Capabilities:
- Smart query routing (decides when to search vs. answer directly)
- Real-time web search with Tavily
- Streaming responses
- Source attribution
- REST API with FastAPI
- Conversation memory (maintains context across queries within the same session)
NOTE: Based on Anthropic
claude-3-haiku-20240307because it is cheap and I have a good knowledge on how to use it, which makes it pretty good for prototyping.
- Create a
.envfile:
cp .env_example .env- Edit
.envand add your API keys:
ANTHROPIC_API_KEY=your_anthropic_api_key
TAVILY_API_KEY=your_tavily_api_key
- Build and run with Docker Compose:
docker-compose up --buildThe app will be available at http://localhost:8000
- Install dependencies:
uv sync- Create a
.envfile:
cp .env_example .env-
Edit
.envand add your API keys -
Run the API:
uv run uvicorn app.main:app --reloadYou can use the notebook to test the agent and the app. See experiments/playground.ipynb
Endpoint: POST http://localhost:8000/query
Request Schema:
{
"query": "string",
"thread_id": "string" // Optional, defaults to "default". Use to maintain conversation context.
}Request Headers:
Content-Type: application/json(required)
Response:
- Content-Type:
text/plain - Format: Streaming text response with metadata suffix
The response streams the agent's answer followed by metadata indicating whether a search was performed and any source URLs used.
Response Format:
<agent_response_text>
___METADATA___:{"searched": boolean, "sources": ["url1", "url2", ...]}
Response Fields:
searched(boolean): Indicates whether the agent performed a web searchsources(array of strings): List of source URLs referenced in the response (only populated whensearchedistrue)
Behavior:
- The agent uses smart query routing to determine if a web search is needed
- Queries requiring current data (e.g., exchange rates, stock prices) trigger a search, otherise no web search is triggered
Examples:
Query requiring web search (current data):
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is the current EUR/USD exchange rate?"}'Query answered directly (general knowledge):
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is diversification in portfolio management?"}'Conversation Memory Example:
The agent maintains conversation context within the same thread_id. Use this to have multi-turn conversations:
# First query - introduce yourself and ask a question
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "My name is Nicholas. I want to know what is the current EUR/USD exchange rate today?", "thread_id": "test_id"}'
# Second query - the agent remembers your name from the previous query
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is my name?", "thread_id": "test_id"}'
# Response: "Your name is Nicholas."NOTE: Memory is stored in-memory and resets when the server restarts or with a different
thread_id.
Error Responses:
- 503 Service Unavailable - Agent not initialized (wait a moment and retry)
- 422 Unprocessable Entity - Invalid request format or query validation failed (e.g., empty query, query too long)
- 500 Internal Server Error - Processing error
To check the whole API go to: http://localhost:8000/docs