Complete guide to using the Clinical Intelligence RAG REST API.
python main.pyThe API will run on http://0.0.0.0:8000
Once the server is running, visit http://localhost:8000/docs to access the interactive Swagger UI where you can test all endpoints directly.
GET /Simple health check endpoint to verify the API is running.
Response:
{
"status": "healthy"
}POST /querySubmit a clinical question and receive AI-powered responses using the RAG pipeline.
Request Body:
{
"question": "What is the patient's oxygen saturation and the doctor's plan?"
}Response:
{
"question": "What is the patient's oxygen saturation and the doctor's plan?",
"answer": "The patient's oxygen saturation is 98% on room air. The doctor's plan is to monitor oxygen levels during activity and provide supplemental oxygen if saturation drops below 95%.",
"context_sources": [
{
"source": "document_name.pdf",
"page": 2,
"relevance_score": 0.98
}
]
}import requests
api_url = "http://localhost:8000/query"
payload = {
"question": "What medications is the patient taking?"
}
response = requests.post(api_url, json=payload)
result = response.json()
print(result["answer"])curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"question": "What is the patient'\''s current diagnosis?"
}' \
| jq .const response = await fetch('http://localhost:8000/query', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
question: "What is the patient's oxygen saturation?"
})
});
const data = await response.json();
console.log(data.answer);The RAG pipeline processes your query as follows:
- Semantic Search - Your question is embedded and compared against the Pinecone vector database
- Context Retrieval - The top 4 most relevant clinical document chunks are retrieved
- Contextual Compression - Retrieved chunks are re-ranked and noisy segments filtered
- LLM Processing - The question + context are sent to your selected LLM provider
- Response Generation - The LLM generates a clinical response grounded in the retrieved context
The API automatically uses the LLM provider specified in your .env file:
LLM_PROVIDER=ANTHROPIC # Default to Claude 3.5 Sonnet
LLM_PROVIDER=OPENAI # Use GPT-4o
LLM_PROVIDER=BEDROCK # Use AWS Bedrock (Claude Sonnet)- For extraction tasks: Use OpenAI's GPT-4o-mini (cheapest)
- For complex reasoning: Use Claude 3.5 Sonnet (best quality)
- For HIPAA compliance: Use AWS Bedrock (compliant deployment)
Modify the number of context chunks retrieved:
# In core/orchestrator.py
docs = vdb.similarity_search(prompt, k=3) # Lower k = faster, less context
docs = vdb.similarity_search(prompt, k=10) # Higher k = slower, more context# Enable compression (default)
docs = vdb.similarity_search(prompt, enable_compression=True)
# Disable for testing
docs = vdb.similarity_search(prompt, enable_compression=False)400 Bad Request:
{
"detail": "Missing required field: 'question'"
}500 Internal Server Error:
{
"detail": "Failed to connect to Pinecone vector database"
}Check your .env configuration and API keys if you encounter errors.
For production deployment with Docker:
docker-compose up --buildSee DOCKER.md for detailed containerization instructions.
Enable LangChain tracing for detailed pipeline monitoring:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your-langsmith-key
python main.pyTraces will be sent to LangSmith for analysis.
- UI Usage: See UI.md
- Advanced Features: See FEATURES.md
- Setup Guide: See SETUP.md