The Streamlit app provides a beautiful, user-friendly web interface for searching the PubMed document collection using semantic search.
- π Semantic Search - Natural language queries
- β‘ Fast Search Mode - HNSW approximate search (~95% accurate, 2-5ms)
- π― Exact Search Mode - Brute force search (100% accurate, 10-50ms)
- π Visual Analytics - Score distributions, section analysis
- π Search History - Quick access to recent queries
- βοΈ Configurable Settings - Adjust results, thresholds, and more
# Using Docker
docker run -p 6333:6333 qdrant/qdrant
# Or start local Qdrant if already configured# Using uv
uv run streamlit run app.py
# Or with regular Python (if venv activated)
streamlit run app.pyOpen your browser to: http://localhost:8501
-
Enter your query in the search box
Example: "diabetes prevention strategies" -
Click "π Search" or press Enter
-
View results with:
- Relevance scores
- Abstract content
- Section labels
- Timing information
- Slider: 1-50 results
- Default: 10
- Use: Adjust how many documents to retrieve
- Slider: 0.0 - 1.0
- Default: 0.0 (show all)
- Use: Filter out low-relevance results
- Tip: Set to 0.5-0.6 for high-quality results only
- Toggle: Enable/Disable
- Default: Disabled (Fast Mode)
- Fast Mode: β‘ HNSW approximate search (faster)
- Exact Mode: π― Brute force search (slower, 100% accurate)
β Best for:
- Regular searches
- Interactive exploration
- Quick lookups
- Production use
- Large result sets
Performance:
- Search time: 2-5ms
- Accuracy: ~95-98%
- Scales to millions of documents
Example:
Query: "HIV treatment"
β‘ Fast Mode: 3.2ms | Found 10 results | Score: 0.87
β Best for:
- Critical research
- Quality assurance
- Benchmarking
- Small collections
- Maximum accuracy needed
Performance:
- Search time: 10-50ms
- Accuracy: 100%
- Slower on large collections
Example:
Query: "HIV treatment"
π― Exact Mode: 28.5ms | Found 10 results | Score: 0.88
In the App:
- Look at the left sidebar
- Under "π Search Settings"
- Check the box: "π― Exact Search Mode"
- Run your search
Visual Indicator:
- The search spinner will show: "π― Exact Search in progress..."
- Results will display: "π― Exact" mode badge (blue)
- Compare timing with Fast Mode
Try these example queries:
HIV treatment effectivenessdiabetes prevention strategiescancer immunotherapy clinical trialsCOVID-19 vaccine efficacy
hypertension managementcardiovascular disease preventionantibiotic resistance mechanisms
mental health interventionsdepression treatment outcomesanxiety disorder therapies
vaccination programs effectivenessobesity prevention strategiessmoking cessation methods
Each result displays:
π’ Result 1 - Abstract ID: 12345678 (Score: 0.8534)
ββ Relevance Score: 0.8534 (higher = more relevant)
ββ Sentences: 15
ββ Dataset Split: TRAIN
ββ Sections: BACKGROUND, METHODS, RESULTS
ββ Abstract Content: Full text of the abstract...
| Score Range | Quality | Color |
|---|---|---|
| 0.80 - 1.00 | Excellent match | π’ Green |
| 0.50 - 0.79 | Good match | π‘ Yellow |
| 0.00 - 0.49 | Weak match | π΄ Red |
β‘ Total Time: 47.3 ms
ββ π§ Embedding Time: 43.8 ms (generating query vector)
ββ π Search Time: 3.5 ms (Qdrant search)
Typical Performance:
- Embedding: 40-50ms (first search), 0ms (cached)
- Fast Search: 2-5ms
- Exact Search: 10-50ms
- Total (Fast): 42-55ms
- Total (Exact): 50-100ms
- Expandable cards for each result
- Full abstract content
- Metadata and labels
- Score Statistics: Avg, Max, Min, Std Dev
- Section Distribution: Bar chart of abstract sections
- Helps understand result composition
- Histogram: Distribution of relevance scores
- Score vs Rank: How scores decrease with rank
- Useful for quality assessment
Click "π Connection Settings" in sidebar:
Qdrant Host: localhost
Qdrant Port: 6333
Collection Name: pubmed_documents
Use when:
- Connecting to remote Qdrant server
- Using different collection
- Custom port configurations
- Location: Bottom of sidebar
- Capacity: Last 10 searches
- Use: Click any previous query to re-run it
Shows in sidebar:
β
Connected to Qdrant
Documents: 20,000
Collection Status: GREEN
-
Run search in Fast Mode:
- Uncheck "π― Exact Search Mode"
- Search: "machine learning"
- Note the timing and top results
-
Run same search in Exact Mode:
- Check "π― Exact Search Mode"
- Search: "machine learning"
- Compare timing and results
Timing:
Fast Mode: Search Time: 3.2ms | Total: 47.5ms
Exact Mode: Search Time: 28.1ms | Total: 72.3ms
Results:
Fast Mode: Top 5 usually match Exact Mode's top 5
Exact Mode: Guaranteed best matches
Accuracy:
- Fast: 4-5 out of 5 results match exact mode
- Exact: 5 out of 5 (by definition)
Problem: Cannot connect to Qdrant
Solutions:
- Check Qdrant is running:
docker ps | grep qdrant - Verify port 6333 is open
- Check connection settings in sidebar
Problem: Query returns 0 results
Solutions:
- Lower the score threshold to 0
- Try a different query
- Check collection has documents:
- Look at "Documents" count in sidebar
Problem: Searches take >1 second
Solutions:
- Use Fast Mode (uncheck Exact Search)
- Check Qdrant server health
- Reduce number of results
- Check network connection (if remote Qdrant)
Problem: "Loading embedding model" takes >30 seconds
Solutions:
- First load is slow (downloads model)
- Subsequent loads are cached (fast)
- Check internet connection for initial download
- Model is cached in
~/.cache/torch/sentence_transformers/
- β Use Fast Mode (default)
- β Enable caching (automatic in app)
- β Use lower result limits (10 instead of 50)
- β Run Qdrant locally (not remote)
- β Use Exact Mode for critical queries
- β Increase score threshold (filter weak results)
- β Review multiple results (not just top 1)
- β Try query variations (different phrasings)
| Key | Action |
|---|---|
Enter |
Search (when in query box) |
Tab |
Navigate fields |
Esc |
Clear focus |
streamlit run app.pystreamlit run app.py --server.port 8080streamlit run app.py --server.address 0.0.0.0streamlit run app.py --server.headless truestreamlit run app.py \
--server.port 8080 \
--server.address 0.0.0.0 \
--server.headless true \
--theme.base light-
Exploratory Search (Fast Mode):
Query: "diabetes treatment" Mode: β‘ Fast Results: 20 Threshold: 0.0 -
Refine Search (add threshold):
Query: "diabetes type 2 insulin therapy" Mode: β‘ Fast Results: 10 Threshold: 0.6 -
Final Validation (Exact Mode):
Query: "type 2 diabetes insulin therapy outcomes" Mode: π― Exact Results: 10 Threshold: 0.7 -
Analyze Results:
- Review Analysis tab
- Check score distribution
- Export findings
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β π¬ PubMed Semantic Search β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π [Enter your search query... ] π β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β‘47ms π§ 44ms π3ms β‘Fast β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π Results | π Analysis | π Distribution β
β β
β π’ Result 1 - Abstract ID: 12345 (0.8534) β
β π’ Result 2 - Abstract ID: 67890 (0.8421) β
β ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββ
β βοΈ Configuration β
ββββββββββββββββββββββββ€
β π Search Settings β
β β
β Max Results: [10] β
β β
β Score Threshold: β
β [β β β‘β‘β‘β‘β‘β‘β‘β‘] 0.0 β
β β
β βοΈ π― Exact Search β
β β
ββββββββββββββββββββββββ€
β π Collection Info β
β β
Connected β
β Documents: 20,000 β
ββββββββββββββββββββββββ
A:
- Fast Mode: Uses HNSW approximate search. ~95% accurate, 2-5ms search time.
- Exact Mode: Brute force search. 100% accurate, 10-50ms search time.
A: Use Exact mode for:
- Critical research requiring maximum accuracy
- Benchmarking and quality assurance
- Small collections where speed difference is minimal
- When you need guaranteed best results
A: First search loads and caches the embedding model (~1-2 seconds). Subsequent searches are fast.
A: Yes! Run the same query twice:
- First with Fast Mode (unchecked box)
- Then with Exact Mode (checked box) Compare the results and timing.
A: Usually 4-5 out of top 5 results are the same. Exact mode guarantees finding THE best matches, but Fast mode is very close.
A:
- Use Fast Mode (default)
- Reduce result limit (10 instead of 50)
- Run Qdrant locally
- Embedding is cached automatically
For issues or questions:
- Check this guide
- Review troubleshooting section
- Check Qdrant connection
- Verify collection has documents
Happy Searching! π¬