Document Version: 1.0 Last Updated: 2025-10-12 Status: Production Ready
AI-CoScientist requires a strategic, boundary-crossing literature monitoring system that captures cutting-edge research at the intersection of multiple disciplines. This document defines the complete monitoring strategy designed to:
✅ Capture interdisciplinary research across brain imaging, data science, psychology, machine learning, foundation models, and AI for science ✅ Target top-tier conferences: NeurIPS, ICLR, ICML, MICCAI ✅ Optimize efficiency: 400-600 high-quality papers/month (not thousands of irrelevant papers) ✅ Enable scientific discovery: Papers that inspire and inform AI-CoScientist development
Problem: Generic monitoring (e.g., all cs.AI papers) produces:
- 🔴 Volume overload: 200+ papers/day, 95% irrelevant
- 🔴 Low signal-to-noise: Gaming AI, robotics swamp neuroscience papers
- 🔴 Missing conferences: ArXiv/PubMed don't directly filter by conference
Solution: 8 targeted sources + 4 precision alerts = 2-stage filtering
Stage 1: Broad Collection (8 sources)
↓
ArXiv: 4 strategic category combinations
PubMed: 4 precision MeSH queries
↓
~100-150 papers/day collected
Stage 2: Precision Filtering (4 alerts)
↓
Keyword-based boundary detection
Conference paper identification
↓
~10-20 papers/day surface to researchers
↓
Final: 400-600 papers/month for review
Key Insight: NeurIPS/ICLR/ICML papers appear on ArXiv as preprints before conference acceptance. We capture them through strategic category + keyword combinations.
Configuration:
{
"source_type": "arxiv",
"category": "cs.LG,cs.AI,stat.ML",
"sync_frequency": "daily"
}Strategy:
- Target: NeurIPS, ICLR, ICML preprints
- Coverage: 90%+ of accepted papers appear here first
- Volume: ~50-100 papers/day
- Filter: Alert keywords narrow to neuroscience/brain imaging papers
Why this works:
- ML conference submissions → ArXiv preprint (typical workflow)
- cs.LG = Machine Learning (ICML, NeurIPS core)
- cs.AI = Artificial Intelligence (ICLR, NeurIPS applications)
- stat.ML = Statistical ML (theory papers)
Expected papers:
- "Decoding visual cortex with transformers" (NeurIPS 2024)
- "Foundation models for fMRI analysis" (ICLR 2024)
- "Causal inference in neuroscience with ML" (ICML 2024)
Configuration:
{
"source_type": "arxiv",
"category": "q-bio.NC,cs.NE,q-bio.QM",
"sync_frequency": "daily"
}Strategy:
- Target: Direct neuroscience + computation intersection
- Volume: ~10-20 papers/day (small, high-quality)
- Filter: Already highly relevant, minimal filtering needed
Why this works:
- q-bio.NC = Neurons and Cognition (neuroscience core)
- cs.NE = Neural and Evolutionary Computing
- q-bio.QM = Quantitative Methods (computational biology)
Expected papers:
- "Neural population dynamics modeling"
- "Brain connectivity analysis with graph neural networks"
- "Computational models of cognition"
Conference relevance: NeurIPS Computational Neuroscience track, COSYNE
Configuration:
{
"source_type": "arxiv",
"category": "cs.CV,eess.IV,physics.med-ph",
"sync_frequency": "weekly"
}Strategy:
- Target: MICCAI-style research (brain imaging + deep learning)
- Volume: ~20-30 papers/week
- Filter: "brain", "neuroimaging", "fMRI" keywords
Why this works:
- cs.CV = Computer Vision (image analysis methods)
- eess.IV = Image and Video Processing (medical imaging)
- physics.med-ph = Medical Physics (imaging physics)
Expected papers:
- "3D brain segmentation with transformers" (MICCAI)
- "Self-supervised learning for fMRI" (MICCAI/NeurIPS)
- "Multimodal neuroimaging fusion" (MICCAI)
Configuration:
{
"source_type": "arxiv",
"category": "cs.AI,cs.CL,cs.HC",
"sync_frequency": "weekly"
}Strategy:
- Target: AI for scientific discovery papers
- Volume: ~15-25 papers/week
- Filter: "scientific discovery", "hypothesis generation", "automated experiment"
Why this works:
- cs.AI = AI applications (broad)
- cs.CL = NLP/LLMs (scientific text mining)
- cs.HC = Human-Computer Interaction (research tools)
Expected papers:
- "LLMs for hypothesis generation" (NeurIPS AI4Science)
- "Automated experimental design" (ICML)
- "Self-driving laboratories" (Science Robotics)
Direct relevance: AI-CoScientist competitors and inspiration
Configuration:
{
"source_type": "pubmed",
"query": "(Brain Mapping[MeSH] OR Neuroimaging[MeSH] OR Magnetic Resonance Imaging[MeSH]) "
"AND (Machine Learning[MeSH] OR Deep Learning[Title/Abstract] OR Neural Networks, Computer[MeSH])",
"sync_frequency": "weekly"
}Strategy:
- Target: Published brain imaging + ML research
- Volume: ~15-25 papers/week
- Journals: NeuroImage, Nature Neuroscience, Brain, PNAS
Why this works:
- MeSH terms = precision (no false positives like "neural network" in biology)
- Brain Mapping/Neuroimaging = core techniques
- Machine Learning[MeSH] = properly tagged ML papers
Expected papers:
- "Deep learning for Alzheimer's prediction from MRI"
- "Transformer-based fMRI decoding"
- "Brain-age prediction with neural networks"
Configuration:
{
"source_type": "pubmed",
"query": "(Mental Disorders[MeSH] OR Psychiatry[MeSH] OR Psychology[MeSH]) "
"AND (Machine Learning[MeSH] OR Computational Biology[MeSH] OR Data Science[Title/Abstract]) "
"AND (Neuroimaging[MeSH] OR Brain[MeSH])",
"sync_frequency": "weekly"
}Strategy:
- Target: Psychology + biological psychology + data science + brain imaging
- Volume: ~10-15 papers/week
- Journals: Biological Psychiatry, JAMA Psychiatry, Molecular Psychiatry
Why this works:
- Captures boundary-crossing research (4 disciplines)
- Mental disorders + ML + brain imaging = computational psychiatry core
- High clinical impact (translational research)
Expected papers:
- "Depression prediction from resting-state fMRI"
- "Computational models of psychiatric disorders"
- "Digital phenotyping with ML"
Configuration:
{
"source_type": "pubmed",
"query": "(Artificial Intelligence[MeSH] OR Deep Learning[Title/Abstract]) "
"AND (Biomedical Research[MeSH] OR Drug Discovery[MeSH] OR Precision Medicine[MeSH])",
"sync_frequency": "weekly"
}Strategy:
- Target: AI applications in real scientific discovery
- Volume: ~20-30 papers/week
- Journals: Cell, Nature Medicine, Science Translational Medicine
Why this works:
- Captures AI actually contributing to science (not just methods)
- Drug discovery, precision medicine = concrete scientific impact
- Inspiration for AI-CoScientist's experimental design module
Expected papers:
- "AlphaFold applications in drug discovery"
- "AI-designed experiments in cancer research"
- "Precision medicine with machine learning"
Configuration:
{
"source_type": "pubmed",
"query": "(Cognition[MeSH] OR Cognitive Science[Title/Abstract]) "
"AND (Neural Networks, Computer[MeSH] OR large language model[Title/Abstract] "
"OR foundation model[Title/Abstract] OR transformer[Title/Abstract])",
"sync_frequency": "weekly"
}Strategy:
- Target: Cutting-edge cognitive science + LLMs/transformers
- Volume: ~5-10 papers/week (emerging field)
- Journals: Nature, Science, PNAS, Trends in Cognitive Sciences
Why this works:
- Most boundary-crossing: cognition + foundation models
- Brain-inspired AI ← → AI-informed neuroscience
- Recent explosion of interest (ChatGPT era)
Expected papers:
- "LLMs as models of human cognition"
- "Brain-inspired transformers"
- "Cognitive architectures using foundation models"
Configuration:
{
"topic": "Brain Decoding + Foundation Models",
"keywords": [
"brain decoding", "neural decoding", "fMRI",
"transformer", "foundation model", "large language model", "CLIP",
"visual cortex", "neural representation", "encoding model",
"shared embedding", "cross-modal"
],
"frequency": "daily"
}Strategy: Captures hottest interdisciplinary topic
- NeurIPS/ICLR papers using CLIP/transformers for brain decoding
- Examples: "Mind-reading with transformers", "CLIP for fMRI"
- High citation potential, boundary-crossing
Configuration:
{
"topic": "AI for Scientific Discovery",
"keywords": [
"automated experiment", "hypothesis generation", "scientific discovery",
"research automation", "experimental design", "active learning",
"bayesian optimization", "self-driving lab", "robot scientist",
"literature mining", "knowledge graph"
],
"frequency": "daily"
}Strategy: Direct AI-CoScientist relevance
- Competitors: A-Lab, self-driving laboratories
- Inspiration: hypothesis generation, experiment design
- ICML/NeurIPS AI4Science track
Configuration:
{
"topic": "Computational Psychiatry",
"keywords": [
"computational psychiatry", "mental disorder prediction",
"depression", "anxiety", "schizophrenia", "ADHD",
"resting-state fMRI", "functional connectivity",
"predictive model", "biomarker", "precision psychiatry"
],
"frequency": "weekly"
}Strategy: Clinical impact + interdisciplinary
- Psychology + biology + ML + brain imaging
- Translational research (high citation)
- Biological Psychiatry papers
Configuration:
{
"topic": "Multimodal Neuroimaging + AI",
"keywords": [
"multimodal", "cross-modal", "fusion", "integration",
"fMRI", "EEG", "MEG", "PET", "DTI",
"vision-language", "contrastive learning", "self-supervised",
"MICCAI", "medical image analysis"
],
"frequency": "daily"
}Strategy: MICCAI + NeurIPS intersection
- Modern ML techniques (contrastive learning, SSL)
- Applied to multimodal brain data
- Foundation model influence on medical imaging
Daily (2 daily sources + alerts):
- Core ML: 50-100 papers → 5-10 relevant (alert filtering)
- Computational Neuroscience: 10-20 papers → 8-15 relevant (high signal)
- Daily total: 13-25 papers
Weekly (6 weekly sources):
- Medical Imaging: 20-30 papers → 10-15 relevant
- AI for Science: 15-25 papers → 8-12 relevant
- 4 PubMed sources: 50-80 papers → 30-50 relevant
- Weekly total: 48-77 papers
Monthly:
- Daily: 13-25 × 30 = 390-750 papers
- Weekly: 48-77 × 4 = 192-308 papers
- Monthly total: ~400-600 highly relevant papers
Expected conference paper capture rate:
| Conference | Papers/Year | ArXiv Rate | Expected Capture |
|---|---|---|---|
| NeurIPS | ~3000 | 95% | ~100-150 relevant |
| ICLR | ~2000 | 98% | ~80-120 relevant |
| ICML | ~2500 | 90% | ~90-130 relevant |
| MICCAI | ~500 | 70% | ~30-50 relevant |
| Total | ~8000 | ~90% | ~300-450 papers |
Coverage quality:
- ✅ High precision: Only boundary-crossing papers
- ✅ Early access: Preprints before conference (3-6 months early)
- ✅ Complete: Abstracts, authors, PDFs available
Expected paper quality:
- Citation potential: High (interdisciplinary papers cite more)
- Journal tier: Top 10% journals (Nature, Science, Cell, PNAS)
- Relevance score: >80% directly applicable to AI-CoScientist
Boundary-crossing verification:
- ✅ 2+ disciplines: Every paper spans multiple fields
- ✅ 4 conference targets: NeurIPS/ICLR/ICML/MICCAI covered
- ✅ AI-CoScientist alignment: Direct inspiration for features
# 1. Ensure services running
poetry run alembic upgrade head
poetry run celery -A src.core.celery_app worker --loglevel=info &
poetry run celery -A src.core.celery_app beat --loglevel=info &
poetry run uvicorn src.main:app --reload &
# 2. Run setup script
python scripts/setup_strategic_monitoring.py
# 3. Verify
curl http://localhost:8000/api/v1/monitoring/sources | jq
# 4. Trigger first sync
curl -X POST http://localhost:8000/api/v1/monitoring/sync/all# Check source status
curl http://localhost:8000/api/v1/monitoring/sources | \
jq '.[] | {id, type: .source_type, status, last_sync: .last_sync_time}'
# Check alerts
curl http://localhost:8000/api/v1/monitoring/alerts | \
jq '.[] | {topic, active, keywords: (.keywords | length)}'
# View statistics
curl http://localhost:8000/api/v1/monitoring/sources/{source_id}/statisticsConference submission seasons (increase frequency):
- March-May: NeurIPS submission → daily → hourly
- September-October: ICLR submission → daily → hourly
- January-February: ICML submission → daily → hourly
Dynamic adjustment:
# Increase during submission season
curl -X PATCH http://localhost:8000/api/v1/monitoring/sources/{id} \
-d '{"sync_frequency": "hourly"}'
# Return to normal after deadline
curl -X PATCH http://localhost:8000/api/v1/monitoring/sources/{id} \
-d '{"sync_frequency": "daily"}'Iterative improvement:
- Review papers surfaced by alerts
- Identify false positives (irrelevant papers matching keywords)
- Add exclusion terms or refine keywords
- Update alerts via API
Example refinement:
# Too broad: "neural network"
# Better: "neural network" + "brain" (requires both)
# Best: "neural decoding" OR "brain-computer interface"- ✅ 400-600 papers/month collected
- ✅ >80% relevance rate (papers reviewed vs. collected)
- ✅ >90% conference coverage (NeurIPS/ICLR/ICML/MICCAI preprints)
- ✅ <5% duplicates (same paper from multiple sources)
- ✅ <24h latency (ArXiv publication → AI-CoScientist database)
- ✅ Boundary-crossing: Every paper spans 2+ disciplines
- ✅ Conference quality: >50% from target conferences
- ✅ Inspiration rate: >10 papers/month directly inform AI-CoScientist features
- ✅ Discovery rate: >5 papers/month reveal new research directions
Motivation: Direct access to ICLR/NeurIPS accepted papers
Implementation:
# New source type: openreview
{
"source_type": "openreview",
"venue": "ICLR.cc/2024/Conference",
"decision": "Accept",
"keywords": ["neuroscience", "brain", "cognitive"]
}Benefits:
- ✅ 100% conference coverage (no preprint reliance)
- ✅ Review scores available
- ✅ Author rebuttals and discussion
Motivation: Find influential papers by tracking citations
Implementation:
- Integrate Semantic Scholar API
- Track citation counts monthly
- Identify "rising stars" (rapid citation growth)
- Build citation graph for related work discovery
Motivation: Prioritize highest-impact papers
Scoring factors:
- Author h-index (H-index > 50 = high priority)
- Venue prestige (Nature/Science = 10x weight)
- Citation velocity (citations/month since publication)
- Relevance score (LLM-based abstract similarity)
Output: Ranked list of papers to review first
- https://arxiv.org/category_taxonomy
- Focus: cs.LG, cs.AI, cs.CV, q-bio.NC, stat.ML
- https://meshb.nlm.nih.gov/search
- Use for building precise medical queries
- NeurIPS: https://neurips.cc/
- ICLR: https://iclr.cc/
- ICML: https://icml.cc/
- MICCAI: http://www.miccai.org/
- API: https://api.openreview.net/
- Browse: https://openreview.net/
Weekly:
- Review alert effectiveness (false positive rate)
- Adjust keywords if needed
- Check sync success rate
Monthly:
- Analyze paper distribution (conferences, topics)
- Tune sync frequencies
- Review collected papers for quality
Quarterly:
- Update ArXiv categories (new categories emerge)
- Refine PubMed queries
- Add new alerts for emerging topics
No papers collected:
- Check Celery worker logs
- Verify source status (active?)
- Test ArXiv/PubMed APIs directly
- Check rate limits
Too many irrelevant papers:
- Review alert keywords
- Add exclusion terms
- Narrow ArXiv categories
- Refine PubMed MeSH terms
Missing conference papers:
- Check ArXiv comment field extraction
- Verify submission season timing
- Add OpenReview integration (Phase B)
This monitoring strategy transforms AI-CoScientist from a passive tool into an active participant in cutting-edge research. By strategically capturing boundary-crossing papers from top conferences, we ensure the system:
✅ Stays current with latest methodological advances ✅ Identifies opportunities for new features and capabilities ✅ Maintains relevance in rapidly evolving AI × science landscape ✅ Inspires innovation through exposure to diverse interdisciplinary work
Next Steps:
- Run
python scripts/setup_strategic_monitoring.py - Monitor first sync results
- Refine based on paper quality
- Scale to full production
Document Maintainer: AI-CoScientist Team Review Schedule: Quarterly Last Review: 2025-10-12