A comprehensive research automation system for social sciences and humanities, built as a Claude Desktop Extension (DXT).
This project was inspired by and builds upon the excellent work from:
- AI-Scientist v2 - Advanced automated scientific research capabilities
- Lotus Wisdom MCP - Model Context Protocol implementation patterns
Special thanks to these projects for demonstrating the potential of AI-assisted research automation.
The Autonomous Scientist is an advanced research assistant that combines:
- Multi-language OCR processing for academic PDFs
- Discipline-specific analysis for 8 academic fields
- LaTeX document generation with proper citation formatting
- Comprehensive literature search across multiple academic databases
- Hardware optimization for Intel i3-12100F + 16GB DDR4
- Psychology - APA 7th Edition formatting, experimental design analysis
- Neuroscience - Nature format, neuroimaging methodology analysis
- Education - APA Educational format, curriculum and pedagogy analysis
- Sociology - ASA format, social network and demographic analysis
- Anthropology - AAA format, ethnographic and cultural analysis
- Philosophy - Chicago 17th Edition, argument structure analysis
- Political Science - APSA format, institutional and electoral analysis
- International Relations - APSA format, conflict and diplomacy analysis
Install the DXT CLI:
npm install -g @anthropic-ai/dxt-
Build the extension:
cd autonomous-scientist-extension npm install npm run build npm run pack -
Install in Claude Desktop:
- Open Claude Desktop
- Navigate to Extensions → Install Extension
- Select the generated
.dxtfile - Follow the configuration wizard
-
Configure settings:
- Set your primary research discipline
- Configure citation style preferences
- Set workspace directory
- Add API keys (optional but recommended)
The extension supports extensive user configuration through Claude Desktop's settings:
- Primary Research Discipline: Your main field of study
- Default Citation Style: APA, Chicago, MLA, ASA, AAA, or APSA
- Workspace Directory: Where to store research projects
- API Keys: Semantic Scholar, CrossRef, OpenAI, Anthropic
- Cache Size: Disk cache limit (1-20GB)
- OCR Languages: Comma-separated language codes
- Advanced Features: Enable experimental functionality
- Semantic Scholar: 200M+ academic papers (free)
- ArXiv: Latest preprints (free)
- CrossRef: Publisher metadata (free)
- OpenAI: Enhanced analysis (paid, optional)
- Anthropic: Extended Claude access (paid, optional)
comprehensive_literature_search- Multi-source academic searchanalyze_by_discipline- Specialized analysis for your fieldidentify_research_gaps- Automatic gap identification
process_academic_pdf- Complete PDF analysis with OCRbatch_process_pdfs- Efficient batch processingocr_multilingual- Advanced multi-language OCR
generate_latex_paper- Complete LaTeX documentsformat_citations- Multi-style citation formattingcompile_to_pdf- LaTeX compilation with error handling
analyze_psychology_research- APA methodology analysisanalyze_neuroscience_paper- Neuroimaging analysisanalyze_education_study- Pedagogical analysisanalyze_philosophy_argument- Argument structure- And 4 more specialized analyzers...
User: "I need a literature review on cognitive behavioral therapy"
→ Extension searches multiple databases
→ Processes provided PDFs with OCR
→ Analyzes content by psychology discipline
→ Generates complete LaTeX literature review
User: "Analyze this neuroscience paper" [attach PDF]
→ OCR processing with quality enhancement
→ Extracts methodology, brain regions, findings
→ Provides quality assessment and recommendations
→ Formats citations in desired style
User: Uploads German psychology paper (scanned)
→ Auto-detects German language
→ Applies academic OCR enhancement
→ Analyzes with psychology-specific patterns
→ Generates English summary with APA citations
Comprehensive workflow for conducting literature reviews:
- Multi-source literature search
- PDF processing and analysis
- Gap identification
- LaTeX document generation
Deep analysis of academic documents:
- OCR processing with quality enhancement
- Discipline-specific content analysis
- Citation extraction and formatting
- Research recommendations
- CPU: Intel i3-12100F or equivalent (4 cores)
- RAM: 16GB DDR4
- Storage: 5GB free space for cache
- OS: Windows 10/11, macOS 12+, or Linux
- Claude Desktop: v0.8.0 or later
- Node.js: v18.0.0 or later
- Memory usage optimized for 16GB systems
- Intelligent caching (2GB memory + 5GB disk)
- Adaptive processing based on available resources
- Hardware-aware concurrency limits
- Local Processing: All PDF processing occurs on your machine
- Encrypted Storage: API keys stored with AES-256 encryption
- No Data Sharing: Research content never leaves your system
- Secure APIs: Only established academic APIs
- Optional Cloud: Premium APIs are optional enhancements
High Memory Usage
- Extension automatically manages memory limits
- Reduce concurrent PDF processing if needed
- Clear cache through extension settings
OCR Quality Issues
- Ensure PDF resolution >150 DPI
- Try different language settings
- Use quality enhancement for scanned documents
API Connection Failures
- Check internet connection
- Verify API keys in settings
- Test individual API connections
✅ Stable Components:
- Simplified MCP Server (
server/index-simple.js) - Core functionality working - DXT Package Integration - Properly formatted for Claude Desktop
- Basic Research Tools - Literature search, PDF processing, API setup
🚧 In Development:
- Full TypeScript Conversion - Complex tools being migrated from TypeScript
- Advanced PDF Processing - OCR and citation extraction
- Discipline-Specific Analyzers - Specialized analysis tools
Issue Resolution:
- ✅ Server Disconnection: Fixed TypeScript compatibility issues by creating simplified server
- ✅ MCP JSON Parsing: Resolved stdout pollution in mcp-science-web wrapper
- ✅ Package Installation: Updated manifest.json for proper DXT structure
Architecture Decision: The project now uses a hybrid approach:
- Simplified Server (
index-simple.js) for immediate functionality - Full Implementation (
index.js) for advanced features (in progress)
This ensures users have a working research assistant while development continues.
Runtime Requirements:
- Node.js 18+ with ES Module support
- Python 3.8+ with MCP SDK
- Git for version control
Optional Integrations:
- LaTeX distribution for document generation
- Tesseract OCR for multilingual text extraction
autonomous-scientist-extension/
├── manifest.json # DXT manifest (updated for simplified server)
├── server/ # MCP server implementation
│ ├── index-simple.js # Simplified working server (current)
│ ├── index.js # Full server (TypeScript conversion in progress)
│ ├── tools/ # Tool implementations (40+ tools)
│ └── utils/ # Utility modules (security, memory, cache)
├── assets/ # Icons and screenshots
├── templates/ # LaTeX templates for disciplines
└── autonomous-scientist-dxt.dxt # Packaged extension file
- Clone the repository
- Install dependencies:
npm install - Build TypeScript:
npm run build - Package DXT:
npm run pack
- Follow the existing code structure
- Implement proper error handling
- Add comprehensive logging
- Test with multiple document types
MIT License - See LICENSE file for details
For issues and support:
- Check the troubleshooting section
- Review Claude Desktop Extension documentation
- Report bugs with system specifications
- Include example files (anonymized) for OCR issues
Transform your research workflow with the Autonomous Scientist Desktop Extension - the most comprehensive academic research automation tool for social sciences and humanities.
Optimized for your hardware • Secure by design • Research-focused