A LangChain-powered question-answering system for scientific papers using Ollama LLM and ChromaDB for vector storage.
- Load and process PDF papers automatically
- Smart document chunking and embedding generation
- Interactive Q&A command line interface
- Vector similarity search using ChromaDB
- Progress tracking for document processing
- Python 3.8+
- Ollama running locally
- PDF papers to analyze
- Clone the repository:
git clone <repository-url>
cd papers-qa-agent
- Setup and dependencies :
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
- Configure environment variables in .env:
PROJECT_NAME=scientific_papers_qa
PAPERS_DIR=./papers
OLLAMA_MODEL=deepseek-r1:14b
OLLAMA_EMBEDDING_MODEL=snowflake-arctic-embed:335m-l-fp16
CHROMA_DB_DIR=./chroma_db
OLLAMA_BASE_URL=http://localhost:11434
- Add PDF papers to the papers directory
- Run the application:
python __main__.py
audio
: Use the audio to text STT featurehelp
: Show available commandsmetadata
: Display information about loaded papersexit
: Quit the application
pyaudio might require additional system-level dependencies:
- On Windows: No additional requirements
- On Linux:
sudo apt-get install python3-pyaudio
- On macOS:
brew install portaudio