A course-assistant demo built with Retrieval-Augmented Generation (RAG), local embeddings, FAISS retrieval, and DashScope/Qwen generation.
This project is designed to answer machine learning course questions using lecture materials and a local vector index.
The pipeline combines:
- semantic retrieval over course materials
- query rewriting for better search quality
- intent routing for task-aware handling
- retrieval planning before search execution
- coverage checks before grounded answer generation
- LLM answer generation
- a fallback path when retrieval quality is too low
- a Streamlit demo UI
- Semantic retrieval with local embeddings
- FAISS-based vector search
- RAG question answering
- Intent-aware retrieval planning
- Raw/wiki mixed-source balancing
- Coverage-aware refusal for weak retrieval
- Query rewriting for short English questions
- Fallback answering when retrieval is weak
- Streamlit web interface
- Support for course-oriented Q&A workflows
- Python
- Streamlit
- DashScope / Qwen
- FAISS
- LangChain community loaders and splitters
- Sentence-Transformers
User Query
-> Intent Router
-> Retrieval Planner
-> Plan Executor
-> Coverage Checker
-> Prompt Builder
-> LLM Generation
-> Answer
Instead of using one fixed retrieval strategy for every question, the system now builds a small retrieval plan before searching.
Different intents use different retrieval behavior:
definition: small wiki support plus a small amount of raw evidencecomparison: query decomposition plus reranking to retrieve both conceptssummary: broader raw coverage so chapter-level summaries are less fragmentaryquiz: diversified retrieval so generated questions cover multiple conceptsdiagnosis: prioritize coverage inspection before direct answer generation
This makes the pipeline easier to debug and explain because the app can show:
- detected intent
- planned queries
- raw/wiki document balance
- coverage status
- retrieved evidence
pdf_ai_project/
|-- app.py
|-- eval_agent.py
|-- llm_client.py
|-- rag.py
|-- build_vectorstore.py
|-- build_wiki.py
|-- run_demo.bat
|-- run_demo.ps1
|-- requirements.txt
|-- eval/
| |-- eval_questions.jsonl
| |-- judge_prompt.txt
| `-- runs/
|-- raw/
|-- wiki/
|-- faiss_index/
`-- README.md
On first run, the embedding model may need to be available locally before the app can run.
If your environment cannot reach Hugging Face, set EMBEDDING_MODEL_PATH in .env to a local sentence-transformers/all-MiniLM-L6-v2 snapshot directory.
This project uses DashScope for model inference. Make sure DASHSCOPE_API_KEY is available before running the app.
Example:
DASHSCOPE_API_KEY=your_api_keyYou can store it in a local .env file. A sample file is included as .env.example.
Optional offline setting:
EMBEDDING_MODEL_PATH=C:\Users\yourname\.cache\huggingface\hub\models--sentence-transformers--all-MiniLM-L6-v2\snapshots\<snapshot-id>If you are on Windows, you can start the demo by double-clicking:
run_demo.bat
Or run it manually in PowerShell:
./run_demo.ps1The startup script will:
- Check Python
- Check or create
.env - Prompt for
DASHSCOPE_API_KEYif it is missing - Install dependencies from
requirements.txt - Build the FAISS index automatically if it is missing
- Start the Streamlit app
If the embedding model is not already cached, the startup may still fail in a restricted network environment. In that case, point EMBEDDING_MODEL_PATH to a local model snapshot first.
- Clone the repository
git clone <your-repo-url>
cd pdf_ai_project- Install dependencies
pip install -r requirements.txt- Build the vector index if needed
python build_vectorstore.py- Start the app
streamlit run app.pyIf you update the PDFs, wiki files, or embedding workflow, rebuild the vector index:
python build_vectorstore.pyThe project now includes a lightweight evaluation agent for regression testing.
It can:
- read questions from
eval/eval_questions.jsonl - batch-run
ask_rag()against the local vector store - record answers, rewritten queries, intents, coverage status, and retrieved sources
- compute simple retrieval metrics such as wiki hit, raw hit, keyword hit, and latency
- optionally call an LLM-as-judge with
eval/judge_prompt.txt - generate run artifacts under
eval/runs/<timestamp>/
Run a basic evaluation first:
python eval_agent.py --limit 5Run the judged version after the basic run looks healthy:
python eval_agent.py --limit 5 --judgeEach run produces:
results.jsonl: one JSON record per casesummary.json: aggregate metrics for the runreport.md: human-readable summary, failure analysis, and case details
This is especially useful for tracking:
- which intents are performing well
- whether retrieval is pulling wiki/raw evidence as expected
- which cases are failing because of coverage or routing
- whether bad cases are hard negatives or true course-coverage gaps
- Multi-document upload support
- Chat history memory
- Streaming output
- Web deployment
- Agent-based tutoring features
1572408266@qq.com