-
Dependencies
- Install
pdfjs-dist(for reading PDF text in browser). - Install
@xenova/transformers(for local semantic embeddings).
- Install
-
UI Updates (Upload & Progress)
- Add a "Upload PDF" button in
Sidebar.tsx. - Add a visual overlay/progress bar in
App.tsxindicating:- PDF parsing state
- Embedding generation state (with a percentage)
- Add a "Upload PDF" button in
-
PDF Processing & Chunking
- Create
src/lib/pdfParser.ts. - Write logic to extract raw text from the uploaded PDF.
- Write logic to chunk text into paragraphs (~300-500 chars).
- Create
-
Web Worker for AI Embeddings
- Create
src/workers/embedWorker.ts. - Initialize
Xenova/all-MiniLM-L6-v2viatransformers.js. - Set up messaging bridge to pass chunks to the worker and receive vectors.
- Create
-
Retrieval Engine Upgrade
- Update
src/lib/retrieval.tsto compute Cosine Similarity between the query embedding and chunk embeddings. - Refactor
retrieve()to use the new uploaded chunks instead of the hardcodedNCERT_CHUNKSwhen a document is uploaded.
- Update
-
Conversational LLM Generation
- Integrate
Xenova/Qwen1.5-0.5B-Chatin the Web Worker. - Intercept retrieved chunk context and prompt the LLM to generate conversational answers.
- Implement UI streaming animation that dynamically renders LLM tokens.
- Configure fallback keyword retrieval thresholds to handle unrelated queries safely.
- Integrate
-
Testing & Polish
- Test with a sample PDF.
- Ensure the browser UI does not freeze during the embedding process.
- Verify semantic search correctly returns the most relevant paragraph.
- Confirm dynamic UI text correctly reflects the uploaded document.