A question-answering system built on DeepSeek's open-source technology that leverages vector databases and large language models for accurate information retrieval and response generation.
python_scripts/- Data extraction from PDFs
- Vector database creation
datasets/- Contains training and evaluation data including:- DeepSeek technical documentation
- Open-source week announcements
- Design notes
Notebooks/- Jupyter notebooks for:- Model fine-tuning
- Inference and evaluation
- Q&A generation
fine_tuning_data/- Contains data for fine-tuning the modelsvector_db/- Contains the vector database