feat: add example configuration files and update inference class ref#6
feat: add example configuration files and update inference class ref#6ahmedshahriar merged 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates the codebase to use generic placeholder values instead of specific author names, renames a class for better clarity, enables repository document export/import functionality, and modernizes the project configuration. It also adds example configuration files and documentation improvements.
- Replaces hardcoded author names with generic placeholders (e.g.,
<author name>,<Author Name>) across various files - Renames
LLMInferenceTransformersLocaltoLLMInferenceTransformersfor improved naming clarity - Enables
RepositoryDocumentexport/import in the data warehouse functionality
Reviewed Changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/rag.py | Replaced hardcoded author name with generic placeholder |
| tools/data_warehouse.py | Enabled RepositoryDocument export/import and fixed grammar error ("doesn't exists" → "doesn't exist") |
| tests/payloads/payload.json.example | Added new example payload file with generic author placeholder |
| pyproject.toml | Modernized license format to PEP 639 standard with explicit license-files field |
| core/model/inference/test.py | Updated to use renamed LLMInferenceTransformers class |
| core/model/inference/inference.py | Renamed class from LLMInferenceTransformersLocal to LLMInferenceTransformers and removed outdated comment |
| core/model/inference/init.py | Updated exports to reflect class rename |
| core/model/finetuning/llm_ghostwriter_finetune_sft.ipynb | Added example output as commented documentation |
| core/model/finetuning/llm_ghostwriter_finetune_dpo.ipynb | Added comments explaining DPO learning rate best practices and example output |
| core/infrastructure/inference_pipeline_api.py | Updated to use renamed LLMInferenceTransformers class |
| core/application/rag/self_query.py | Replaced hardcoded author name with generic placeholder and added clarifying comment |
| core/application/rag/query_expansion.py | Added clarifying comment about model selection |
| configs/feature_engineering.yaml.example | Added new example configuration file with generic author names |
| configs/digital_data_etl_author_name.yaml.example | Added new example configuration file for ETL with generic author data |
| .gitignore | Updated to ignore additional local directories and configuration files |
| .env.example | Added Qdrant Cloud configuration and commented OLLAMA_API_URL option |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # blog Posts | ||
| - https://johndoe.blog/post1 | ||
| - https://johndoe.blog/post2 | ||
| # github Repositories |
There was a problem hiding this comment.
Inconsistent comment formatting: "blog Posts" should be "Blog Posts" or "blog posts" to match the capitalization pattern used for "github Repositories" below (which should likely be "GitHub Repositories").
| # blog Posts | |
| - https://johndoe.blog/post1 | |
| - https://johndoe.blog/post2 | |
| # github Repositories | |
| # Blog Posts | |
| - https://johndoe.blog/post1 | |
| - https://johndoe.blog/post2 | |
| # GitHub Repositories |
| # blog Posts | ||
| - https://johndoe.blog/post1 | ||
| - https://johndoe.blog/post2 | ||
| # github Repositories |
There was a problem hiding this comment.
Inconsistent capitalization: "github Repositories" should be "GitHub Repositories" to properly capitalize the brand name and maintain consistency.
| # github Repositories | |
| # GitHub Repositories |
| @@ -1,6 +1,7 @@ | |||
| OPENAI_MODEL_ID=gpt-4.1-nano | |||
| OPENAI_API_KEY=<str> | |||
| OLLAMA_MODEL_ID=llama3.2:3b#replace with your model | |||
There was a problem hiding this comment.
Missing space after the comment delimiter. Should be llama3.2:3b # replace with your model for consistency with other comment formatting.
This pull request introduces several configuration and code improvements to support cloud-based Qdrant vector database integration, refactors the Huggingface inference model class for clarity, and adds new example configuration and payload files to streamline onboarding and testing. It also updates documentation and comments for better maintainability and understanding.
Cloud Vector Database Integration:
USE_QDRANT_CLOUD,QDRANT_CLOUD_URL,QDRANT_APIKEY) to.env.exampleto support cloud-hosted vector databases..env.example.Inference Model Refactor:
LLMInferenceTransformersLocaltoLLMInferenceTransformersacross codebase for consistency and clarity (core/model/inference/inference.py,__init__.py, usage in APIs and tests). [1] [2] [3] [4] [5] [6]Configuration and Example Files:
configs/digital_data_etl_author_name.yaml.example) and feature engineering (configs/feature_engineering.yaml.example). [1] [2]tests/payloads/payload.json.example.Documentation and Usability:
llm_ghostwriter_finetune_dpo.ipynb,llm_ghostwriter_finetune_sft.ipynb). [1] [2]pyproject.tomlfor compliance and clarity.Minor Fixes and Data Handling:
RepositoryDocumentin data warehouse tool and fixed typo in import assertion. [1] [2]Let me know if you have questions about any specific change!…rences