Skip to content

feat: add example configuration files and update inference class ref#6

Merged
ahmedshahriar merged 1 commit intomainfrom
staging
Nov 7, 2025
Merged

feat: add example configuration files and update inference class ref#6
ahmedshahriar merged 1 commit intomainfrom
staging

Conversation

@ahmedshahriar
Copy link
Owner

This pull request introduces several configuration and code improvements to support cloud-based Qdrant vector database integration, refactors the Huggingface inference model class for clarity, and adds new example configuration and payload files to streamline onboarding and testing. It also updates documentation and comments for better maintainability and understanding.

Cloud Vector Database Integration:

  • Added Qdrant Cloud configuration options (USE_QDRANT_CLOUD, QDRANT_CLOUD_URL, QDRANT_APIKEY) to .env.example to support cloud-hosted vector databases.
  • Provided an example for setting the Ollama API URL in .env.example.

Inference Model Refactor:

  • Renamed and refactored the Huggingface Transformers inference class from LLMInferenceTransformersLocal to LLMInferenceTransformers across codebase for consistency and clarity (core/model/inference/inference.py, __init__.py, usage in APIs and tests). [1] [2] [3] [4] [5] [6]

Configuration and Example Files:

  • Added example configuration files for digital data ETL (configs/digital_data_etl_author_name.yaml.example) and feature engineering (configs/feature_engineering.yaml.example). [1] [2]
  • Added a sample payload for API testing in tests/payloads/payload.json.example.

Documentation and Usability:

  • Improved comments and added example outputs in Jupyter notebooks for fine-tuning and supervised training (llm_ghostwriter_finetune_dpo.ipynb, llm_ghostwriter_finetune_sft.ipynb). [1] [2]
  • Updated license specification in pyproject.toml for compliance and clarity.

Minor Fixes and Data Handling:

  • Enabled export of RepositoryDocument in data warehouse tool and fixed typo in import assertion. [1] [2]
  • Updated prompt and query examples to use placeholder author names for better generalization. [1] [2]

Let me know if you have questions about any specific change!…rences

Copilot AI review requested due to automatic review settings November 7, 2025 05:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the codebase to use generic placeholder values instead of specific author names, renames a class for better clarity, enables repository document export/import functionality, and modernizes the project configuration. It also adds example configuration files and documentation improvements.

  • Replaces hardcoded author names with generic placeholders (e.g., <author name>, <Author Name>) across various files
  • Renames LLMInferenceTransformersLocal to LLMInferenceTransformers for improved naming clarity
  • Enables RepositoryDocument export/import in the data warehouse functionality

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/rag.py Replaced hardcoded author name with generic placeholder
tools/data_warehouse.py Enabled RepositoryDocument export/import and fixed grammar error ("doesn't exists" → "doesn't exist")
tests/payloads/payload.json.example Added new example payload file with generic author placeholder
pyproject.toml Modernized license format to PEP 639 standard with explicit license-files field
core/model/inference/test.py Updated to use renamed LLMInferenceTransformers class
core/model/inference/inference.py Renamed class from LLMInferenceTransformersLocal to LLMInferenceTransformers and removed outdated comment
core/model/inference/init.py Updated exports to reflect class rename
core/model/finetuning/llm_ghostwriter_finetune_sft.ipynb Added example output as commented documentation
core/model/finetuning/llm_ghostwriter_finetune_dpo.ipynb Added comments explaining DPO learning rate best practices and example output
core/infrastructure/inference_pipeline_api.py Updated to use renamed LLMInferenceTransformers class
core/application/rag/self_query.py Replaced hardcoded author name with generic placeholder and added clarifying comment
core/application/rag/query_expansion.py Added clarifying comment about model selection
configs/feature_engineering.yaml.example Added new example configuration file with generic author names
configs/digital_data_etl_author_name.yaml.example Added new example configuration file for ETL with generic author data
.gitignore Updated to ignore additional local directories and configuration files
.env.example Added Qdrant Cloud configuration and commented OLLAMA_API_URL option

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +4 to +7
# blog Posts
- https://johndoe.blog/post1
- https://johndoe.blog/post2
# github Repositories
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent comment formatting: "blog Posts" should be "Blog Posts" or "blog posts" to match the capitalization pattern used for "github Repositories" below (which should likely be "GitHub Repositories").

Suggested change
# blog Posts
- https://johndoe.blog/post1
- https://johndoe.blog/post2
# github Repositories
# Blog Posts
- https://johndoe.blog/post1
- https://johndoe.blog/post2
# GitHub Repositories

Copilot uses AI. Check for mistakes.
# blog Posts
- https://johndoe.blog/post1
- https://johndoe.blog/post2
# github Repositories
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization: "github Repositories" should be "GitHub Repositories" to properly capitalize the brand name and maintain consistency.

Suggested change
# github Repositories
# GitHub Repositories

Copilot uses AI. Check for mistakes.
@@ -1,6 +1,7 @@
OPENAI_MODEL_ID=gpt-4.1-nano
OPENAI_API_KEY=<str>
OLLAMA_MODEL_ID=llama3.2:3b#replace with your model
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the comment delimiter. Should be llama3.2:3b # replace with your model for consistency with other comment formatting.

Copilot uses AI. Check for mistakes.
@ahmedshahriar ahmedshahriar merged commit 1822377 into main Nov 7, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants