feat: add example configuration files and update inference class ref by ahmedshahriar · Pull Request #6 · ahmedshahriar/llm-ghostwriter

ahmedshahriar · 2025-11-07T05:43:52Z

This pull request introduces several configuration and code improvements to support cloud-based Qdrant vector database integration, refactors the Huggingface inference model class for clarity, and adds new example configuration and payload files to streamline onboarding and testing. It also updates documentation and comments for better maintainability and understanding.

Cloud Vector Database Integration:

Added Qdrant Cloud configuration options (USE_QDRANT_CLOUD, QDRANT_CLOUD_URL, QDRANT_APIKEY) to .env.example to support cloud-hosted vector databases.
Provided an example for setting the Ollama API URL in .env.example.

Inference Model Refactor:

Renamed and refactored the Huggingface Transformers inference class from LLMInferenceTransformersLocal to LLMInferenceTransformers across codebase for consistency and clarity (core/model/inference/inference.py, __init__.py, usage in APIs and tests). [1] [2] [3] [4] [5] [6]

Configuration and Example Files:

Added example configuration files for digital data ETL (configs/digital_data_etl_author_name.yaml.example) and feature engineering (configs/feature_engineering.yaml.example). [1] [2]
Added a sample payload for API testing in tests/payloads/payload.json.example.

Documentation and Usability:

Improved comments and added example outputs in Jupyter notebooks for fine-tuning and supervised training (llm_ghostwriter_finetune_dpo.ipynb, llm_ghostwriter_finetune_sft.ipynb). [1] [2]
Updated license specification in pyproject.toml for compliance and clarity.

Minor Fixes and Data Handling:

Enabled export of RepositoryDocument in data warehouse tool and fixed typo in import assertion. [1] [2]
Updated prompt and query examples to use placeholder author names for better generalization. [1] [2]

Let me know if you have questions about any specific change!…rences

…rences

Copilot

Pull Request Overview

This PR updates the codebase to use generic placeholder values instead of specific author names, renames a class for better clarity, enables repository document export/import functionality, and modernizes the project configuration. It also adds example configuration files and documentation improvements.

Replaces hardcoded author names with generic placeholders (e.g., <author name>, <Author Name>) across various files
Renames LLMInferenceTransformersLocal to LLMInferenceTransformers for improved naming clarity
Enables RepositoryDocument export/import in the data warehouse functionality

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/rag.py	Replaced hardcoded author name with generic placeholder
tools/data_warehouse.py	Enabled RepositoryDocument export/import and fixed grammar error ("doesn't exists" → "doesn't exist")
tests/payloads/payload.json.example	Added new example payload file with generic author placeholder
pyproject.toml	Modernized license format to PEP 639 standard with explicit license-files field
core/model/inference/test.py	Updated to use renamed LLMInferenceTransformers class
core/model/inference/inference.py	Renamed class from LLMInferenceTransformersLocal to LLMInferenceTransformers and removed outdated comment
core/model/inference/init.py	Updated exports to reflect class rename
core/model/finetuning/llm_ghostwriter_finetune_sft.ipynb	Added example output as commented documentation
core/model/finetuning/llm_ghostwriter_finetune_dpo.ipynb	Added comments explaining DPO learning rate best practices and example output
core/infrastructure/inference_pipeline_api.py	Updated to use renamed LLMInferenceTransformers class
core/application/rag/self_query.py	Replaced hardcoded author name with generic placeholder and added clarifying comment
core/application/rag/query_expansion.py	Added clarifying comment about model selection
configs/feature_engineering.yaml.example	Added new example configuration file with generic author names
configs/digital_data_etl_author_name.yaml.example	Added new example configuration file for ETL with generic author data
.gitignore	Updated to ignore additional local directories and configuration files
.env.example	Added Qdrant Cloud configuration and commented OLLAMA_API_URL option

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-07T05:46:24Z

configs/digital_data_etl_author_name.yaml.example

+    # blog Posts
+    - https://johndoe.blog/post1
+    - https://johndoe.blog/post2
+    # github Repositories


Inconsistent comment formatting: "blog Posts" should be "Blog Posts" or "blog posts" to match the capitalization pattern used for "github Repositories" below (which should likely be "GitHub Repositories").

Suggested change

# blog Posts

- https://johndoe.blog/post1

- https://johndoe.blog/post2

# github Repositories

# Blog Posts

- https://johndoe.blog/post1

- https://johndoe.blog/post2

# GitHub Repositories

Copilot · 2025-11-07T05:46:24Z

configs/digital_data_etl_author_name.yaml.example

+    # blog Posts
+    - https://johndoe.blog/post1
+    - https://johndoe.blog/post2
+    # github Repositories


Inconsistent capitalization: "github Repositories" should be "GitHub Repositories" to properly capitalize the brand name and maintain consistency.

Suggested change

# github Repositories

# GitHub Repositories

Copilot · 2025-11-07T05:46:25Z

.env.example

@@ -1,6 +1,7 @@
 OPENAI_MODEL_ID=gpt-4.1-nano
 OPENAI_API_KEY=<str>
 OLLAMA_MODEL_ID=llama3.2:3b#replace with your model


Missing space after the comment delimiter. Should be llama3.2:3b # replace with your model for consistency with other comment formatting.

feat: add example configuration files and update inference class refe…

33c67a3

…rences

Copilot AI review requested due to automatic review settings November 7, 2025 05:43

Copilot AI reviewed Nov 7, 2025

View reviewed changes

ahmedshahriar merged commit 1822377 into main Nov 7, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add example configuration files and update inference class ref#6

feat: add example configuration files and update inference class ref#6
ahmedshahriar merged 1 commit intomainfrom
staging

ahmedshahriar commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahmedshahriar commented Nov 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants