This project is an AI-powered article generator that fine-tunes models using OpenAI's API. The system generates and translates articles across multiple languages and supports managing multiple distinct projects. Each project has its own configuration, fine-tuning datasets, and article prompts, providing flexibility for creating unique content streams.
- Fine-tunes AI models with custom datasets.
- Supports multiple projects with isolated settings, titles, and output directories.
- Generates and translates articles using AI-powered prompts.
- Automatically translates generated articles into multiple languages.
- Manages fine-tuning job statuses and efficiently uploads new training files only when necessary.
- Publishes blog posts to HubSpot CMS with automatic markdown-to-HTML conversion.
The system is designed to handle multiple distinct projects, each with its own dataset, prompts, and article generation pipeline. Each project is maintained in its own directory, allowing you to easily manage different streams of content.
Each project is stored in its own directory under a common root folder. For example:
projects/
│
├── productivity/
│ ├── tuning/ # Fine-tuning data
│ ├── results/ # Generated articles
│ ├── titles.txt # List of article titles
│ ├── prompts/ # Custom prompt files for articles
│ ├── .env # Environment variables specific to this project
│
└── tech/
├── tuning/
├── results/
├── titles.txt
├── prompts/
├── .env
Each project has its own environment settings, training data, and output directories, allowing multiple distinct projects to run independently within the system.
git clone https://github.com/yourusername/ai-blogger.git
cd ai-bloggeruv is a fast Python package and environment manager that replaces raw pip usage in this project.
-
Install
uv(skip this if you already have it):curl -LsSf https://astral.sh/uv/install.sh | sh -
Create a dedicated local environment in
.venvand activate it:uv venv .venv source .venv/bin/activate
To ensure every uv command (including uv sync) targets .venv, export UV_PROJECT_ENVIRONMENT once per shell session or add it to your shell startup file:
export UV_PROJECT_ENVIRONMENT=.venvResolve and install project dependencies with uv (this reads from pyproject.toml and produces an uv.lock on the first run):
uv syncuv sync ensures the virtual environment is populated using uv's resolver and installer.
The English spaCy language model (en_core_web_sm) is bundled as a direct dependency, so uv sync installs it automatically. If you need to refresh the model manually, run:
uv run python -m spacy download en_core_web_sm-
Copy the example environment file:
cp .env.example .env
-
Edit the
.envfile and configure the necessary environment variables, such as OpenAI API keys, paths, and project-specific settings.
Use the data preparation script to process the training data into the format needed for fine-tuning:
uv run python prepare.pyThe preparation script processes Markdown and text files, converting them into a format suitable for AI fine-tuning. Markdown files are parsed to retain their hierarchical structure, and plain text files are cleaned up for valuable content extraction.
This project uses ruff for Python linting and formatting, with auto-fix enabled by default.
-
Run linting and formatting manually:
uv run ruff check uv run ruff format
-
Install git hooks (run once):
uv run pre-commit install
The installed pre-commit hooks run ruff with --fix, so staged Python files are automatically formatted and linted before each commit.
To fine-tune the model and generate articles for a specific project, run the main script and provide the project ID:
uv run python main.py <project_id>For example, to run the productivity project:
uv run python main.py productivityThis will:
- Combine fine-tuning files for the project.
- Upload the training file (if necessary) and wait for an available fine-tuning slot.
- Start the fine-tuning process if the training data has changed.
- Generate articles based on the titles in the
titles.txtfile and article prompts. - Translate the generated articles into multiple languages, as specified in the project settings.
-
Fine-Tuning: The system checks for changes in the fine-tuning dataset by comparing file hashes. If the dataset has changed, the system uploads the new dataset and fine-tunes the model. If there are no changes, the previous model is reused.
-
Article Generation: Once the fine-tuning is complete (or reused), the system generates articles for each title listed in the
titles.txtfile. The article content is based on a customizable prompt provided in the project. -
Translations: Each article is automatically translated into multiple languages based on the configuration in the project's environment file (
.env) or project settings.
You can configure the languages to which articles will be translated by setting environment variables or adjusting the project's .env file.
SUPPORTED_LANGUAGES=en,es,fr,deThis configuration will generate translations in English, Spanish, French, and German. You can add or remove languages by updating this setting.
Each project contains its own .env file, where you can configure settings such as the OpenAI API key, supported languages, fine-tuning settings, and more.
OPENAI_API_KEY=your_openai_api_key
GPT_MODEL=gpt-3.5-turbo
SUPPORTED_LANGUAGES=en,es,fr,deThis .env file contains project-specific settings like API keys and the list of supported languages.
The data preparation script (prepare.py) is used to process all text and Markdown files within a project's tuning/ directory. This script reads files, cleans up HTML tags, processes Markdown hierarchies, and creates training data in JSONL format for fine-tuning. It ensures that the text is properly structured for AI fine-tuning.
To run the data preparation script:
uv run python prepare.pyThis will process all .txt and .md files in the project's tuning directory, generating a .jsonl file that is ready for fine-tuning.
For Markdown files, the script parses headers and treats them as context for paragraphs. For example, in a file with the following structure:
# Header1
Paragraph 1
## Subheader1
Paragraph 2The generated prompts will maintain the context of the headers and subheaders:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Header1: Subheader1"},
{"role": "assistant", "content": "Header1: Subheader1: Paragraph 2"}
]
}This hierarchical structure ensures that the AI model understands the context of the text during fine-tuning.
You can easily manage multiple projects within this system. Each project has its own set of configurations, tuning files, and output directories. To switch between projects, simply specify the project ID when running the script.
-
Create a new project directory under the
projects/folder. -
Copy the necessary configuration files (such as
.env,titles.txt, and prompts) from an existing project. -
Modify the configuration and settings to suit the new project.
-
Run the new project using:
uv run python main.py <new_project_id>
The project includes a HubSpot publishing script that automatically publishes your generated blog posts to HubSpot CMS.
# 1. Configure your HubSpot API token in project.env
# HUBSPOT_API_TOKEN=your-token-here
# 2. List available content groups (blogs)
./hubspot-publish.py list-groups productivity
# 3. Preview what would be published (dry run)
./publish-to-hubspot.sh --dry-run --limit 5
# 4. Publish posts
./publish-to-hubspot.sh --limit 10- Automatically converts markdown to HTML
- Extracts metadata (title, category, description, keywords)
- Maps categories to HubSpot content groups
- Replaces
{url}placeholders with your app URL - Creates posts as DRAFTS for review
- Tracks published posts to prevent duplicates
- Selects the latest humanized English version of each post
For detailed instructions, see HUBSPOT_PUBLISHING.md.
PRIVATE