In this project, we build a machine learning workflow using AG2. The workflow involves data analysis, preprocessing, and model training to build a machine learning model.
Machine learning workflows typically involve several key steps:
- Data Analysis and Exploration: Understanding dataset size, columns, and distributions.
- Data Preprocessing: Cleaning data, handling missing values, and encoding categorical variables.
- Model Training: Training a model, comparing different models, and tuning hyperparameters.
The workflow follows the steps of data analysis, preprocessing, and model training. Each step is executed by a specific agent, and the transition between steps is determined by the success or failure of the previous step.
We follow a state machine design to build the machine learning workflow:
InitandEnd: Represent the start and end of the workflow.Explore: Analyze the dataset.- Agents: Data Explorer → Code Executor
- Transition: If code execution is successful, move to
Preprocess; otherwise, remain inExplore.
Preprocess: Clean and prepare data.- Agents: Data Preprocessor → Code Executor
- Transition: A language model determines whether all necessary preprocessing steps have been completed. If yes, move to
Train; otherwise, return toExplorefor further analysis.
Train: Train a machine learning model.- Agents: Model Trainer → Code Executor
- Transition: The model is trained in two iterations to compare performance. If the maximum trials are reached, move to
Summarize. If code execution fails, remain inTrain(failed trials do not count).
Summarize: Generate a summary of the workflow.- Agents: Summarizer
- Transition: Always moves to
End.
At the Explore, Preprocess, and Train states:
- A language model agent is invoked first.
- A code executor then executes the generated code.
- If execution fails, the workflow remains in the same state.
- If execution succeeds, conditions are checked to determine whether to transition to the next state.
This structured workflow ensures an efficient and iterative approach to machine learning model building.
This project demonstrates the following AG2 features:
## TAGS
TAGS: data analysis, groupchat, stateflow, code execution, kaggle, automated machine learning, workflow automation, model training, data preprocessing, state machine, hyperparameter tuning
- Python 3.12 or higher
- OpenAI API key
- Clone and navigate to the folder:
git clone https://github.com/ag2ai/build-with-ag2.git
cd build-with-ag2/automate-ml-for-kaggle- Install dependencies:
uv sync- Set up environment variables:
cp .env.example .env
# Edit .env with your OpenAI API keyRun the automated ML workflow:
uv run python main.pyThe workflow will:
- Analyze the dataset (
house_prices_train.csv) - Preprocess the data automatically
- Train and compare multiple models
- Generate performance visualizations
- Output a comprehensive summary
For more information or any questions, please refer to the documentation or reach out to us!
- View Documentation at: https://docs.ag2.ai/latest/
- Find AG2 on github: https://github.com/ag2ai/ag2
- Join us on Discord: https://discord.gg/pAbnFJrkgZ
- Email us at: support@ag2.ai
This project is licensed under the Apache License 2.0. See the LICENSE for details.