Now that we converted the notebook into a python script, we can use an orchestrator to turn the script into a production pipeline.
There's no video for this unit, but you can use ChatGPT to help you with this.
For that you first need to choose an orchestrator. For example:
- Airflow
- Prefect
- Dagster
- Kestra
- Mage
- or some other tool
- Configure the tool to run locally
- Run the simplest "hello world" workflow
- Get the code from the previous unit (see code)
- Use the tool to orchestrate the steps in the pipeline
- Schedule the workflow to run monthly
- The train data should be from two months ago
- The validation data - one month ago
- Learn to run the workflow for some of the past months
- Learn to deploy the tool to the cloud
For guidance, you can refer to past cohorts of the course:
- Prefect - 2022 and 2023
- Mage - 2024
You can also rely on ChatGPT or similar tools. They are very helpful.
More information here.
If you want to run MLFlow with Docker, you can do this:
Create a dockerfile for mlflow, e.g. mlflow.dockerfile:
FROM python:3.10-slim
RUN pip install mlflow==2.12.1
EXPOSE 5000
CMD [ \
"mlflow", "server", \
"--backend-store-uri", "sqlite:///home/mlflow_data/mlflow.db", \
"--host", "0.0.0.0", \
"--port", "5000" \
]Add it to the docker-compose.yaml:
mlflow:
build:
context: .
dockerfile: mlflow.dockerfile
ports:
- "5000:5000"
volumes:
- "${PWD}/mlflow_data:/home/mlflow_data/"In your code, make sure you use the same version of mlflow (mlflow==2.12.1).
When you run it, mlflow should be accessible at http://mlflow:5000.
Did you take notes? Add them here: