Azure MLFlow deployment and instructions for how to use it.
Install the Azure CLI and login, selecting the correct subscription (probably "ARC") as active:
brew install azure-cli pwgen
az loginpwgen is used to auto-generate passwords.
To run these commands you will need to know:
- The Azure resource group the MLFlow server has been deployed into (default:
arc-turing-mlflow) - The name of the MLFlow container app (default: same as the resource group)
Running the following script:
cd setup-env
bash make_env.shWill:
- Prompt for a username and password to create on the MLFlow server
- Ask for the resource group and app name of the deployed MLFlow server
- Create the user on the server
- Save a
.envfile containing the necessary environment variables to set to use it (MLFLOW_TRACKING_URI,AZURE_STORAGE_CONNECTION_STRING,MLFLOW_TRACKING_USERNAME,MLFLOW_TRACKING_PASSWORD)
Source the saved .env file (source .env) before running scripts using mlflow, or add them to your .bash_profile/.zprofile/similar.
The Turing IP address is automatically added to the allow-list as part of the deployment. If you need to add another, run:
cd setup-env
bash add_ip.shThis will prompt for an IP address/address range to add, and a suitable label for it.
uv syncThe main ones are:
mlflow: The Python library for interacting with a MLFlow serverpsutil,nvidia-ml-py: If you want to log system (CPU, GPU respectively) stats with your jobazure-storage-blob,azure-identity: If you want to log artifacts (files, e.g. models), as these are stored in an Azure blob.hyperopt: Is the package MLFlow recommends for hyperparameter sweeps.
The rest of the dependencies in pyproject.toml are just for the examples.
You must have the following environment variables exported in your environment:
MLFLOW_TRACKING_URI- the URL of the MLFlow serverMLFLOW_TRACKING_USERNAME- your MLFlow usernameMLFLOW_TRACKING_PASSWORD- your MLFlow passwordAZURE_STORAGE_CONNECTION_STRING- the connection string for the Azure storage account for artefacts (only needed if you're logging artefacts to Azure). If you want to log an artifact locally instead, you should be able to do so by setting theartifact_locationwhen creating the MLFlow experiment you are logging results to, e.g.mlflow.create_experiment("experiment_name", artifact_location="/your/local/path").
The scripts in mlflow-examples give a few examples of using MLFlow.
To start:
-
Find the name of the MLFlow resource group in the ARC subscription in the Azure portal (https://portal.azure.com), e.g.
arc-mlflow-test. -
Find the name of the MLFLow container app, e.g.
mlflow-app. -
Set the correct values for these in the first two lines of
mlflow-examples/.env. -
Load the environment variables:
cd mlflow-examples source .env
Example scripts you can run:
uv run mlflow-examples/hello.py: Basic logging of a parameter, metric, and artifact.uv run mlflow-examples/train.py: Automated logging of metrics and models with the HuggingFace transformers Traineruv run mlflow-examples/sweep.py: A hyperparameter sweep.
If you go to the MLFLOW_TRACKING_URI in a browser and enter your username and password you should get to the UI and be able to browser through your tracked experiments and artefacts.
The mlflow-container and pgbouncer-container directories contain docker files for MLFlow and PgBouncer (for managing connections to the database). The images are hosted with the GitHub container registry, and will be rebuilt whenever a change is pushed to the relevant directory in the repo.
First, edit any variables you would like to in container-app/.env - this specifies names, passwords, and IP restrictions for the deployment, for example. Ensure the resource group you're specifying doesn't already exist. By default passwords are auto-generated and access to the MLFlow server is restricted to the deployment IP address.
cd container-app-deployment
bash deploy.shaz group delete --name $RESOURCE_GROUPWhere $RESOURCE_GROUP is the name of the resource group you deployed MLFlow to.