Open
Description
Which example? Describe the issue
For the below code based on this sample notebook;
https://github.com/Azure/azureml-examples/blob/main/notebooks/using-mlflow/train-with-mlflow/xgboost_service_principal.ipynb
I get a "RepresenterError" when trying to log a model using service principal, and not with user. SP is an appregistration and has the role "AzureML Data Scientist" on the aml workspace. Logging metrics with SP will not produce errors, but logging model will. Have tested with mlflow 1.27.0 and 1.28.0. Is this expected behaviour or bug? The mlflow.sklearn.autolog() as done in sample notebook does not work either.
example:
"name": "RepresenterError",
"message": "('cannot represent an object', OrderedDict([('name', 'mlflow-env'), ('channels', ['conda-forge']), ('dependencies', ['python=3.9.7', 'pip<=21.2.4', {'pip': ['mlflow', 'cloudpickle==2.1.0', 'psutil==5.9.1', 'scikit-learn==1.1.2', 'typing-extensions==4.3.0']}])]))", .... and so on ....
Additional context
This code will generate the error:
#%%
import os
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import pandas as pd
from azureml.core import Workspace
### SECTION A: Create credentials from service principal will produce error
os.environ['AZURE_TENANT_ID']="..."
os.environ['AZURE_CLIENT_ID']="..."
os.environ['AZURE_CLIENT_SECRET']="..."
credentials = DefaultAzureCredential()
ml_client = MLClient.from_config(credential=credentials)
ws = ml_client.workspaces.get(name="wsname")
mlflow.set_tracking_uri(ws.mlflow_tracking_uri)
### END A
### SECTION B: Alternative, use credentials for user which will not produce error (Replace A with B)
#ws = Workspace.from_config('config.json')
#ws.get_mlflow_tracking_uri()
#mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
### END B
mlflow.set_experiment("expname")
# Fit a simple model and log it
data_uri = "https://azuremlexamples.blob.core.windows.net/datasets/iris.csv"
df = pd.read_csv(data_uri)
X = df.drop(["species"], axis=1)
y = df["species"]
enc = LabelEncoder()
y = enc.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
# log model
with mlflow.start_run() as run:
mlflow.sklearn.log_model(model, 'modelpath')