Skip to content

Commit 62d819d

Browse files
kriscon-dbMrPowers
andauthored
Added initial documentation for models (unitycatalog#447)
**PR Checklist** - [x] A description of the changes is added to the description of this PR. - [ ] If there is a related issue, make sure it is linked to this PR. - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added or modified a feature, documentation in `docs` is updated **Description of changes** Initial version of model documentation. Quickstart manually verified on my local machine. --------- Co-authored-by: Matthew Powers <[email protected]>
1 parent a7cfdc9 commit 62d819d

File tree

3 files changed

+324
-1
lines changed

3 files changed

+324
-1
lines changed

docs/quickstart.md

+78
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,84 @@ Delete the table to clean up:
146146
bin/uc table delete --full_name unity.default.my_table
147147
```
148148

149+
## Manage models in Unity Catalog using MLflow
150+
151+
Unity Catalog supports the management and governance of ML models as securable assets. Starting with
152+
[MLflow 2.16.1](https://mlflow.org/releases/2.16.1), MLflow offers integrated support for using Unity Catalog as the
153+
backing resource for the MLflow model registry. What this means is that with the MLflow client, you will be able to
154+
interact directly with your Unity Catalog service for the creation and access of registered models.
155+
156+
## Setup MLflow for usage with Unity Catalog
157+
158+
In your desired development environment, install MLflow 2.16.1 or higher:
159+
160+
```sh
161+
$ pip install mlflow
162+
```
163+
164+
The installation of MLflow includes the MLflow CLI tool, so you can start a local MLflow server with UI by running the command below in your terminal:
165+
166+
```sh
167+
$ mlflow ui
168+
```
169+
170+
It will generate logs with the IP address, for example:
171+
172+
```
173+
(mlflow) [master][~/Documents/mlflow_team/mlflow]$ mlflow ui
174+
[2023-10-25 19:39:12 -0700] [50239] [INFO] Starting gunicorn 20.1.0
175+
[2023-10-25 19:39:12 -0700] [50239] [INFO] Listening at: http://127.0.0.1:5000 (50239)
176+
```
177+
178+
Next, from within a python script or shell, import MLflow and set the tracking URI and the registry URI.
179+
180+
```python
181+
import mlflow
182+
183+
mlflow.set_tracking_uri("http://127.0.0.1:5000")
184+
mlflow.set_registry_uri("uc:http://127.0.0.1:8080")
185+
```
186+
187+
At this point, your MLflow environment is ready for use with the newly started MLflow tracking server and the Unity Catalog server acting as your model registry.
188+
189+
You can quickly train a test model and validate that the MLflow/Unity catalog integration is fully working.
190+
191+
```python
192+
import os
193+
from sklearn import datasets
194+
from sklearn.ensemble import RandomForestClassifier
195+
from sklearn.model_selection import train_test_split
196+
import pandas as pd
197+
198+
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
199+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
200+
201+
with mlflow.start_run():
202+
# Train a sklearn model on the iris dataset
203+
clf = RandomForestClassifier(max_depth=7)
204+
clf.fit(X_train, y_train)
205+
# Take the first row of the training dataset as the model input example.
206+
input_example = X_train.iloc[[0]]
207+
# Log the model and register it as a new version in UC.
208+
mlflow.sklearn.log_model(
209+
sk_model=clf,
210+
artifact_path="model",
211+
# The signature is automatically inferred from the input example and its predicted output.
212+
input_example=input_example,
213+
registered_model_name="unity.default.iris",
214+
)
215+
216+
loaded_model = mlflow.pyfunc.load_model(f"models:/unity.default.iris/1")
217+
predictions = loaded_model.predict(X_test)
218+
iris_feature_names = datasets.load_iris().feature_names
219+
result = pd.DataFrame(X_test, columns=iris_feature_names)
220+
result["actual_class"] = y_test
221+
result["predicted_class"] = predictions
222+
result[:4]
223+
```
224+
225+
This code snippet will create a registered model `default.unity.iris` and log the trained model as model version 1. It then loads the model from the Unity Catalog server, and performs batch inference on the test set using the loaded model.
226+
149227
## APIs and Compatibility
150228

151229
- Open API specification: See the Unity Catalog Rest API.

docs/usage/cli.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -415,7 +415,11 @@ bin/uc function delete --full_name <catalog>.<schema>.<function_name>
415415
- `schema` : The name of the schema.
416416
- `function_name` : The name of the function.
417417

418-
## 6. CLI Server Configuration
418+
## 6. Registered model and model version management
419+
420+
Please refer to [MLflow documentation](https://mlflow.org/docs/latest/index.html) to learn how to use MLflow to create, register, update, use, and delete registered models and model versions.
421+
422+
## 7. CLI Server Configuration
419423

420424
By default, the CLI tool is configured to interact with a local reference server running at `http://localhost:8080`.
421425
The CLI can be configured to talk to Databricks Unity Catalog by one of the following methods:

0 commit comments

Comments
 (0)