This guide walks through: (1) creating an AutoML experiment on Red Hat OpenShift AI, (2) deploying the best model, and (3) using that deployment with this MCP server so tools (e.g. invoke_churn) call your live model.
It is based on the Red Hat AI examples – Predict Customer Churn tutorial. You can follow the same pattern for other datasets (e.g. credit risk) by swapping the pipeline inputs and schema.
| Phase | What you do |
|---|---|
| 1. AutoML experiment | Create a project, S3 connections, run the AutoML pipeline (e.g. Telco Churn), view the leaderboard. |
| 2. Deploy best model | Register the chosen model, set up the AutoGluon ServingRuntime (KServe), deploy the model, get the inference URL and token. |
| 3. Use with MCP server | Set DEPLOYMENT_URL and DEPLOYMENT_TOKEN in .env; run the MCP server; tools will POST to your deployment. |
- Access to Red Hat OpenShift AI (self-managed or cloud).
- Permissions to create projects, connections, pipelines, workbenches, and deployments.
- This repo with the MCP AutoML server (see main README for Python/setup).
Follow the churn prediction tutorial up to and including View the leaderboard. Summary:
- In OpenShift AI, go to Projects and create a new project (e.g.
customer-churn-mlorcredit-risk-ml).
Create two S3-compatible connections in the project:
- Results storage – for pipeline artifacts and leaderboard (e.g.
automl-results-s3). You will use this when configuring the Pipeline Server. - Training data – for the dataset (e.g.
customer-churn-data-s3). Note the connection name; you will use it astrain_data_secret_namein the pipeline run.
See: Create the S3 connections.
- In your project, open Pipelines (or project details) and Configure pipeline server.
- Set the Object storage connection to the same bucket/credentials as your results S3 connection so runs and artifacts (leaderboard, models) are stored there.
- Choose Default database or External MySQL as needed. Create/Save and wait until the Pipeline Server is ready.
See: Configure the Pipeline Server.
- In Workbenches, create a workbench and attach both the results and training-data S3 connections so you can access artifacts and data without a restart.
See: Create workbench with connections attached.
- Download the dataset (e.g. WA_FnUseC_TelcoCustomerChurn.csv for churn).
- Upload it to the bucket used by the training data connection. Note the bucket name and object key (path) for the pipeline run.
See: Upload the training dataset to S3.
- Get the compiled AutoML pipeline from the repo: autogluon_tabular_training_pipeline (branch
rhoai_automl), e.g. pipeline.yaml. - In OpenShift AI Pipelines, add it as a new Pipeline Definition (upload/create from YAML).
- Create a pipeline run with at least:
- train_data_secret_name – name of the training-data S3 connection
- train_data_bucket_name – bucket name
- train_data_file_key – object key of the CSV (e.g.
data/WA_FnUseC_TelcoCustomerChurn.csv) - label_column – e.g.
Churn - task_type – e.g.
binary - top_n – e.g.
3
- Start the run and wait for completion.
See: Add the AutoML pipeline, Run AutoML with the required inputs.
- Open the run Artifacts and locate the leaderboard (e.g. HTML). Download or open it and pick the best model (e.g. top-ranked) to deploy.
See: View the leaderboard.
Follow the tutorial from Model Registry through Deployment Scoring.
- Create a Model registry (one-time, if not already done) under Settings → Model resources and operations → AI registry settings.
- In Registry → Model registry, Register model:
- Model location: Object storage (S3), using the same artifact store as your pipeline.
- Path: root folder of one refitted predictor (e.g. under
.../autogluon-models-full-refit/<task_id>/model_artifact/<ModelName>_FULL/). - Set Model name, Version, and Source model format (e.g. custom / AutoGluon), then Register.
See: Model Registry.
- Build the serving image on the cluster (ImageStream + BuildConfig) using the tutorial’s YAML (Git source, Dockerfile, output to ImageStream).
- Create the ServingRuntime from the tutorial’s ServingRuntime YAML:
- Set
metadata.namespaceto your project. - Set
spec.containers[0].imageto the built image (e.g.image-registry.openshift-image-registry.svc:5000/<namespace>/autogluonkserveimagev1:latest).
- Set
- In OpenShift AI: Settings → Serving runtimes → Add serving runtime → upload the YAML, select REST and Predictive model, then Create.
See: Prepare the ServingRuntime for AutoGluon with KServe.
- Projects → your project → Deployments → Deploy model.
- Model location: S3; use the path to the refitted model (same as registry or from run artifacts).
- Model type: Predictive model.
- Model framework: e.g. autogluon - 1.
- Serving runtime: AutoGluon ServingRuntime for KServe.
- In Advanced settings:
- Require token authentication – enable if you want to use a Bearer token (recommended for the MCP server).
- Make model deployment available through an external route – enable so you can call the endpoint from your machine (for the MCP server).
- Deploy model and wait until the deployment is running.
See: Model Deployment.
- Open the deployment details. Under Inference endpoint, copy the external URL (only if you enabled the external route).
- Deployment URL for the MCP server is the predict endpoint, for example:
<EXTERNAL_BASE_URL>/v1/models/<MODEL_NAME>:predict
where<MODEL_NAME>is the deployment’s Resource name (lowercase, no spaces). Example:https://my-model-myproject.apps.example.com/v1/models/my-churn-model:predict
- Token (if you enabled token auth): Projects → your project → Deployments → expand the deployment → use the Token secret value as
DEPLOYMENT_TOKEN.
See: Deployment Scoring for the exact request format (e.g. instances with per-field arrays). The MCP server sends payloads in that same shape.
The MCP server sends a JSON body like:
{
"instances": [
{ "feature1": [value1], "feature2": [value2], ... }
]
}Your deployment (e.g. AutoGluon churn) expects one object per instance with the same feature names as in training. This repo provides churn_schema.json and the tool invoke_churn in tools_config.yaml. The schema properties match the churn model’s inputs (e.g. gender, tenure, Contract, Churn, etc. as in the tutorial’s curl example).
Set DEPLOYMENT_URL (and DEPLOYMENT_TOKEN if used) to your churn predict endpoint. To add another tool, add a new JSON Schema and entry in tools_config.yaml with its own schema_path and the same deployment_url_env / deployment_token_env if needed.
In the MCP server directory, ensure .env exists and contains the deployment values from Phase 2:
mv template.env .envEdit .env:
- DEPLOYMENT_URL = full predict URL, e.g.
https://my-model-myproject.apps.example.com/v1/models/my-churn-model:predict - DEPLOYMENT_TOKEN = token from the deployment’s Token secret (leave empty or omit if you did not enable token auth)
Keep LLAMA_STACK_CLIENT_* (or other LLM vars) if you use the demo client; see main README.
-
Start the server:
python mcp_automl/mcp_server.py
-
Use the deployment via MCP:
- Demo client: run
python mcp_automl/interact_with_mcp.pyand ask a question that triggers the tool (e.g. churn). - Cursor: add the MCP server with URL
http://127.0.0.1:8000/sse(see README – Attaching to Cursor and Ollama). - Ollama: use an MCP-capable client with the same URL, or use the demo client with Ollama as the LLM.
- Demo client: run
The tool will POST to DEPLOYMENT_URL with Authorization: Bearer <DEPLOYMENT_TOKEN> and the validated input as instances; the response (e.g. predictions) is returned to the caller.
| Step | Where | What to set / do |
|---|---|---|
| AutoML run | OpenShift AI Pipelines | Pipeline definition, run params (train_data_*, label_column, task_type, top_n) |
| Leaderboard | Run Artifacts | Pick best model for deployment |
| Deploy model | Project → Deployments | S3 path, AutoGluon runtime, external route, token auth |
| Inference URL | Deployment details | <base>/v1/models/<resource-name>:predict |
| Token | Deployment → Token secret | DEPLOYMENT_TOKEN in .env |
| MCP server | .env |
DEPLOYMENT_URL, DEPLOYMENT_TOKEN |
| Tool schema | tools_config.yaml + churn_schema.json |
Match deployment input features |
- Red Hat AI examples – Predict Customer Churn (AutoML) – full tutorial (project, S3, pipeline, leaderboard, predictor notebook, model registry, ServingRuntime, deployment, scoring).
- autogluon_tabular_training_pipeline – pipeline source (branch
rhoai_automl). - KServe V1 Protocol – inference request/response format.
- This repo’s README – MCP server setup, tools, Cursor/Ollama, and
.env.