Skip to content

Commit d0a838f

Browse files
committed
multi-modal AI workloads foundations
Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>
1 parent 1a6ab1d commit d0a838f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+17455
-1
lines changed

courses/.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ public/
1212

1313
# Content production folder
1414
production/
15-
outputs/
15+
outputs/
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Multi-modal AI pipeline\n"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"\n",
15+
"<div align=\"left\">\n",
16+
"<a target=\"_blank\" href=\"https://console.anyscale.com/\"><img src=\"https://img.shields.io/badge/🚀 Run_on-Anyscale-9hf\"></a>&nbsp;\n",
17+
"<a href=\"https://github.com/anyscale/multimodal-ai\" role=\"button\"><img src=\"https://img.shields.io/static/v1?label=&amp;message=View%20On%20GitHub&amp;color=586069&amp;logo=github&amp;labelColor=2f363d\"></a>&nbsp;\n",
18+
"</div>\n",
19+
"\n",
20+
"💻 Run this entire tutorial on [Anyscale](https://www.anyscale.com/) for free:\n",
21+
"**https://console.anyscale.com/template-preview/image-search-and-classification** or access the repo [here](https://github.com/ray-project/ray/tree/master/doc/source/ray-overview/examples/e2e-multimodal-ai-workloads).\n",
22+
"\n",
23+
"This tutorial focuses on the fundamental challenges of multimodal AI workloads at scale:\n",
24+
"\n",
25+
"- **🔋 Compute**: managing heterogeneous clusters, reducing idle time, and handling complex dependencies\n",
26+
"- **📈 Scale**: integrating with the Python ecosystem, improving observability, and enabling effective debugging\n",
27+
"- **🛡️ Reliability**: ensuring fault tolerance, leveraging checkpointing, and supporting job resumability\n",
28+
"- **🚀 Production**: bridging dev-to-prod gaps, enabling fast iteration, maintaining zero downtime, and meeting SLAs\n",
29+
"\n",
30+
"This tutorial covers how Ray addresses each of these challenges and shows the solutions hands-on by implementing scalable batch inference, distributed training, and online serving workloads.\n",
31+
"\n",
32+
"- [**`01-Batch-Inference.ipynb`**](https://github.com/anyscale/multimodal-ai/tree/main/notebooks/01-Batch-Inference.ipynb): ingest and preprocess data at scale using [Ray Data](https://docs.ray.io/en/latest/data/data.html) to generate embeddings for an image dataset of different dog breeds and store them.\n",
33+
"- [**`02-Distributed-Training.ipynb`**](https://github.com/anyscale/multimodal-ai/tree/main/notebooks/02-Distributed-Training.ipynb): preprocess data to train an image classifier using [Ray Train](https://docs.ray.io/en/latest/train/train.html) and save model artifacts to a model registry (MLOps).\n",
34+
"- [**`03-Online-Serving.ipynb`**](https://github.com/anyscale/multimodal-ai/tree/main/notebooks/03-Online-Serving.ipynb): deploy an online service using [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), that uses the trained model to generate predictions.\n",
35+
"- Create production batch [**Jobs**](https://docs.anyscale.com/platform/jobs/) for offline workloads like embedding generation, model training, etc., and production online [**Services**](https://docs.anyscale.com/platform/services/) that can scale.\n",
36+
"\n",
37+
"<img src=\"https://raw.githubusercontent.com/anyscale/multimodal-ai/refs/heads/main/images/overview.png\" width=1000>\n",
38+
"\n",
39+
"## Development\n",
40+
"\n",
41+
"The application is developed on [Anyscale Workspaces](https://docs.anyscale.com/platform/workspaces/), which enables development without worrying about infrastructure—just like working on a laptop. Workspaces come with:\n",
42+
"- **Development tools**: Spin up a remote session from your local IDE (Cursor, VS Code, etc.) and start coding, using the same tools you love but with the power of Anyscale's compute.\n",
43+
"- **Dependencies**: Install dependencies using familiar tools like pip or uv. Anyscale propagates all dependencies to the cluster's worker nodes.\n",
44+
"- **Compute**: Leverage any reserved instance capacity, spot instance from any compute provider of your choice by deploying Anyscale into your account. Alternatively, you can use the Anyscale cloud for a full serverless experience.\n",
45+
" - Under the hood, a cluster spins up and is efficiently managed by Anyscale.\n",
46+
"- **Debugging**: Leverage a [distributed debugger](https://docs.anyscale.com/platform/workspaces/workspaces-debugging/#distributed-debugger) to get the same VS Code-like debugging experience.\n",
47+
"\n",
48+
"Learn more about Anyscale Workspaces in the [official documentation](https://docs.anyscale.com/platform/workspaces/).\n",
49+
"\n",
50+
"<div align=\"center\">\n",
51+
" <img src=\"https://raw.githubusercontent.com/anyscale/multimodal-ai/refs/heads/main/images/compute.png\" width=600>\n",
52+
"</div>\n",
53+
"\n",
54+
"### Additional dependencies\n",
55+
"\n",
56+
"You can choose to manage the additional dependencies via `uv` or `pip`. \n",
57+
"\n",
58+
"```bash\n",
59+
"# UV setup instructions\n",
60+
"uv init . # this creates pyproject.toml, uv lockfile, etc.\n",
61+
"ray_wheel_url=http://localhost:9478/ray/$(pip freeze | grep -oP '^ray @ file:///home/ray/\\.whl/\\K.*')\n",
62+
"uv add \"$ray_wheel_url[data, train, tune, serve]\" # to use anyscale's performant ray runtime\n",
63+
"uv add $(grep -v '^\\s*#' requirements.txt)\n",
64+
"uv add --editable ./doggos\n",
65+
"```\n",
66+
"\n",
67+
"```bash\n",
68+
"# Pip setup instructions\n",
69+
"pip install -q -r /home/ray/default/requirements.txt\n",
70+
"pip install -e ./doggos\n",
71+
"```\n",
72+
"\n",
73+
"**Note**: Run the entire tutorial for free on [Anyscale](https://console.anyscale.com/)—all dependencies come pre-installed, and compute autoscales automatically. To run it elsewhere, install the dependencies from the [`containerfile`](https://github.com/anyscale/multimodal-ai/tree/main/containerfile) and provision the appropriate GPU resources.\n",
74+
"\n",
75+
"## Production\n",
76+
"Seamlessly integrate with your existing CI/CD pipelines by leveraging the Anyscale [CLI](https://docs.anyscale.com/reference/quickstart-cli) or [SDK](https://docs.anyscale.com/reference/quickstart-sdk) to deploy [highly available services](https://docs.anyscale.com/platform/services) and run [reliable batch jobs](https://docs.anyscale.com/platform/jobs). Developing in an environment nearly identical to production—a multi-node cluster—drastically accelerates the dev-to-prod transition. This tutorial also introduces proprietary RayTurbo features that optimize workloads for performance, fault tolerance, scale, and observability.\n",
77+
"\n",
78+
"```bash\n",
79+
"anyscale job submit -f /home/ray/default/configs/generate_embeddings.yaml\n",
80+
"anyscale job submit -f /home/ray/default/configs/train_model.yaml\n",
81+
"anyscale service deploy -f /home/ray/default/configs/service.yaml\n",
82+
"```\n",
83+
"\n",
84+
"## No infrastructure headaches\n",
85+
"Abstract away infrastructure from your ML/AI developers so they can focus on their core ML development. You can additionally better manage compute resources and costs with [enterprise governance and observability](https://www.anyscale.com/blog/enterprise-governance-observability) and [admin capabilities](https://docs.anyscale.com/administration/overview) so you can set [resource quotas](https://docs.anyscale.com/reference/resource-quotas/), set [priorities for different workloads](https://docs.anyscale.com/administration/cloud-deployment/global-resource-scheduler) and gain [observability of your utilization across your entire compute fleet](https://docs.anyscale.com/administration/resource-management/telescope-dashboard).\n",
86+
"Users running on a Kubernetes cloud (EKS, GKE, etc.) can still access the proprietary RayTurbo optimizations demonstrated in this tutorial by deploying the [Anyscale Kubernetes Operator](https://docs.anyscale.com/administration/cloud-deployment/kubernetes/)."
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"metadata": {},
92+
"source": [
93+
"\n",
94+
"```{toctree}\n",
95+
":hidden:\n",
96+
"\n",
97+
"notebooks/01-Batch-Inference\n",
98+
"notebooks/02-Distributed-Training\n",
99+
"notebooks/03-Online-Serving\n",
100+
"```"
101+
]
102+
}
103+
],
104+
"metadata": {
105+
"language_info": {
106+
"name": "python"
107+
}
108+
},
109+
"nbformat": 4,
110+
"nbformat_minor": 2
111+
}

0 commit comments

Comments
 (0)