ray-project
diff --git a/‎…orch_Lightning/00_workload/thumbnail.png‎ ‎…orkloads/PyTorch_Lightning/thumbnail.png‎courses/workloads/PyTorch_Lightning/00_workload/thumbnail.png renamed to courses/workloads/PyTorch_Lightning/thumbnail.png b/‎…orch_Lightning/00_workload/thumbnail.png‎ ‎…orkloads/PyTorch_Lightning/thumbnail.png‎courses/workloads/PyTorch_Lightning/00_workload/thumbnail.png renamed to courses/workloads/PyTorch_Lightning/thumbnail.png
diff --git a/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/lesson.html‎
Lines changed: 7538 additions & 0 deletions b/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/lesson.html‎
Lines changed: 7538 additions & 0 deletions
diff --git a/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/lesson.ipynb‎
Lines changed: 60 additions & 0 deletions b/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/lesson.ipynb‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/slides.md‎
Lines changed: 59 additions & 0 deletions b/‎courses/workloads/Ray_Data_Batch_Inference/00_workload/00_lesson/slides.md‎
Lines changed: 59 additions & 0 deletions
@@ -0,0 +1,60 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Batch Inference with Ray Data",
+    "© 2025, Anyscale. All Rights Reserved",
+    "",
+    "💻 **Launch Locally**: You can run this notebook locally.",
+    "",
+    "🚀 **Launch on Cloud**: Think about running this notebook on a Ray Cluster (Click [here](http://console.anyscale.com/register) to easily start a Ray cluster on Anyscale)",
+    "",
+    "This example shows how to do batch inference with Ray Data.",
+    "",
+    "Batch inference with Ray Data enables you to efficiently generate predictions from machine learning models on large datasets by processing multiple data points at once. Instead of running inference on one row at a time, which can be slow and resource-inefficient, batch inference leverages vectorized computation and parallelism to maximize throughput. This is especially useful when working with modern deep learning models, which are optimized for batch processing on CPUs, GPUs, or Apple Silicon devices.",
+    "",
+    "The typical workflow begins by loading your dataset—such as a public dataset from Hugging Face—into a Ray Dataset. Ray Data can automatically partition the data for parallel processing, or you can repartition it explicitly to control the number of data blocks. Once the data is loaded, you define a callable class (such as a text embedding model) that loads the machine learning model in its constructor and implements a `__call__` method to process each batch. Ray Data’s `map_batches` API is then used to apply this callable to each batch of data, with options to control concurrency and resource allocation (e.g., number of GPUs).",
+    "",
+    "This approach allows you to spin up multiple concurrent model instances, each processing different batches of data in parallel. The result is a significant speedup in inference time, especially for large datasets. After inference, you can materialize the results, inspect the output, and shut down the Ray cluster to free up resources. Batch inference with Ray Data is scalable, flexible, and integrates seamlessly with modern ML workflows, making it a powerful tool for production and research environments alike."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Outline",
+    "",
+    "<b>In this notebook, we go through a typical ML batch inference workflow:</b>",
+    "",
+    "",
+    "    Architecture",
+    "    Import Libraries",
+    "    Load a public dataset from Hugging Face and move it into Ray Data object store.",
+    "    Batch Inference Class",
+    "        - Create a Ray actor class to load a ML model. In this example, we use SentenceTransformer library from Hugging Face to load a sentence embedding model.",
+    "    Create batches of data to do inference.",
+    "    Deploying at Scale",
+    "    Inference on the entire dataset",
+    "    Out of memory errors",
+    "    Summary",
+    "</ul>",
+    "</div>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
@@ -0,0 +1,59 @@
+---
+theme: seriph
+background: /slides_background.png
+class: text-center
+drawings:
+    persist: 
+# slide transition: https://sli.dev/guide/animations.html#slide-transitions
+# transition: fade
+# enable MDC Syntax: https://sli.dev/features/mdc
+mdc: true
+# duration of the presentation
+duration: 15min
+addons:
+    - fancy-arrow
+    - slidev-addon-tldraw
+    - slidev-component-spotlight
+    - slidev-component-poll
+    - slidev-addon-typst
+---
+
+
+# Batch Inference with Ray Data
+
+---
+
+# Batch Inference with Ray Data
+© 2025, Anyscale. All Rights Reserved
+
+💻 **Launch Locally**: You can run this notebook locally.
+
+🚀 **Launch on Cloud**: Think about running this notebook on a Ray Cluster (Click [here](http://console.anyscale.com/register) to easily start a Ray cluster on Anyscale)
+
+This example shows how to do batch inference with Ray Data.
+
+Batch inference with Ray Data enables you to efficiently generate predictions from machine learning models on large datasets by processing multiple data points at once. Instead of running inference on one row at a time, which can be slow and resource-inefficient, batch inference leverages vectorized computation and parallelism to maximize throughput. This is especially useful when working with modern deep learning models, which are optimized for batch processing on CPUs, GPUs, or Apple Silicon devices.
+
+The typical workflow begins by loading your dataset—such as a public dataset from Hugging Face—into a Ray Dataset. Ray Data can automatically partition the data for parallel processing, or you can repartition it explicitly to control the number of data blocks. Once the data is loaded, you define a callable class (such as a text embedding model) that loads the machine learning model in its constructor and implements a `__call__` method to process each batch. Ray Data’s `map_batches` API is then used to apply this callable to each batch of data, with options to control concurrency and resource allocation (e.g., number of GPUs).
+
+This approach allows you to spin up multiple concurrent model instances, each processing different batches of data in parallel. The result is a significant speedup in inference time, especially for large datasets. After inference, you can materialize the results, inspect the output, and shut down the Ray cluster to free up resources. Batch inference with Ray Data is scalable, flexible, and integrates seamlessly with modern ML workflows, making it a powerful tool for production and research environments alike.
+
+---
+
+### Outline
+
+<b>In this notebook, we go through a typical ML batch inference workflow:</b>
+
+
+    Architecture
+    Import Libraries
+    Load a public dataset from Hugging Face and move it into Ray Data object store.
+    Batch Inference Class
+        - Create a Ray actor class to load a ML model. In this example, we use SentenceTransformer library from Hugging Face to load a sentence embedding model.
+    Create batches of data to do inference.
+    Deploying at Scale
+    Inference on the entire dataset
+    Out of memory errors
+    Summary
+</ul>
+</div>