Create Choosing_Your_Engine_ONNX_vs_OpenVINO_in_Spark_NLP.ipynb

ahmedlone127 · ahmedlone127 · commit d0de0eb0c77e · 2026-05-08T16:48:06.000+05:00
diff --git a/examples/python/transformers/Choosing_Your_Engine_ONNX_vs_OpenVINO_in_Spark_NLP.ipynb b/examples/python/transformers/Choosing_Your_Engine_ONNX_vs_OpenVINO_in_Spark_NLP.ipynb
@@ -0,0 +1,282 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "header_cell"
+      },
+      "source": [
+        "![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)\n",
+        "\n",
+        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/Choosing_Your_Engine_ONNX_vs_OpenVINO_in_Spark_NLP.ipynb)\n",
+        "\n",
+        "# Choosing Your Inference Engine: ONNX vs OpenVINO in Spark NLP 🚀\n",
+        "\n",
+        "This notebook walks you through the `engine` parameter introduced in Spark NLP, which lets you **choose which deep learning backend** is used when downloading pretrained models.\n",
+        "\n",
+        "Spark NLP supports multiple inference backends:\n",
+        "- **`tensorflow`** — the original TensorFlow backend (older models)\n",
+        "- **`onnx`** — high-performance cross-platform runtime via [ONNX Runtime](https://onnxruntime.ai/) *(default since Spark NLP 5.0.0)*\n",
+        "- **`openvino`** — Intel-optimized runtime via [OpenVINO™ Toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html) *(since Spark NLP 5.4.0)*\n",
+        "\n",
+        "The engine parameter is exposed through:\n",
+        "- **`pretrainedEngine(name, lang, engine=...)`** — download a pretrained model with a specific engine backend\n",
+        "\n",
+        "Let's keep in mind a few things before we start 😊\n",
+        "- The engine you pick **changes the actual binary file downloaded** — ONNX models ship `.onnx` weights while OpenVINO models ship `.xml`/`.bin` weights. You can verify this with `ls` directly in the Spark NLP cache folder.\n",
+        "- All engines produce the **same results** for the same model — the difference is purely about runtime performance characteristics and hardware compatibility.\n",
+        "- ONNX is the default and works on all hardware. OpenVINO is optimized for Intel CPUs/GPUs/NPUs and can give significant speedups on those platforms."
+      ],
+      "id": "header_cell"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "toc_cell"
+      },
+      "source": [
+        "## Table of Contents\n",
+        "\n",
+        "1. [Install Dependencies](#1-install-dependencies)\n",
+        "2. [Start Spark NLP](#2-start-spark-nlp)\n",
+        "3. [Download the Same Model with Different Engines](#4-download-the-same-model-with-different-engines)\n",
+        "4. [When to Use Which Engine](#8-when-to-use-which-engine)"
+      ],
+      "id": "toc_cell"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "section1"
+      },
+      "source": [
+        "## 1. Install Dependencies\n",
+        "\n"
+      ],
+      "id": "section1"
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "id": "install_cell",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "590dd327-671f-4018-e7b7-f8ad2d98437e"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m317.3/317.3 MB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m772.6/772.6 kB\u001b[0m \u001b[31m39.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m200.5/200.5 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+            "\u001b[?25h  Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+            "dataproc-spark-connect 1.1.0 requires pyspark[connect]~=4.0.0, but you have pyspark 3.5.4 which is incompatible.\u001b[0m\u001b[31m\n",
+            "\u001b[0m"
+          ]
+        }
+      ],
+      "source": [
+        "!pip install -q pyspark==3.5.4 spark-nlp==6.4.1rc1"
+      ],
+      "id": "install_cell"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "section2"
+      },
+      "source": [
+        "## 2. Start Spark NLP\n",
+        "\n",
+        "Let's start Spark with Spark NLP included via our simple `start()` function."
+      ],
+      "id": "section2"
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "id": "start_spark_cell",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "716a0c7a-a439-48b8-bfd1-e6255a2f8739"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Spark NLP version  : 6.4.1-rc1\n",
+            "Apache Spark version: 3.5.4\n"
+          ]
+        }
+      ],
+      "source": [
+        "from pyspark.sql import SparkSession\n",
+        "import sparknlp\n",
+        "\n",
+        "spark = SparkSession.builder \\\n",
+        "    .appName(\"Spark NLP\") \\\n",
+        "    .config(\"spark.driver.memory\", \"16G\") \\\n",
+        "    .config(\"spark.kryoserializer.buffer.max\", \"2000M\") \\\n",
+        "    .config(\"spark.jars\", \"/content/sparknlp.jar\") \\\n",
+        "    .getOrCreate()\n",
+        "\n",
+        "print(\"Spark NLP version  :\", sparknlp.version())\n",
+        "print(\"Apache Spark version:\", spark.version)"
+      ],
+      "id": "start_spark_cell"
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import sparknlp\n",
+        "\n",
+        "spark = sparknlp.start()\n",
+        "\n",
+        "print(\"Spark NLP version  :\", sparknlp.version())\n",
+        "print(\"Apache Spark version:\", spark.version)"
+      ],
+      "metadata": {
+        "id": "X34E1cckR_9t"
+      },
+      "id": "X34E1cckR_9t",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "section4"
+      },
+      "source": [
+        "## 3. Download the Same Model with Different Engines\n",
+        "\n",
+        "Let's download **`distilbert_base_cased`** using both the `onnx` and `openvino` engines.\n"
+      ],
+      "id": "section4"
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 2,
+      "metadata": {
+        "id": "import_cell",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "206112bd-d214-4f8f-ee9f-debbfbe2f7e5"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "distilbert_base_cased download started this may take some time.\n",
+            "Approximate size to download 232.5 MB\n",
+            "[OK!]\n",
+            "\n",
+            "✅ Loaded! Engine reported by model: onnx\n"
+          ]
+        }
+      ],
+      "source": [
+        "from sparknlp.annotator import *\n",
+        "\n",
+        "model_onnx = DistilBertEmbeddings.pretrainedEngine(\"distilbert_base_cased\", \"en\", engine=\"onnx\")\n",
+        "\n",
+        "print(f\"\\n✅ Loaded! Engine reported by model: {model_onnx.getEngine()}\")"
+      ],
+      "id": "import_cell"
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "download_openvino_cell",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "58e86ca0-809d-4f0a-9b90-76bfa473a9da"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "distilbert_base_cased download started this may take some time.\n",
+            "Approximate size to download 232.5 MB\n",
+            "[OK!]\n",
+            "\n",
+            "✅ Loaded! Engine reported by model: openvino\n"
+          ]
+        }
+      ],
+      "source": [
+        "model_openvino = DistilBertEmbeddings.pretrainedEngine(\"distilbert_base_cased\", \"en\", engine=\"openvino\")\n",
+        "\n",
+        "print(f\"\\n✅ Loaded! Engine reported by model: {model_openvino.getEngine()}\")"
+      ],
+      "id": "download_openvino_cell"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "section8"
+      },
+      "source": [
+        "## 4. When to Use Which Engine\n",
+        "\n",
+        "### Quick decision guide\n",
+        "\n",
+        "| Scenario | Recommended engine |\n",
+        "|---|---|\n",
+        "| Running on a mixed or unknown hardware cluster | `onnx` (default) |\n",
+        "| Running on Intel CPUs (Xeon, Core) | `openvino` |\n",
+        "| Running on Intel integrated GPU or Arc GPU | `openvino` |\n",
+        "| Running on Intel NPU (Core Ultra) | `openvino` |\n",
+        "| Running on NVIDIA GPU | `onnx` (with CUDA EP) |\n",
+        "| Reproducing results from old Spark NLP models | `tensorflow` |\n",
+        "| Maximum portability and ecosystem compatibility | `onnx` |\n",
+        "\n",
+        "### Performance notes\n",
+        "\n",
+        "- **OpenVINO** typically gives **1.5×–4× throughput improvement** over ONNX on Intel CPUs due to model-level graph optimizations and hardware-specific kernel fusion.\n",
+        "- **ONNX** is the safest choice for heterogeneous clusters (a mix of Intel, AMD, ARM workers) since it runs everywhere.\n",
+        "- Both `onnx` and `openvino` are significantly faster than `tensorflow` for inference in Spark NLP.\n",
+        "\n"
+      ],
+      "id": "section8"
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbformat_minor": 5,
+      "pygments_lexer": "ipython3",
+      "version": "3.10.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}