Skip to content

docs(notebook): Sparse threshold optimization with GraphAI notebook #571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Joshua-Briggs
Copy link
Member

@Joshua-Briggs Joshua-Briggs commented Mar 23, 2025

User description

This document includes the hybrid router with a spare encoder, how to optimize it using the .fit method, then carries on towards graphai, where we use two routers to showcase the guardrail capabilities.


PR Type

Documentation, Enhancement


Description

  • New notebook demonstrating GraphAI integration.

  • Implements hybrid routers with sparse and dense encoders.

  • Adds evaluation, threshold tuning, and scam detection examples.

  • Provides a complete GraphAI workflow with node routing.


Changes walkthrough 📝

Relevant files
Documentation
sparse-threshold-optimization-guardrail-graphai.ipynb
Add comprehensive GraphAI integration notebook                     

docs/integrations/graphai/sparse-threshold-optimization-guardrail-graphai.ipynb

  • Introduces a comprehensive Jupyter Notebook for hybrid router
    examples.
  • Shows installation steps and imports for semantic router components.
  • Implements first router for BYD and related queries with threshold
    fitting.
  • Provides a second router for scam detection and integrates GraphAI
    workflow.
  • +1446/-0

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • notebook includes how to create hybrid routers with a spare encoder, optimize this process and then uses a graphai example to showcase the usage of the router by showing its guardrail usecases
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Complexity

    The notebook introduces a large volume of code in one file, including configuration for multiple routers, evaluation routines, and graph definitions. Consider refactoring or modularizing the code into smaller functions or separate modules to enhance readability and maintainability.

    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/10-sparse-threshold-optimization-guardrail-graphai.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/10-sparse-threshold-optimization-guardrail-graphai.ipynb)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Sparse Encoder"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Install Prerequisites"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 222,
       "metadata": {},
       "outputs": [],
       "source": [
        "!pip install -qU \\\n",
        "   semantic-router>=0.1.4 \\\n",
        "   graphai-lib==0.0.2"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Creating Hybrid Router for Sparse Encoder Detection"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "To begin we first need to import the `Route` class from the `semantic_router` package.\n",
        "\n",
        "Then we can define the routes that we want to use in our semantic router. For this example we will use routes for BYD, Tesla, Polestar, and Rivian. Giving each route a name and a list of utterances that we want to use to represent the route.\n"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 223,
       "metadata": {},
       "outputs": [],
       "source": [
        "from semantic_router import Route\n",
        "\n",
        "# Route for BYD-related queries (allowed)\n",
        "byd = Route(\n",
        "    name=\"byd\",\n",
        "    utterances=[\n",
        "        \"Tell me about the BYD Seal.\",\n",
        "        \"What is the battery capacity of the BYD Dolphin?\",\n",
        "        \"How does BYD's Blade Battery work?\",\n",
        "        \"Is the BYD Atto 3 a good EV?\",\n",
        "        \"Can I sell my BYD?\",\n",
        "        \"How much is my BYD worth?\",\n",
        "        \"What is the resale value of my BYD?\",\n",
        "        \"How much can I get for my BYD?\",\n",
        "        \"How much can I sell my BYD for?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Route for Tesla-related queries (blocked or redirected)\n",
        "tesla = Route(\n",
        "    name=\"tesla\",\n",
        "    utterances=[\n",
        "        \"Is Tesla better than BYD?\",\n",
        "        \"Tell me about the Tesla Model 3.\",\n",
        "        \"How does Tesla’s autopilot compare to other EVs?\",\n",
        "        \"What’s new in the Tesla Cybertruck?\",\n",
        "        \"Can I sell my Tesla?\",\n",
        "        \"How much is my Tesla worth?\",\n",
        "        \"What is the resale value of my Tesla?\",\n",
        "        \"How much can I get for my Tesla?\",\n",
        "        \"How much can I sell my Tesla for?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Route for Polestar-related queries (blocked or redirected)\n",
        "polestar = Route(\n",
        "    name=\"polestar\",\n",
        "    utterances=[\n",
        "        \"What’s the range of the Polestar 2?\",\n",
        "        \"Is Polestar a good alternative to other EVs?\",\n",
        "        \"How does Polestar compare to other EVs?\",\n",
        "        \"Can I sell my Polestar?\",\n",
        "        \"How much is my Polestar worth?\",\n",
        "        \"What is the resale value of my Polestar?\",\n",
        "        \"How much can I get for my Polestar?\",\n",
        "        \"How much can I sell my Polestar for?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Route for Rivian-related queries (blocked or redirected)\n",
        "rivian = Route(\n",
        "    name=\"rivian\",\n",
        "    utterances=[\n",
        "        \"Tell me about the Rivian R1T.\",\n",
        "        \"How does Rivian's off-road capability compare to other EVs?\",\n",
        "        \"Is Rivian's charging network better than other EVs?\",\n",
        "        \"Can I sell my Rivian?\",\n",
        "        \"How much is my Rivian worth?\",\n",
        "        \"What is the resale value of my Rivian?\",\n",
        "        \"How much can I get for my Rivian?\",\n",
        "        \"How much can I sell my Rivian for?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Combine all routes\n",
        "routes = [byd, tesla, polestar, rivian]"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Next we need to define the sparse encoder, but before we do that we need to import the `AurelioSparseEncoder` class from the `semantic_router.encoders` package.\n",
        "\n",
        "This will also require an Aurelio API key, which can be obtained from the [Aurelio Platform website](https://platform.aurelio.ai/settings/api-keys).\n",
        "\n",
        "Now we can define the sparse encoder and use the `bm25` model."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 224,
       "metadata": {},
       "outputs": [],
       "source": [
        "import os\n",
        "from getpass import getpass\n",
        "from semantic_router.encoders.aurelio import AurelioSparseEncoder\n",
        "\n",
        "os.environ[\"AURELIO_API_KEY\"] = os.environ[\"AURELIO_API_KEY\"] or getpass(\n",
        "    \"Enter your Aurelio API key: \"\n",
        ")\n",
        "# sparse encoder for term matching\n",
        "sparse_encoder = AurelioSparseEncoder(name=\"bm25\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Next we need to define the dense encoder, and similar to before we need to import the `OpenAIEncoder` class from the `semantic_router.encoders` package.\n",
        "\n",
        "This will also require an OpenAI API key, which can be obtained from the [OpenAI Platform website](https://platform.openai.com/api-keys).\n",
        "\n",
        "Now we can define the dense encoder and use the `text-embedding-3-small` model alongside a score threshold of 0.3."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 225,
       "metadata": {},
       "outputs": [],
       "source": [
        "from semantic_router.encoders import OpenAIEncoder\n",
        "\n",
        "os.environ[\"OPENAI_API_KEY\"] = os.environ[\"OPENAI_API_KEY\"] or getpass(\n",
        "    \"Enter your OpenAI API key: \"\n",
        ")\n",
        "# dense encoder for semantic meaning\n",
        "encoder = OpenAIEncoder(name=\"text-embedding-3-small\", score_threshold=0.3)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Now we have all the components needed including the routes, sparse encoder, and dense encoder to create our hybrid router **(semantic router only uses dense embeddings)**.\n",
        "\n",
        "Within the `HybridRouter` class we pass in the dense encoder, sparse encoder, routes, and the `auto_sync` parameter."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 226,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:20 - semantic_router.utils.logger - WARNING - hybrid.py:54 - __init__() - No index provided. Using default HybridLocalIndex.\n",
          "2025-03-23 13:11:20 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "2025-03-23 13:11:21 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "2025-03-23 13:11:22 - semantic_router.utils.logger - WARNING - hybrid_local.py:47 - add() - Function schemas are not supported for HybridLocalIndex.\n",
          "2025-03-23 13:11:22 - semantic_router.utils.logger - WARNING - hybrid_local.py:49 - add() - Metadata is not supported for HybridLocalIndex.\n",
          "2025-03-23 13:11:22 - semantic_router.utils.logger - WARNING - hybrid_local.py:210 - _write_config() - No config is written for HybridLocalIndex.\n"
         ]
        }
       ],
       "source": [
        "from semantic_router.routers import HybridRouter\n",
        "\n",
        "first_router = HybridRouter(\n",
        "    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync=\"local\"\n",
        ")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "To check if the router is synced we can use the `is_synced` method."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 227,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:22 - semantic_router.utils.logger - WARNING - base.py:316 - _read_config() - This method should be implemented by subclasses.\n"
         ]
        },
        {
         "data": {
          "text/plain": [
           "False"
          ]
         },
         "execution_count": 227,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "first_router.is_synced()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "To check the current route thresholds we can use the `get_thresholds` method which will return a dictionary of route names and their corresponding thresholds values in a float."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 228,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Default route thresholds: {'byd': 0.09, 'tesla': 0.09, 'polestar': 0.09, 'rivian': 0.09}\n"
         ]
        }
       ],
       "source": [
        "route_thresholds = first_router.get_thresholds()\n",
        "print(\"Default route thresholds:\", route_thresholds)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We can also use the `get_utterance_diff` method to see the difference in utterances between the local and remote routes."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 229,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "['  byd: Can I sell my BYD?',\n",
           " \"  byd: How does BYD's Blade Battery work?\",\n",
           " '  byd: How much can I get for my BYD?',\n",
           " '  byd: How much can I sell my BYD for?',\n",
           " '  byd: How much is my BYD worth?',\n",
           " '  byd: Is the BYD Atto 3 a good EV?',\n",
           " '  byd: Tell me about the BYD Seal.',\n",
           " '  byd: What is the battery capacity of the BYD Dolphin?',\n",
           " '  byd: What is the resale value of my BYD?',\n",
           " '  polestar: Can I sell my Polestar?',\n",
           " '  polestar: How does Polestar compare to other EVs?',\n",
           " '  polestar: How much can I get for my Polestar?',\n",
           " '  polestar: How much can I sell my Polestar for?',\n",
           " '  polestar: How much is my Polestar worth?',\n",
           " '  polestar: Is Polestar a good alternative to other EVs?',\n",
           " '  polestar: What is the resale value of my Polestar?',\n",
           " '  polestar: What’s the range of the Polestar 2?',\n",
           " '  rivian: Can I sell my Rivian?',\n",
           " \"  rivian: How does Rivian's off-road capability compare to other EVs?\",\n",
           " '  rivian: How much can I get for my Rivian?',\n",
           " '  rivian: How much can I sell my Rivian for?',\n",
           " '  rivian: How much is my Rivian worth?',\n",
           " \"  rivian: Is Rivian's charging network better than other EVs?\",\n",
           " '  rivian: Tell me about the Rivian R1T.',\n",
           " '  rivian: What is the resale value of my Rivian?',\n",
           " '  tesla: Can I sell my Tesla?',\n",
           " '  tesla: How does Tesla’s autopilot compare to other EVs?',\n",
           " '  tesla: How much can I get for my Tesla?',\n",
           " '  tesla: How much can I sell my Tesla for?',\n",
           " '  tesla: How much is my Tesla worth?',\n",
           " '  tesla: Is Tesla better than BYD?',\n",
           " '  tesla: Tell me about the Tesla Model 3.',\n",
           " '  tesla: What is the resale value of my Tesla?',\n",
           " '  tesla: What’s new in the Tesla Cybertruck?']"
          ]
         },
         "execution_count": 229,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "first_router.get_utterance_diff()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Next we can use the `get_utterances` method to get the utterances from the `index` attribute attached to the router."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 230,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "[Utterance(route='byd', utterance='Can I sell my BYD?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance=\"How does BYD's Blade Battery work?\", function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='How much can I get for my BYD?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='How much can I sell my BYD for?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='How much is my BYD worth?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='Is the BYD Atto 3 a good EV?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='Tell me about the BYD Seal.', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='What is the battery capacity of the BYD Dolphin?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='byd', utterance='What is the resale value of my BYD?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='Can I sell my Polestar?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='How does Polestar compare to other EVs?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='How much can I get for my Polestar?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='How much can I sell my Polestar for?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='How much is my Polestar worth?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='Is Polestar a good alternative to other EVs?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='What is the resale value of my Polestar?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='polestar', utterance='What’s the range of the Polestar 2?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='Can I sell my Rivian?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance=\"How does Rivian's off-road capability compare to other EVs?\", function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='How much can I get for my Rivian?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='How much can I sell my Rivian for?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='How much is my Rivian worth?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance=\"Is Rivian's charging network better than other EVs?\", function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='Tell me about the Rivian R1T.', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='rivian', utterance='What is the resale value of my Rivian?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='Can I sell my Tesla?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='How does Tesla’s autopilot compare to other EVs?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='How much can I get for my Tesla?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='How much can I sell my Tesla for?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='How much is my Tesla worth?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='Is Tesla better than BYD?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='Tell me about the Tesla Model 3.', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='What is the resale value of my Tesla?', function_schemas=None, metadata={}, diff_tag=' '),\n",
           " Utterance(route='tesla', utterance='What’s new in the Tesla Cybertruck?', function_schemas=None, metadata={}, diff_tag=' ')]"
          ]
         },
         "execution_count": 230,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "first_router.index.get_utterances()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We can test our router already by passing in a list of utterances and seeing which route each utterance is routed to."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 231,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:22 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Tell me about BYD's Blade Battery. -> byd\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:23 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Does the Tesla Model 3 have better range? -> tesla\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:24 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "What are the key features of the Polestar 2? -> polestar\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:25 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Is Rivian's R1T better for off-roading? -> rivian\n"
         ]
        }
       ],
       "source": [
        "for utterance in [\n",
        "    \"Tell me about BYD's Blade Battery.\",\n",
        "    \"Does the Tesla Model 3 have better range?\",\n",
        "    \"What are the key features of the Polestar 2?\",\n",
        "    \"Is Rivian's R1T better for off-roading?\",\n",
        "]:\n",
        "    print(f\"{utterance} -> {first_router(utterance).name}\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We can also use the `evaluate` method to evaluate the router by passing in a list of test data and evaluating the accuracy of the router."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 232,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:26 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:01<00:00,  1.06s/it]"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy: 100.00%\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "\n"
         ]
        }
       ],
       "source": [
        "test_data = [\n",
        "    (\"Tell me about BYD's Blade Battery.\", \"byd\"),\n",
        "    (\"Does the Tesla Model 3 have better range?\", \"tesla\"),\n",
        "    (\"What are the key features of the Polestar 2?\", \"polestar\"),\n",
        "    (\"Is Rivian's R1T better for off-roading?\", \"rivian\"),\n",
        "]\n",
        "\n",
        "# unpack the test data\n",
        "X, y = zip(*test_data)\n",
        "\n",
        "# evaluate using the default thresholds\n",
        "accuracy = first_router.evaluate(X=X, y=y)\n",
        "print(f\"Accuracy: {accuracy*100:.2f}%\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Unfortunatly having a small dataset of 4 utterances for each route is not enough to get a good understanding of the router's performance.\n",
        "\n",
        "So we will use a larger dataset of BYD, Tesla, Polestar, and Rivian related queries to evaluate the router.\n"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 233,
       "metadata": {},
       "outputs": [],
       "source": [
        "test_data = [\n",
        "    # BYD-related queries\n",
        "    (\"Tell me about the BYD Seal.\", \"byd\"),\n",
        "    (\"What is the battery capacity of the BYD Dolphin?\", \"byd\"),\n",
        "    (\"How does BYD's Blade Battery work?\", \"byd\"),\n",
        "    (\"Is the BYD Atto 3 a good EV?\", \"byd\"),\n",
        "    (\"What’s the range of the BYD Tang?\", \"byd\"),\n",
        "    (\"Does BYD offer fast-charging stations?\", \"byd\"),\n",
        "    (\"How is the BYD Han different from the Seal?\", \"byd\"),\n",
        "    (\"Is BYD the largest EV manufacturer in China?\", \"byd\"),\n",
        "    (\"What is the top speed of the BYD Seal?\", \"byd\"),\n",
        "    (\"Compare the BYD Dolphin and the BYD Atto 3.\", \"byd\"),\n",
        "    (\"How does BYD’s battery technology compare to Tesla’s?\", \"byd\"),\n",
        "    (\"What makes the BYD Blade Battery safer?\", \"byd\"),\n",
        "    (\"Does BYD have plans to expand to Europe?\", \"byd\"),\n",
        "    (\"How efficient is the BYD Tang in terms of range?\", \"byd\"),\n",
        "    (\"What are the latest BYD electric vehicle models?\", \"byd\"),\n",
        "    (\"How does the BYD Han compare to the Tesla Model S?\", \"byd\"),\n",
        "    (\"What is the warranty on BYD EV batteries?\", \"byd\"),\n",
        "    (\"Which BYD model is the best for long-distance driving?\", \"byd\"),\n",
        "    (\"Does BYD manufacture its own battery cells?\", \"byd\"),\n",
        "    # Tesla-related queries\n",
        "    (\"Is Tesla better than BYD?\", \"tesla\"),\n",
        "    (\"Tell me about the Tesla Model 3.\", \"tesla\"),\n",
        "    (\"How does Tesla’s autopilot compare to other EVs?\", \"tesla\"),\n",
        "    (\"What’s new in the Tesla Cybertruck?\", \"tesla\"),\n",
        "    (\"What is Tesla’s Full Self-Driving feature?\", \"tesla\"),\n",
        "    (\"How long does it take to charge a Tesla?\", \"tesla\"),\n",
        "    (\"Tell me about the Tesla Roadster.\", \"tesla\"),\n",
        "    (\"How much does a Tesla Model S cost?\", \"tesla\"),\n",
        "    (\"Which Tesla model has the longest range?\", \"tesla\"),\n",
        "    (\"What are the main differences between the Tesla Model S and Model 3?\", \"tesla\"),\n",
        "    (\"How safe is Tesla’s Autopilot?\", \"tesla\"),\n",
        "    (\"Does Tesla use LFP batteries?\", \"tesla\"),\n",
        "    (\"What is the Tesla Supercharger network?\", \"tesla\"),\n",
        "    (\"How does Tesla’s Plaid mode work?\", \"tesla\"),\n",
        "    (\"Which Tesla is best for off-roading?\", \"tesla\"),\n",
        "    # Polestar-related queries\n",
        "    (\"What’s the range of the Polestar 2?\", \"polestar\"),\n",
        "    (\"Is Polestar a good alternative?\", \"polestar\"),\n",
        "    (\"How does Polestar compare to Tesla?\", \"polestar\"),\n",
        "    (\"Tell me about the Polestar 3.\", \"polestar\"),\n",
        "    (\"Is the Polestar 2 fully electric?\", \"polestar\"),\n",
        "    (\"What is Polestar’s performance like?\", \"polestar\"),\n",
        "    (\"Does Polestar offer any performance upgrades?\", \"polestar\"),\n",
        "    (\"How is Polestar's autonomous driving technology?\", \"polestar\"),\n",
        "    (\"What is the battery capacity of the Polestar 2?\", \"polestar\"),\n",
        "    (\"How does Polestar differ from Volvo?\", \"polestar\"),\n",
        "    (\"Is Polestar planning a fully electric SUV?\", \"polestar\"),\n",
        "    (\"How does the Polestar 4 compare to other EVs?\", \"polestar\"),\n",
        "    (\"What are Polestar’s sustainability goals?\", \"polestar\"),\n",
        "    (\"How much does a Polestar 3 cost?\", \"polestar\"),\n",
        "    (\"Does Polestar have its own fast-charging network?\", \"polestar\"),\n",
        "    # Rivian-related queries\n",
        "    (\"Tell me about the Rivian R1T.\", \"rivian\"),\n",
        "    (\"How does Rivian's off-road capability compare to other EVs?\", \"rivian\"),\n",
        "    (\"Is Rivian's charging network better than other EVs?\", \"rivian\"),\n",
        "    (\"What is the range of the Rivian R1S?\", \"rivian\"),\n",
        "    (\"How much does a Rivian R1T cost?\", \"rivian\"),\n",
        "    (\"Tell me about Rivian’s plans for new EVs.\", \"rivian\"),\n",
        "    (\"How does Rivian’s technology compare to other EVs?\", \"rivian\"),\n",
        "    (\"What are the best off-road features of the Rivian R1T?\", \"rivian\"),\n",
        "    (\"What’s the towing capacity of the Rivian R1T?\", \"rivian\"),\n",
        "    (\"How does the Rivian R1S differ from the R1T?\", \"rivian\"),\n",
        "    (\"What’s special about Rivian’s adventure network?\", \"rivian\"),\n",
        "    (\"How much does it cost to charge a Rivian?\", \"rivian\"),\n",
        "    (\"Does Rivian have a lease program?\", \"rivian\"),\n",
        "    (\"What are Rivian’s future expansion plans?\", \"rivian\"),\n",
        "    (\"How long does it take to charge a Rivian at home?\", \"rivian\"),\n",
        "    # None category (general knowledge)\n",
        "    (\"What is the capital of France?\", None),\n",
        "    (\"How many people live in the US?\", None),\n",
        "    (\"When is the best time to visit Bali?\", None),\n",
        "    (\"How do I learn a language?\", None),\n",
        "    (\"Tell me an interesting fact.\", None),\n",
        "    (\"What is the best programming language?\", None),\n",
        "    (\"I'm interested in learning about llama 2.\", None),\n",
        "    (\"What is the capital of the moon?\", None),\n",
        "    (\"Who was the first person to walk on the moon?\", None),\n",
        "    (\"What’s the best way to cook a steak?\", None),\n",
        "    (\"How do I start a vegetable garden?\", None),\n",
        "    (\"What’s the most popular dog breed?\", None),\n",
        "    (\"Tell me about the history of the Roman Empire.\", None),\n",
        "    (\"How do I improve my photography skills?\", None),\n",
        "    (\"What are some good book recommendations?\", None),\n",
        "    (\"How does the stock market work?\", None),\n",
        "    (\"What’s the best way to stay fit?\", None),\n",
        "    (\"What’s the weather like in London today?\", None),\n",
        "    (\"Who won the last FIFA World Cup?\", None),\n",
        "    (\"What’s the difference between a crocodile and an alligator?\", None),\n",
        "    (\"Tell me about the origins of jazz music.\", None),\n",
        "    (\"What’s the fastest animal on land?\", None),\n",
        "    (\"How does Bitcoin mining work?\", None),\n",
        "    (\"What are the symptoms of the flu?\", None),\n",
        "    (\"How do I start a YouTube channel?\", None),\n",
        "    (\"What’s the best travel destination for solo travelers?\", None),\n",
        "    (\"Who invented the light bulb?\", None),\n",
        "    (\"What are the rules of chess?\", None),\n",
        "    (\"Tell me about ancient Egyptian mythology.\", None),\n",
        "    (\"How do I train my dog to sit?\", None),\n",
        "    (\"What’s the difference between espresso and regular coffee?\", None),\n",
        "    (\"What’s a good beginner-friendly programming language?\", None),\n",
        "    (\"What are some good stretching exercises?\", None),\n",
        "    (\"How do I bake a chocolate cake?\", None),\n",
        "    (\"What’s the best way to save money?\", None),\n",
        "    (\"How do airplanes stay in the air?\", None),\n",
        "    (\"What are the benefits of meditation?\", None),\n",
        "    (\"How do I learn basic Spanish?\", None),\n",
        "    (\"What’s the best way to pack for a trip?\", None),\n",
        "    (\"What’s the most common phobia?\", None),\n",
        "    (\"How do I take care of a bonsai tree?\", None),\n",
        "    (\"What’s the best way to clean a laptop keyboard?\", None),\n",
        "    (\"Tell me about the Great Wall of China.\", None),\n",
        "    (\"What’s the best way to learn to swim?\", None),\n",
        "    (\"How does WiFi work?\", None),\n",
        "    (\"What’s the healthiest type of bread?\", None),\n",
        "    (\"What’s the origin of the word ‘quarantine’?\", None),\n",
        "    (\"How do I find a good apartment?\", None),\n",
        "    (\"What are some good mindfulness techniques?\", None),\n",
        "    (\"How do I set up a home theater system?\", None),\n",
        "]"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Using the new test data we can also evaluate the router with a higher degree of accuracy due to the larger dataset."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 234,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "['Tell me about the BYD Seal.', 'What is the battery capacity of the BYD Dolphin?', \"How does BYD's Blade Battery work?\", 'Is the BYD Atto 3 a good EV?', 'What’s the range of the BYD Tang?', 'Does BYD offer fast-charging stations?', 'How is the BYD Han different from the Seal?', 'Is BYD the largest EV manufacturer in China?', 'What is the top speed of the BYD Seal?', 'Compare the BYD Dolphin and the BYD Atto 3.', 'How does BYD’s battery technology compare to Tesla’s?', 'What makes the BYD Blade Battery safer?', 'Does BYD have plans to expand to Europe?', 'How efficient is the BYD Tang in terms of range?', 'What are the latest BYD electric vehicle models?', 'How does the BYD Han compare to the Tesla Model S?', 'What is the warranty on BYD EV batteries?', 'Which BYD model is the best for long-distance driving?', 'Does BYD manufacture its own battery cells?', 'Is Tesla better than BYD?', 'Tell me about the Tesla Model 3.', 'How does Tesla’s autopilot compare to other EVs?', 'What’s new in the Tesla Cybertruck?', 'What is Tesla’s Full Self-Driving feature?', 'How long does it take to charge a Tesla?', 'Tell me about the Tesla Roadster.', 'How much does a Tesla Model S cost?', 'Which Tesla model has the longest range?', 'What are the main differences between the Tesla Model S and Model 3?', 'How safe is Tesla’s Autopilot?', 'Does Tesla use LFP batteries?', 'What is the Tesla Supercharger network?', 'How does Tesla’s Plaid mode work?', 'Which Tesla is best for off-roading?', 'What’s the range of the Polestar 2?', 'Is Polestar a good alternative?', 'How does Polestar compare to Tesla?', 'Tell me about the Polestar 3.', 'Is the Polestar 2 fully electric?', 'What is Polestar’s performance like?', 'Does Polestar offer any performance upgrades?', \"How is Polestar's autonomous driving technology?\", 'What is the battery capacity of the Polestar 2?', 'How does Polestar differ from Volvo?', 'Is Polestar planning a fully electric SUV?', 'How does the Polestar 4 compare to other EVs?', 'What are Polestar’s sustainability goals?', 'How much does a Polestar 3 cost?', 'Does Polestar have its own fast-charging network?', 'Tell me about the Rivian R1T.', \"How does Rivian's off-road capability compare to other EVs?\", \"Is Rivian's charging network better than other EVs?\", 'What is the range of the Rivian R1S?', 'How much does a Rivian R1T cost?', 'Tell me about Rivian’s plans for new EVs.', 'How does Rivian’s technology compare to other EVs?', 'What are the best off-road features of the Rivian R1T?', 'What’s the towing capacity of the Rivian R1T?', 'How does the Rivian R1S differ from the R1T?', 'What’s special about Rivian’s adventure network?', 'How much does it cost to charge a Rivian?', 'Does Rivian have a lease program?', 'What are Rivian’s future expansion plans?', 'How long does it take to charge a Rivian at home?', 'What is the capital of France?', 'How many people live in the US?', 'When is the best time to visit Bali?', 'How do I learn a language?', 'Tell me an interesting fact.', 'What is the best programming language?', \"I'm interested in learning about llama 2.\", 'What is the capital of the moon?', 'Who was the first person to walk on the moon?', 'What’s the best way to cook a steak?', 'How do I start a vegetable garden?', 'What’s the most popular dog breed?', 'Tell me about the history of the Roman Empire.', 'How do I improve my photography skills?', 'What are some good book recommendations?', 'How does the stock market work?', 'What’s the best way to stay fit?', 'What’s the weather like in London today?', 'Who won the last FIFA World Cup?', 'What’s the difference between a crocodile and an alligator?', 'Tell me about the origins of jazz music.', 'What’s the fastest animal on land?', 'How does Bitcoin mining work?', 'What are the symptoms of the flu?', 'How do I start a YouTube channel?', 'What’s the best travel destination for solo travelers?', 'Who invented the light bulb?', 'What are the rules of chess?', 'Tell me about ancient Egyptian mythology.', 'How do I train my dog to sit?', 'What’s the difference between espresso and regular coffee?', 'What’s a good beginner-friendly programming language?', 'What are some good stretching exercises?', 'How do I bake a chocolate cake?', 'What’s the best way to save money?', 'How do airplanes stay in the air?', 'What are the benefits of meditation?', 'How do I learn basic Spanish?', 'What’s the best way to pack for a trip?', 'What’s the most common phobia?', 'How do I take care of a bonsai tree?', 'What’s the best way to clean a laptop keyboard?', 'Tell me about the Great Wall of China.', 'What’s the best way to learn to swim?', 'How does WiFi work?', 'What’s the healthiest type of bread?', 'What’s the origin of the word ‘quarantine’?', 'How do I find a good apartment?', 'What are some good mindfulness techniques?', 'How do I set up a home theater system?']\n",
          "['byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'byd', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'tesla', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'polestar', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', 'rivian', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]\n"
         ]
        }
       ],
       "source": [
        "# unpack the test data\n",
        "X, y = zip(*test_data)\n",
        "\n",
        "X = list(X)\n",
        "y = list(y)\n",
        "\n",
        "print(X)\n",
        "print(y)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We can now look at the default route thresholds and showcase the change in accuracy when we change the threshold."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 235,
       "metadata": {},
       "outputs": [],
       "source": [
        "first_router.set_threshold(route_name=\"byd\", threshold=0.42424242424242425)\n",
        "first_router.set_threshold(route_name=\"tesla\", threshold=0.31313131313131315)\n",
        "first_router.set_threshold(route_name=\"polestar\", threshold=0.84640342822161)\n",
        "first_router.set_threshold(route_name=\"rivian\", threshold=0.12121212121212122)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "We can set the threshold manually and see the change in accuracy."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 236,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Default route thresholds: {'byd': 0.42424242424242425, 'tesla': 0.31313131313131315, 'polestar': 0.84640342822161, 'rivian': 0.12121212121212122}\n"
         ]
        }
       ],
       "source": [
        "route_thresholds = first_router.get_thresholds()\n",
        "print(\"Default route thresholds:\", route_thresholds)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 237,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:27 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.24s/it]"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy: 68.42%\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "\n"
         ]
        }
       ],
       "source": [
        "# evaluate using the default thresholds\n",
        "accuracy = first_router.evaluate(X=X, y=y)\n",
        "print(f\"Accuracy: {accuracy*100:.2f}%\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Or we can use the `fit` method to fit the router to the test data which should give us the best accuracy possible based on the thresholds."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 238,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:30 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.29s/it]\n",
          "Training: 100%|██████████| 500/500 [00:16<00:00, 31.17it/s, acc=0.95]\n"
         ]
        }
       ],
       "source": [
        "# Call the fit method\n",
        "first_router.fit(X=X, y=y)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 239,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Updated route thresholds: {'byd': 0.4141414141414142, 'tesla': 0.4746046321803898, 'polestar': 0.7575757575757577, 'rivian': 0.7373737373737375}\n"
         ]
        }
       ],
       "source": [
        "route_thresholds = first_router.get_thresholds()\n",
        "print(\"Updated route thresholds:\", route_thresholds)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 240,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:49 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.55s/it]"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy: 94.74%\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "\n"
         ]
        }
       ],
       "source": [
        "accuracy = first_router.evaluate(X=X, y=y)\n",
        "print(f\"Accuracy: {accuracy*100:.2f}%\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Creating Second Hybrid Router for Scam Detection"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "In this section we are repeating the same process as before but for a different router.\n",
        "\n",
        "This router will instead be a scam detector router, looking for common scam utterances and redirecting them to a different route."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 241,
       "metadata": {},
       "outputs": [],
       "source": [
        "# Route for BYD-related queries (allowed)\n",
        "scam = Route(\n",
        "    name=\"scam\",\n",
        "    utterances=[\n",
        "        \"Can you give me a discount?\",\n",
        "        \"I need to pay you in bitcoin\",\n",
        "        \"I need to pay you in cash\",\n",
        "        \"I need to pay you in gift card\",\n",
        "        \"I want you to pay me in bitcoin\",\n",
        "        \"I want you to pay me in cash\",\n",
        "        \"I want you to pay me in gift card\",\n",
        "        \"Could you lower the price?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Route for Tesla-related queries (blocked or redirected)\n",
        "other = Route(\n",
        "    name=\"other\",\n",
        "    utterances=[\n",
        "        \"What is the price of the product?\",\n",
        "        \"What is the delivery time?\",\n",
        "        \"What is the return policy?\",\n",
        "        \"What is the warranty?\",\n",
        "        \"What is the refund policy?\",\n",
        "        \"What is the shipping cost?\",\n",
        "        \"What is the shipping time?\",\n",
        "        \"What is the shipping policy?\",\n",
        "        \"How much can I sell my EV for?\",\n",
        "        \"How much can I sell my Tesla for?\",\n",
        "        \"How much can I sell my Polestar for?\",\n",
        "        \"How much can I sell my Rivian for?\",\n",
        "        \"How much can I sell my BYD for?\",\n",
        "        \"How much can I sell my other EV for?\",\n",
        "    ],\n",
        ")\n",
        "\n",
        "# Combine all routes\n",
        "routes = [scam, other]"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 242,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:11:50 - semantic_router.utils.logger - WARNING - hybrid.py:54 - __init__() - No index provided. Using default HybridLocalIndex.\n",
          "2025-03-23 13:11:51 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "2025-03-23 13:11:52 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "2025-03-23 13:11:53 - semantic_router.utils.logger - WARNING - hybrid_local.py:47 - add() - Function schemas are not supported for HybridLocalIndex.\n",
          "2025-03-23 13:11:53 - semantic_router.utils.logger - WARNING - hybrid_local.py:49 - add() - Metadata is not supported for HybridLocalIndex.\n",
          "2025-03-23 13:11:53 - semantic_router.utils.logger - WARNING - hybrid_local.py:210 - _write_config() - No config is written for HybridLocalIndex.\n"
         ]
        }
       ],
       "source": [
        "second_router = HybridRouter(\n",
        "    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync=\"local\"\n",
        ")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 243,
       "metadata": {},
       "outputs": [],
       "source": [
        "test_data = [\n",
        "    # scam routes\n",
        "    (\"Can you give me a discount?\", \"scam\"),\n",
        "    (\"I need to pay you in bitcoin\", \"scam\"),\n",
        "    (\"I need to pay you in cash\", \"scam\"),\n",
        "    (\"I need to pay you in gift card\", \"scam\"),\n",
        "    (\"I want you to pay me in bitcoin\", \"scam\"),\n",
        "    (\"I want you to pay me in cash\", \"scam\"),\n",
        "    (\"I want you to pay me in gift card\", \"scam\"),\n",
        "    (\"Could you lower the price?\", \"scam\"),\n",
        "    (\"Can I pay with cryptocurrency?\", \"scam\"),\n",
        "    (\"Do you accept payments through Venmo only?\", \"scam\"),\n",
        "    (\"Can I send you money via Western Union?\", \"scam\"),\n",
        "    (\"Can I overpay and get a refund?\", \"scam\"),\n",
        "    (\"Can we complete this deal outside the platform?\", \"scam\"),\n",
        "    (\"I'll pay you later, just ship it now.\", \"scam\"),\n",
        "    (\"Can I get a refund without returning the item?\", \"scam\"),\n",
        "    (\"I’ll send extra money if you process this quickly.\", \"scam\"),\n",
        "    (\"Can you mark this transaction as a gift?\", \"scam\"),\n",
        "    (\"Can I use multiple gift cards to pay?\", \"scam\"),\n",
        "    (\"Can you split the payment across different methods?\", \"scam\"),\n",
        "    (\"Can you wire me money first as a guarantee?\", \"scam\"),\n",
        "    (\"Can you send the product before I pay?\", \"scam\"),\n",
        "    (\"Can you help me transfer money?\", \"scam\"),\n",
        "    (\"Can you provide fake receipts?\", \"scam\"),\n",
        "    (\"Can you process my payment through an unusual method?\", \"scam\"),\n",
        "    (\"Can I pay you in prepaid debit cards?\", \"scam\"),\n",
        "    # other routes\n",
        "    (\"What is the price of the product?\", \"other\"),\n",
        "    (\"What is the delivery time?\", \"other\"),\n",
        "    (\"What is the return policy?\", \"other\"),\n",
        "    (\"Do you offer international shipping?\", \"other\"),\n",
        "    (\"How long does it take for delivery?\", \"other\"),\n",
        "    (\"Is there a warranty for this product?\", \"other\"),\n",
        "    (\"Do you provide customer support?\", \"other\"),\n",
        "    (\"Can I track my order?\", \"other\"),\n",
        "    (\"Is express shipping available?\", \"other\"),\n",
        "    (\"What payment methods do you accept?\", \"other\"),\n",
        "    (\"Do you offer bulk discounts?\", \"other\"),\n",
        "    (\"What are the shipping costs?\", \"other\"),\n",
        "    (\"Can I cancel my order?\", \"other\"),\n",
        "    (\"Do you have a physical store?\", \"other\"),\n",
        "    (\"Can I change my shipping address?\", \"other\"),\n",
        "    (\"Is there a restocking fee for returns?\", \"other\"),\n",
        "    (\"Do you have customer reviews?\", \"other\"),\n",
        "    (\"Is this product available in other colors?\", \"other\"),\n",
        "    (\"Do you provide installation services?\", \"other\"),\n",
        "    (\"How can I contact customer service?\", \"other\"),\n",
        "    (\"Are there any current promotions or sales?\", \"other\"),\n",
        "    (\"Can I pick up my order instead of delivery?\", \"other\"),\n",
        "    # add some None routes to prevent excessively small thresholds\n",
        "    (\"What is the capital of France?\", None),\n",
        "    (\"How many people live in the US?\", None),\n",
        "    (\"When is the best time to visit Bali?\", None),\n",
        "    (\"How do I learn a language?\", None),\n",
        "    (\"Tell me an interesting fact.\", None),\n",
        "    (\"What is the best programming language?\", None),\n",
        "    (\"I'm interested in learning about llama 2.\", None),\n",
        "    (\"What is the capital of the moon?\", None),\n",
        "    (\"Who discovered gravity?\", None),\n",
        "    (\"What are some healthy breakfast options?\", None),\n",
        "    (\"How do I start a vegetable garden?\", None),\n",
        "    (\"What are the symptoms of the flu?\", None),\n",
        "    (\"What’s the most spoken language in the world?\", None),\n",
        "    (\"How does WiFi work?\", None),\n",
        "    (\"What are the benefits of meditation?\", None),\n",
        "    (\"How do I improve my memory?\", None),\n",
        "    (\"What is the speed of light?\", None),\n",
        "    (\"Who wrote 'To Kill a Mockingbird'?\", None),\n",
        "    (\"How does an electric car work?\", None),\n",
        "    (\"What’s the best way to save money?\", None),\n",
        "    (\"How do I bake a chocolate cake?\", None),\n",
        "    (\"What’s the healthiest type of bread?\", None),\n",
        "    (\"Who invented the internet?\", None),\n",
        "    (\"How do airplanes stay in the air?\", None),\n",
        "    (\"What are some famous landmarks in Italy?\", None),\n",
        "    (\"What’s the difference between a virus and bacteria?\", None),\n",
        "    (\"How do I learn to play the guitar?\", None),\n",
        "    (\"What’s the best way to learn to swim?\", None),\n",
        "    (\"What’s the tallest mountain in the world?\", None),\n",
        "    (\"How does the stock market work?\", None),\n",
        "]"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 244,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "['Can you give me a discount?', 'I need to pay you in bitcoin', 'I need to pay you in cash', 'I need to pay you in gift card', 'I want you to pay me in bitcoin', 'I want you to pay me in cash', 'I want you to pay me in gift card', 'Could you lower the price?', 'Can I pay with cryptocurrency?', 'Do you accept payments through Venmo only?', 'Can I send you money via Western Union?', 'Can I overpay and get a refund?', 'Can we complete this deal outside the platform?', \"I'll pay you later, just ship it now.\", 'Can I get a refund without returning the item?', 'I’ll send extra money if you process this quickly.', 'Can you mark this transaction as a gift?', 'Can I use multiple gift cards to pay?', 'Can you split the payment across different methods?', 'Can you wire me money first as a guarantee?', 'Can you send the product before I pay?', 'Can you help me transfer money?', 'Can you provide fake receipts?', 'Can you process my payment through an unusual method?', 'Can I pay you in prepaid debit cards?', 'What is the price of the product?', 'What is the delivery time?', 'What is the return policy?', 'Do you offer international shipping?', 'How long does it take for delivery?', 'Is there a warranty for this product?', 'Do you provide customer support?', 'Can I track my order?', 'Is express shipping available?', 'What payment methods do you accept?', 'Do you offer bulk discounts?', 'What are the shipping costs?', 'Can I cancel my order?', 'Do you have a physical store?', 'Can I change my shipping address?', 'Is there a restocking fee for returns?', 'Do you have customer reviews?', 'Is this product available in other colors?', 'Do you provide installation services?', 'How can I contact customer service?', 'Are there any current promotions or sales?', 'Can I pick up my order instead of delivery?', 'What is the capital of France?', 'How many people live in the US?', 'When is the best time to visit Bali?', 'How do I learn a language?', 'Tell me an interesting fact.', 'What is the best programming language?', \"I'm interested in learning about llama 2.\", 'What is the capital of the moon?', 'Who discovered gravity?', 'What are some healthy breakfast options?', 'How do I start a vegetable garden?', 'What are the symptoms of the flu?', 'What’s the most spoken language in the world?', 'How does WiFi work?', 'What are the benefits of meditation?', 'How do I improve my memory?', 'What is the speed of light?', \"Who wrote 'To Kill a Mockingbird'?\", 'How does an electric car work?', 'What’s the best way to save money?', 'How do I bake a chocolate cake?', 'What’s the healthiest type of bread?', 'Who invented the internet?', 'How do airplanes stay in the air?', 'What are some famous landmarks in Italy?', 'What’s the difference between a virus and bacteria?', 'How do I learn to play the guitar?', 'What’s the best way to learn to swim?', 'What’s the tallest mountain in the world?', 'How does the stock market work?']\n",
          "['scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'scam', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', 'other', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]\n"
         ]
        }
       ],
       "source": [
        "# unpack the test data\n",
        "X, y = zip(*test_data)\n",
        "\n",
        "X = list(X)\n",
        "y = list(y)\n",
        "\n",
        "print(X)\n",
        "print(y)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 245,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:11:54 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.36s/it]\n",
          "Training: 100%|██████████| 500/500 [00:07<00:00, 63.91it/s, acc=0.84]\n"
         ]
        }
       ],
       "source": [
        "# Call the fit method\n",
        "second_router.fit(X=X, y=y)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 246,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]2025-03-23 13:12:04 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
          "Generating embeddings: 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Accuracy: 84.42%\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "\n"
         ]
        }
       ],
       "source": [
        "accuracy = second_router.evaluate(X=X, y=y)\n",
        "print(f\"Accuracy: {accuracy*100:.2f}%\")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 247,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Updated route thresholds: {'scam': 0.36363636363636365, 'other': 0.33333333333333337}\n"
         ]
        }
       ],
       "source": [
        "route_thresholds = second_router.get_thresholds()\n",
        "print(\"Updated route thresholds:\", route_thresholds)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "### Creating GraphAI workflow with Sparse Router Detection"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Now we are moving on to use this router in graphai.\n"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Now we can import the `OpenAILLM` class from the `semantic_router.llms` package.\n",
        "\n",
        "With this we can define the agent and pass in the `gpt-4o` large language model."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 248,
       "metadata": {},
       "outputs": [],
       "source": [
        "from pydantic_ai import Agent\n",
        "\n",
        "from semantic_router.llms import OpenAILLM\n",
        "\n",
        "llm = OpenAILLM(name=\"gpt-4o-2024-08-06\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Now we want to define the nodes that will be used in the graph.\n",
        "\n",
        "We will create a function decoratored with the `@node` decorator from the `graphai` package.\n",
        "\n",
        "Then we can pass through the query and response to each node."
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "For this example we will create a `Respond` node that will use the `agent` to respond to the user's query.\n",
        "\n",
        "We will also create a `Check` and `CheckScam` node that will use the `router` to check which route the user's query should be routed to.\n",
        "\n",
        "If the query is about BYD we will respond with a message, however if the query is about Tesla, Polestar, or Rivian we will respond with a different predefined message."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 249,
       "metadata": {},
       "outputs": [],
       "source": [
        "from graphai import router, node\n",
        "from semantic_router.schema import Message\n",
        "\n",
        "\n",
        "@node(start=True)\n",
        "async def Check():\n",
        "    print(\"Check\")\n",
        "    return {\"result\": \"Checking for BYD specific queries\"}\n",
        "\n",
        "\n",
        "@router()\n",
        "async def Check_Router(query: str):\n",
        "    print(\"Check_Router\")\n",
        "    print(query)\n",
        "    result = first_router(text=query)\n",
        "    print(result.name)\n",
        "    if result.name == \"byd\":\n",
        "        return {\"result\": \"Checking Scam\", \"choice\": \"Respond\"}\n",
        "    else:\n",
        "        return {\"result\": f\"We dont talk about {result.name} here\"}\n",
        "\n",
        "\n",
        "@node()\n",
        "async def CheckScam():\n",
        "    print(\"CheckScam\")\n",
        "    return {\"result\": \"Checking for Scam specific queries\"}\n",
        "\n",
        "\n",
        "@router()\n",
        "async def CheckScam_Router(query: str):\n",
        "    print(\"CheckScam_Router\")\n",
        "    result = second_router(text=query)\n",
        "    if result.name == \"other\":\n",
        "        return {\"result\": \"Responding to query\", \"choice\": \"Respond\"}\n",
        "    else:\n",
        "        return {\"result\": f\"We dont talk about {result.name} here\"}\n",
        "\n",
        "\n",
        "@node()\n",
        "async def Respond(query: str):\n",
        "    print(\"Respond\")\n",
        "    messages = [\n",
        "        Message(\n",
        "            role=\"system\", content=\"\"\"You are a helpful assistant, be wary of scams.\"\"\"\n",
        "        ),\n",
        "        Message(\n",
        "            role=\"user\",\n",
        "            content=(f\"Response to the following query from the user: {query}\\n\"),\n",
        "        ),\n",
        "    ]\n",
        "    response = llm(messages=messages)\n",
        "    return {\"result\": response}\n",
        "\n",
        "\n",
        "@node(end=True)\n",
        "async def Node_End():\n",
        "    print(\"Node_End\")\n",
        "    return {\"output\": \"Completed\"}"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Next we need to define the `graph` object."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 250,
       "metadata": {},
       "outputs": [],
       "source": [
        "from graphai import Graph\n",
        "\n",
        "graph = Graph()"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "For each node we need to add it to the graph.\n",
        "\n",
        "Then we will need to declare the routers and define the sources, router function, and destinations.\n",
        "\n",
        "Then we can build the graph by adding edges between the nodes."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 251,
       "metadata": {},
       "outputs": [],
       "source": [
        "for node_fn in [Check, CheckScam, Respond, Node_End]:  # CheckScam Respond\n",
        "    graph.add_node(node_fn)\n",
        "\n",
        "# add the router\n",
        "graph.add_router(\n",
        "    sources=[Check], router=Check_Router, destinations=[CheckScam, Node_End]\n",
        ")\n",
        "\n",
        "# add the router\n",
        "graph.add_router(\n",
        "    sources=[CheckScam], router=CheckScam_Router, destinations=[Respond, Node_End]\n",
        ")\n",
        "\n",
        "graph.add_edge(source=Respond, destination=Node_End)"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Now we can test the graph by passing in a query via the graph variable we set up earlier."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 254,
       "metadata": {},
       "outputs": [
        {
         "data": {
          "text/plain": [
           "(graphai.nodes.base._Node._node.<locals>.NodeClass,\n",
           " [graphai.nodes.base._Node._node.<locals>.NodeClass])"
          ]
         },
         "execution_count": 254,
         "metadata": {},
         "output_type": "execute_result"
        }
       ],
       "source": [
        "graph.start_node, graph.end_nodes"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Then we can use the `execute` method to pass in a query and get the response."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 261,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Check\n",
          "Check_Router\n",
          "how much can i sell my byd for?\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:13:23 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "byd\n",
          "Respond\n"
         ]
        },
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "2025-03-23 13:13:26 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
         ]
        },
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Node_End\n"
         ]
        }
       ],
       "source": [
        "response = await graph.execute(input={\"query\": \"how much can i sell my byd for?\"})"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "Then using the `graph` variable we defined from the nodes, we should be able to print the response and query."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 263,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "Response:  To determine how much you can sell your BYD vehicle for, you'll need to consider several factors:\n",
          "\n",
          "1. **Model and Year**: The specific model and year of your BYD vehicle will significantly impact its value. Newer models or those with desirable features typically sell for more.\n",
          "\n",
          "2. **Condition**: The overall condition of the car, including the exterior, interior, and mechanical components, will affect its price. Cars in excellent condition with no major issues will fetch higher prices.\n",
          "\n",
          "3. **Mileage**: Lower mileage usually increases a car's value, as it suggests less wear and tear.\n",
          "\n",
          "4. **Market Demand**: The demand for BYD vehicles in your area can influence the selling price. If there is high demand and low supply, you might be able to sell for a higher price.\n",
          "\n",
          "5. **Location**: Prices can vary based on your geographic location due to differences in demand and local market conditions.\n",
          "\n",
          "6. **Modifications and Features**: Any additional features or\n",
          "Query:  how much can i sell my byd for?\n"
         ]
        }
       ],
       "source": [
        "print(\"Response: \", response[\"result\"])\n",
        "print(\"Query: \", response[\"query\"])"
       ]
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": ".venv",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
       "version": "3.12.7"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 2
    }
    

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Safely retrieve API keys

    Use os.environ.get to safely retrieve API keys and avoid a KeyError when the
    environment variables are not set.

    docs/integrations/graphai/sparse-threshold-optimization-guardrail-graphai.ipynb [146-174]

    -os.environ["AURELIO_API_KEY"] = os.environ["AURELIO_API_KEY"] or getpass(
    +os.environ["AURELIO_API_KEY"] = os.environ.get("AURELIO_API_KEY") or getpass(
         "Enter your Aurelio API key: "
     )
    -os.environ["OPENAI_API_KEY"] = os.environ["OPENAI_API_KEY"] or getpass(
    +os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY") or getpass(
         "Enter your OpenAI API key: "
     )
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion replaces direct environment variable access with os.environ.get to avoid a potential KeyError when the API key is not preset. It improves safety by ensuring that missing keys are handled gracefully, though it addresses a minor error-handling improvement rather than a critical bug.

    Medium

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant