Skip to content

Opensearch Plugin

Madhumita Subramaniam edited this page Apr 29, 2025 · 10 revisions

dovie

title Dovie AI Demo: Permissioned Chat Bot


actor User
participant BOT
participant Deepseek
participant AS
participant API
participant OpenSearch
participant Disk
participant Plugin
participant Open Search Cedarling
participant Lock Server
participant ITDR


autonumber 1

User<->BOT: Invoke Bot
box over BOT: Dovie.ai\nStarting up!
BOT<->AS: Register
BOT<->AS: Get JWT Access Token
BOT->API: Get me all data for tenant Acme_Inc and Account Foo_Bar
API->OpenSearch: Return all data for tenant=acme_inc
OpenSearch<->Disk: fetch bits
OpenSearch->Plugin: Filter out unauthorized data
Plugin<->Open Search Cedarling: Authorize data against policies
Plugin->OpenSearch: data
OpenSearch->API: data
API->BOT: data
BOT<->Deepseek: train
BOT->User: Can I help you?
Open Search Cedarling->Lock Server: Send Logs
Lock Server->ITDR: Call ITDR API with identity-key corrolated data

Notes

  • Although the diagram above only shows the Open Search Cedarling sending logs to the Lock Server, BOT, AS, and API also have Cedarling policy stores and logs. This is how we get a chain of custody from the device to the database (i.e. Zero Trust).

  • What if certain records for tenant Acme are labeled confidential, and there is a policy that no confidential information should be returned to the bot.

Some basic reference documents:

  1. Opensearch lingo - https://www.instaclustr.com/blog/learning-opensearch-from-scratch-part-1/

  2. Deepseek - Opensearch connector + BOT https://opensearch.org/blog/OpenSearch-Now-Supports-DeepSeek-Chat-Models/

Cedarling schema

  • Principal : AI BOT
  • Action : Read
  • Rescource : tickets
  • Context : tenent = ABC, account = PQR, level = 1

Entity Types:

  1. User: the human making the request
  2. Bot: the software service (e.g. bot:SupportBot)
  3. Tenant (e.g. Acme_Inc)
  4. Account (e.g. Foo_Bar under Acme_Inc)
  5. Ticket (a support ticket)
schema {
  entity User {
    roles: set<string>,
    tenant: Tenant,
    account: Account
  }

  entity Bot {
    // bots may be allowed to act on behalf of users
    authorized_users: set<User>
  }

  entity Tenant {}

  entity Account {
    tenant: Tenant
  }

  entity Ticket {
    tenant: Tenant,
    account: Account
  }

  action ViewTicket
}


Policy

permit(
  principal: Bot,
  action == Action::"ViewTicket",
  resource: Ticket
)
when {
  // Acting on behalf of some user
  some user in principal.authorized_users
  if user.tenant == resource.tenant &&
     user.account == resource.account;
};

Example:

User::"user:agent_1" {
  roles: ["support_agent"],
  tenant: Tenant::"tenant:Acme_Inc",
  account: Account::"account:Foo_Bar"
}

Bot::"bot:SupportBot" {
  authorized_users: [User::"user:agent_1"]
}

Ticket::"ticket:456" {
  tenant: Tenant::"tenant:Acme_Inc",
  account: Account::"account:Foo_Bar"
}

Indices in Opensearch

Index Purpose
users Store user metadata
tickets Store ticket metadata & questions
ticket_answers Store support answers

Index: Tickets

PUT tickets
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "title": { "type": "text" },
      "description": { "type": "text" },
      "status": { "type": "keyword" },
      "is_deleted": { "type": "boolean" },
      "date_added": { "type": "date" },
      "date_modified": { "type": "date" },
      "assigned_to_id": { "type": "keyword" },
      "created_by_id": { "type": "keyword" },
      "modified_by_id": { "type": "keyword" },
      "is_private": { "type": "boolean" },
      "link_url": { "type": "keyword" },
      "answers_no": { "type": "integer" },
      "send_copy": { "type": "boolean" },

      "os_type": { "type": "keyword" },
      "os_version": { "type": "keyword" },
      "ram": { "type": "keyword" },
      "gluu_server_version_id": { "type": "keyword" },
      "gluu_server_version_comments": { "type": "text" },
      "created_for_id": { "type": "keyword" },
      "issue_type": { "type": "keyword" },
      "last_notification_sent": { "type": "date" },
      "ticket_category": { "type": "keyword" },
      "company_association_id": { "type": "keyword" },
      "visits": { "type": "integer" },

      "os_version_name": { "type": "text" },
      "meta_keywords": { "type": "text" },
      "set_default_gluu": { "type": "boolean" },
      "os_name": { "type": "keyword" },
      "container_management": { "type": "keyword" },
      "deployment_architecture": { "type": "keyword" },
      "gluu_edition": { "type": "keyword" },

      "company": { "type": "keyword" },       <-- for tenant filtering
      "account": { "type": "keyword" },
      "full_text": { "type": "text" }         <-- optional field for RAG/chatbot search
    }
  }
}

Index: Ticket_answer

PUT ticket_answers
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "ticket_id": { "type": "keyword" },
      "answer": { "type": "text" },
      "link_url": { "type": "keyword" },
      "privacy": { "type": "keyword" },
      "is_deleted": { "type": "boolean" },
      "date_added": { "type": "date" },
      "date_modified": { "type": "date" },
      "created_by_id": { "type": "keyword" },
      "send_copy": { "type": "boolean" },
      "is_from_email": { "type": "boolean" },

      "company": { "type": "keyword" },
      "account": { "type": "keyword" },
      "full_text": { "type": "text" }         <-- useful for search/RAG
    }
  }
}

Index: Users

PUT users
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "username": { "type": "keyword" },
      "email": { "type": "keyword" },
      "first_name": { "type": "text" },
      "last_name": { "type": "text" },
      "is_active": { "type": "boolean" },
      "is_superuser": { "type": "boolean" },
      "is_staff": { "type": "boolean" },
      "last_login": { "type": "date" },
      "date_joined": { "type": "date" },
      "modified": { "type": "date" },

      "company": { "type": "keyword" },
      "is_company_admin": { "type": "boolean" },
      "job_title": { "type": "text" },
      "mobile_number": { "type": "keyword" },
      "idp_uuid": { "type": "keyword" },
      "company_association_id": { "type": "keyword" },
      "timezone": { "type": "keyword" },

      "receive_all_notifications": { "type": "boolean" },
      "crm_uuid": { "type": "keyword" },
      "get_email_notification": { "type": "boolean" },
      "all_ticket_permission": { "type": "boolean" },
      "is_from_registration": { "type": "boolean" },
      "is_onboarding_email_sent": { "type": "boolean" }
    }
  }
}

Steps in Opensearch

  1. Create indices
  2. import data from mysql
from opensearchpy import OpenSearch, helpers
import pymysql
from tqdm import tqdm

# --- Config ---
MYSQL_CONFIG = {
    "host": "localhost",
    "user": "your_user",
    "password": "your_password",
    "db": "your_db",
    "cursorclass": pymysql.cursors.DictCursor
}

OPENSEARCH_CONFIG = {
    "hosts": [{"host": "localhost", "port": 9200}],
    "http_auth": ("admin", "admin"),  # if using basic auth
    "use_ssl": False
}

# --- Connect to MySQL ---
mysql_conn = pymysql.connect(**MYSQL_CONFIG)
os_client = OpenSearch(**OPENSEARCH_CONFIG)

# --- Helper to bulk index ---
def bulk_index(index_name, docs):
    actions = [
        {
            "_index": index_name,
            "_source": doc
        }
        for doc in docs
    ]
    helpers.bulk(os_client, actions)

# --- Read Tickets ---
def get_tickets():
    with mysql_conn.cursor() as cursor:
        cursor.execute("SELECT * FROM tickets")
        return cursor.fetchall()

# --- Read Ticket Answers ---
def get_ticket_answers():
    with mysql_conn.cursor() as cursor:
        cursor.execute("SELECT * FROM ticket_answers")
        return cursor.fetchall()

# --- Main ---
if __name__ == "__main__":
    print("Fetching tickets...")
    tickets = get_tickets()
    print(f"Fetched {len(tickets)} tickets")

    print("Indexing tickets...")
    bulk_index("tickets", tickets)

    print("Fetching ticket answers...")
    answers = get_ticket_answers()
    print(f"Fetched {len(answers)} answers")

    print("Indexing answers...")
    bulk_index("ticket_answers", answers)

    print("✅ Done.")
  1. Query on the lines of :
GET tickets/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "company": "Acme_Inc" }},
        { "term": { "account": "Foo_Bar" }}
      ]
    }
  }
}

  1. Integration with DeepSeek embeddings for vector search

Clone this wiki locally