Skip to content

Subh24ai/neuradesk

Repository files navigation

NeuraDesk

Production-grade agentic IT/HR service platform — multi-agent AI that autonomously resolves enterprise tickets

Python 3.11 License: MIT Tests

Live Demo


What It Does

Enterprise IT/HR teams spend 60–80% of their time on repetitive tickets — password resets, access provisioning, leave approvals, incident creation — that follow predictable patterns. The bottleneck is not intelligence; it is routing, context retrieval, and safe execution at scale.

NeuraDesk routes every incoming ticket through four specialized LangGraph agents: an Intake Agent that classifies intent with a DSPy-optimized classifier, a Knowledge Agent that retrieves relevant articles via hybrid FAISS + BM25 retrieval with cross-encoder reranking, an Action Agent that executes enterprise API calls behind an explicit confirmation gate for destructive operations, and an Escalation Agent that hands off unresolved tickets with full state attached.

Architecture

graph TD
    A([Employee]) -->|text or screenshot| B[Intake Agent<br/>Vision OCR · DSPy triage · 9 categories]
    B -->|category + confidence| C[Knowledge Agent<br/>FAISS + BM25 + cross-encoder RAG]
    C -->|grounded resolution| D[Action Agent<br/>ITSM · HR · IAM APIs]

    D -->|destructive intent| GATE{Confirmation gate<br/>access_revoke · account_lock · account_delete}
    GATE -->|confirmed| IAM[IAM API<br/>bearer auth · audit log]
    IAM -->|executed| E
    GATE -->|cancelled| G

    D -->|resolved| E([Ticket resolved<br/>WebSocket · LangSmith trace])
    D -->|low confidence · unknown · API error| F[Escalation Agent]
    F -->|structured handoff| G([Human queue<br/>email · Slack])

    C -.->|A2A HTTP/SSE| EXT1([External agent])
    C -.->|MCP stdio| EXT2([Claude Desktop])
Loading

Key:

  • Solid arrows — primary ticket flow
  • Dashed arrows — external protocol surfaces
Agent Role
Intake DSPy-optimized classifier — assigns category, intent, priority, and confidence score
Knowledge Hybrid retrieval: FAISS semantic search + BM25 lexical search + cross-encoder reranking
Action Executes ITSM/HR API calls; blocks destructive operations until explicitly confirmed
Escalation Routes to the correct support tier with complete agent context attached

Stack

Layer Technology
Orchestration LangGraph 1.2, typed TicketState
RAG FAISS + rank-bm25 + sentence-transformers cross-encoder
Prompt optimization DSPy 2.5
LLM Groq (llama-3.3-70b-versatile) — swappable via LLM_PROVIDER env var (Anthropic/OpenAI supported)
Tracing LangSmith — every node is a named span
API FastAPI 0.115, WebSocket streaming, structlog
Auth JWT (PyJWT) + bcrypt, 8-hour sessions
Database PostgreSQL, SQLAlchemy 2.0 mapped_column
Cloud GCP Cloud Run, Docker, docker-compose
Testing pytest, RAGAS evaluation suite

Key Features

  • ✅ Multi-agent orchestration with LangGraph — 4 nodes, typed TicketState, conditional routing
  • ✅ Hybrid RAG — FAISS semantic + BM25 lexical + cross-encoder reranking
  • ✅ DSPy-optimized ticket classifier with offline prompt compilation
  • ✅ Multimodal input — plain text and base-64 encoded screenshots
  • ✅ A2A protocol endpoint on the Knowledge Agent (agent-to-agent interop)
  • ✅ Production safety — explicit confirmation gate blocks all destructive API calls
  • ✅ Auto-escalation with complete agent state forwarded to the human queue
  • ✅ LangSmith tracing on every node — no silent agent execution
  • ✅ JWT auth + session management (GET /auth/sessions, remote revocation), JSON-lines audit log, structured errors
  • ✅ RAGAS evaluation suite with CI enforcement on faithfulness and answer relevance
  • ✅ Prompt injection guard — detects system:, role-override, and jailbreak patterns; caps confidence to 0.3 so adversarial tickets escalate safely

Quickstart

Prerequisites: Python 3.11, Docker (for Postgres — optional, SQLite works locally)

git clone https://github.com/Subh24ai/neuradesk.git
cd neuradesk

python3.11 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip && pip install -e ".[dev]"

cp .env.example .env
# Fill in: GROQ_API_KEY, ENTERPRISE_API_SECRET, API_SECRET_KEY, A2A_API_KEY

Full stack with Docker (recommended):

docker-compose up --build
# Backend → localhost:8000   Enterprise mock API → localhost:8001

Local dev without Docker (SQLite fallback):

# Terminal 1 — Enterprise mock API (port 8001):
ENTERPRISE_API_SECRET=local-dev-secret-123 \
  uvicorn services.enterprise_api:app --port 8001

# Terminal 2 — Main backend (port 8000):
uvicorn api.main:app --reload --port 8000

# Terminal 3 — Frontend (port 3000):
cd frontend && npm install && npm run dev

Submit your first ticket:

# Step 1 — register. With no SMTP configured, the OTP is returned inline as
# "dev_otp" (dev convenience only). Capture it with jq:
OTP=$(curl -s -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@company.com", "password": "testpass123", "first_name": "Ada", "last_name": "Lovelace", "org_name": "Acme Inc"}' | jq -r .dev_otp)

# Step 2 — verify the OTP to activate the account and get an 8-hour JWT:
TOKEN=$(curl -s -X POST http://localhost:8000/auth/verify-otp \
  -H "Content-Type: application/json" \
  -d "{\"email\": \"you@company.com\", \"otp\": \"$OTP\"}" | jq -r .access_token)

curl -s -X POST http://localhost:8000/tickets \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "I forgot my password"}' | jq .

Run tests:

pytest tests/ -v          # 202 tests

Run the load test:

# Start both servers, then:
python3 tests/load_test.py

Project Structure

neuradesk/
├── agents/              # LangGraph nodes and typed TicketState
│   ├── state.py         # Single TypedDict threaded through every node
│   ├── graph.py         # Wiring, conditional routing, entry point
│   ├── intake_node.py   # Category · intent · priority · confidence
│   ├── knowledge_node.py# FAISS + BM25 retrieval + cross-encoder reranking
│   ├── action_node.py   # Enterprise API dispatch + destructive-action gate
│   └── escalation_node.py # Human handoff with full state
├── api/                 # FastAPI app — auth, ticket routes, WebSocket stream, admin SSE
├── core/                # LLM factory, DSPy config, security (injection guard)
├── notifications/       # Slack incoming-webhook alerts
├── storage/             # GCS image upload utility
├── services/            # Mock ITSM/HR endpoints + async JSON-lines audit log
├── rag/                 # Retriever (FAISS + BM25 + cross-encoder)
├── dspy_modules/        # DSPy signatures and compiled classifiers
├── tracing/             # LangSmith @traceable helpers, trace URL utilities
├── tests/               # pytest suite — agents, API, RAG, security
├── infra/               # Dockerfile, GCP Cloud Run config
└── docker-compose.yml   # PostgreSQL · backend · enterprise mock API

API Reference

Main API — port 8000

Method Endpoint Auth Description
POST /auth/register Create account, sends OTP email; returns dev_otp hint when SMTP unset
POST /auth/verify-otp Verify OTP code — returns 8-hour JWT access_token
POST /auth/login Login, returns 8-hour JWT
POST /tickets JWT Create pending ticket — agent graph runs over WebSocket /ws/{ticket_id}
GET /tickets/ JWT Last 20 tickets for the authenticated user
GET /tickets/{id} JWT Full ticket state by ID
WS /ws/{ticket_id} Stream per-node status events in real time
POST /tickets/{id}/confirm-action JWT Confirm a destructive action awaiting authorization
POST /tickets/{id}/cancel JWT Cancel a destructive action (routes to escalation)
GET /auth/sessions JWT List active sessions for the current user
DELETE /auth/sessions/{jti} JWT Revoke a session by JTI (remote sign-out)
GET /admin/stream JWT (query) SSE stream of resolved/escalated ticket events (admin)
GET /.well-known/agent.json A2A Agent Card
POST /tasks/send A2A synchronous knowledge-retrieval task

Enterprise Mock API — port 8001

All endpoints require Authorization: Bearer <ENTERPRISE_API_SECRET> and append to services/audit.jsonl.

Method Endpoint Destructive Description
POST /itsm/reset-password Generate temporary password
POST /itsm/provision-access Grant resource role
POST /hr/approve-leave Approve leave request
POST /itsm/create-incident Open incident record
POST /itsm/notify-manager Email reporting manager
POST /iam/revoke-access Remove user's existing access to a resource
POST /iam/lock-account Disable/suspend a user's account
POST /iam/delete-account Permanently delete a user account (requires confirm=true)

MCP Interface

NeuraDesk ships an MCP tool server (mcp_server.py) that exposes a read-only slice of the platform to MCP clients such as Claude Desktop over stdio. It reuses the same hybrid retriever, database, and category taxonomy as the agent graph. Being stdio (client-launched and local), it carries no bearer auth — unlike the HTTP A2A and enterprise APIs.

Tool Parameters Description
search_knowledge query, org_id Hybrid FAISS + BM25 + cross-encoder search over the global KB merged with an org's private docs; returns the top 3 chunks
get_ticket_status ticket_id Current status, category, resolution, and escalation details for a ticket
resolve_ticket_info category Top 3 knowledge-base resolution steps for a ticket category (retrieval-only)
list_ticket_categories All supported ticket categories with descriptions

Run it (stdio):

python mcp_server.py

The server reads DATABASE_URL from the environment — point it at the same database the API uses.

Use with Claude Desktop

Add this to claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json). The snippet uses absolute paths — replace every /ABS/PATH/ placeholder with your own virtualenv Python and repo paths:

{
  "mcpServers": {
    "neuradesk": {
      "command": "/ABS/PATH/neuradesk/.venv/bin/python",
      "args": ["/ABS/PATH/neuradesk/mcp_server.py"],
      "env": {
        "DATABASE_URL": "sqlite:////ABS/PATH/neuradesk/neuradesk.db"
      }
    }
  }
}

Restart Claude Desktop, then the four tools appear under the MCP (🔌) menu. Try "What ticket categories does NeuraDesk support?" or "Search the knowledge base for VPN setup."

Benchmarks

Metric Value
Ticket resolution latency (P50) 4.28s (Groq LLM)
Ticket resolution latency (P95) 4.71s (Groq LLM)
Concurrent users tested 10 (3 workers)
Success rate 100/100
RAG faithfulness score (RAGAS) 1.000 (10-question eval, llama-3.1-8b-instant judge)
RAG answer relevancy (RAGAS) 0.439 (10-question eval, llama-3.1-8b-instant judge)
DSPy classifier accuracy — zero-shot 92.6% (25/27)
DSPy classifier accuracy — compiled 96.3% (26/27)

See BENCHMARKS.md for full breakdown and latency footnote.

Known Limitations

Fixed in v1.1

Issue Fix
✅ No escalation notifications Email + Slack alerts fire from escalation_node
✅ Destructive actions unconfirmable from UI Confirm/cancel flow via POST /tickets/{id}/confirm-action
✅ FAISS not updated on KB upload add_documents() called after every admin KB insert
✅ JWT not revocable TokenBlocklist table — every authenticated request checked
✅ Images not persisted after upload GCS upload utility; URL stored in tickets.image_url
✅ No admin real-time push SSE stream at GET /admin/stream with per-org queue
✅ WebSocket no reconnect logic Exponential backoff (1 s / 2 s / 4 s) + fetch on reconnect
✅ No Slack notifications on escalation Slack incoming-webhook via notifications/slack.py

Fixed in v1.2

Issue Fix
✅ Hardcoded false <2s stat in UI Updated to ~4s to match BENCHMARKS.md P50
✅ No prompt injection filter core/security.py — 7 patterns, confidence capped to 0.3, 13 tests
✅ RAG threshold disabled (0.0) Threshold set to 0.35; off-topic tickets now escalate instead of hallucinating
✅ Mock sessions in Account panel Real GET /auth/sessions + DELETE /auth/sessions/{jti} endpoints wired to UI
✅ Docker inter-container env var mismatch ENTERPRISE_API_URLENTERPRISE_API_BASE_URL across codebase
✅ GCP deployment live Cloud Run URL — see Live Demo above

Remaining

  • Mock enterprise APIs — services/enterprise_api.py stubs only; no real ITSM/HR integration
  • Groq single point of failure — no fallback LLM configured
  • Audit log: RotatingFileHandler, 10 MB max, 5 backups (~60 MB cap)
  • RAG answer relevancy 0.44 — corpus too small; improves with more KB documents
  • SUPPORT_EMAIL / SMTP must be configured manually per org

Roadmap

  • Week 1 — Core scaffold: LangGraph skeleton, FastAPI, JWT auth, mock enterprise API, 33 passing tests
  • Week 2 — RAG (faithfulness 1.0) ✓, DSPy 96.3% ✓, all agents live ✓ — 80 tests green
  • Week 3 — A2A protocol ✓, LangSmith tracing ✓, CI/CD ✓ — 111 tests green
  • Week 4 — React frontend ✓, GCP deployment ✓, load test ✓ (P50 4.28s, 100/100 success)
  • v1.1 hardening — 8 production fixes: escalation alerts, confirmation flow, FAISS live update, JWT revocation, image persistence, SSE admin push, WS reconnect, Slack webhook — 169 tests green
  • v1.2 security — prompt injection guard, RAG threshold fix, real session management, Docker env fix, GCP Cloud Run deploy — 202 tests green

Built by Subhash Gupta · GitHub

About

Production multi-agent IT/HR service platform — LangGraph + DSPy + RAG resolves enterprise tickets autonomously with A2A protocol support

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors