AI Architecture Deep Dive

API Endpoint

The AI implementation is accessible at: http://localhost:3000/api/ai/query

System Flow

Core Request Pipeline

Step	Component	Description
1. Query Reception	Frontend → API	User submits natural language question
2. Cache Check	Exact-match cache	Returns cached response if exact normalized query seen within 5 min
3. Complexity Scoring	`requiresAIPath()` function	Scores query based on aggregations, entity mentions, phrase patterns; threshold ≥4 triggers AI path
4a. Simple Path	Keyword Search	Extracts keywords, checks keyword cache, then searches `searchText` using PostgreSQL full-text
4b. Complex Path	Text-to-SQL Model	Generates SQL → post-processing fixes → executes against database
5. Response Generation	Response Model	Formats results into conversational answer
6. Evaluation	Judge Model	Asynchronously scores SQL quality and logs results

Detailed Flow Description

1. Query Reception

When a user interacts with the AI chatbot by sending a query, the application executes the following decision tree.

2. Cache Check

The application first checks if the exact query was sent within the past 5 minutes:

Cache HIT: Returns cached results immediately
Cache MISS: Proceeds to complexity analysis

3. Query Classification

The application determines whether the user prompt requires AI assistance:

A. Keyword Search Path (Low Complexity) If the query is not complex enough to need the TEXT2SQL_MODEL:

Check for cached table data
If cache MISS, query searchText fields (pre-optimized for rapid lookup)
Store both query (as key) and results in cache
Proceed to response generation

B. AI Path (High Complexity) If the TEXT2SQL_MODEL is required:

Send user query to TEXT2SQL_MODEL
Convert natural language to SQL
Execute generated SQL against PostgreSQL
Proceed to response generation

4. Response Generation

All results paths converge at the AI_RESPONSE_MODEL:

Keyword search results
AI-generated query results
Cached query results

The response model converts raw data into human-friendly format before returning to the frontend chatbot.

Note: This flow could be improved if cache were set AFTER human-friendly responses were generated and validated, so valid responses could immediately be returned to the user and responses would not need to be generated every time. However, a more robust validation step would be necessary to prevent invalid responses from being cached.

5. Non-Blocking Evaluation

The JUDGE_MODEL operates asynchronously, not blocking user response:

Evaluation Logic:

Test set matches: If query matches src/server/aiTest/test-questions.json, compare resultsCount against expected count from ground truth. (See also: src/server/aiTest/test_questions.md for quick copy/paste testing)
No match: LLM-as-Judge autonomously evaluates quality

Output: Scores and explanations are saved to: src/server/aiTest/judgements/

Component Details

Text-to-SQL Model (7B)

Aspect	Current Implementation	Production Target
Purpose	Convert natural language to PostgreSQL queries	Same
Model	Generic 7B with prompt engineering	Fine-tuned on company-specific query patterns
Post-processing	Extensive regex fixes: SQL extraction, quote fixing, UNION correction, identifier validation	Fine-tuning would eliminate most post-processing
Limitations	May hallucinate tables/columns; requires aggressive cleaning; not all models follow instructions	Specialized model would generate cleaner SQL

Response Model (7B)

Aspect	Current Implementation	Production Target
Purpose	Transform raw data into natural language answers	Same
Model	Generic 7B with instruction prompts	Fine-tuned on company tone and terminology
Note	Same model also serves as Judge (dual-purpose)	Would use specialized models in production

Judge Model (7B)

Aspect	Current Implementation	Production Target
Purpose	Asynchronous quality evaluation	Same + human validation
Logic	Test set match → compare counts; No match → LLM-as-Judge	Add semantic correctness checks, human sampling
Limitations	Count comparison doesn't verify data correctness; LLM-judge accuracy bounded by model capabilities	Human validation would catch subtle errors

Design Decisions & Trade-offs

Decision	Why	Trade-off
Model specialization	Each task (SQL, response, evaluation) has different requirements	More complex orchestration
Non-blocking evaluation	Users get immediate responses	Quality feedback is delayed
Local-only models	Data privacy, no API costs	Limited to 7B parameter models
5-minute TTL cache	Simple implementation	Misses semantic similarities
Dual-purpose response/judge model	Reduces resource requirements	Likely compromises performance of both tasks
In-memory caching	Fast, simple for prototype	Does not scale beyond small datasets (RAM limits, no distribution)
Rule-based complexity scoring	Zero-cost, transparent routing	Brittle at scale - misses novel query patterns; would need better classifier for production

Caching Strategy

Aspect	Current Implementation	Production Enhancement
Type	In-memory, exact-match	Semantic + partial caching
TTL	5 minutes	Variable based on data freshness needs
Scope	Full query results	Partial results, embeddings
Scale Limitation	Works only with small datasets	Would need Redis or Memcached, sharding, and eviction policies

Production Roadmap

If this were moving to production, I'd prioritize:

Phase	Focus	Key Improvements
Phase 1	Accuracy	Add RAG and fine-tune models on actual query logs; add confidence scoring; basic human review for edge cases
Phase 2	Reliability	Add observability, better evaluation, governance. Log everything, catch errors, regenerate; monitoring and alerting
Phase 3	Consider adding Security & Privacy layer	Add PII detection/redaction; output guardrails if any chance data might be exposed to internet, even though this application runs locally and offline
Phase 4	Continuous Improvement	Use human feedback to augment training data
Phase 5	Scale	Replace in-memory cache with Redis; add semantic caching; implement query logging for analysis

What I'm Still Learning

This project sparked curiosity about what comes next for future projects. I'm actively exploring:

Topic	Why It Matters
Semantic caching	Moving beyond exact-match to cache by meaning - critical for production scale
Agentic AI	Moving from single-turn SQL generation to using an AI agent that can plan, decompose tasks, interact with filesystems, spawn sub-agents, call tools and use skills in a self-correcting loop
Agent observability	Understanding why an agent failed or what decision it made
PII handling	Protecting user data isn't optional - learning detection/redaction patterns
Guardrails	Preventing prompt injection and inappropriate outputs before they reach users
Human-in-the-loop	Designing efficient review workflows for edge cases
Evaluation metrics	Result counts aren't enough - need semantic correctness
Scaling AI systems	Understanding where this prototype would break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Architecture Deep Dive

API Endpoint

System Flow

Core Request Pipeline

Detailed Flow Description

1. Query Reception

2. Cache Check

3. Query Classification

4. Response Generation

5. Non-Blocking Evaluation

Component Details

Text-to-SQL Model (7B)

Response Model (7B)

Judge Model (7B)

Design Decisions & Trade-offs

Caching Strategy

Production Roadmap

What I'm Still Learning

Related Documentation

FilesExpand file tree

ai-architecture.md

Latest commit

History

ai-architecture.md

File metadata and controls

AI Architecture Deep Dive

API Endpoint

System Flow

Core Request Pipeline

Detailed Flow Description

1. Query Reception

2. Cache Check

3. Query Classification

4. Response Generation

5. Non-Blocking Evaluation

Component Details

Text-to-SQL Model (7B)

Response Model (7B)

Judge Model (7B)

Design Decisions & Trade-offs

Caching Strategy

Production Roadmap

What I'm Still Learning

Related Documentation