Refactor existing HIPAA compliance Slack bot to support:
- Swappable LLM providers (currently Gemini, future Claude)
- Adaptive confidence thresholds based on question complexity
- Hybrid knowledge base (local vector DB + live integration search)
- Multi-bot architecture (HIPAA bot now, architecture/onboarding bots later)
- Clean separation of concerns with ports & adapters pattern
Working bot with basic RAG functionality. Needs architectural improvements for:
- Provider swappability (locked to Gemini)
- Better confidence scoring (static threshold)
- Prompt management (hardcoded)
- Knowledge base routing (no hybrid strategy)
Clients (External API wrappers):
SlackClient- Parse events, send messagesVectorDBClient- Pinecone/Weaviate wrapperLLMClient- Gemini/Claude API wrapper
Services (Business logic):
Augmenter- Personality injection, prompt building, output formattingConfidenceEvaluator- Adaptive threshold based on question complexity + chunk qualityKnowledgeBaseRouter- Select primary KB, fallback to integrationsPromptManager- Load prompts from files
Providers (Swappable backends):
RAGProvider(interface) - embed(), retrieve(), generate()GeminiRAGProvider(current)ClaudeRAGProvider(future)
Core:
MessageValidator- Pre-filter confidence scoringBotEngine- Orchestrates flow
Bots (Domain-specific):
HipaaBot- Uses RAGProvider + Augmenter + config
App:
app.py- Flask, webhooks, DI setup
Current: Static threshold (e.g., 0.75) Target: Adjust based on question complexity
class ConfidenceEvaluator:
BASE_THRESHOLD = 0.75
MAX_ADJUSTMENT = 0.20
def should_answer(self, question: str, chunks: List[Chunk]) -> tuple[bool, dict]:
complexity = self._calculate_complexity(question) # 0.0-1.0
chunk_quality = self._calculate_chunk_quality(chunks)
adjusted_threshold = self.BASE_THRESHOLD - (complexity * self.MAX_ADJUSTMENT)
should_answer = chunk_quality['avg_score'] >= adjusted_threshold
return should_answer, metadata
def _calculate_complexity(self, question: str) -> float:
score = 0.0
q_lower = question.lower()
# Length factor
if len(question.split()) > 25: score += 0.3
elif len(question.split()) > 15: score += 0.15
# Specificity indicators
if any(w in q_lower for w in ['specific', 'edge case', 'exception']):
score += 0.25
# Comparative/multi-part
if any(w in q_lower for w in ['vs', 'versus', 'compared']):
score += 0.2
return min(score, 1.0)Current: Direct Gemini calls Target: Abstract interface
class RAGProvider(ABC):
@abstractmethod
def embed(self, text: str) -> List[float]:
pass
@abstractmethod
def retrieve(self, query_vector: List[float], collection: str, top_k: int) -> List[Chunk]:
pass
@abstractmethod
def generate(self, prompt: str) -> str:
pass
class GeminiRAGProvider(RAGProvider):
def __init__(self, api_key, vector_db_client):
self.embedder = GeminiEmbedder(api_key)
self.retriever = VectorDBRetriever(vector_db_client)
self.generator = GeminiGenerator(api_key)Current: Hardcoded prompts Target: File-based with runtime reload
prompts/
├── hipaa_bot/
│ ├── system.txt
│ ├── user_template.txt
│ └── fallback.txt
└── architecture_bot/
├── system.txt
└── user_template.txt
class PromptManager:
def load(self, bot_name: str, template_name: str) -> str:
path = f"{self.prompts_dir}/{bot_name}/{template_name}.txt"
# Load and cache
def reload(self):
# Clear cache for runtime editingCurrent: Local vector DB only Target: Local KB + live integration fallback
def retrieve(self, question: str, config: dict) -> List[Chunk]:
# Step 1: Search local KB
local_chunks = self.vector_db.search(question, collection)
# Step 2: Assess if we need live data
needs_live = self._assess_recency_need(question) # "who's working on X now?"
local_quality = self._assess_quality(local_chunks)
# Step 3: Decide strategy
if needs_live['required']:
live_chunks = self._search_integrations(question)
return self._merge_chunks(local_chunks, live_chunks)
if local_quality['sufficient']:
return local_chunks
# Fallback to live if local is weak
return local_chunks # or search liveSlack message
↓
SlackClient.parse() → Message object
↓
MessageValidator.validate() → Pass/Fail (pre-filter)
↓
BotEngine.route() → HipaaBot
↓
HipaaBot.process():
1. query_vector = RAGProvider.embed(question)
2. chunks = RAGProvider.retrieve(query_vector, collection)
3. should_answer, metadata = ConfidenceEvaluator.should_answer(question, chunks)
4. if not should_answer → return fallback
5. prompt = Augmenter.build_prompt(question, chunks, personality, rules)
6. answer = RAGProvider.generate(prompt)
7. formatted = Augmenter.format_output(answer, config)
↓
SlackClient.send(formatted) → Thread reply
Preserve existing functionality, enhance architecture incrementally:
- Create
RAGProviderabstract class - Wrap current Gemini code in
GeminiRAGProvider - Update bot to use provider interface
- No behavior change, just abstraction
- Extract current threshold logic
- Add complexity scoring
- Keep existing threshold as fallback
- Enhancement: smarter answering
- Move prompts to files
- Add reload capability
- Keep existing prompts as default
- Enhancement: editable prompts
- Extract personality/formatting logic
- Centralize prompt building
- Enhancement: reusable across bots
- Make config per-bot
- Add BotEngine router
- Future: architecture_bot, onboarding_bot
- max(new_structure, current_code) - Keep what works, enhance what doesn't
- No big-bang rewrites - Incremental refactoring
- Preserve existing behavior - Tests should still pass
- Add abstraction where needed - Don't over-engineer
- File-based config - Simple, version-controlled
- Slack integration (if working)
- Vector DB setup (if working)
- Gemini API calls (wrap, don't replace)
- Basic message flow (preserve)
- Hardcoded thresholds → Adaptive
- Coupled LLM provider → Swappable interface
- Hardcoded prompts → File-based
- Monolithic bot → Layered architecture
hipaa-bot/
├── app.py # Flask + DI
├── config/
│ ├── providers.yaml # LLM provider configs
│ └── bots/
│ └── hipaa_bot.yaml # Bot-specific config
├── prompts/
│ └── hipaa_bot/
│ ├── system.txt
│ ├── user_template.txt
│ └── fallback.txt
├── bots/
│ └── hipaa_bot.py
├── core/
│ ├── bot_engine.py
│ └── message_validator.py
├── services/
│ ├── augmenter.py
│ ├── confidence_evaluator.py
│ ├── knowledge_base_router.py # Future
│ └── prompt_manager.py
├── providers/
│ ├── rag_provider.py # Interface
│ ├── gemini_provider.py
│ └── claude_provider.py # Future
├── clients/
│ ├── slack_client.py
│ ├── vector_db_client.py
│ └── llm_client.py
└── data/
└── hipaa_docs/
After refactoring:
- ✅ Can swap Gemini → Claude with config change only
- ✅ Adaptive threshold lowers false negatives on complex questions
- ✅ Prompts editable without code changes
- ✅ Clean separation: clients → services → providers → bots
- ✅ Easy to add new bots (architecture_bot, etc.)
- ✅ All existing tests pass
- ✅ No regression in answer quality
- Start with Phase 1 (RAGProvider interface)
- Preserve working code - wrap, don't rewrite
- Add tests for new components before refactoring
- Keep git history clean - one phase per commit
- Ask before big changes - discuss tradeoffs
Current priority: Phase 1 - Extract RAGProvider interface while preserving all existing Gemini functionality.