You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full audit covering security, performance, architecture, code quality, and workspace hygiene.
Ran 3 parallel specialized agents (security-engineer, architecture-strategist, performance-engineer) plus direct quality checks.
Add WebSocket authentication to /ws/transcribe and /ws/speak
HIGH
✅ Done
_authenticate_websocket() helper validates JWT cookie on connect. Rejects with code 4001 if invalid. src/api/routes/voice.py
B2
Add CSRF protection for state-changing POST endpoints
HIGH
✅ Done
CSRFMiddleware in src/api/middleware.py — OWASP custom-header pattern (HX-Request/X-Requested-With). 15 tests in tests/api/test_csrf.py.
B3
Add VocabularyRepository.get_by_id() method
HIGH
✅ Done
Single-row lookup replaces get_all() + Python filter in ReviewService. src/db/repository.py
B4
Persist LangGraph checkpointer across requests
HIGH
✅ Done
Documented singleton pattern with get_checkpointer() async context manager. Dev limitation accepted. src/agent/checkpointer.py
B5
Fix 9 layer violations (inner layers importing from API)
HIGH
✅ Done
Created src/config.py, src/validation.py, src/db/client.py. Old locations are re-export shims. 8 inner-layer imports updated.
B6
Fix ReviewService direct DB access bypassing repository
HIGH
✅ Done
ReviewService now uses VocabularyRepository methods exclusively. No more direct client.table() calls. src/services/review.py
B7
Extract large route files into focused modules (SRP)
HIGH
✅ Done
lessons.py (817→468 lines) — business logic extracted to src/services/lesson_completion.py.
Priority: P2 — Medium Severity ✅ All Done
#
Task
Severity
Status
Notes
B8
Nonce-based CSP for CDN scripts
MEDIUM
✅ Done
Per-request secrets.token_urlsafe(16) nonce in script-src replaces unsafe-inline. unsafe-eval retained for Tailwind CDN (requires build-time CSS migration to remove). All <script> tags in 6 templates use nonce="{{ request.state.csp_nonce }}". 7 tests in tests/api/test_security_headers.py.
B9
Add rate limiting to voice endpoints
MEDIUM
✅ Done
REST: @rate_limited() on POST /api/speak (10 calls/60s). WebSocket: WebSocketMessageRateLimiter class with per-connection sliding window (30 msgs/60s). src/api/rate_limit.py, src/api/routes/voice.py.
B10
Cache LangGraph graph compilation
MEDIUM
✅ Done
Module-level _graph_cache: dict[int, CompiledStateGraph] keyed by id(checkpointer). clear_graph_cache() helper for tests. src/agent/graph.py.
B11
Cache ChatAnthropic instances per profile
MEDIUM
✅ Done
@lru_cache(maxsize=8) on get_llm(). clear_llm_cache() helper for test isolation. src/agent/llm.py.
Full audit covering security, architecture, code quality, dependencies, and deployment.
Remediated on 2026-02-23 via fix/codebase-improvements-2 branch using 10 parallel worktree agents.
Priority: P0 — Security Critical ✅ All Done
#
Task
Severity
Status
Notes
A1
Add security headers middleware
CRITICAL
✅ Done
SecurityHeadersMiddleware in src/api/middleware.py — CSP, HSTS, X-Frame-Options, X-Content-Type-Options.
fail_ci_if_error: true in Codecov action. CI now enforces coverage thresholds.
A24
Reduce conversation version cookie max_age
LOW
✅ Done
Reduced from 1 year to 30 days.
Previous Improvements (2026-02-18) — ✅ All Complete
Expand completed improvement backlog
Priority: High — ✅ All Done
#
Task
Status
Notes
1
Fix VocabularyRepository.upsert() race condition
✅ Done
Insert-first pattern catching PostgreSQL 23505. complete_lesson() also switched to single .upsert(on_conflict=...). increment_correct() documented as concurrency-limited.
2
Remove get_supabase_admin() from agent nodes
✅ Done
User-scoped Supabase client flows through ConversationState/ReviewState → supabase_client field. chat.py passes user_client into graph state.
MAX_MESSAGE_LENGTH = 2000 already exists at line 198.
6
Fix new_conversation checkpoint clearing
✅ Done
src/api/routes/chat.py
Conversation versioning via cookie — new UUID per "new conversation" creates fresh thread_id.
7
Narrow broad except Exception blocks
✅ Done
auth.py, service.py
AuthApiError in auth routes, (YAMLError, ValidationError, OSError) in lesson service.
8
Move keyword filtering server-side in get_due_by_keywords()
✅ Done
src/db/repository.py
Uses .or_() with ilike filters — no more fetching all rows.
Priority: Low — ✅ All Done
#
Task
Status
Files
Notes
9
Remove dead EffectiveUser code
❌ Not dead
src/api/auth.py
Actively used for guest session handling. Task invalid.
10
Delete dead feedback.py stub node
✅ Done
Deleted
51-line stub, never imported.
11
Remove stub methods in VocabularyService
✅ Partial
src/services/vocabulary.py
extract_vocabulary() removed. get_word_bank() is NOT a stub — it calls self._repo.get_recent().
12
Clean up f-string logging
✅ Done
All agent nodes
Fixed across scaffold.py, analyze.py, respond.py, review.py.
13
Document learn.py route in architecture.md
✅ Done
docs/architecture.md
Added Learn (Phase 14) section with endpoint signatures.
14
Update stale deployment configs
✅ Done
render.yaml, Dockerfile
Replaced SQLite references with Supabase env vars.
15
Consider LLM instance caching
✅ Done
src/agent/llm.py
Profile-based caching via get_llm() — instances reused per profile.
JavaScript Quality Improvements (2026-02-25 audit) — ✅ All Done
Identified during code review of src/static/js/ (app.js, stream.js, voice.js — 1549 lines total).
All items resolved as part of Phase 16 ES Module Migration.
Priority: Medium — ✅ All Done
#
Task
File
Status
Notes
J1
Remove dead HTMX event handlers
app.js
✅ Done
Dead handlers removed in ESM refactor
J2
Cache DOM elements at init
app.js
✅ Done
dom.js caches elements
J3
Fix scroll throttle using Math.random()
stream.js
✅ Done
tokenCounter % 3 deterministic throttle
J4
Cache send button reference in stream.js
stream.js
✅ Done
stream.js caches send button
Priority: Low / Cleanup — ✅ All Done
#
Task
File
Status
Notes
J5
Remove console.log in production
app.js
✅ Done
console.log removed in ESM refactor
J6
Remove unused welcomeMessage variable
app.js
✅ Done
Dead variable removed in ESM refactor
J7
Standardize var → const/let in voice.js
voice.js
✅ N/A
Intentionally ES5 per ADR-009 (loaded as type="module" but uses var/prototype)
J8
Sanitize HTMX error detail before console logging
app.js
✅ Done
htmx-handlers.js logs sanitized
Priority: High Effort (Future) — ✅ Done
#
Task
Status
Notes
J9
Add JS unit tests
✅ Done
186 Vitest tests across all modules
Future Ideas
Task
Notes
Scenario roleplay
Ordering food, booking hotel
Multiple AI personas
Beyond Hermano
Offline mode
PWA with service worker
Completed Phases
Phase
Name
Key Deliverable
0
Project Setup
FastAPI + HTMX + Tailwind, CI/CD, pre-commit
1
Basic Chat
LangGraph StateGraph, level-adaptive responses
2
Grammar Feedback
Analyze node, gentle corrections UI
3
Scaffolding
Conditional routing, word banks, click-to-insert
4
Persistence
PostgresSaver checkpointing, session management
5
Supabase Auth
JWT auth, multi-user isolation, 829+ tests
6
Micro-Lessons
Pydantic models, 5 Spanish A0 lessons, HTMX player
7
Progress Tracking
Dashboard stats, vocabulary, charts, streaks
8
Guest Sessions
Session cookies, auth-gated data features
9
AI-Enhanced Lessons
LangGraph subgraphs, Hermano personalization
10
Content Expansion
60 lessons (3 lang x 4 levels x 5 categories)
11
Theme System
Pronunciation tips, collapsible UI (originally Nordic, now Spanish-inspired)
All audit items now complete: P1 (7 HIGH) + P2 (10 MEDIUM) + P3 (7 LOW) = 24/24 findings resolved
2026-02-27: P2 Audit Remediation — 10 MEDIUM Severity Items
Branch: fix/p2-audit-remediation (PR #41)
Scope: All 10 P2 (MEDIUM severity) findings from 2026-02-26 audit (B8-B17)
Method: Direct implementation across 3 sessions on git worktree
Changes:
B15: get_supabase_admin() cached with @lru_cache. clear_supabase_cache() clears both.
B16: Pydantic V2 model_config = ConfigDict(from_attributes=True) replaces V1 class Config.
B11: get_llm() cached with @lru_cache(maxsize=8). clear_llm_cache() for test isolation.
B13: ruff check --select F401,F841,ERA001 — all clean, no dead code.
B10: Graph compilation cached per checkpointer in _graph_cache dict. clear_graph_cache() helper.
B12: Review queries use get_due_for_review() + get_in_rotation_count() instead of get_all() + Python filter.
B9: REST TTS rate-limited with @rate_limited() decorator. WebSocket endpoints use WebSocketMessageRateLimiter sliding window.
B14: Narrowed except Exception to specific types (httpx.HTTPError, ConnectionError, OSError) in voice routes.
B17: python-json-logger dependency, LOG_FORMAT setting, JSON formatter in main.py.
B8: Per-request CSP nonce (secrets.token_urlsafe(16)) replaces unsafe-inline in script-src. All <script> tags across 6 templates use nonce="{{ request.state.csp_nonce }}".
Test isolation: clear_llm_cache(), clear_graph_cache(), clear_supabase_cache() added to autouse reset_settings fixture in conftest.py. Fixed PostgREST mock chaining in csrf_app fixture.
Key new files: tests/api/test_security_headers.py (7 CSP nonce tests)
E2E Validated: Playwright MCP verified chat, lessons, and progress pages load with 0 JS errors. CSP nonce confirmed in response headers and rendered HTML.
2026-02-26: P1 Audit Remediation — 7 HIGH Severity Items
Branch: fix/p1-audit-remediation
Scope: All 7 P1 (HIGH severity) findings from 2026-02-26 audit (B1-B7)
Method: Mix of parallel worktree agents and direct implementation across 2 sessions
Changes:
B1: WebSocket auth — _authenticate_websocket() validates JWT cookie on /ws/transcribe and /ws/speak. Rejects with code 4001.
B2: CSRF middleware — CSRFMiddleware using OWASP custom-header pattern. HX-Request: true (HTMX) or X-Requested-With: XMLHttpRequest (fetch). 15 tests.
B3: VocabularyRepository.get_by_id() — single-row lookup, replaces full-table scan in ReviewService.
B4: Checkpointer docs — documented singleton pattern, dev limitation accepted.
B5: Layer violations — created src/config.py, src/validation.py, src/db/client.py as canonical modules. Old API locations are re-export shims. 8 inner-layer imports fixed.
B6: ReviewService — refactored to use VocabularyRepository exclusively, no more direct client.table() calls.
B7: Lesson completion — extracted business logic from lessons.py (817→468 lines) into src/services/lesson_completion.py.
Key new files: src/config.py, src/validation.py, src/db/client.py, src/services/lesson_completion.py, tests/api/test_csrf.py
Test patches: ~30 test files updated for relocated mock targets (patch paths changed from src.api.routes.lessons.* → src.services.lesson_completion.*, src.api.config.* → src.config.*, etc.)
Phase 4: Checkpointer module, session management, 72 new tests
Phase 5: ADR-001 for Supabase, design doc, found AsyncSqliteSaver bug → MemorySaver fallback
Phase 3: Scaffold node, conditional routing, click-to-insert word banks
2025-01-17: Phases 1-2 + Test Coverage Upgrade
Phase 1: LangGraph StateGraph with respond node, HTMX chat UI
Phase 2: Analyze node for grammar detection, collapsible feedback UI
Test coverage: 37% → 98% (328 → 641 tests)
Notes for Future Agents
Project Status
Phase 19 conversational lessons are complete and merged. The known issue is exercise string matching being too strict (FillBlank/Translate exercises — LLM praises correct answers but badge shows "Not quite").
All 24 audit findings (P1 HIGH + P2 MEDIUM + P3 LOW) are resolved.
Quick Reference
Personality: "Hermano" — friendly big brother tutor (see src/agent/prompts.py)
Language Adapter: LANGUAGE_ADAPTER dict in prompts.py — never use string replacement
Guest Flow: Session cookie (UUID v4 validated) for chat, auth-gated data features (progress, vocab, review)
Checkpointer: PostgresSaver for production, MemorySaver fallback for dev
Key Constraint: lesson_progress stores base IDs without language/level scoping; PathService always scopes calls
Cookie Security: All cookies go through src/api/cookies.py — signed with itsdangerous, environment-aware secure flag
Security Architecture (Post-Audit 2026-03-02)
CSP Nonce: Per-request secrets.token_urlsafe(16) nonce in script-src replaces unsafe-inline. Generated before call_next(), stored on request.state.csp_nonce. All <script> tags use nonce="{{ request.state.csp_nonce }}". unsafe-eval retained for Tailwind CDN eval() dependency.
Cache-Control: SecurityHeadersMiddleware adds Cache-Control for /static/ paths: max-age=3600 (debug), max-age=86400 (production). Non-static paths intentionally omit the header.
.npmrc — npm config: save-exact, no fund/audit prompts (P3)
Key Docs
docs/product.md — What we're building
docs/architecture.md — How we're building it (incl. voice architecture section)
docs/api.md — API endpoint reference (incl. voice WebSocket/REST endpoints)
docs/codebase-summary.md — Full codebase crash course
docs/design/ — Phase-by-phase design documents
docs/adr/ — Architectural Decision Records
Quick Commands
make install # Install dependencies
make install-hooks # Install pre-commit hooks
make dev # Run dev server
make test# Run tests
make check # Run all checks (lint + typecheck)