Analytics/dashboard by namtroi · Pull Request #59 · namtroi/RAGBase

namtroi · 2025-12-26T22:13:21Z

PR Type

Enhancement, Tests, Documentation

Description

Analytics Dashboard: New comprehensive analytics system with overview, processing time breakdown, quality metrics, and documents routes providing RAG pipeline performance insights
Metrics Collection: End-to-end metrics tracking from AI worker through backend callback to analytics endpoints, including timing (queue, conversion, chunking, embedding) and quality aggregation
Chunks Explorer: New API and UI for browsing document chunks with filtering by quality level, chunk type, and content search, plus detailed chunk view modal
Frontend Analytics UI: New analytics dashboard component displaying pipeline funnel visualization across 5 stages with period selector (24h, 7d, 30d) and summary statistics
Multi-file Upload: Enhanced document upload with batch processing (max 3 concurrent, 20 file limit) and real-time progress tracking via UploadStatusPopup
Custom UI Components: New Select dropdown component and useClickOutside hook for improved form consistency across the application
Database Schema: Added ProcessingMetrics model for storing detailed pipeline metrics with one-to-one relationship to Document
Comprehensive Testing: Integration tests for analytics API, chunks API, metrics callbacks, and E2E analytics flow; unit tests for metrics collector in AI worker
Documentation: Updated implementation plan with TDD approach and test specifications; new hybrid search feature documentation

Diagram Walkthrough

flowchart LR
  A["AI Worker<br/>MetricsCollector"] -->|"timing, size,<br/>quality metrics"| B["Callback<br/>ProcessingMetrics"]
  B -->|"upsert metrics"| C["Backend<br/>ProcessingMetrics DB"]
  C -->|"aggregate data"| D["Analytics Routes<br/>overview, processing,<br/>quality, documents"]
  D -->|"fetch metrics"| E["Frontend<br/>Analytics Dashboard"]
  C -->|"query chunks"| F["Chunks API<br/>list & detail"]
  F -->|"fetch chunks"| G["Frontend<br/>Chunks Explorer"]
  H["Document Upload"] -->|"batch process<br/>max 3 concurrent"| A
  H -->|"progress tracking"| I["UploadStatusPopup"]

File Walkthrough

Relevant files

Tests

7 files

analytics-api.test.ts `Analytics API integration tests with metrics validation` apps/backend/tests/integration/routes/analytics-api.test.ts Comprehensive integration tests for analytics API endpoints covering overview, processing, quality, and documents routes Tests for document chunk retrieval with pagination and ordering Helper function `seedDocumentWithMetrics` to create test data with ProcessingMetrics Validates API responses for period filtering (24h, 7d, 30d) and metric calculations	+345/-0
chunks-api.test.ts `Chunks explorer API integration tests with filtering` apps/backend/tests/integration/routes/chunks-api.test.ts Integration tests for chunks explorer API with pagination and filtering Tests for quality filtering (excellent/good/low), chunk type filtering, and content search Helper function `seedChunk` to create test chunks with quality scores and metadata Validates chunk detail endpoint and 404 error handling	+266/-0
callback-metrics.test.ts `Processing metrics callback integration tests` apps/backend/tests/integration/routes/callback-metrics.test.ts Tests for ProcessingMetrics record creation during document callback processing Validates queue time calculation from document creation to processing start Tests user wait time calculation as sum of queue and processing times Tests metrics upsert on retry callbacks and graceful handling of missing metrics	+336/-0
analytics-e2e.test.ts `Analytics end-to-end flow integration tests` apps/backend/tests/integration/routes/analytics-e2e.test.ts End-to-end tests for complete analytics flow from document processing to metrics retrieval Tests ProcessingMetrics creation via callback and reflection in analytics endpoints Validates chunks explorer after document processing with quality filtering Tests processing time breakdown and metrics aggregation	+266/-0
test_metrics.py `Metrics collector unit tests` apps/ai-worker/tests/test_metrics.py Comprehensive test suite for MetricsCollector class with 30+ test cases Tests timing capture for conversion, chunking, and embedding stages Tests size metrics, chunking efficiency metrics, and quality aggregation Tests oversized chunk detection and edge cases like empty chunks and missing metadata	+260/-0
test_text_processor.py `Text processor tests clarification` apps/ai-worker/tests/test_text_processor.py Updated test comments to reflect actual behavior of text processing Added explicit use of JsonConverter for JSON file tests Clarified that MD content is post-processed but structure is preserved Tests now properly instantiate JsonConverter for JSON-specific tests	+15/-8
test_main.py `Main tests pipeline mock updates` apps/ai-worker/tests/test_main.py Updated mock pipeline return values to match new tuple format (chunks, embedding_time_ms) Tests now expect pipeline.run() to return tuple instead of list	+8/-2

Enhancement

34 files

documents-route.ts `Analytics documents and chunks detail routes` apps/backend/src/routes/analytics/documents-route.ts New route `GET /api/analytics/documents` returning paginated list of documents with processing metrics New route `GET /api/analytics/documents/:id/chunks` returning chunks for specific document Supports period filtering (24h, 7d, 30d, all) and sorting by totalTimeMs, avgQualityScore, or createdAt Includes pagination with configurable page and limit parameters	+217/-0
chunks-route.ts `Chunks explorer API routes with filtering` apps/backend/src/routes/chunks/chunks-route.ts New route `GET /api/chunks` for paginated, filterable chunks list with default 20 items per page Supports filtering by documentId, quality level (excellent/good/low), chunk type, quality flags, and content search New route `GET /api/chunks/:id` returning full chunk detail with metadata Returns 404 for non-existent chunks with proper error response	+182/-0
endpoints.ts `Frontend API types and endpoints for analytics` apps/frontend/src/api/endpoints.ts Added TypeScript interfaces for analytics data structures (AnalyticsOverview, AnalyticsProcessing, AnalyticsQuality) Added interfaces for chunk data (ChunkListItem, ChunkDetail) and document metrics New `analyticsApi` object with methods for overview, processing, quality, and documents endpoints New `chunksApi` object with list and detail methods for chunks explorer	+147/-0
processing-route.ts `Analytics processing time breakdown route` apps/backend/src/routes/analytics/processing-route.ts New route `GET /api/analytics/processing` returning processing time breakdown by stage Aggregates average conversion, chunking, embedding, queue, and user wait times Includes trends data grouped by hour (24h) or day (other periods) Supports period filtering with date range calculations	+118/-0
quality-route.ts `Analytics quality metrics route` apps/backend/src/routes/analytics/quality-route.ts New route `GET /api/analytics/quality` returning quality score distribution and flags breakdown Calculates distribution across excellent (>=0.85), good (0.70-0.84), and low (<0.70) categories Aggregates quality flags from ProcessingMetrics with counts Supports period filtering for time-based analysis	+120/-0
callback-route.ts `Callback route ProcessingMetrics creation` apps/backend/src/routes/internal/callback-route.ts Added Phase 5 ProcessingMetrics record creation in callback handler Calculates `queueTimeMs` from document creation to processing start Calculates `userWaitTimeMs` as sum of queue and total processing times Uses upsert to handle retry scenarios, stores all timing and quality metrics	+78/-0
overview-route.ts `Analytics overview dashboard route` apps/backend/src/routes/analytics/overview-route.ts New route `GET /api/analytics/overview` returning summary statistics for dashboard Aggregates total documents, chunks, average processing times, and quality scores Calculates queue time and user wait time averages Supports period filtering (24h, 7d, 30d, all) with date range calculations	+105/-0
callback-validator.ts `Callback validator metrics schema` apps/backend/src/validators/callback-validator.ts Added `MetricsSchema` for validating detailed processing metrics in callbacks Includes timing fields (conversionTimeMs, chunkingTimeMs, embeddingTimeMs, totalTimeMs) Includes size, chunking efficiency, and quality metrics fields Added `ProcessingMetrics` type export for callback payload validation	+22/-0
use-analytics.ts `Frontend analytics custom hooks` apps/frontend/src/hooks/use-analytics.ts New custom hooks for analytics data fetching using React Query `useAnalyticsOverview`, `useAnalyticsProcessing`, `useAnalyticsQuality` hooks with period parameter `useAnalyticsDocuments` hook with pagination and period support `useDocumentChunks` hook for fetching chunks of specific document	+45/-0
app.ts `App registration of analytics routes` apps/backend/src/app.ts Imported analytics routes (overview, processing, quality, documents) from new index Imported chunks route from new index Registered all analytics and chunks routes with protected scope Routes are protected by API key authentication	+10/-1
use-click-outside.ts `Click outside detection utility hook` apps/frontend/src/hooks/use-click-outside.ts New utility hook for detecting clicks outside a referenced element Handles both mouse and touch events Properly cleans up event listeners on unmount Generic type parameter for HTML element type	+27/-0
use-chunks.ts `Frontend chunks explorer custom hooks` apps/frontend/src/hooks/use-chunks.ts New custom hooks for chunks explorer data fetching `useChunksList` hook with optional filtering parameters `useChunkDetail` hook for fetching single chunk with longer cache time Both hooks use React Query for state management	+19/-0
python-worker-mock.ts `Python worker mock metrics support` apps/backend/tests/mocks/python-worker-mock.ts Added optional `metrics` field to ProcessingResult interface Allows mock to return detailed processing metrics in callback responses	+1/-0
metrics.py `Metrics collection for processing pipeline` apps/ai-worker/src/metrics.py New MetricsCollector class for capturing processing pipeline metrics Dataclasses for TimingMetrics, SizeMetrics, ChunkingMetrics, QualitySummary Methods for timing stages, setting size/chunking/quality metrics `to_dict()` method converts metrics to callback payload format	+174/-0
main.py `AI worker metrics collection integration` apps/ai-worker/src/main.py Integrated MetricsCollector into document processing workflow Captures timing for conversion, chunking, and embedding stages Records size metrics (raw bytes and markdown characters) Collects chunking efficiency and quality metrics, includes metrics in callback	+32/-3
pptx_converter.py `PPTX converter slide marker constant` apps/ai-worker/src/converters/pptx_converter.py Extracted slide marker string to constant `SLIDE_MARKER` for maintainability Updated slide marker injection logic to use constant Fixed slide count calculation to use constant Improved code clarity with constant reference instead of magic string	+11/-8
pipeline.py `Pipeline embedding time measurement` apps/ai-worker/src/pipeline.py Modified `run()` method to return tuple of (chunks, embedding_time_ms) Added timing measurement for embedding stage Returns embedding time separately for metrics collection Updated return type annotation and docstring	+11/-7
models.py `Processing result metrics field` apps/ai-worker/src/models.py Added optional `metrics` field to ProcessingResult dataclass Allows detailed processing metrics to be included in callback payload	+2/-0
epub_converter.py `EPUB converter chapter count tracking` apps/ai-worker/src/converters/epub_converter.py Added `chapter_count` field to ProcessorOutput return value Tracks number of chapters extracted from EPUB file	+5/-1
callback.py `Callback metrics payload inclusion` apps/ai-worker/src/callback.py Added `metrics` field to callback payload from ProcessingResult Includes detailed processing metrics in callback sent to backend	+2/-0
presentation_chunker.py `Presentation chunker slide marker update` apps/ai-worker/src/chunkers/presentation_chunker.py Updated slide marker from empty string to constant Improves readability and maintainability of presentation chunking	+1/-1
ChunkCard.tsx `Chunk card display component` apps/frontend/src/components/chunks/ChunkCard.tsx New component for displaying chunk list items with quality visualization Shows filename, chunk index, content preview, and quality score badge Displays quality flags and chunk type metadata Includes "View Details" button for accessing full chunk information	+112/-0
upload-dropzone.tsx `Multi-file batch upload with progress tracking` apps/frontend/src/components/documents/upload-dropzone.tsx Refactored from single-file upload to multi-file batch upload with concurrent processing (max 3 concurrent, 20 file limit) Replaced custom hook `useUploadDocument` with direct `documentsApi.upload` calls and `useQueryClient` for cache invalidation Added `UploadStatusPopup` component displaying real-time upload progress with status indicators (pending, uploading, success, error) Implemented queue processing with concurrency control and automatic document list refresh after uploads complete	+225/-69
AnalyticsPage.tsx `New analytics dashboard with pipeline funnel view` apps/frontend/src/components/analytics/AnalyticsPage.tsx New analytics dashboard component displaying RAG pipeline performance metrics across 5 stages (Upload/Queue, Conversion, Chunking, Quality, Embedding) Implements period selector (24h, 7d, 30d) and summary stat cards with formatted metrics (documents, processing time, quality score, chunks) Pipeline funnel visualization with stage-specific metrics and quality flag breakdown in footer Total pipeline time and user wait time display with custom styling	+261/-0
ChunksExplorerPage.tsx `New chunks explorer with detail modal` apps/frontend/src/components/chunks/ChunksExplorerPage.tsx New chunks explorer page with paginated list view (20 items per page) and filtering by search, quality level, and chunk type `ChunkDetailModal` component showing full chunk metadata (quality score, tokens, position, breadcrumbs, quality flags, content) Quality color coding helper function for visual distinction (excellent/good/low scores) Integration with `chunksApi` for fetching chunks list and individual chunk details	+219/-0
App.tsx `Restructured app navigation with analytics and chunks tabs` apps/frontend/src/App.tsx Reorganized navigation: combined header and tabs into single header with logo, main nav, and settings button Added two new tabs: `analytics` and `chunks` alongside existing documents, drive, query, and settings Updated tab ordering and styling with improved visual hierarchy Added new page content sections for `AnalyticsPage` and `ChunksExplorerPage`	+78/-45
document-filters.tsx `Migrate filters to custom Select component` apps/frontend/src/components/documents/document-filters.tsx Replaced native `elements with custom Select component for consistent styling and functionality Added options definitions for sort, status, availability, connection, and source filters Improved input styling with hover states and focus rings Maintained all existing filter functionality with new component API`	+65/-44
pagination.tsx `Enhance pagination with jump-to-page input` apps/frontend/src/components/documents/pagination.tsx Replaced native `with custom Select component for page size selection Added "jump to page" input field allowing users to type page number and press Enter Improved button styling with transition effects and better visual organization Reorganized pagination controls into logical button groups`	+76/-46
select.tsx `New custom Select component for form inputs` apps/frontend/src/components/ui/select.tsx New reusable `Select` component with dropdown menu, keyboard support, and optional count badges Features click-outside detection via `useClickOutside` hook, animated dropdown, and selected state highlighting Supports disabled state and custom placeholder text Provides consistent styling across the application with primary color theming	+122/-0
schema.prisma `Add ProcessingMetrics schema for analytics data` apps/backend/prisma/schema.prisma Added new `ProcessingMetrics` model to store detailed pipeline metrics (queue time, conversion, chunking, embedding times) Includes size metrics (`rawSizeBytes`, `markdownSizeChars`), chunking efficiency, and aggregated quality data One-to-one relationship with `Document` model with cascade delete Added deprecation comment on `processingMetadata` JSON field, marking it for migration	+48/-2
document-list.tsx `Simplify document list and use Select component` apps/frontend/src/components/documents/document-list.tsx Removed header with refresh button from document list component (moved to parent) Replaced native `with custom Select component for folder filtering Removed unused imports (RefreshCw) and simplified component structure Maintained all filtering and pagination functionality`	+16/-28
search-form.tsx `Redesign search form with horizontal layout` apps/frontend/src/components/query/search-form.tsx Refactored search form layout from vertical to horizontal with flex layout Replaced native `with custom Select component for results count selection Improved input styling with search icon and better focus states Moved loading indicator inside input field instead of button`	+19/-34
ChunkFilters.tsx `New chunk filters component with search and dropdowns` apps/frontend/src/components/chunks/ChunkFilters.tsx New filter component for chunks explorer with search input, quality level filter, and chunk type filter Implements debounced search (300ms) to avoid excessive filter updates Uses custom `Select` component for quality and type dropdowns with optional count badges Exports `ChunkFilterState` interface for type-safe filter management	+88/-0
DriveSyncTab.tsx `Update Drive Sync tab header styling` apps/frontend/src/components/drive/DriveSyncTab.tsx Updated header styling to match new design pattern with icon and description Changed heading from `text-xl` to `text-lg` for consistency with other pages Added `FolderSync` icon to header for visual consistency Improved header layout and typography	+9/-6

Miscellaneous

2 files

index.ts `Analytics routes barrel export` apps/backend/src/routes/analytics/index.ts New barrel export file for analytics routes Exports overview, processing, quality, and documents routes	+4/-0
index.ts `Chunks route barrel export` apps/backend/src/routes/chunks/index.ts New barrel export file for chunks routes Exports chunks route	+1/-0

Configuration changes

1 files

package.json `Development script and dependencies` package.json Added `dev:all` script for running backend, worker, and frontend concurrently Added `concurrently` dependency for parallel process management Script uses docker-compose for services and activates Python virtual environment	+2/-0

Dependencies

1 files

pnpm-lock.yaml `Add concurrently package dependency` pnpm-lock.yaml Added `concurrently@9.2.1` dependency with transitive dependencies (`rxjs`, `shell-quote`, `supports-color`, `tree-kill`) Updated lock file to reflect new package versions and their dependency trees	+44/-0

Documentation

2 files

extension-analytics-dashboard.md `Add TDD test specifications to analytics implementation plan` docs/extension-analytics-dashboard.md Updated implementation phases with TDD approach emphasis and test-first methodology Added detailed test specifications for Phase 1 (metrics collection), Phase 2 (API development), and Phase 4 (E2E tests) Expanded backend implementation details with specific test cases for metrics, analytics endpoints, and chunks API Added note about no frontend tests per project decision, focusing on worker and backend tests	+128/-16
extension-hybrid-search.md `New hybrid search implementation plan documentation` docs/extension-hybrid-search.md New documentation for hybrid search feature combining semantic (vector) and keyword (BM25) search with RRF reranking Includes schema updates for `tsvector` column, backend service implementation, API validator changes, and frontend UI updates Provides verification plan with unit tests, integration tests, and manual testing procedures Specifies implementation order using TDD approach with estimated 4-5 hour timeline	+205/-0

…r tasks

- Add ProcessingMetrics model to schema with timing, size, chunking, quality fields - Add MetricsCollector class in AI worker for timing instrumentation - Update pipeline.py to return embedding timing - Integrate metrics collection into main.py processing flow - Extend callback payload with detailed metrics - Add MetricsSchema to callback-validator.ts - Update callback-route.ts to persist ProcessingMetrics records - Calculate queueTimeMs and userWaitTimeMs in backend - Add 26 unit tests for MetricsCollector (all passing) - Add 8 integration tests for callback metrics

- Implement Analytics API endpoints (overview, processing, quality, documents) - Implement Chunks Explorer API endpoints - Add ProcessingMetrics schema and relation - Implement AI Worker metrics collection (MetricsCollector) - Fix AI Worker tests (PresentationChunker, PPTX, EPUB) - Fix Backend integration tests (Vector syntax, Auth headers)

- Add Analytics Dashboard with Pipeline Funnel view - Add Chunks Explorer with filtering and detail modal - Replace Tremor components with native Tailwind for TailwindCSS v4 compatibility - Add analytics and chunks API endpoints to frontend - Add E2E tests for analytics flow (analytics-e2e.test.ts) - Fix API types to match backend response format

- Remove @tremor/react package completely - Replace Tremor Badge with native Tailwind spans - Replace Tremor TabGroup with native period selector buttons - Sync Drive Sync header font with Documents page - Move Settings tab to right side of navigation - Reduce bundle size (323KB JS, 30KB CSS)

- Rename analytics-dashboard.md to extension-analytics-dashboard.md - Rename hybrid-search-implementation.md to extension-hybrid-search.md - Update chunks filter placeholder: Search -> Filter by text - Add hybrid search implementation plan with TDD approach

- Replace native select elements with custom Select component - implement consistent header design (Icon + Title + Subtitle) across all tabs - Align Drive Sync and Analytics actions with headers - Remove redundant headers and Refresh button - Standardize Search input styling (icon position, no label)

qodo-code-review · 2025-12-26T22:14:33Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Raw SQL injection Description: The trends query uses `Prisma.raw` to inject the `DATE_TRUNC` interval (line 79/85), which can become a SQL-injection vector if `truncInterval` is ever derived from user-controlled input or expanded beyond the current hardcoded `'hour' \| 'day'` values. processing-route.ts [70-88] Referred Code const truncInterval = period === '24h' ? 'hour' : 'day'; const trends = await prisma.$queryRaw<Array<{ date: Date; count: bigint; avg_total_time: number; avg_queue_time: number; }>>( Prisma.sql` SELECT DATE_TRUNC(${Prisma.raw(`'${truncInterval}'`)}, created_at) as date, COUNT(*) as count, AVG(total_time_ms) as avg_total_time, AVG(queue_time_ms) as avg_queue_time FROM processing_metrics WHERE created_at >= ${start} AND created_at <= ${end} GROUP BY DATE_TRUNC(${Prisma.raw(`'${truncInterval}'`)}, created_at) ORDER BY date ASC ` );
	Sensitive data exposure Description: The `GET /api/chunks/:id` endpoint returns full chunk `content` and detailed metadata (including `location`) which can expose sensitive document text to anyone with API access, so access control/scoping should be verified to prevent unintended data leakage across tenants/users. chunks-route.ts [123-180] Referred Code fastify.get<{ Params: { id: string } }>('/api/chunks/:id', async (request, reply) => { const { id } = request.params; const prisma = getPrismaClient(); const chunk = await prisma.chunk.findUnique({ where: { id }, select: { id: true, documentId: true, chunkIndex: true, content: true, charStart: true, charEnd: true, qualityScore: true, qualityFlags: true, chunkType: true, completeness: true, hasTitle: true, breadcrumbs: true, tokenCount: true, location: true, ... (clipped 37 lines)
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Invalid date handling: The code constructs `new Date(m.startedAt)` / `new Date(m.completedAt)` without validating parse success, which can produce `Invalid Date` and lead to `queueTimeMs` becoming `NaN` and being persisted. Referred Code // Calculate queue time from document creation (enqueued) to processing start let queueTimeMs = 0; let startedAt: Date \| null = null; let completedAt: Date \| null = null; if (m.startedAt) { startedAt = new Date(m.startedAt); // Use document.createdAt as enqueuedAt (when job was added to queue) queueTimeMs = Math.max(0, startedAt.getTime() - document.createdAt.getTime()); } if (m.completedAt) { completedAt = new Date(m.completedAt); } const totalTimeMs = m.totalTimeMs \|\| 0; const userWaitTimeMs = queueTimeMs + totalTimeMs; Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Leaky validation errors: The route returns `queryResult.error.message` directly to clients, potentially exposing internal validation/schema details instead of a generic user-facing error. Referred Code fastify.get('/api/analytics/overview', async (request, reply) => { const queryResult = PeriodQuerySchema.safeParse(request.query); if (!queryResult.success) { return reply.status(400).send({ error: 'VALIDATION_ERROR', message: queryResult.error.message, }); } Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Weak timestamp validation: `startedAt` and `completedAt` are accepted as arbitrary strings (not validated as datetimes), enabling malformed inputs that later break processing and can persist invalid derived metrics. Referred Code const MetricsSchema = z.object({ startedAt: z.string().optional(), completedAt: z.string().optional(), conversionTimeMs: z.number().int().nonnegative().optional(), chunkingTimeMs: z.number().int().nonnegative().optional(), embeddingTimeMs: z.number().int().nonnegative().optional(), totalTimeMs: z.number().int().nonnegative().optional(), rawSizeBytes: z.number().int().nonnegative().optional(), markdownSizeChars: z.number().int().nonnegative().optional(), totalChunks: z.number().int().nonnegative().optional(), avgChunkSize: z.number().nonnegative().optional(), oversizedChunks: z.number().int().nonnegative().optional(), avgQualityScore: z.number().min(0).max(1).optional(), qualityFlags: z.record(z.number()).optional(), totalTokens: z.number().int().nonnegative().optional(), }).passthrough(); Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing user context: The new metrics write path logs `processing_metrics_saved` without any user identifier or request context, so audit trails may be insufficient depending on whether global request/user logging exists elsewhere. Referred Code // Phase 5: Create ProcessingMetrics record if (result.metrics) { const m = result.metrics; // Calculate queue time from document creation (enqueued) to processing start let queueTimeMs = 0; let startedAt: Date \| null = null; let completedAt: Date \| null = null; if (m.startedAt) { startedAt = new Date(m.startedAt); // Use document.createdAt as enqueuedAt (when job was added to queue) queueTimeMs = Math.max(0, startedAt.getTime() - document.createdAt.getTime()); } if (m.completedAt) { completedAt = new Date(m.completedAt); } const totalTimeMs = m.totalTimeMs \|\| 0; const userWaitTimeMs = queueTimeMs + totalTimeMs; ... (clipped 55 lines) Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-26T22:15:55Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Separate chunking from embedding timing To fix a double-counting error, explicitly calculate `chunking_time_ms` by subtracting `embedding_time_ms` from the total pipeline duration. apps/ai-worker/src/main.py [154-159] metrics_collector.start_stage() chunks, embedding_time_ms = processing_pipeline.run( output.markdown, category ) -metrics_collector.end_chunking() +pipeline_duration_ms = metrics_collector.end_chunking() +# subtract embedding to get actual chunking time +chunking_time_ms = max(0, pipeline_duration_ms - embedding_time_ms) +metrics_collector._metrics.timing.chunking_time_ms = chunking_time_ms metrics_collector.set_embedding_time(embedding_time_ms) Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug where `embedding_time_ms` was being double-counted in the total processing time. The proposed fix accurately separates chunking and embedding timings, ensuring correct metric calculation.	High
	Refactor upsert and prevent overwriting timestamps Refactor the `upsert` operation to avoid code duplication. Prevent overwriting `enqueuedAt` and `startedAt` timestamps on updates to preserve the accuracy of initial processing metrics during retries. apps/backend/src/routes/internal/callback-route.ts [160-208] const totalTimeMs = m.totalTimeMs \|\| 0; const userWaitTimeMs = queueTimeMs + totalTimeMs; + const metricsData = { + pageCount: result.pageCount, + ocrApplied: result.ocrApplied, + completedAt, + queueTimeMs, + conversionTimeMs: m.conversionTimeMs \|\| 0, + chunkingTimeMs: m.chunkingTimeMs \|\| 0, + embeddingTimeMs: m.embeddingTimeMs \|\| 0, + totalTimeMs, + userWaitTimeMs, + rawSizeBytes: m.rawSizeBytes \|\| 0, + markdownSizeChars: m.markdownSizeChars \|\| 0, + totalChunks: m.totalChunks \|\| result.chunks.length, + avgChunkSize: m.avgChunkSize \|\| 0, + oversizedChunks: m.oversizedChunks \|\| 0, + avgQualityScore: m.avgQualityScore \|\| 0, + qualityFlags: m.qualityFlags \|\| {}, + totalTokens: m.totalTokens \|\| 0, + }; + await prisma.processingMetrics.upsert({ where: { documentId }, - update: { - pageCount: result.pageCount, - ocrApplied: result.ocrApplied, + update: metricsData, + create: { + ...metricsData, + documentId, enqueuedAt: document.createdAt, startedAt, - completedAt, - queueTimeMs, - conversionTimeMs: m.conversionTimeMs \|\| 0, - chunkingTimeMs: m.chunkingTimeMs \|\| 0, - embeddingTimeMs: m.embeddingTimeMs \|\| 0, - totalTimeMs, - userWaitTimeMs, - rawSizeBytes: m.rawSizeBytes \|\| 0, - markdownSizeChars: m.markdownSizeChars \|\| 0, - totalChunks: m.totalChunks \|\| result.chunks.length, - avgChunkSize: m.avgChunkSize \|\| 0, - oversizedChunks: m.oversizedChunks \|\| 0, - avgQualityScore: m.avgQualityScore \|\| 0, - qualityFlags: m.qualityFlags \|\| {}, - totalTokens: m.totalTokens \|\| 0, - }, - create: { - documentId, - pageCount: result.pageCount, - ocrApplied: result.ocrApplied, - enqueuedAt: document.createdAt, - startedAt, - completedAt, - queueTimeMs, - conversionTimeMs: m.conversionTimeMs \|\| 0, - chunkingTimeMs: m.chunkingTimeMs \|\| 0, - embeddingTimeMs: m.embeddingTimeMs \|\| 0, - totalTimeMs, - userWaitTimeMs, - rawSizeBytes: m.rawSizeBytes \|\| 0, - markdownSizeChars: m.markdownSizeChars \|\| 0, - totalChunks: m.totalChunks \|\| result.chunks.length, - avgChunkSize: m.avgChunkSize \|\| 0, - oversizedChunks: m.oversizedChunks \|\| 0, - avgQualityScore: m.avgQualityScore \|\| 0, - qualityFlags: m.qualityFlags \|\| {}, - totalTokens: m.totalTokens \|\| 0, }, }); Apply / Chat Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies a logical flaw where retry callbacks would overwrite initial timestamps, corrupting key metrics like `queueTimeMs`. It also proposes a valid refactoring that improves code maintainability by removing duplication.	Medium
	Correct gradient class typo Correct the Tailwind CSS class `bg-liner-to-r` to `bg-gradient-to-r` to fix a background gradient rendering issue. apps/frontend/src/components/analytics/AnalyticsPage.tsx [241] -<div className="bg-liner-to-r from-blue-50 to-indigo-50 rounded-xl p-6 border border-blue-100"> +<div className="bg-gradient-to-r from-blue-50 to-indigo-50 rounded-xl p-6 border border-blue-100"> Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion corrects a typo in a Tailwind CSS class name from `bg-liner-to-r` to `bg-gradient-to-r`, which fixes a visual bug where the intended background gradient was not being applied.	Medium
	Use value instead of index for state Refactor the state management for the time period selector to use the period's `value` (e.g., '7d') directly, instead of relying on its array `index`. apps/frontend/src/components/analytics/AnalyticsPage.tsx [94-131] export function AnalyticsPage() { - const [periodIndex, setPeriodIndex] = useState(1); // Default to 7d - const period = periods[periodIndex].value; + const [period, setPeriod] = useState<Period>('7d'); // Default to 7d const { data: overview, isLoading: loadingOverview } = useAnalyticsOverview(period); const { data: processing, isLoading: loadingProcessing } = useAnalyticsProcessing(period); const { data: quality, isLoading: loadingQuality } = useAnalyticsQuality(period); const isLoading = loadingOverview \|\| loadingProcessing \|\| loadingQuality; return ( <div className="space-y-6"> {/* Header + Actions /} <div className="flex items-center justify-between"> <div> <h2 className="text-lg font-semibold text-gray-900 flex items-center gap-2"> <BarChart3 className="w-5 h-5 text-gray-400" /> Analytics </h2> <p className="text-sm text-gray-500">Monitor your RAG pipeline performance</p> </div> {/ Period Selector */} <div className="flex items-center bg-gray-100 rounded-lg p-1"> - {periods.map((p, idx) => ( + {periods.map((p) => ( <button key={p.value} - onClick={() => setPeriodIndex(idx)} - className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${periodIndex === idx + onClick={() => setPeriod(p.value)} + className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${period === p.value ? 'bg-white text-gray-900 shadow-sm' : 'text-gray-600 hover:text-gray-900' }`} > {p.label} </button> ))} </div> </div> ... `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: Storing the period `value` in state instead of its `index` makes the component more robust and maintainable by decoupling the state from the `periods` array structure.	Medium
	Default metrics to empty object Ensure the `metrics` field in the callback payload defaults to an empty dictionary (`{}`) instead of `None` to prevent sending null values. apps/ai-worker/src/callback.py [34] -"metrics": result.metrics, +"metrics": result.metrics or {}, Apply / Chat Suggestion importance[1-10]: 7 __ Why: This change makes the callback payload more robust by ensuring `metrics` is always a dictionary, preventing potential `null` values and improving backend compatibility.	Medium
	Improve query function for robustness Refactor the `useQuery` hook for fetching chunk details to ensure the `queryFn` always returns a promise, throwing an error if the ID is missing, which is more idiomatic for TanStack Query. apps/frontend/src/components/chunks/ChunksExplorerPage.tsx [135-140] // Fetch selected chunk detail const { data: selectedChunk } = useQuery({ queryKey: ['chunks', 'detail', selectedChunkId], - queryFn: () => selectedChunkId ? chunksApi.get(selectedChunkId) : null, + queryFn: () => { + if (!selectedChunkId) { + // This should not be reached if `enabled` is correctly set, but it's good practice + return Promise.reject(new Error("No chunk ID selected")); + } + return chunksApi.get(selectedChunkId); + }, enabled: !!selectedChunkId, }); Apply / Chat Suggestion importance[1-10]: 5 __ Why: The suggestion improves the `queryFn` to align with TanStack Query's idiomatic usage, enhancing robustness, although the original code works due to the `enabled` flag.	Low
Possible issue	Ensure consistent document count logic To ensure data consistency, derive the `totalDocuments` count from the `processingMetrics` aggregation instead of querying the `document` table separately. apps/backend/src/routes/analytics/overview-route.ts [77-83] - // Total documents in period - prisma.document.count({ - where: { - status: 'COMPLETED', - createdAt: { gte: start, lte: end }, - }, - }), + // Total documents in period is derived from metrics aggregation + // to ensure consistency with other stats. + // This avoids counting documents created in the period but processed outside of it. + Promise.resolve(metricsAgg._count), Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential data inconsistency issue where `totalDocuments` and other metrics could be based on different sets of records, leading to a subtle bug on the analytics dashboard.	Medium
	Fix incorrect slide marker replacement Update the regex for slide marker replacement to be greedy, ensuring it correctly handles multiple consecutive horizontal rules and prevents incorrect slide separation. apps/ai-worker/src/converters/pptx_converter.py [127-128] - # We look for --- surrounded by newlines - if re.search(r"\n\s---\s\n", markdown): -- return re.sub(r"\n\s---\s\n", "\n\n\n\n", markdown) -+ return re.sub(r"\n\s---\s\n", f"\n\n{SLIDE_MARKER}\n\n", markdown) + # We look for one or more `---` rules surrounded by newlines to handle consecutive separators + if re.search(r"(\n\s---\s)+", markdown): + return re.sub(r"(\n\s---\s)+", f"\n\n{SLIDE_MARKER}\n\n", markdown) # Strategy B: Fallback to H1 Headers # Only if no --- found (e.g., custom template) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a potential bug where consecutive slide separators would not be handled correctly, leading to incorrect chunking. The proposed fix using a greedy regex is accurate.	Low
	Refactor concurrency logic for robustness Refactor the `processQueue` function to use a simpler and more robust worker pool pattern with `Promise.all` instead of the more complex `Promise.race` implementation. apps/frontend/src/components/documents/upload-dropzone.tsx [163-196] // Process queue with concurrency limit const processQueue = useCallback(async (items: UploadItem[]) => { const queue = [...items]; - const active: Promise<void>[] = []; - const startNext = async () => { - if (queue.length === 0) return; - - const item = queue.shift()!; - const promise = uploadFile(item).finally(() => { - const idx = active.indexOf(promise); - if (idx > -1) active.splice(idx, 1); - }); - active.push(promise); + const worker = async () => { + while (queue.length > 0) { + const item = queue.shift(); + if (item) { + await uploadFile(item); + } + } }; - // Initial batch - for (let i = 0; i < Math.min(MAX_CONCURRENT, items.length); i++) { - startNext(); - } - - // Process remaining - while (queue.length > 0 \|\| active.length > 0) { - if (active.length > 0) { - await Promise.race(active); - if (queue.length > 0) { - startNext(); - } - } - } + const workers = Array(MAX_CONCURRENT).fill(null).map(worker); + await Promise.all(workers); // Done - refresh documents list await queryClient.invalidateQueries({ queryKey: ['documents'] }); }, [uploadFile, queryClient]); Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggested worker pool pattern is a more robust and readable way to handle concurrency than the current `Promise.race` implementation, improving code maintainability.	Low
More

namtroi added 9 commits December 25, 2025 21:25

docs: refine analytics plan with TDD approach

3e5894a

docs: Refine Analytics Dashboard plan with TDD test cases and granula…

774f18d

…r tasks

feat: implement analytics dashboard and chunks explorer

26eaa97

namtroi merged commit bde88d3 into main Dec 26, 2025
7 checks passed

qodo-code-review Bot added the Review effort 3/5 label Dec 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analytics/dashboard#59

Analytics/dashboard#59
namtroi merged 9 commits into
mainfrom
analytics/dashboard

namtroi commented Dec 26, 2025 •

edited by qodo-code-review Bot

Loading

Uh oh!

Uh oh!

qodo-code-review Bot commented Dec 26, 2025

Uh oh!

qodo-code-review Bot commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

namtroi commented Dec 26, 2025 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Uh oh!

qodo-code-review Bot commented Dec 26, 2025

PR Compliance Guide 🔍

Uh oh!

qodo-code-review Bot commented Dec 26, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

namtroi commented Dec 26, 2025 •

edited by qodo-code-review Bot

Loading