Skip to content

Analytics/dashboard#59

Merged
namtroi merged 9 commits into
mainfrom
analytics/dashboard
Dec 26, 2025
Merged

Analytics/dashboard#59
namtroi merged 9 commits into
mainfrom
analytics/dashboard

Conversation

@namtroi

@namtroi namtroi commented Dec 26, 2025

Copy link
Copy Markdown
Owner

PR Type

Enhancement, Tests, Documentation


Description

  • Analytics Dashboard: New comprehensive analytics system with overview, processing time breakdown, quality metrics, and documents routes providing RAG pipeline performance insights

  • Metrics Collection: End-to-end metrics tracking from AI worker through backend callback to analytics endpoints, including timing (queue, conversion, chunking, embedding) and quality aggregation

  • Chunks Explorer: New API and UI for browsing document chunks with filtering by quality level, chunk type, and content search, plus detailed chunk view modal

  • Frontend Analytics UI: New analytics dashboard component displaying pipeline funnel visualization across 5 stages with period selector (24h, 7d, 30d) and summary statistics

  • Multi-file Upload: Enhanced document upload with batch processing (max 3 concurrent, 20 file limit) and real-time progress tracking via UploadStatusPopup

  • Custom UI Components: New Select dropdown component and useClickOutside hook for improved form consistency across the application

  • Database Schema: Added ProcessingMetrics model for storing detailed pipeline metrics with one-to-one relationship to Document

  • Comprehensive Testing: Integration tests for analytics API, chunks API, metrics callbacks, and E2E analytics flow; unit tests for metrics collector in AI worker

  • Documentation: Updated implementation plan with TDD approach and test specifications; new hybrid search feature documentation


Diagram Walkthrough

flowchart LR
  A["AI Worker<br/>MetricsCollector"] -->|"timing, size,<br/>quality metrics"| B["Callback<br/>ProcessingMetrics"]
  B -->|"upsert metrics"| C["Backend<br/>ProcessingMetrics DB"]
  C -->|"aggregate data"| D["Analytics Routes<br/>overview, processing,<br/>quality, documents"]
  D -->|"fetch metrics"| E["Frontend<br/>Analytics Dashboard"]
  C -->|"query chunks"| F["Chunks API<br/>list & detail"]
  F -->|"fetch chunks"| G["Frontend<br/>Chunks Explorer"]
  H["Document Upload"] -->|"batch process<br/>max 3 concurrent"| A
  H -->|"progress tracking"| I["UploadStatusPopup"]
Loading

File Walkthrough

Relevant files
Tests
7 files
analytics-api.test.ts
Analytics API integration tests with metrics validation   

apps/backend/tests/integration/routes/analytics-api.test.ts

  • Comprehensive integration tests for analytics API endpoints covering
    overview, processing, quality, and documents routes
  • Tests for document chunk retrieval with pagination and ordering
  • Helper function seedDocumentWithMetrics to create test data with
    ProcessingMetrics
  • Validates API responses for period filtering (24h, 7d, 30d) and metric
    calculations
+345/-0 
chunks-api.test.ts
Chunks explorer API integration tests with filtering         

apps/backend/tests/integration/routes/chunks-api.test.ts

  • Integration tests for chunks explorer API with pagination and
    filtering
  • Tests for quality filtering (excellent/good/low), chunk type
    filtering, and content search
  • Helper function seedChunk to create test chunks with quality scores
    and metadata
  • Validates chunk detail endpoint and 404 error handling
+266/-0 
callback-metrics.test.ts
Processing metrics callback integration tests                       

apps/backend/tests/integration/routes/callback-metrics.test.ts

  • Tests for ProcessingMetrics record creation during document callback
    processing
  • Validates queue time calculation from document creation to processing
    start
  • Tests user wait time calculation as sum of queue and processing times
  • Tests metrics upsert on retry callbacks and graceful handling of
    missing metrics
+336/-0 
analytics-e2e.test.ts
Analytics end-to-end flow integration tests                           

apps/backend/tests/integration/routes/analytics-e2e.test.ts

  • End-to-end tests for complete analytics flow from document processing
    to metrics retrieval
  • Tests ProcessingMetrics creation via callback and reflection in
    analytics endpoints
  • Validates chunks explorer after document processing with quality
    filtering
  • Tests processing time breakdown and metrics aggregation
+266/-0 
test_metrics.py
Metrics collector unit tests                                                         

apps/ai-worker/tests/test_metrics.py

  • Comprehensive test suite for MetricsCollector class with 30+ test
    cases
  • Tests timing capture for conversion, chunking, and embedding stages
  • Tests size metrics, chunking efficiency metrics, and quality
    aggregation
  • Tests oversized chunk detection and edge cases like empty chunks and
    missing metadata
+260/-0 
test_text_processor.py
Text processor tests clarification                                             

apps/ai-worker/tests/test_text_processor.py

  • Updated test comments to reflect actual behavior of text processing
  • Added explicit use of JsonConverter for JSON file tests
  • Clarified that MD content is post-processed but structure is preserved
  • Tests now properly instantiate JsonConverter for JSON-specific tests
+15/-8   
test_main.py
Main tests pipeline mock updates                                                 

apps/ai-worker/tests/test_main.py

  • Updated mock pipeline return values to match new tuple format (chunks,
    embedding_time_ms)
  • Tests now expect pipeline.run() to return tuple instead of list
+8/-2     
Enhancement
34 files
documents-route.ts
Analytics documents and chunks detail routes                         

apps/backend/src/routes/analytics/documents-route.ts

  • New route GET /api/analytics/documents returning paginated list of
    documents with processing metrics
  • New route GET /api/analytics/documents/:id/chunks returning chunks for
    specific document
  • Supports period filtering (24h, 7d, 30d, all) and sorting by
    totalTimeMs, avgQualityScore, or createdAt
  • Includes pagination with configurable page and limit parameters
+217/-0 
chunks-route.ts
Chunks explorer API routes with filtering                               

apps/backend/src/routes/chunks/chunks-route.ts

  • New route GET /api/chunks for paginated, filterable chunks list with
    default 20 items per page
  • Supports filtering by documentId, quality level (excellent/good/low),
    chunk type, quality flags, and content search
  • New route GET /api/chunks/:id returning full chunk detail with
    metadata
  • Returns 404 for non-existent chunks with proper error response
+182/-0 
endpoints.ts
Frontend API types and endpoints for analytics                     

apps/frontend/src/api/endpoints.ts

  • Added TypeScript interfaces for analytics data structures
    (AnalyticsOverview, AnalyticsProcessing, AnalyticsQuality)
  • Added interfaces for chunk data (ChunkListItem, ChunkDetail) and
    document metrics
  • New analyticsApi object with methods for overview, processing,
    quality, and documents endpoints
  • New chunksApi object with list and detail methods for chunks explorer
+147/-0 
processing-route.ts
Analytics processing time breakdown route                               

apps/backend/src/routes/analytics/processing-route.ts

  • New route GET /api/analytics/processing returning processing time
    breakdown by stage
  • Aggregates average conversion, chunking, embedding, queue, and user
    wait times
  • Includes trends data grouped by hour (24h) or day (other periods)
  • Supports period filtering with date range calculations
+118/-0 
quality-route.ts
Analytics quality metrics route                                                   

apps/backend/src/routes/analytics/quality-route.ts

  • New route GET /api/analytics/quality returning quality score
    distribution and flags breakdown
  • Calculates distribution across excellent (>=0.85), good (0.70-0.84),
    and low (<0.70) categories
  • Aggregates quality flags from ProcessingMetrics with counts
  • Supports period filtering for time-based analysis
+120/-0 
callback-route.ts
Callback route ProcessingMetrics creation                               

apps/backend/src/routes/internal/callback-route.ts

  • Added Phase 5 ProcessingMetrics record creation in callback handler
  • Calculates queueTimeMs from document creation to processing start
  • Calculates userWaitTimeMs as sum of queue and total processing times
  • Uses upsert to handle retry scenarios, stores all timing and quality
    metrics
+78/-0   
overview-route.ts
Analytics overview dashboard route                                             

apps/backend/src/routes/analytics/overview-route.ts

  • New route GET /api/analytics/overview returning summary statistics for
    dashboard
  • Aggregates total documents, chunks, average processing times, and
    quality scores
  • Calculates queue time and user wait time averages
  • Supports period filtering (24h, 7d, 30d, all) with date range
    calculations
+105/-0 
callback-validator.ts
Callback validator metrics schema                                               

apps/backend/src/validators/callback-validator.ts

  • Added MetricsSchema for validating detailed processing metrics in
    callbacks
  • Includes timing fields (conversionTimeMs, chunkingTimeMs,
    embeddingTimeMs, totalTimeMs)
  • Includes size, chunking efficiency, and quality metrics fields
  • Added ProcessingMetrics type export for callback payload validation
+22/-0   
use-analytics.ts
Frontend analytics custom hooks                                                   

apps/frontend/src/hooks/use-analytics.ts

  • New custom hooks for analytics data fetching using React Query
  • useAnalyticsOverview, useAnalyticsProcessing, useAnalyticsQuality
    hooks with period parameter
  • useAnalyticsDocuments hook with pagination and period support
  • useDocumentChunks hook for fetching chunks of specific document
+45/-0   
app.ts
App registration of analytics routes                                         

apps/backend/src/app.ts

  • Imported analytics routes (overview, processing, quality, documents)
    from new index
  • Imported chunks route from new index
  • Registered all analytics and chunks routes with protected scope
  • Routes are protected by API key authentication
+10/-1   
use-click-outside.ts
Click outside detection utility hook                                         

apps/frontend/src/hooks/use-click-outside.ts

  • New utility hook for detecting clicks outside a referenced element
  • Handles both mouse and touch events
  • Properly cleans up event listeners on unmount
  • Generic type parameter for HTML element type
+27/-0   
use-chunks.ts
Frontend chunks explorer custom hooks                                       

apps/frontend/src/hooks/use-chunks.ts

  • New custom hooks for chunks explorer data fetching
  • useChunksList hook with optional filtering parameters
  • useChunkDetail hook for fetching single chunk with longer cache time
  • Both hooks use React Query for state management
+19/-0   
python-worker-mock.ts
Python worker mock metrics support                                             

apps/backend/tests/mocks/python-worker-mock.ts

  • Added optional metrics field to ProcessingResult interface
  • Allows mock to return detailed processing metrics in callback
    responses
+1/-0     
metrics.py
Metrics collection for processing pipeline                             

apps/ai-worker/src/metrics.py

  • New MetricsCollector class for capturing processing pipeline metrics
  • Dataclasses for TimingMetrics, SizeMetrics, ChunkingMetrics,
    QualitySummary
  • Methods for timing stages, setting size/chunking/quality metrics
  • to_dict() method converts metrics to callback payload format
+174/-0 
main.py
AI worker metrics collection integration                                 

apps/ai-worker/src/main.py

  • Integrated MetricsCollector into document processing workflow
  • Captures timing for conversion, chunking, and embedding stages
  • Records size metrics (raw bytes and markdown characters)
  • Collects chunking efficiency and quality metrics, includes metrics in
    callback
+32/-3   
pptx_converter.py
PPTX converter slide marker constant                                         

apps/ai-worker/src/converters/pptx_converter.py

  • Extracted slide marker string to constant SLIDE_MARKER for
    maintainability
  • Updated slide marker injection logic to use constant
  • Fixed slide count calculation to use constant
  • Improved code clarity with constant reference instead of magic string
+11/-8   
pipeline.py
Pipeline embedding time measurement                                           

apps/ai-worker/src/pipeline.py

  • Modified run() method to return tuple of (chunks, embedding_time_ms)
  • Added timing measurement for embedding stage
  • Returns embedding time separately for metrics collection
  • Updated return type annotation and docstring
+11/-7   
models.py
Processing result metrics field                                                   

apps/ai-worker/src/models.py

  • Added optional metrics field to ProcessingResult dataclass
  • Allows detailed processing metrics to be included in callback payload
+2/-0     
epub_converter.py
EPUB converter chapter count tracking                                       

apps/ai-worker/src/converters/epub_converter.py

  • Added chapter_count field to ProcessorOutput return value
  • Tracks number of chapters extracted from EPUB file
+5/-1     
callback.py
Callback metrics payload inclusion                                             

apps/ai-worker/src/callback.py

  • Added metrics field to callback payload from ProcessingResult
  • Includes detailed processing metrics in callback sent to backend
+2/-0     
presentation_chunker.py
Presentation chunker slide marker update                                 

apps/ai-worker/src/chunkers/presentation_chunker.py

  • Updated slide marker from empty string to constant
  • Improves readability and maintainability of presentation chunking
+1/-1     
ChunkCard.tsx
Chunk card display component                                                         

apps/frontend/src/components/chunks/ChunkCard.tsx

  • New component for displaying chunk list items with quality
    visualization
  • Shows filename, chunk index, content preview, and quality score badge
  • Displays quality flags and chunk type metadata
  • Includes "View Details" button for accessing full chunk information
+112/-0 
upload-dropzone.tsx
Multi-file batch upload with progress tracking                     

apps/frontend/src/components/documents/upload-dropzone.tsx

  • Refactored from single-file upload to multi-file batch upload with
    concurrent processing (max 3 concurrent, 20 file limit)
  • Replaced custom hook useUploadDocument with direct documentsApi.upload
    calls and useQueryClient for cache invalidation
  • Added UploadStatusPopup component displaying real-time upload progress
    with status indicators (pending, uploading, success, error)
  • Implemented queue processing with concurrency control and automatic
    document list refresh after uploads complete
+225/-69
AnalyticsPage.tsx
New analytics dashboard with pipeline funnel view               

apps/frontend/src/components/analytics/AnalyticsPage.tsx

  • New analytics dashboard component displaying RAG pipeline performance
    metrics across 5 stages (Upload/Queue, Conversion, Chunking, Quality,
    Embedding)
  • Implements period selector (24h, 7d, 30d) and summary stat cards with
    formatted metrics (documents, processing time, quality score, chunks)
  • Pipeline funnel visualization with stage-specific metrics and quality
    flag breakdown in footer
  • Total pipeline time and user wait time display with custom styling
+261/-0 
ChunksExplorerPage.tsx
New chunks explorer with detail modal                                       

apps/frontend/src/components/chunks/ChunksExplorerPage.tsx

  • New chunks explorer page with paginated list view (20 items per page)
    and filtering by search, quality level, and chunk type
  • ChunkDetailModal component showing full chunk metadata (quality score,
    tokens, position, breadcrumbs, quality flags, content)
  • Quality color coding helper function for visual distinction
    (excellent/good/low scores)
  • Integration with chunksApi for fetching chunks list and individual
    chunk details
+219/-0 
App.tsx
Restructured app navigation with analytics and chunks tabs

apps/frontend/src/App.tsx

  • Reorganized navigation: combined header and tabs into single header
    with logo, main nav, and settings button
  • Added two new tabs: analytics and chunks alongside existing documents,
    drive, query, and settings
  • Updated tab ordering and styling with improved visual hierarchy
  • Added new page content sections for AnalyticsPage and
    ChunksExplorerPage
+78/-45 
document-filters.tsx
Migrate filters to custom Select component                             

apps/frontend/src/components/documents/document-filters.tsx

  • Replaced native elements with custom Select component for consistent styling and functionality Added options definitions for sort, status, availability, connection, and source filters Improved input styling with hover states and focus rings Maintained all existing filter functionality with new component API
+65/-44 
pagination.tsx
Enhance pagination with jump-to-page input                             

apps/frontend/src/components/documents/pagination.tsx

  • Replaced native with custom Select component for page size selection Added "jump to page" input field allowing users to type page number and press Enter Improved button styling with transition effects and better visual organization Reorganized pagination controls into logical button groups
+76/-46 
select.tsx
New custom Select component for form inputs                           

apps/frontend/src/components/ui/select.tsx

  • New reusable Select component with dropdown menu, keyboard support,
    and optional count badges
  • Features click-outside detection via useClickOutside hook, animated
    dropdown, and selected state highlighting
  • Supports disabled state and custom placeholder text
  • Provides consistent styling across the application with primary color
    theming
+122/-0 
schema.prisma
Add ProcessingMetrics schema for analytics data                   

apps/backend/prisma/schema.prisma

  • Added new ProcessingMetrics model to store detailed pipeline metrics
    (queue time, conversion, chunking, embedding times)
  • Includes size metrics (rawSizeBytes, markdownSizeChars), chunking
    efficiency, and aggregated quality data
  • One-to-one relationship with Document model with cascade delete
  • Added deprecation comment on processingMetadata JSON field, marking it
    for migration
+48/-2   
document-list.tsx
Simplify document list and use Select component                   

apps/frontend/src/components/documents/document-list.tsx

  • Removed header with refresh button from document list component (moved
    to parent)
  • Replaced native with custom Select component for folder filtering Removed unused imports (RefreshCw) and simplified component structure Maintained all filtering and pagination functionality
+16/-28 
search-form.tsx
Redesign search form with horizontal layout                           

apps/frontend/src/components/query/search-form.tsx

  • Refactored search form layout from vertical to horizontal with flex
    layout
  • Replaced native with custom Select component for results count selection Improved input styling with search icon and better focus states Moved loading indicator inside input field instead of button
+19/-34 
ChunkFilters.tsx
New chunk filters component with search and dropdowns       

apps/frontend/src/components/chunks/ChunkFilters.tsx

  • New filter component for chunks explorer with search input, quality
    level filter, and chunk type filter
  • Implements debounced search (300ms) to avoid excessive filter updates
  • Uses custom Select component for quality and type dropdowns with
    optional count badges
  • Exports ChunkFilterState interface for type-safe filter management
+88/-0   
DriveSyncTab.tsx
Update Drive Sync tab header styling                                         

apps/frontend/src/components/drive/DriveSyncTab.tsx

  • Updated header styling to match new design pattern with icon and
    description
  • Changed heading from text-xl to text-lg for consistency with other
    pages
  • Added FolderSync icon to header for visual consistency
  • Improved header layout and typography
+9/-6     
Miscellaneous
2 files
index.ts
Analytics routes barrel export                                                     

apps/backend/src/routes/analytics/index.ts

  • New barrel export file for analytics routes
  • Exports overview, processing, quality, and documents routes
+4/-0     
index.ts
Chunks route barrel export                                                             

apps/backend/src/routes/chunks/index.ts

  • New barrel export file for chunks routes
  • Exports chunks route
+1/-0     
Configuration changes
1 files
package.json
Development script and dependencies                                           

package.json

  • Added dev:all script for running backend, worker, and frontend
    concurrently
  • Added concurrently dependency for parallel process management
  • Script uses docker-compose for services and activates Python virtual
    environment
+2/-0     
Dependencies
1 files
pnpm-lock.yaml
Add concurrently package dependency                                           

pnpm-lock.yaml

  • Added concurrently@9.2.1 dependency with transitive dependencies
    (rxjs, shell-quote, supports-color, tree-kill)
  • Updated lock file to reflect new package versions and their dependency
    trees
+44/-0   
Documentation
2 files
extension-analytics-dashboard.md
Add TDD test specifications to analytics implementation plan

docs/extension-analytics-dashboard.md

  • Updated implementation phases with TDD approach emphasis and
    test-first methodology
  • Added detailed test specifications for Phase 1 (metrics collection),
    Phase 2 (API development), and Phase 4 (E2E tests)
  • Expanded backend implementation details with specific test cases for
    metrics, analytics endpoints, and chunks API
  • Added note about no frontend tests per project decision, focusing on
    worker and backend tests
+128/-16
extension-hybrid-search.md
New hybrid search implementation plan documentation           

docs/extension-hybrid-search.md

  • New documentation for hybrid search feature combining semantic
    (vector) and keyword (BM25) search with RRF reranking
  • Includes schema updates for tsvector column, backend service
    implementation, API validator changes, and frontend UI updates
  • Provides verification plan with unit tests, integration tests, and
    manual testing procedures
  • Specifies implementation order using TDD approach with estimated 4-5
    hour timeline
+205/-0 

- Add ProcessingMetrics model to schema with timing, size, chunking, quality fields
- Add MetricsCollector class in AI worker for timing instrumentation
- Update pipeline.py to return embedding timing
- Integrate metrics collection into main.py processing flow
- Extend callback payload with detailed metrics
- Add MetricsSchema to callback-validator.ts
- Update callback-route.ts to persist ProcessingMetrics records
- Calculate queueTimeMs and userWaitTimeMs in backend
- Add 26 unit tests for MetricsCollector (all passing)
- Add 8 integration tests for callback metrics
- Implement Analytics API endpoints (overview, processing, quality, documents)
- Implement Chunks Explorer API endpoints
- Add ProcessingMetrics schema and relation
- Implement AI Worker metrics collection (MetricsCollector)
- Fix AI Worker tests (PresentationChunker, PPTX, EPUB)
- Fix Backend integration tests (Vector syntax, Auth headers)
- Add Analytics Dashboard with Pipeline Funnel view
- Add Chunks Explorer with filtering and detail modal
- Replace Tremor components with native Tailwind for TailwindCSS v4 compatibility
- Add analytics and chunks API endpoints to frontend
- Add E2E tests for analytics flow (analytics-e2e.test.ts)
- Fix API types to match backend response format
- Remove @tremor/react package completely
- Replace Tremor Badge with native Tailwind spans
- Replace Tremor TabGroup with native period selector buttons
- Sync Drive Sync header font with Documents page
- Move Settings tab to right side of navigation
- Reduce bundle size (323KB JS, 30KB CSS)
- Rename analytics-dashboard.md to extension-analytics-dashboard.md
- Rename hybrid-search-implementation.md to extension-hybrid-search.md
- Update chunks filter placeholder: Search -> Filter by text
- Add hybrid search implementation plan with TDD approach
- Replace native select elements with custom Select component
- implement consistent header design (Icon + Title + Subtitle) across all tabs
- Align Drive Sync and Analytics actions with headers
- Remove redundant headers and Refresh button
- Standardize Search input styling (icon position, no label)
@namtroi namtroi merged commit bde88d3 into main Dec 26, 2025
7 checks passed
@qodo-code-review

Copy link
Copy Markdown

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Raw SQL injection

Description: The trends query uses Prisma.raw to inject the DATE_TRUNC interval (line 79/85), which can
become a SQL-injection vector if truncInterval is ever derived from user-controlled input
or expanded beyond the current hardcoded 'hour' | 'day' values.
processing-route.ts [70-88]

Referred Code
const truncInterval = period === '24h' ? 'hour' : 'day';
const trends = await prisma.$queryRaw<Array<{
  date: Date;
  count: bigint;
  avg_total_time: number;
  avg_queue_time: number;
}>>(
  Prisma.sql`
    SELECT 
      DATE_TRUNC(${Prisma.raw(`'${truncInterval}'`)}, created_at) as date,
      COUNT(*) as count,
      AVG(total_time_ms) as avg_total_time,
      AVG(queue_time_ms) as avg_queue_time
    FROM processing_metrics
    WHERE created_at >= ${start} AND created_at <= ${end}
    GROUP BY DATE_TRUNC(${Prisma.raw(`'${truncInterval}'`)}, created_at)
    ORDER BY date ASC
  `
);
Sensitive data exposure

Description: The GET /api/chunks/:id endpoint returns full chunk content and detailed metadata
(including location) which can expose sensitive document text to anyone with API access,
so access control/scoping should be verified to prevent unintended data leakage across
tenants/users.
chunks-route.ts [123-180]

Referred Code
fastify.get<{ Params: { id: string } }>('/api/chunks/:id', async (request, reply) => {
  const { id } = request.params;
  const prisma = getPrismaClient();

  const chunk = await prisma.chunk.findUnique({
    where: { id },
    select: {
      id: true,
      documentId: true,
      chunkIndex: true,
      content: true,
      charStart: true,
      charEnd: true,
      qualityScore: true,
      qualityFlags: true,
      chunkType: true,
      completeness: true,
      hasTitle: true,
      breadcrumbs: true,
      tokenCount: true,
      location: true,


 ... (clipped 37 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Invalid date handling: The code constructs new Date(m.startedAt) / new Date(m.completedAt) without validating
parse success, which can produce Invalid Date and lead to queueTimeMs becoming NaN and
being persisted.

Referred Code
// Calculate queue time from document creation (enqueued) to processing start
let queueTimeMs = 0;
let startedAt: Date | null = null;
let completedAt: Date | null = null;

if (m.startedAt) {
  startedAt = new Date(m.startedAt);
  // Use document.createdAt as enqueuedAt (when job was added to queue)
  queueTimeMs = Math.max(0, startedAt.getTime() - document.createdAt.getTime());
}

if (m.completedAt) {
  completedAt = new Date(m.completedAt);
}

const totalTimeMs = m.totalTimeMs || 0;
const userWaitTimeMs = queueTimeMs + totalTimeMs;

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Leaky validation errors: The route returns queryResult.error.message directly to clients, potentially exposing
internal validation/schema details instead of a generic user-facing error.

Referred Code
fastify.get('/api/analytics/overview', async (request, reply) => {
  const queryResult = PeriodQuerySchema.safeParse(request.query);

  if (!queryResult.success) {
    return reply.status(400).send({
      error: 'VALIDATION_ERROR',
      message: queryResult.error.message,
    });
  }

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Weak timestamp validation: startedAt and completedAt are accepted as arbitrary strings (not validated as datetimes),
enabling malformed inputs that later break processing and can persist invalid derived
metrics.

Referred Code
const MetricsSchema = z.object({
  startedAt: z.string().optional(),
  completedAt: z.string().optional(),
  conversionTimeMs: z.number().int().nonnegative().optional(),
  chunkingTimeMs: z.number().int().nonnegative().optional(),
  embeddingTimeMs: z.number().int().nonnegative().optional(),
  totalTimeMs: z.number().int().nonnegative().optional(),
  rawSizeBytes: z.number().int().nonnegative().optional(),
  markdownSizeChars: z.number().int().nonnegative().optional(),
  totalChunks: z.number().int().nonnegative().optional(),
  avgChunkSize: z.number().nonnegative().optional(),
  oversizedChunks: z.number().int().nonnegative().optional(),
  avgQualityScore: z.number().min(0).max(1).optional(),
  qualityFlags: z.record(z.number()).optional(),
  totalTokens: z.number().int().nonnegative().optional(),
}).passthrough();

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing user context: The new metrics write path logs processing_metrics_saved without any user identifier or
request context, so audit trails may be insufficient depending on whether global
request/user logging exists elsewhere.

Referred Code
// Phase 5: Create ProcessingMetrics record
if (result.metrics) {
  const m = result.metrics;

  // Calculate queue time from document creation (enqueued) to processing start
  let queueTimeMs = 0;
  let startedAt: Date | null = null;
  let completedAt: Date | null = null;

  if (m.startedAt) {
    startedAt = new Date(m.startedAt);
    // Use document.createdAt as enqueuedAt (when job was added to queue)
    queueTimeMs = Math.max(0, startedAt.getTime() - document.createdAt.getTime());
  }

  if (m.completedAt) {
    completedAt = new Date(m.completedAt);
  }

  const totalTimeMs = m.totalTimeMs || 0;
  const userWaitTimeMs = queueTimeMs + totalTimeMs;


 ... (clipped 55 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review

Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Separate chunking from embedding timing

To fix a double-counting error, explicitly calculate chunking_time_ms by
subtracting embedding_time_ms from the total pipeline duration.

apps/ai-worker/src/main.py [154-159]

 metrics_collector.start_stage()
 chunks, embedding_time_ms = processing_pipeline.run(
     output.markdown, category
 )
-metrics_collector.end_chunking()
+pipeline_duration_ms = metrics_collector.end_chunking()
+# subtract embedding to get actual chunking time
+chunking_time_ms = max(0, pipeline_duration_ms - embedding_time_ms)
+metrics_collector._metrics.timing.chunking_time_ms = chunking_time_ms
 metrics_collector.set_embedding_time(embedding_time_ms)
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug where embedding_time_ms was being double-counted in the total processing time. The proposed fix accurately separates chunking and embedding timings, ensuring correct metric calculation.

High
Refactor upsert and prevent overwriting timestamps

Refactor the upsert operation to avoid code duplication. Prevent overwriting
enqueuedAt and startedAt timestamps on updates to preserve the accuracy of
initial processing metrics during retries.

apps/backend/src/routes/internal/callback-route.ts [160-208]

           const totalTimeMs = m.totalTimeMs || 0;
           const userWaitTimeMs = queueTimeMs + totalTimeMs;
 
+          const metricsData = {
+            pageCount: result.pageCount,
+            ocrApplied: result.ocrApplied,
+            completedAt,
+            queueTimeMs,
+            conversionTimeMs: m.conversionTimeMs || 0,
+            chunkingTimeMs: m.chunkingTimeMs || 0,
+            embeddingTimeMs: m.embeddingTimeMs || 0,
+            totalTimeMs,
+            userWaitTimeMs,
+            rawSizeBytes: m.rawSizeBytes || 0,
+            markdownSizeChars: m.markdownSizeChars || 0,
+            totalChunks: m.totalChunks || result.chunks.length,
+            avgChunkSize: m.avgChunkSize || 0,
+            oversizedChunks: m.oversizedChunks || 0,
+            avgQualityScore: m.avgQualityScore || 0,
+            qualityFlags: m.qualityFlags || {},
+            totalTokens: m.totalTokens || 0,
+          };
+
           await prisma.processingMetrics.upsert({
             where: { documentId },
-            update: {
-              pageCount: result.pageCount,
-              ocrApplied: result.ocrApplied,
+            update: metricsData,
+            create: {
+              ...metricsData,
+              documentId,
               enqueuedAt: document.createdAt,
               startedAt,
-              completedAt,
-              queueTimeMs,
-              conversionTimeMs: m.conversionTimeMs || 0,
-              chunkingTimeMs: m.chunkingTimeMs || 0,
-              embeddingTimeMs: m.embeddingTimeMs || 0,
-              totalTimeMs,
-              userWaitTimeMs,
-              rawSizeBytes: m.rawSizeBytes || 0,
-              markdownSizeChars: m.markdownSizeChars || 0,
-              totalChunks: m.totalChunks || result.chunks.length,
-              avgChunkSize: m.avgChunkSize || 0,
-              oversizedChunks: m.oversizedChunks || 0,
-              avgQualityScore: m.avgQualityScore || 0,
-              qualityFlags: m.qualityFlags || {},
-              totalTokens: m.totalTokens || 0,
-            },
-            create: {
-              documentId,
-              pageCount: result.pageCount,
-              ocrApplied: result.ocrApplied,
-              enqueuedAt: document.createdAt,
-              startedAt,
-              completedAt,
-              queueTimeMs,
-              conversionTimeMs: m.conversionTimeMs || 0,
-              chunkingTimeMs: m.chunkingTimeMs || 0,
-              embeddingTimeMs: m.embeddingTimeMs || 0,
-              totalTimeMs,
-              userWaitTimeMs,
-              rawSizeBytes: m.rawSizeBytes || 0,
-              markdownSizeChars: m.markdownSizeChars || 0,
-              totalChunks: m.totalChunks || result.chunks.length,
-              avgChunkSize: m.avgChunkSize || 0,
-              oversizedChunks: m.oversizedChunks || 0,
-              avgQualityScore: m.avgQualityScore || 0,
-              qualityFlags: m.qualityFlags || {},
-              totalTokens: m.totalTokens || 0,
             },
           });
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies a logical flaw where retry callbacks would overwrite initial timestamps, corrupting key metrics like queueTimeMs. It also proposes a valid refactoring that improves code maintainability by removing duplication.

Medium
Correct gradient class typo

Correct the Tailwind CSS class bg-liner-to-r to bg-gradient-to-r to fix a
background gradient rendering issue.

apps/frontend/src/components/analytics/AnalyticsPage.tsx [241]

-<div className="bg-liner-to-r from-blue-50 to-indigo-50 rounded-xl p-6 border border-blue-100">
+<div className="bg-gradient-to-r from-blue-50 to-indigo-50 rounded-xl p-6 border border-blue-100">
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion corrects a typo in a Tailwind CSS class name from bg-liner-to-r to bg-gradient-to-r, which fixes a visual bug where the intended background gradient was not being applied.

Medium
Use value instead of index for state

Refactor the state management for the time period selector to use the period's
value (e.g., '7d') directly, instead of relying on its array index.

apps/frontend/src/components/analytics/AnalyticsPage.tsx [94-131]

 export function AnalyticsPage() {
-    const [periodIndex, setPeriodIndex] = useState(1); // Default to 7d
-    const period = periods[periodIndex].value;
+    const [period, setPeriod] = useState<Period>('7d'); // Default to 7d
 
     const { data: overview, isLoading: loadingOverview } = useAnalyticsOverview(period);
     const { data: processing, isLoading: loadingProcessing } = useAnalyticsProcessing(period);
     const { data: quality, isLoading: loadingQuality } = useAnalyticsQuality(period);
 
     const isLoading = loadingOverview || loadingProcessing || loadingQuality;
 
     return (
         <div className="space-y-6">
             {/* Header + Actions */}
             <div className="flex items-center justify-between">
                 <div>
                     <h2 className="text-lg font-semibold text-gray-900 flex items-center gap-2">
                         <BarChart3 className="w-5 h-5 text-gray-400" />
                         Analytics
                     </h2>
                     <p className="text-sm text-gray-500">Monitor your RAG pipeline performance</p>
                 </div>
 
                 {/* Period Selector */}
                 <div className="flex items-center bg-gray-100 rounded-lg p-1">
-                    {periods.map((p, idx) => (
+                    {periods.map((p) => (
                         <button
                             key={p.value}
-                            onClick={() => setPeriodIndex(idx)}
-                            className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${periodIndex === idx
+                            onClick={() => setPeriod(p.value)}
+                            className={`px-3 py-1.5 text-sm font-medium rounded-md transition-colors ${period === p.value
                                 ? 'bg-white text-gray-900 shadow-sm'
                                 : 'text-gray-600 hover:text-gray-900'
                                 }`}
                         >
                             {p.label}
                         </button>
                     ))}
                 </div>
             </div>
 ...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: Storing the period value in state instead of its index makes the component more robust and maintainable by decoupling the state from the periods array structure.

Medium
Default metrics to empty object

Ensure the metrics field in the callback payload defaults to an empty dictionary
({}) instead of None to prevent sending null values.

apps/ai-worker/src/callback.py [34]

-"metrics": result.metrics,
+"metrics": result.metrics or {},
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: This change makes the callback payload more robust by ensuring metrics is always a dictionary, preventing potential null values and improving backend compatibility.

Medium
Improve query function for robustness

Refactor the useQuery hook for fetching chunk details to ensure the queryFn
always returns a promise, throwing an error if the ID is missing, which is more
idiomatic for TanStack Query.

apps/frontend/src/components/chunks/ChunksExplorerPage.tsx [135-140]

 // Fetch selected chunk detail
 const { data: selectedChunk } = useQuery({
     queryKey: ['chunks', 'detail', selectedChunkId],
-    queryFn: () => selectedChunkId ? chunksApi.get(selectedChunkId) : null,
+    queryFn: () => {
+        if (!selectedChunkId) {
+            // This should not be reached if `enabled` is correctly set, but it's good practice
+            return Promise.reject(new Error("No chunk ID selected"));
+        }
+        return chunksApi.get(selectedChunkId);
+    },
     enabled: !!selectedChunkId,
 });
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion improves the queryFn to align with TanStack Query's idiomatic usage, enhancing robustness, although the original code works due to the enabled flag.

Low
Possible issue
Ensure consistent document count logic

To ensure data consistency, derive the totalDocuments count from the
processingMetrics aggregation instead of querying the document table separately.

apps/backend/src/routes/analytics/overview-route.ts [77-83]

-      // Total documents in period
-      prisma.document.count({
-        where: {
-          status: 'COMPLETED',
-          createdAt: { gte: start, lte: end },
-        },
-      }),
+      // Total documents in period is derived from metrics aggregation
+      // to ensure consistency with other stats.
+      // This avoids counting documents created in the period but processed outside of it.
+      Promise.resolve(metricsAgg._count),
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential data inconsistency issue where totalDocuments and other metrics could be based on different sets of records, leading to a subtle bug on the analytics dashboard.

Medium
Fix incorrect slide marker replacement

Update the regex for slide marker replacement to be greedy, ensuring it
correctly handles multiple consecutive horizontal rules and prevents incorrect
slide separation.

apps/ai-worker/src/converters/pptx_converter.py [127-128]

-        # We look for --- surrounded by newlines
-        if re.search(r"\n\s*---\s*\n", markdown):
--           return re.sub(r"\n\s*---\s*\n", "\n\n\n\n", markdown)
-+           return re.sub(r"\n\s*---\s*\n", f"\n\n{SLIDE_MARKER}\n\n", markdown)
+        # We look for one or more `---` rules surrounded by newlines to handle consecutive separators
+        if re.search(r"(\n\s*---\s*)+", markdown):
+            return re.sub(r"(\n\s*---\s*)+", f"\n\n{SLIDE_MARKER}\n\n", markdown)
 
         # Strategy B: Fallback to H1 Headers
         # Only if no --- found (e.g., custom template)

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential bug where consecutive slide separators would not be handled correctly, leading to incorrect chunking. The proposed fix using a greedy regex is accurate.

Low
Refactor concurrency logic for robustness

Refactor the processQueue function to use a simpler and more robust worker pool
pattern with Promise.all instead of the more complex Promise.race
implementation.

apps/frontend/src/components/documents/upload-dropzone.tsx [163-196]

 // Process queue with concurrency limit
 const processQueue = useCallback(async (items: UploadItem[]) => {
   const queue = [...items];
-  const active: Promise<void>[] = [];
 
-  const startNext = async () => {
-    if (queue.length === 0) return;
-
-    const item = queue.shift()!;
-    const promise = uploadFile(item).finally(() => {
-      const idx = active.indexOf(promise);
-      if (idx > -1) active.splice(idx, 1);
-    });
-    active.push(promise);
+  const worker = async () => {
+    while (queue.length > 0) {
+      const item = queue.shift();
+      if (item) {
+        await uploadFile(item);
+      }
+    }
   };
 
-  // Initial batch
-  for (let i = 0; i < Math.min(MAX_CONCURRENT, items.length); i++) {
-    startNext();
-  }
-
-  // Process remaining
-  while (queue.length > 0 || active.length > 0) {
-    if (active.length > 0) {
-      await Promise.race(active);
-      if (queue.length > 0) {
-        startNext();
-      }
-    }
-  }
+  const workers = Array(MAX_CONCURRENT).fill(null).map(worker);
+  await Promise.all(workers);
 
   // Done - refresh documents list
   await queryClient.invalidateQueries({ queryKey: ['documents'] });
 }, [uploadFile, queryClient]);
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggested worker pool pattern is a more robust and readable way to handle concurrency than the current Promise.race implementation, improving code maintainability.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant