test: update unit and integration tests for converters, chunkers, and… by namtroi · Pull Request #69 · namtroi/RAGBase

namtroi · 2025-12-29T19:25:26Z

User description

… profiles

PR Type

Tests

Description

Updated profile model tests to use flexible assertions instead of hardcoded values
- Allows schema defaults to be tuned without breaking tests
Added comprehensive tests for PDF post-processing methods
- Tests for page artifact removal and code block cleanup
Added header_levels parameter tests for DocumentChunker
- Validates clamping behavior and breadcrumb tracking
Created new PyMuPDFConverter test suite for link stripping
Updated TabularChunker test to verify large table handling behavior

Diagram Walkthrough

flowchart LR
  A["Profile Model Tests"] -->|"Flexible assertions"| B["Schema-agnostic validation"]
  C["Converter Tests"] -->|"Post-processing methods"| D["PDF artifact removal"]
  E["PyMuPDF Tests"] -->|"Link stripping"| F["Markdown link handling"]
  G["DocumentChunker Tests"] -->|"Header levels"| H["Breadcrumb tracking"]
  I["TabularChunker Tests"] -->|"Large table behavior"| J["Single chunk verification"]

File Walkthrough

Relevant files

Tests

profile-model.test.ts `Convert profile defaults to flexible assertions` apps/backend/tests/unit/models/profile-model.test.ts Replaced hardcoded default value assertions with flexible range-based checks Added comments explaining behavior-driven testing approach Validates reasonable defaults exist without enforcing specific values Allows schema defaults to be tuned independently of tests	+32/-25
test_base_converter.py `Add PDF post-processing method tests` apps/ai-worker/tests/test_base_converter.py Added `TestPostProcessPdf` class with 3 test methods for PDF post-processing Added `TestPostProcessPymupdf` class with 4 test methods for PyMuPDF-specific processing Tests verify removal of page artifacts, empty code blocks, and soft linebreak merging Tests validate full processing chain including normalization	+65/-0
test_document_chunker.py `Add header_levels parameter validation tests` apps/ai-worker/tests/test_document_chunker.py Added `TestHeaderLevels` class with 7 comprehensive test methods Tests validate default header_levels value of 3 Tests verify clamping behavior for values below 1 and above 6 Tests confirm breadcrumb tracking for different header level configurations	+59/-0
test_pymupdf_converter.py `Create PyMuPDFConverter link stripping tests` apps/ai-worker/tests/test_pymupdf_converter.py Created new test file with `TestStripHiddenLinks` class Added 10 test methods covering markdown link stripping functionality Tests cover basic links, nested brackets, multiple links, edge cases Documents current behavior including URL parentheses edge case	+84/-0
test_tabular_chunker.py `Implement large table chunking behavior test` apps/ai-worker/tests/test_tabular_chunker.py Replaced placeholder test with actual implementation Tests verify large Markdown tables remain as single chunk Generates 100-row table and validates chunk count and metadata Documents current behavior of keeping Markdown tables intact	+13/-5

… profiles

qodo-code-review · 2025-12-29T19:26:01Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Passed Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-29T19:27:18Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Ensure entire table content is preserved In `test_large_markdown_table_stays_single_chunk`, strengthen the assertion to verify that the chunk's content is identical to the original table input, not just that it contains a specific string. apps/ai-worker/tests/test_tabular_chunker.py [82-94] def test_large_markdown_table_stays_single_chunk(self, chunker): """Large Markdown tables remain as single chunk (current behavior).""" # Generate a large table with 100 rows header = "\| Name \| Age \| City \|\n\|---\|---\|---\|\n" rows = "\| Alice \| 30 \| NYC \|\n" * 100 large_table = header + rows chunks = chunker.chunk(large_table) # Current behavior: Markdown tables are kept as single chunk assert len(chunks) == 1 assert chunks[0]["metadata"]["chunk_type"] == "tabular" - assert "Alice" in chunks[0]["content"] + assert chunks[0]["content"] == large_table Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out a weakness in the test and proposes a stricter assertion to ensure the entire table content is preserved, which improves test robustness.	Medium
More

test: update unit and integration tests for converters, chunkers, and…

d21e099

… profiles

qodo-code-review Bot added the Review effort 2/5 label Dec 29, 2025

namtroi merged commit 6db3baf into main Dec 29, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: update unit and integration tests for converters, chunkers, and…#69

test: update unit and integration tests for converters, chunkers, and…#69
namtroi merged 1 commit into
mainfrom
pdf/optimize

namtroi commented Dec 29, 2025 •

edited by qodo-code-review Bot

Loading

Uh oh!

qodo-code-review Bot commented Dec 29, 2025

Uh oh!

Uh oh!

qodo-code-review Bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

namtroi commented Dec 29, 2025 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review Bot commented Dec 29, 2025

PR Compliance Guide 🔍

Uh oh!

Uh oh!

qodo-code-review Bot commented Dec 29, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

namtroi commented Dec 29, 2025 •

edited by qodo-code-review Bot

Loading