Skip to content

feat(clp-s-ffi-sfa): Expose file list and range info metadata in ClpArchiveReader; Refactor and clean up unit tests.#2093

Open
Bill-hbrhbr wants to merge 6 commits intoy-scope:mainfrom
Bill-hbrhbr:clp-s-ffi-sfa/list-files
Open

feat(clp-s-ffi-sfa): Expose file list and range info metadata in ClpArchiveReader; Refactor and clean up unit tests.#2093
Bill-hbrhbr wants to merge 6 commits intoy-scope:mainfrom
Bill-hbrhbr:clp-s-ffi-sfa/list-files

Conversation

@Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Mar 12, 2026

Description

This change extends clp_s::ffi:sfa::ClpArchiveReader to expose per file metadata derived from the range index, including file names and file ranges.

The new ClpArchiveReader::FileInfo class should roughly match the JS prototype FileInfo.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • C++ unit tests and downstream clp-ffi-js tests pass.

Summary by CodeRabbit

  • New Features
    • Archive reader now exposes per-file metadata: file names, per-file event counts and index ranges.
  • Reliability / Bug Fixes
    • Archive precomputation validates range-index structure, returns a clear MalformedRangeIndex error on failure, and runs during reader creation; reader cleanup now clears per-file metadata.
  • Tests
    • Tests refactored to a Result-based flow with centralized validation for archive readers.

@Bill-hbrhbr Bill-hbrhbr requested review from a team and gibber9809 as code owners March 12, 2026 17:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Walkthrough

Precomputation of archive metadata was moved out of the constructor and made to return Result<void>. It now validates range-index entries, populates m_file_names and m_file_infos, updates m_event_count, and constructors/factories call precompute before returning. Tests were refactored to Result-based helpers; a new error MalformedRangeIndex was added.

Changes

Cohort / File(s) Summary
Archive Reader Core
components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp, components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp
Added FileInfo type and new members m_file_names, m_file_infos. precompute_archive_metadata() signature changed to Result<void>; it validates contiguous, non-empty range_index entries with string filenames, computes per-range event counts, populates metadata, and updates m_event_count. Constructors/factory functions now call precompute and return fully initialized readers; close() and move_from handle new containers. Added required headers.
Error Codes
components/core/src/clp_s/ffi/sfa/SfaErrorCode.hpp, components/core/src/clp_s/ffi/sfa/SfaErrorCode.cpp
Added MalformedRangeIndex enum value and corresponding error message in the error category.
Tests
components/core/tests/test-clp_s-ffi_sfa_reader.cpp
Refactored tests to use Result-based helper functions (create_reader_from_path, create_reader_from_bytes, run_single_log_file_test, assert_reader_matches_expected), consolidated cases with GENERATE, and validated reader metadata (file names, indices, event counts) for both path and in-memory readers. Removed prior pre-cleanup behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the primary changes: exposing file list and range info metadata in ClpArchiveReader and refactoring unit tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp`:
- Around line 133-142: Loop is currently emitting one FileInfo per range
fragment; instead detect split fragments using the archive constant
(clp_s::archive_constants::_file_split_number) and coalesce by filename: for
each range (range.fields, key clp_s::constants::range_index::cFilename) check if
a FileInfo for that filename already exists in m_file_infos and if so update its
start_idx = min(existing.start_idx, start_idx) and end_idx =
max(existing.end_idx, end_idx) (and do not push a duplicate into m_file_names),
otherwise emplace a new FileInfo as currently done; ensure detection uses the
split-number field when present so fragments are merged into a single logical
file entry.

In `@components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp`:
- Around line 17-42: The header uses std::string (members and return type in
class FileInfo and method get_file_name which returns std::string const&
referencing m_file_name) but doesn't include <string>; add the missing `#include`
<string> at the top of ClpArchiveReader.hpp so the declaration of std::string
and uses of m_file_name/get_file_name compile without relying on transitive
includes.

In `@components/core/tests/test-clp_s-ffi_sfa_reader.cpp`:
- Around line 77-88: The tests currently only assert internal consistency
between reader.get_file_names() and reader.get_file_infos() but do not verify
the actual expected filenames or ranges; update the test to assert concrete
expectations for each fixture: replace the non-empty and equality checks with
assertions that the sizes equal the expected number of files, then iterate over
reader.get_file_names() / reader.get_file_infos() and assert file_name ==
expected_filenames[i], file_info.get_file_name() == expected_filenames[i],
file_info.get_start_index() == expected_ranges[i].start,
file_info.get_end_index() == expected_ranges[i].end (or expected_event_count
when appropriate), and file_info.get_event_count() == expected_ranges[i].count
so multi-file ordering and ranges are validated explicitly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74dbdec3-ee2e-4ba0-a30a-f56f11d3494f

📥 Commits

Reviewing files that changed from the base of the PR and between 5798e0e and 7f4bec7.

📒 Files selected for processing (3)
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp
  • components/core/tests/test-clp_s-ffi_sfa_reader.cpp

@Bill-hbrhbr Bill-hbrhbr marked this pull request as draft March 12, 2026 18:19
@Bill-hbrhbr Bill-hbrhbr marked this pull request as ready for review March 13, 2026 01:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/core/tests/test-clp_s-ffi_sfa_reader.cpp (1)

43-58: ⚠️ Potential issue | 🟡 Minor

Clear the per-fixture output directory before recompressing.

This helper reuses a deterministic output_dir based on log_path.stem(), but it never removes stale contents under that path. A dirty or interrupted prior run can therefore influence the next archive and hide regressions. Removing the directory first keeps the fixture hermetic.

💡 Proposed fix
 auto const output_dir{root_output_dir / log_path.stem().string()};
+std::filesystem::remove_all(output_dir);
 
 auto const archive_stats = compress_archive(
Based on learnings, files and directories are intentionally removed at the beginning of CLP tests so existing content cannot influence the result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/tests/test-clp_s-ffi_sfa_reader.cpp` around lines 43 - 58,
The helper generate_single_file_archive reuses a deterministic output_dir
(derived from get_archive_output_root_dir() and log_path.stem()) but doesn't
clear previous contents, so remove any stale data before recompressing: call
std::filesystem::remove_all(output_dir) (handle errors/exceptions as
appropriate) and then recreate the directory
(std::filesystem::create_directories(output_dir)) prior to calling
compress_archive; reference output_dir and generate_single_file_archive to
locate the change.
♻️ Duplicate comments (1)
components/core/tests/test-clp_s-ffi_sfa_reader.cpp (1)

81-92: 🛠️ Refactor suggestion | 🟠 Major

Add at least one real multi-file archive case.

These checks still only exercise the one-file path, so the new ordering and non-zero start-index behaviour for later files never executes here. A regression in per-file range accumulation or file ordering would still pass. Please add a fixture that packs multiple input files into one archive and assert each filename/range explicitly.

Also applies to: 125-140

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/tests/test-clp_s-ffi_sfa_reader.cpp` around lines 81 - 92,
The test only covers the single-file path so it doesn't exercise multi-file
ordering or non-zero start-index behavior; update the test that uses reader by
adding a fixture that packs multiple input files into one archive, invoke
reader.get_file_names() and reader.get_file_infos(), assert the returned sizes >
1, then iterate the files asserting the expected file name ordering and per-file
ranges using file_info.get_start_index(), file_info.get_end_index(), and
file_info.get_event_count() for each file to verify accumulated start/end
indices and counts match the inputs; ensure at least one case has a non-zero
start index to catch regressions in per-file range accumulation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp`:
- Around line 32-34: Update the docblocks for the ClpArchiveReader factory API
(both create() variants that now call precompute_archive_metadata()) to document
that callers may receive the new MalformedRangeIndex failure in addition to the
previously listed errors; specifically mention MalformedRangeIndex alongside
existing error cases in the header comments for the create() paths and note that
it is propagated from ClpArchiveReader::precompute_archive_metadata().

---

Outside diff comments:
In `@components/core/tests/test-clp_s-ffi_sfa_reader.cpp`:
- Around line 43-58: The helper generate_single_file_archive reuses a
deterministic output_dir (derived from get_archive_output_root_dir() and
log_path.stem()) but doesn't clear previous contents, so remove any stale data
before recompressing: call std::filesystem::remove_all(output_dir) (handle
errors/exceptions as appropriate) and then recreate the directory
(std::filesystem::create_directories(output_dir)) prior to calling
compress_archive; reference output_dir and generate_single_file_archive to
locate the change.

---

Duplicate comments:
In `@components/core/tests/test-clp_s-ffi_sfa_reader.cpp`:
- Around line 81-92: The test only covers the single-file path so it doesn't
exercise multi-file ordering or non-zero start-index behavior; update the test
that uses reader by adding a fixture that packs multiple input files into one
archive, invoke reader.get_file_names() and reader.get_file_infos(), assert the
returned sizes > 1, then iterate the files asserting the expected file name
ordering and per-file ranges using file_info.get_start_index(),
file_info.get_end_index(), and file_info.get_event_count() for each file to
verify accumulated start/end indices and counts match the inputs; ensure at
least one case has a non-zero start index to catch regressions in per-file range
accumulation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 18eb316c-5297-4226-af65-989b3e528e79

📥 Commits

Reviewing files that changed from the base of the PR and between 7f4bec7 and bffd4cc.

📒 Files selected for processing (5)
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.cpp
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp
  • components/core/src/clp_s/ffi/sfa/SfaErrorCode.cpp
  • components/core/src/clp_s/ffi/sfa/SfaErrorCode.hpp
  • components/core/tests/test-clp_s-ffi_sfa_reader.cpp

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp`:
- Around line 92-102: The two accessor methods have inconsistent return
semantics: get_file_names() returns a vector by value while get_file_infos()
returns a const reference, which can dangle if close() clears
m_file_names/m_file_infos; make them consistent by returning the same type for
both—prefer returning by value for safety (change get_file_infos() to return
std::vector<FileInfo> by value) or, if you choose efficiency, return const& for
both (change get_file_names() to return std::vector<std::string> const&), and
update callers accordingly; refer to get_file_names(), get_file_infos(),
close(), and members m_file_names/m_file_infos when applying the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 000ac3b6-3d8c-4b4b-bd2b-e5e08171c181

📥 Commits

Reviewing files that changed from the base of the PR and between bffd4cc and 771d56e.

📒 Files selected for processing (1)
  • components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp

Comment on lines +92 to +102
/**
* @return Source file names in range-index order.
*/
[[nodiscard]] auto get_file_names() const -> std::vector<std::string> { return m_file_names; }

/**
* @return Source file metadata in range index order.
*/
[[nodiscard]] auto get_file_infos() const -> std::vector<FileInfo> const& {
return m_file_infos;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making return types consistent between get_file_names() and get_file_infos().

get_file_names() returns by value (safe copy), while get_file_infos() returns by const reference (efficient but can dangle if close() is called while the reference is held). Given that close() clears both vectors, this inconsistency could lead to subtle lifetime bugs.

Consider either:

  • Return both by const reference for efficiency and consistent semantics, or
  • Return both by value for safety

For SFA-scoped archives with typically small file counts, the performance difference is likely negligible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/ffi/sfa/ClpArchiveReader.hpp` around lines 92 -
102, The two accessor methods have inconsistent return semantics:
get_file_names() returns a vector by value while get_file_infos() returns a
const reference, which can dangle if close() clears m_file_names/m_file_infos;
make them consistent by returning the same type for both—prefer returning by
value for safety (change get_file_infos() to return std::vector<FileInfo> by
value) or, if you choose efficiency, return const& for both (change
get_file_names() to return std::vector<std::string> const&), and update callers
accordingly; refer to get_file_names(), get_file_infos(), close(), and members
m_file_names/m_file_infos when applying the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant