Skip to content

feat(clp-s): Add get_metadata_for_log_event function to clp_s::ArchiveReader (resolves #2012).#2033

Open
gibber9809 wants to merge 6 commits intoy-scope:mainfrom
gibber9809:get-metadata-for-log-event
Open

feat(clp-s): Add get_metadata_for_log_event function to clp_s::ArchiveReader (resolves #2012).#2033
gibber9809 wants to merge 6 commits intoy-scope:mainfrom
gibber9809:get-metadata-for-log-event

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Feb 25, 2026

Description

This PR implements the get_metadata_for_log_event function as requested in #2012. The implementation is simply to create a map of end_index -> metadata containing entries for each range with at least one record, and look up the metadata for a given index using the upper_bound() function.

This PR also adds some assertions in test-clp_s-range-index to confirm that get_metadata_for_log_event works as expected.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added tests to validate that get_metadata_for_log_event retrieves the correct metadata for a given index, and throws for invalid indexes.

Summary by CodeRabbit

  • New Features

    • Added ability to retrieve metadata for individual log events, exposed as JSON.
  • Tests

    • Added tests validating per-log-event metadata retrieval, including boundary and invalid-index error handling.

@gibber9809 gibber9809 requested a review from a team as a code owner February 25, 2026 19:47
@gibber9809 gibber9809 linked an issue Feb 25, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Walkthrough

Adds ArchiveReader::get_metadata_for_log_event and matching ArchiveReaderAdaptor support: adaptor stores per-range metadata, populates it when reading range index, and exposes lookup by log-event index. Tests extended to validate per-log-event metadata and boundary conditions; nlohmann::json forward-declared.

Changes

Cohort / File(s) Summary
API Declarations
components/core/src/clp_s/ArchiveReader.hpp, components/core/src/clp_s/ArchiveReaderAdaptor.hpp
Added [[nodiscard]] auto get_metadata_for_log_event(int64_t) -> nlohmann::json const& declarations; added std::map<int64_t, nlohmann::json> m_non_empty_range_metadata_map to adaptor; forward-declared nlohmann::json.
Implementation
components/core/src/clp_s/ArchiveReaderAdaptor.cpp
Populate m_non_empty_range_metadata_map for non-empty ranges in try_read_range_index; implemented get_metadata_for_log_event using upper_bound lookup with validation and throws on invalid/missing entries.
Tests
components/core/tests/test-clp_s-range_index.cpp
Included ArchiveReaderAdaptor.hpp, added constexpr size_t cNumRecordsInInputFile = 4; verify metadata for first N log events matches range metadata; added boundary tests for out-of-range indices.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature added: a new get_metadata_for_log_event function in clp_s::ArchiveReader, matching the actual changes across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/ArchiveReaderAdaptor.cpp`:
- Around line 339-346: In get_metadata_for_log_event, move the negative-index
guard (log_event_idx < 0) to before the
m_non_empty_range_metadata_map.upper_bound call to avoid unnecessary lookups,
and after obtaining the iterator from upper_bound verify that the found range
actually contains log_event_idx (e.g., check log_event_idx >= corresponding
range's start_index or whatever field marks range start) before returning
it->second; if the iterator is end() or the index falls outside the range, throw
OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__).
- Around line 153-155: m_non_empty_range_metadata_map is currently copying
m_range_index.back().fields (potentially large nlohmann::json); change the map
value to store a pointer (nlohmann::json const*) or std::reference_wrapper<const
nlohmann::json> instead of a copy, update the map type in the header, modify the
emplace in try_read_range_index to insert the address/reference of
m_range_index.back().fields, and update callers such as
get_metadata_for_log_event to dereference the stored pointer/reference when
accessing metadata; keep the invariant that m_range_index is not
mutated/reallocated after population.

In `@components/core/src/clp_s/ArchiveReaderAdaptor.hpp`:
- Line 5: Replace the unused `#include` <functional> with `#include` <map> in
ArchiveReaderAdaptor.hpp so std::map is properly declared for
m_non_empty_range_metadata_map; locate the declaration of
m_non_empty_range_metadata_map (and the surrounding class/struct in
ArchiveReaderAdaptor) and update the header includes to remove <functional> and
add <map> consistent with ArchiveReader.hpp.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d754cbf and 44b49f9.

📒 Files selected for processing (4)
  • components/core/src/clp_s/ArchiveReader.hpp
  • components/core/src/clp_s/ArchiveReaderAdaptor.cpp
  • components/core/src/clp_s/ArchiveReaderAdaptor.hpp
  • components/core/tests/test-clp_s-range_index.cpp

@junhaoliao junhaoliao added this to the March 2026 milestone Mar 7, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
components/core/src/clp_s/ArchiveReaderAdaptor.hpp (1)

1-10: ⚠️ Potential issue | 🟡 Minor

Missing <map> include for std::map usage.

The file uses std::map at line 198 for m_non_empty_range_metadata_map, but <map> is not included in the header includes.

Proposed fix
 `#include` <cstddef>
+#include <map>
 `#include` <memory>
 `#include` <optional>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/ArchiveReaderAdaptor.hpp` around lines 1 - 10, The
header is missing the <map> include required for std::map usage; add `#include`
<map> to the include block at the top of ArchiveReaderAdaptor.hpp so the
declaration of m_non_empty_range_metadata_map (and any other std::map uses)
compiles; look for the include list near the top of the file and insert <map>
alongside the other standard headers.
components/core/src/clp_s/ArchiveReaderAdaptor.cpp (2)

369-376: ⚠️ Potential issue | 🟡 Minor

Move negative index validation before the map lookup.

The log_event_idx < 0 check is evaluated after upper_bound, performing an unnecessary lookup when the index is already known to be invalid. Reordering improves clarity and avoids wasted work.

Proposed fix
 auto ArchiveReaderAdaptor::get_metadata_for_log_event(int64_t log_event_idx)
         -> nlohmann::json const& {
+    if (log_event_idx < 0) {
+        throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
+    }
     auto const it{m_non_empty_range_metadata_map.upper_bound(log_event_idx)};
-    if (m_non_empty_range_metadata_map.end() == it || log_event_idx < 0) {
+    if (m_non_empty_range_metadata_map.end() == it) {
         throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
     }
     return it->second;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/ArchiveReaderAdaptor.cpp` around lines 369 - 376,
Check for a negative log_event_idx before doing the map lookup in
ArchiveReaderAdaptor::get_metadata_for_log_event: move the log_event_idx < 0
validation to the top of the function and throw
OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__) immediately if
negative, then call m_non_empty_range_metadata_map.upper_bound(log_event_idx)
and proceed as before to avoid an unnecessary lookup when the index is invalid.

171-173: 🧹 Nitpick | 🔵 Trivial

Map stores a copy of fields — consider storing a pointer instead.

Since m_range_index owns the RangeIndexEntry objects and lives for the lifetime of the adaptor, storing a nlohmann::json const* instead of copying potentially large JSON objects would reduce memory overhead.

Sketch using pointer

In the header, change the map value type:

-    std::map<int64_t, nlohmann::json> m_non_empty_range_metadata_map;
+    std::map<int64_t, nlohmann::json const*> m_non_empty_range_metadata_map;

In try_read_range_index:

         if (start_index != end_index) {
-            m_non_empty_range_metadata_map.emplace(end_index, m_range_index.back().fields);
+            m_non_empty_range_metadata_map.emplace(end_index, &m_range_index.back().fields);
         }

In get_metadata_for_log_event:

-    return it->second;
+    return *(it->second);

Note: This is safe as long as m_range_index is not mutated after population, which appears to be the case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp_s/ArchiveReaderAdaptor.cpp` around lines 171 - 173,
The code currently copies large nlohmann::json objects into
m_non_empty_range_metadata_map; change the map to store nlohmann::json const*
(pointer) instead of a copy, update the header/type of
m_non_empty_range_metadata_map accordingly, set the pointer when populating the
map in try_read_range_index by using &m_range_index.back().fields (or the
appropriate RangeIndexEntry instance), and update get_metadata_for_log_event and
any other access sites to dereference the pointer (handle nulls if needed) while
preserving const correctness; ensure comments note that m_range_index must not
be mutated after population so the pointer remains valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@components/core/src/clp_s/ArchiveReaderAdaptor.cpp`:
- Around line 369-376: Check for a negative log_event_idx before doing the map
lookup in ArchiveReaderAdaptor::get_metadata_for_log_event: move the
log_event_idx < 0 validation to the top of the function and throw
OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__) immediately if
negative, then call m_non_empty_range_metadata_map.upper_bound(log_event_idx)
and proceed as before to avoid an unnecessary lookup when the index is invalid.
- Around line 171-173: The code currently copies large nlohmann::json objects
into m_non_empty_range_metadata_map; change the map to store nlohmann::json
const* (pointer) instead of a copy, update the header/type of
m_non_empty_range_metadata_map accordingly, set the pointer when populating the
map in try_read_range_index by using &m_range_index.back().fields (or the
appropriate RangeIndexEntry instance), and update get_metadata_for_log_event and
any other access sites to dereference the pointer (handle nulls if needed) while
preserving const correctness; ensure comments note that m_range_index must not
be mutated after population so the pointer remains valid.

In `@components/core/src/clp_s/ArchiveReaderAdaptor.hpp`:
- Around line 1-10: The header is missing the <map> include required for
std::map usage; add `#include` <map> to the include block at the top of
ArchiveReaderAdaptor.hpp so the declaration of m_non_empty_range_metadata_map
(and any other std::map uses) compiles; look for the include list near the top
of the file and insert <map> alongside the other standard headers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 813fdf19-d406-4950-879c-30aaf9ee91b7

📥 Commits

Reviewing files that changed from the base of the PR and between 53041d4 and 7eba757.

📒 Files selected for processing (3)
  • components/core/src/clp_s/ArchiveReader.hpp
  • components/core/src/clp_s/ArchiveReaderAdaptor.cpp
  • components/core/src/clp_s/ArchiveReaderAdaptor.hpp

* @throws ArchiveReaderAdaptor::OperationFailed when `log_event_idx` cannot be mapped to
* any metadata.
*/
[[nodiscard]] auto get_metadata_for_log_event(int64_t log_event_idx) -> nlohmann::json const& {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should log event simply take an unsigned type like uint64_t?

Comment on lines +371 to +372
auto const it{m_non_empty_range_metadata_map.upper_bound(log_event_idx)};
if (m_non_empty_range_metadata_map.end() == it || log_event_idx < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic seems like you don't really need to look up anything if the log even idx is negative, which makes me feel more strongly about using uint64_t

Comment on lines +171 to +173
if (start_index != end_index) {
m_non_empty_range_metadata_map.emplace(end_index, m_range_index.back().fields);
}
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I've always had the question if the range index accepts files with empty ranges, or gaps between adjacent ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArchiveReader::get_metadata_for_log_event API

3 participants