Skip to content

feat(kv-ir): Add UnstructuredIrDeserializerImpl implementation for unstructured IR deserialization (resolves #2096).#2227

Open
jonathan-imanu wants to merge 10 commits intoy-scope:mainfrom
jonathan-imanu:feat/ir_deserializer
Open

feat(kv-ir): Add UnstructuredIrDeserializerImpl implementation for unstructured IR deserialization (resolves #2096).#2227
jonathan-imanu wants to merge 10 commits intoy-scope:mainfrom
jonathan-imanu:feat/ir_deserializer

Conversation

@jonathan-imanu
Copy link
Copy Markdown
Contributor

@jonathan-imanu jonathan-imanu commented Apr 25, 2026

Description

Completes the implementation of #2096.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added a unit test for old IR deserilizaton.

Summary by CodeRabbit

  • New Features

    • Backward-compatible unstructured IR stream support with selectable encoding variants and preserved message/timestamp semantics.
  • Bug Fixes / Diagnostics

    • Improved deserialization error reporting for invalid reference-timestamp metadata, invalid timestamp values, and missing required schema nodes.
  • Chores

    • Build updated to include the new unstructured deserializer components.
  • Tests

    • New end-to-end tests validating serialization/deserialization of unstructured log events across encoding variants.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Factory now validates protocol version and selects either the existing KvIrDeserializerImpl (Supported) or a new templated UnstructuredIrDeserializerImpl (BackwardCompatible) supporting four- and eight-byte encodings; related decoding helpers, error codes, build files, and tests were added or adjusted.

Changes

Unstructured IR deserialization feature

Layer / File(s) Summary
Data Shape / Declarations
components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.hpp, components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
Add templated UnstructuredIrDeserializerImpl<encoded_variable_t> and declare deserialize_timestamp<encoded_variable_t> API; new factory usage and private state (pending insertions, cached node IDs, previous-timestamp for 4-byte).
Core Implementation
components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp, components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
Implement unstructured decoding for 4- and 8-byte encodings: schema-node insertion emission, log-event deserialization, timestamp delta (4-byte) vs absolute (8-byte) handling, 4-byte reference-timestamp metadata parsing, error conversions, and explicit template instantiations.
Factory / Integration
components/core/src/clp/ffi/ir_stream/Deserializer.hpp
create_generic now validates protocol version and chooses between KvIrDeserializerImpl (Supported) and UnstructuredIrDeserializerImpl (BackwardCompatible), storing the impl in a std::unique_ptr and passing it into Deserializer.
Interface docs / Errors
components/core/src/clp/ffi/ir_stream/DeserializerImpl.hpp, components/core/src/clp/ffi/ir_stream/IrDeserializationError.hpp, components/core/src/clp/ffi/ir_stream/IrDeserializationError.cpp
Document tweaks for key_name_buffer; add new error enumerators (InvalidReferenceTimestampMetadata, InvalidReferenceTimestampValue, MissingRequiredSchemaNodes) and map messages for them.
Build / Wiring
components/core/CMakeLists.txt, components/core/src/clp_s/CMakeLists.txt
Add UnstructuredIrDeserializerImpl.{cpp,hpp} to unit test and clp_s source lists so the new implementation is compiled into test and clp_s targets.
Schema utilities
components/core/src/clp/ffi/SchemaTree.hpp
Mark SchemaTree::NodeLocator ctor/dtor, copy/move, and accessors as constexpr (copy/move noexcept) with no shape change.
Small API renames / forwarding
components/core/src/clp/ffi/ir_stream/KvIrDeserializerImpl.hpp, .../KvIrDeserializerImpl.cpp, components/core/src/clp/ffi/ir_stream/Deserializer.hpp
Rename schema-insertion key parameter to key_name_buffer and pass through to implementations.
Tests
components/core/tests/test-ir_encoding_methods.cpp
Add end-to-end serde test for unstructured IR stream: serialize preamble+events+EOF, deserialize until EndOfStream, validate counts, schema nodes, timestamps/messages, and EOF error behavior.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Factory as Deserializer Factory
    participant PV as Protocol Validator
    participant KvImpl as KvIrDeserializerImpl
    participant UnstructImpl as UnstructuredIrDeserializerImpl
    participant Reader as ReaderInterface
    participant Schema as SchemaTree

    Client->>Factory: create_generic(version, encoding_type, metadata, ...)
    Factory->>PV: validate_protocol_version(version)
    alt Supported
        PV-->>Factory: Supported
        Factory->>KvImpl: construct KvIrDeserializerImpl(...)
        KvImpl-->>Factory: instance
    else BackwardCompatible
        PV-->>Factory: BackwardCompatible
        Factory->>UnstructImpl: UnstructuredIrDeserializerImpl::create(metadata)
        UnstructImpl-->>Factory: unique_ptr<UnstructuredIrDeserializerImpl>
    end
    Factory-->>Client: Deserializer instance

    Client->>UnstructImpl: get_next_ir_unit_type(Reader)
    alt Pending schema insertions exist
        UnstructImpl-->>Client: SchemaTreeNodeInsertion (message/timestamp)
    else
        UnstructImpl->>Reader: read protocol tag
        Reader-->>UnstructImpl: tag
        alt tag == LogEvent
            UnstructImpl->>Schema: resolve node IDs for message & timestamp
            Schema-->>UnstructImpl: NodeLocator info
            UnstructImpl-->>Client: KeyValuePairLogEvent
        else tag == EndOfStream
            UnstructImpl-->>Client: EndOfStream
        else tag == UtcOffsetChange
            UnstructImpl-->>Client: UtcOffsetChange
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • gibber9809
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding an UnstructuredIrDeserializerImpl implementation for unstructured IR deserialization, directly reflecting the primary purpose of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jonathan-imanu jonathan-imanu marked this pull request as ready for review April 26, 2026 00:16
@jonathan-imanu jonathan-imanu requested review from a team and gibber9809 as code owners April 26, 2026 00:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`:
- Around line 199-241: The code currently increments m_previous_timestamp before
running fallible steps; instead compute a local new_timestamp =
m_previous_timestamp + timestamp_delta and use that local value in
build_encoded_text_ast/when creating the timestamp Value and in
KeyValuePairLogEvent::create, but do not assign it back to m_previous_timestamp
until after build_encoded_text_ast, resolve_required_node_ids and
KeyValuePairLogEvent::create succeed (i.e., just prior to the successful
return). Update references in this block (deserialize_log_event,
build_encoded_text_ast, resolve_required_node_ids, KeyValuePairLogEvent::create)
to use the local new_timestamp and only mutate m_previous_timestamp on the
success path.
- Around line 48-74: Move the doxygen comment block so it immediately precedes
the template declaration for build_encoded_text_ast; specifically place the
entire /** ... */ docblock above the line "template <ir::EncodedVariableTypeReq
encoded_variable_t>" so the documentation is attached to the templated function
declaration (build_encoded_text_ast) rather than following the template line,
leaving the function signature and body (including
EncodedTextAst<encoded_variable_t>::create usage) unchanged.
- Around line 155-169: Replace the verbose has_error()/error()/value() handling
with the project macro to be concise and move-friendly: use
YSTDLIB_ERROR_HANDLING_TRYX when calling deserialize_tag (currently held in
tag_result) and for the two encoded_text_ast_result usages; specifically wrap
the deserialize_tag(reader) call with YSTDLIB_ERROR_HANDLING_TRYX to return its
error or extract the tag, and do the same for the encoded_text_ast_result
occurrences so they return early on error and yield the inner value consistently
with the rest of the file (keep existing semantics for
cProtocol::Eof/cProtocol::Payload checks and the final IrUnitType returns).

In `@components/core/tests/test-ir_encoding_methods.cpp`:
- Around line 1389-1390: Remove the redundant manual EOF push: delete the call
that does ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof) because
serialize_log_events already appends cProtocol::Eof; ensure tests rely on the
helper-produced EOF (used by is_stream_completed() and decode_ir_complete) and
keep only the serialize_log_events<TestType>(...) call followed by assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4ea8206-10f2-4989-8ebc-ae5df0a2d712

📥 Commits

Reviewing files that changed from the base of the PR and between 32d2d43 and 1616cd7.

📒 Files selected for processing (6)
  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.hpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/tests/test-ir_encoding_methods.cpp

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Comment on lines +1389 to +1390
REQUIRE(serialize_log_events<TestType>(test_log_events, preamble_ts, ir_buf, encoded_logtypes));
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Redundant Eof byte appended after serialize_log_events.

serialize_log_events already appends cProtocol::Eof at line 288, so line 1390 pushes a second EOF that the deserializer never reaches (it halts at the first one and is_stream_completed() becomes true, which is exactly what the assertion at line 1452 then exercises). The byte is harmless but obscures the fact that the EOF is produced by the helper, and it's inconsistent with decode_ir_complete (lines 1001-1064), which relies solely on the helper's EOF.

🧹 Suggested cleanup
     REQUIRE(serialize_log_events<TestType>(test_log_events, preamble_ts, ir_buf, encoded_logtypes));
-    ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REQUIRE(serialize_log_events<TestType>(test_log_events, preamble_ts, ir_buf, encoded_logtypes));
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof);
REQUIRE(serialize_log_events<TestType>(test_log_events, preamble_ts, ir_buf, encoded_logtypes));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/tests/test-ir_encoding_methods.cpp` around lines 1389 - 1390,
Remove the redundant manual EOF push: delete the call that does
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof) because
serialize_log_events already appends cProtocol::Eof; ensure tests rely on the
helper-produced EOF (used by is_stream_completed() and decode_ir_complete) and
keep only the serialize_log_events<TestType>(...) call followed by assertions.

std::vector<SchemaTree::NodeLocator> m_pending_schema_insertions;
std::optional<SchemaTree::Node::id_t> m_optional_message_node_id;
std::optional<SchemaTree::Node::id_t> m_optional_timestamp_node_id;
ir::epoch_time_ms_t m_previous_timestamp{0};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This member shouldn't exist for eight-byte encoded vars. Check:

[[no_unique_address]] std::conditional_t<

Comment on lines +48 to +74
template <ir::EncodedVariableTypeReq encoded_variable_t>
/**
* Builds an encoded text AST from the components of an unstructured IR log event.
* @tparam encoded_variable_t
* @param logtype
* @param dict_vars
* @param encoded_vars
* @return A result containing the encoded text AST on success, or an error code indicating the
* failure:
* - Forwards `EncodedTextAst::create`'s return values on failure.
*/
[[nodiscard]] auto build_encoded_text_ast(
std::string const& logtype,
std::vector<std::string> const& dict_vars,
std::vector<encoded_variable_t> encoded_vars
) -> ystdlib::error_handling::Result<EncodedTextAst<encoded_variable_t>> {
StringBlob string_blob;
for (auto const& dict_var : dict_vars) {
string_blob.append(dict_var);
}
string_blob.append(logtype);

return EncodedTextAst<encoded_variable_t>::create(
std::move(encoded_vars),
std::move(string_blob)
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this. The existing encoded text AST method can be used to deserialize unstructured IR, which uses string blob directly.

m_pending_schema_insertions.pop_back();

key_name.clear();
return std::pair{false, node_locator};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A design decsion I forgot to mention: the timestamp key should be considered auto-generated to match our convention in the kv-ir logging library. I think the ``pending_schema_insertionsshould have a pair ofis_auto_gen, node_locator` instead.

std::vector<std::string> dict_vars;
ir::epoch_time_ms_t timestamp_delta{};

if (auto const err{deserialize_log_event<ir::four_byte_encoded_variable_t>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment, I think we should use the latest version of deserialize_encoded_text_ast directly as kv-ir, and then deserialize the timestamp/timestamp-delta manually from the stream. Check this implementation for details.

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Comment on lines +91 to +99
// Constructors
explicit UnstructuredIrDeserializerImpl(
std::vector<SchemaTree::NodeLocator> initial_insertions
);

UnstructuredIrDeserializerImpl(
std::vector<SchemaTree::NodeLocator> initial_insertions,
ir::epoch_time_ms_t reference_timestamp
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add constraints to only enable the first method for eight-byte encoded, and the second for four-byte encoded. Check std::enable_if_t: https://en.cppreference.com/cpp/types/enable_if.

Comment on lines +110 to +122
template <ir::EncodedVariableTypeReq encoded_variable_t>
UnstructuredIrDeserializerImpl<encoded_variable_t>::UnstructuredIrDeserializerImpl(
std::vector<SchemaTree::NodeLocator> initial_insertions
)
: m_pending_schema_insertions{std::move(initial_insertions)} {}

template <ir::EncodedVariableTypeReq encoded_variable_t>
UnstructuredIrDeserializerImpl<encoded_variable_t>::UnstructuredIrDeserializerImpl(
std::vector<SchemaTree::NodeLocator> initial_insertions,
ir::epoch_time_ms_t reference_timestamp
)
: m_pending_schema_insertions{std::move(initial_insertions)},
m_previous_timestamp{reference_timestamp} {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pay attention to the symbol declaration ordering. The constructor is declared after other functions, then its definition should follow the declaration order.

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp (5)

39-46: ⚠️ Potential issue | 🟠 Major

Schema insertions still don't distinguish auto-generated vs user-generated keys.

create_schema_insertions returns a std::vector<NodeLocator> (line 39) and deserialize_ir_unit_schema_tree_node_insertion always returns {false, node_locator} (line 178), so both message and timestamp are emitted as user-generated insertions. Per the kv-ir convention, timestamp should be auto-generated; the pending stack should hold (is_auto_gen, NodeLocator) pairs and resolve_required_node_ids (lines 132-137) should look up timestamp against auto_gen_keys_schema_tree rather than user_gen_keys_schema_tree. As-is, downstream consumers expecting the kv-ir layout will see a mismatched data model.

Also applies to: 167-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 39 - 46, create_schema_insertions currently produces NodeLocator
entries but doesn't mark which keys are auto-generated, and
deserialize_ir_unit_schema_tree_node_insertion always returns {false,
node_locator} so timestamp is treated as user-generated; update
create_schema_insertions to mark the timestamp NodeLocator as auto-generated,
change deserialize_ir_unit_schema_tree_node_insertion to return a pair
(is_auto_gen, NodeLocator) with true for timestamp, update the pending stack
type to hold (bool, SchemaTree::NodeLocator) and modify
resolve_required_node_ids to consult auto_gen_keys_schema_tree (not
user_gen_keys_schema_tree) when resolving the timestamp key so auto-generated
keys are stored/checked in the auto-gen table.

210-217: 🧹 Nitpick | 🔵 Trivial

Use YSTDLIB_ERROR_HANDLING_TRYX for the encoded_text_ast_result returns.

The tag deserialization at line 155 already uses YSTDLIB_ERROR_HANDLING_TRYX, but the two encoded_text_ast_result sites (lines 215-217 and 271-273) still use the verbose has_error()/error()/value() pattern. Switching them to the macro keeps this file consistent with Deserializer.hpp and the rest of the file, and is more move-friendly.

♻️ Proposed change (apply to both specializations)
-    auto encoded_text_ast_result{build_encoded_text_ast<ir::four_byte_encoded_variable_t>(
-            logtype,
-            dict_vars,
-            std::move(encoded_vars)
-    )};
-    if (encoded_text_ast_result.has_error()) {
-        return encoded_text_ast_result.error();
-    }
+    auto encoded_text_ast{YSTDLIB_ERROR_HANDLING_TRYX(
+            build_encoded_text_ast<ir::four_byte_encoded_variable_t>(
+                    logtype,
+                    dict_vars,
+                    std::move(encoded_vars)
+            )
+    )};
@@
-    user_gen_pairs.emplace(message_node_id, Value{std::move(encoded_text_ast_result.value())});
+    user_gen_pairs.emplace(message_node_id, Value{std::move(encoded_text_ast)});

Also applies to: 266-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 210 - 217, Replace the verbose has_error()/error()/value() handling
for the results of build_encoded_text_ast in UnstructuredIrDeserializerImpl.cpp
with the YSTDLIB_ERROR_HANDLING_TRYX macro: locate the two spots that call
build_encoded_text_ast<ir::four_byte_encoded_variable_t> (the
encoded_text_ast_result at ~lines 215-217 and the other specialization at
~266-273) and wrap/unwrap the result using YSTDLIB_ERROR_HANDLING_TRYX so the
error is propagated and the value is retrieved in a move-friendly way, matching
how tag deserialization uses the macro elsewhere in this file.

195-237: ⚠️ Potential issue | 🟠 Major

m_previous_timestamp is mutated before fallible work completes.

Line 208 advances m_previous_timestamp before build_encoded_text_ast (line 210) and resolve_required_node_ids (line 220) run. If either fails, the persistent timestamp has already moved forward, so any subsequent diagnostic or recovery path observes a corrupted running timestamp. Compute the new timestamp into a local, use the local for the emitted Value (line 227) and KeyValuePairLogEvent::create, and only assign back to m_previous_timestamp on the success path immediately before the final return.

🛡️ Proposed reordering
-    m_previous_timestamp += timestamp_delta;
-
     auto encoded_text_ast_result{build_encoded_text_ast<ir::four_byte_encoded_variable_t>(
             logtype,
             dict_vars,
             std::move(encoded_vars)
     )};
     if (encoded_text_ast_result.has_error()) {
         return encoded_text_ast_result.error();
     }

     auto const [message_node_id, timestamp_node_id]{
             YSTDLIB_ERROR_HANDLING_TRYX(this->resolve_required_node_ids(user_gen_keys_schema_tree))
     };

+    auto const new_timestamp{m_previous_timestamp + timestamp_delta};
+
     KeyValuePairLogEvent::NodeIdValuePairs user_gen_pairs;
     user_gen_pairs.emplace(message_node_id, Value{std::move(encoded_text_ast_result.value())});
     user_gen_pairs.emplace(
             timestamp_node_id,
-            Value{static_cast<value_int_t>(m_previous_timestamp)}
+            Value{static_cast<value_int_t>(new_timestamp)}
     );

-    return KeyValuePairLogEvent::create(
+    auto result{KeyValuePairLogEvent::create(
             auto_gen_keys_schema_tree,
             user_gen_keys_schema_tree,
             {},
             std::move(user_gen_pairs),
             utc_offset
-    );
+    )};
+    if (false == result.has_error()) {
+        m_previous_timestamp = new_timestamp;
+    }
+    return result;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 195 - 237, The code advances m_previous_timestamp before fallible
operations; compute the new timestamp into a local variable (e.g., new_timestamp
= m_previous_timestamp + timestamp_delta), use that local for the Value placed
into user_gen_pairs and for the utc_offset/KeyValuePairLogEvent::create call,
run build_encoded_text_ast and resolve_required_node_ids as currently written,
and only on the success path (just before returning the created
KeyValuePairLogEvent) assign m_previous_timestamp = new_timestamp; references to
help locate the spots: m_previous_timestamp, timestamp_delta,
build_encoded_text_ast<ir::four_byte_encoded_variable_t>,
resolve_required_node_ids, and KeyValuePairLogEvent::create.

152-164: 🧹 Nitpick | 🔵 Trivial

Inconsistent std::pair usage in return statements.

Line 164 was updated to use braced-list return (return {IrUnitType::LogEvent, tag};), but lines 152, 158, and 161 still wrap their returns in std::pair{...}. Make them consistent for clarity.

♻️ Proposed change
-    if (false == m_pending_schema_insertions.empty()) {
-        return std::pair{IrUnitType::SchemaTreeNodeInsertion, encoded_tag_t{0}};
-    }
+    if (false == m_pending_schema_insertions.empty()) {
+        return {IrUnitType::SchemaTreeNodeInsertion, encoded_tag_t{0}};
+    }

     auto const tag{YSTDLIB_ERROR_HANDLING_TRYX(deserialize_tag(reader))};

     if (cProtocol::Eof == tag) {
-        return std::pair{IrUnitType::EndOfStream, tag};
+        return {IrUnitType::EndOfStream, tag};
     }
     if (cProtocol::Payload::UtcOffsetChange == tag) {
-        return std::pair{IrUnitType::UtcOffsetChange, tag};
+        return {IrUnitType::UtcOffsetChange, tag};
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 152 - 164, The returns in UnstructuredIrDeserializerImpl.cpp are
inconsistent: some use std::pair{...} while one uses braced-list return
{IrUnitType::LogEvent, tag};; make them consistent by replacing the
std::pair{...} returns (the SchemaTreeNodeInsertion and
UtcOffsetChange/EndOfStream branches) with the same braced-list style used for
the LogEvent return so all return statements return {IrUnitType::..., tag}
(referencing IrUnitType, encoded_tag_t, deserialize_tag,
YSTDLIB_ERROR_HANDLING_TRYX, and cProtocol::Eof /
cProtocol::Payload::UtcOffsetChange to locate the branches).

48-74: 🛠️ Refactor suggestion | 🟠 Major

build_encoded_text_ast helper is redundant.

The existing deserialize_encoded_text_ast path used by kv-ir already produces an EncodedTextAst from a StringBlob; the unstructured deserializer can call it directly after manually deserializing the timestamp/timestamp-delta from the stream (see decoding_methods.cpp). Wrapping that into a separate helper here is unnecessary and duplicates assembly logic. Removing it would also let you drop the dict_vars/encoded_vars/logtype triple buffered locally in deserialize_ir_unit_kv_pair_log_event.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 48 - 74, The build_encoded_text_ast helper is redundant—remove
build_encoded_text_ast and instead call the existing
deserialize_encoded_text_ast (the kv-ir path) directly after you deserialize the
timestamp/timestamp-delta in deserialize_ir_unit_kv_pair_log_event; assemble a
StringBlob in-place (as done in decoding_methods.cpp) and pass it to
deserialize_encoded_text_ast to get an EncodedTextAst, and drop the temporary
dict_vars/encoded_vars/logtype buffers previously created in
deserialize_ir_unit_kv_pair_log_event so the unstructured deserializer reuses
the shared assembly logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`:
- Around line 39-46: create_schema_insertions currently produces NodeLocator
entries but doesn't mark which keys are auto-generated, and
deserialize_ir_unit_schema_tree_node_insertion always returns {false,
node_locator} so timestamp is treated as user-generated; update
create_schema_insertions to mark the timestamp NodeLocator as auto-generated,
change deserialize_ir_unit_schema_tree_node_insertion to return a pair
(is_auto_gen, NodeLocator) with true for timestamp, update the pending stack
type to hold (bool, SchemaTree::NodeLocator) and modify
resolve_required_node_ids to consult auto_gen_keys_schema_tree (not
user_gen_keys_schema_tree) when resolving the timestamp key so auto-generated
keys are stored/checked in the auto-gen table.
- Around line 210-217: Replace the verbose has_error()/error()/value() handling
for the results of build_encoded_text_ast in UnstructuredIrDeserializerImpl.cpp
with the YSTDLIB_ERROR_HANDLING_TRYX macro: locate the two spots that call
build_encoded_text_ast<ir::four_byte_encoded_variable_t> (the
encoded_text_ast_result at ~lines 215-217 and the other specialization at
~266-273) and wrap/unwrap the result using YSTDLIB_ERROR_HANDLING_TRYX so the
error is propagated and the value is retrieved in a move-friendly way, matching
how tag deserialization uses the macro elsewhere in this file.
- Around line 195-237: The code advances m_previous_timestamp before fallible
operations; compute the new timestamp into a local variable (e.g., new_timestamp
= m_previous_timestamp + timestamp_delta), use that local for the Value placed
into user_gen_pairs and for the utc_offset/KeyValuePairLogEvent::create call,
run build_encoded_text_ast and resolve_required_node_ids as currently written,
and only on the success path (just before returning the created
KeyValuePairLogEvent) assign m_previous_timestamp = new_timestamp; references to
help locate the spots: m_previous_timestamp, timestamp_delta,
build_encoded_text_ast<ir::four_byte_encoded_variable_t>,
resolve_required_node_ids, and KeyValuePairLogEvent::create.
- Around line 152-164: The returns in UnstructuredIrDeserializerImpl.cpp are
inconsistent: some use std::pair{...} while one uses braced-list return
{IrUnitType::LogEvent, tag};; make them consistent by replacing the
std::pair{...} returns (the SchemaTreeNodeInsertion and
UtcOffsetChange/EndOfStream branches) with the same braced-list style used for
the LogEvent return so all return statements return {IrUnitType::..., tag}
(referencing IrUnitType, encoded_tag_t, deserialize_tag,
YSTDLIB_ERROR_HANDLING_TRYX, and cProtocol::Eof /
cProtocol::Payload::UtcOffsetChange to locate the branches).
- Around line 48-74: The build_encoded_text_ast helper is redundant—remove
build_encoded_text_ast and instead call the existing
deserialize_encoded_text_ast (the kv-ir path) directly after you deserialize the
timestamp/timestamp-delta in deserialize_ir_unit_kv_pair_log_event; assemble a
StringBlob in-place (as done in decoding_methods.cpp) and pass it to
deserialize_encoded_text_ast to get an EncodedTextAst, and drop the temporary
dict_vars/encoded_vars/logtype buffers previously created in
deserialize_ir_unit_kv_pair_log_event so the unstructured deserializer reuses
the shared assembly logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 12f62508-3b04-497e-9e49-468aa14fcef3

📥 Commits

Reviewing files that changed from the base of the PR and between 1616cd7 and 1a19382.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/core/tests/test-ir_encoding_methods.cpp (1)

1389-1390: ⚠️ Potential issue | 🟡 Minor

Drop the extra EOF append.

serialize_log_events() already writes cProtocol::Eof, so Line 1390 leaves trailing unread bytes and weakens what this test is actually proving about end-of-stream handling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/tests/test-ir_encoding_methods.cpp` around lines 1389 - 1390,
Remove the redundant append of cProtocol::Eof after calling
serialize_log_events<TestType>(...) because serialize_log_events already writes
the end-of-stream marker; specifically, delete the
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof) call so the test verifies
EOF handling as produced by serialize_log_events rather than leaving trailing
unread bytes.
components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp (1)

117-147: ⚠️ Potential issue | 🟡 Minor

Delay updating m_previous_timestamp until the success path.

Line 127 mutates persistent state before resolve_required_node_ids(...) and KeyValuePairLogEvent::create(...) can still fail. If either returns an error, the deserializer has advanced its running timestamp without having produced a log event.

Suggested fix
     ir::epoch_time_ms_t timestamp_delta{};
     if (auto const err{deserialize_timestamp<ir::four_byte_encoded_variable_t>(
                 reader,
                 timestamp_tag,
                 timestamp_delta
         )};
         IRErrorCode_Success != err)
     {
         return ir_error_code_to_errc(err);
     }
-    m_previous_timestamp += timestamp_delta;
+    auto const timestamp{m_previous_timestamp + timestamp_delta};

     auto const [message_node_id, timestamp_node_id]{YSTDLIB_ERROR_HANDLING_TRYX(
             this->resolve_required_node_ids(auto_gen_keys_schema_tree, user_gen_keys_schema_tree)
     )};

     KeyValuePairLogEvent::NodeIdValuePairs auto_gen_pairs;
     auto_gen_pairs.emplace(
             timestamp_node_id,
-            Value{static_cast<value_int_t>(m_previous_timestamp)}
+            Value{static_cast<value_int_t>(timestamp)}
     );
     KeyValuePairLogEvent::NodeIdValuePairs user_gen_pairs;
     user_gen_pairs.emplace(message_node_id, Value{std::move(encoded_text_ast)});

-    return KeyValuePairLogEvent::create(
+    auto result = KeyValuePairLogEvent::create(
             auto_gen_keys_schema_tree,
             user_gen_keys_schema_tree,
             std::move(auto_gen_pairs),
             std::move(user_gen_pairs),
             utc_offset
     );
+    if (result.has_error()) {
+        return result.error();
+    }
+    m_previous_timestamp = timestamp;
+    return result;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`
around lines 117 - 147, Compute the new timestamp in a local variable instead of
mutating m_previous_timestamp immediately: call deserialize_timestamp to get
timestamp_delta, then compute auto new_timestamp = m_previous_timestamp +
timestamp_delta and use new_timestamp when populating the timestamp Value (e.g.
Value{static_cast<value_int_t>(new_timestamp)}) and when calling
KeyValuePairLogEvent::create; only after resolve_required_node_ids(...) and
KeyValuePairLogEvent::create(...) succeed, assign m_previous_timestamp =
new_timestamp. This ensures deserialize_timestamp, resolve_required_node_ids,
and KeyValuePairLogEvent::create can fail without advancing the persistent
m_previous_timestamp.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/tests/test-ir_encoding_methods.cpp`:
- Around line 1412-1443: The test loop currently validates timestamp and message
but omits verifying UTC offset round-trip; update the assertions in the loop
that iterates deserialized_log_events (the block using deserialized_log_event,
user_gen_schema_tree, auto_gen_node_id_value_pairs, user_gen_node_id_value_pairs
and serialized_json_result) to also assert that
deserialized_log_event.get_utc_offset() equals the original offset from the
matching test_log_events entry (created by create_test_log_events()), ensuring
UtcOffsetChange handling is validated; add a REQUIRE comparing
deserialized_log_event.get_utc_offset() to
test_log_events.at(idx).get_utc_offset() (or equivalent getter) immediately
after the existing timestamp/message checks.

---

Duplicate comments:
In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`:
- Around line 117-147: Compute the new timestamp in a local variable instead of
mutating m_previous_timestamp immediately: call deserialize_timestamp to get
timestamp_delta, then compute auto new_timestamp = m_previous_timestamp +
timestamp_delta and use new_timestamp when populating the timestamp Value (e.g.
Value{static_cast<value_int_t>(new_timestamp)}) and when calling
KeyValuePairLogEvent::create; only after resolve_required_node_ids(...) and
KeyValuePairLogEvent::create(...) succeed, assign m_previous_timestamp =
new_timestamp. This ensures deserialize_timestamp, resolve_required_node_ids,
and KeyValuePairLogEvent::create can fail without advancing the persistent
m_previous_timestamp.

In `@components/core/tests/test-ir_encoding_methods.cpp`:
- Around line 1389-1390: Remove the redundant append of cProtocol::Eof after
calling serialize_log_events<TestType>(...) because serialize_log_events already
writes the end-of-stream marker; specifically, delete the
ir_buf.push_back(clp::ffi::ir_stream::cProtocol::Eof) call so the test verifies
EOF handling as produced by serialize_log_events rather than leaving trailing
unread bytes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99537c36-6661-490a-9c5e-fcc70f334315

📥 Commits

Reviewing files that changed from the base of the PR and between 1a19382 and 0df70b6.

📒 Files selected for processing (5)
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/tests/test-ir_encoding_methods.cpp

Comment on lines +1412 to +1443
for (size_t idx{0}; idx < deserialized_log_events.size(); ++idx) {
auto const& deserialized_log_event{deserialized_log_events.at(idx)};
auto const& user_gen_schema_tree{deserialized_log_event.get_user_gen_keys_schema_tree()};
auto const& auto_gen_node_id_value_pairs{
deserialized_log_event.get_auto_gen_node_id_value_pairs()
};
auto const& user_gen_node_id_value_pairs{
deserialized_log_event.get_user_gen_node_id_value_pairs()
};

REQUIRE((2 == user_gen_schema_tree.get_size()));
REQUIRE((user_gen_schema_tree.has_node(
clp::ffi::SchemaTree::NodeLocator{
clp::ffi::SchemaTree::cRootId,
"message",
clp::ffi::SchemaTree::Node::Type::Str
}
)));
REQUIRE((1 == auto_gen_node_id_value_pairs.size()));
REQUIRE((1 == user_gen_node_id_value_pairs.size()));

auto const serialized_json_result{deserialized_log_event.serialize_to_json()};
REQUIRE_FALSE(serialized_json_result.has_error());
auto const& [auto_gen_json_obj, user_gen_json_obj]{serialized_json_result.value()};
REQUIRE(
(test_log_events.at(idx).get_timestamp()
== auto_gen_json_obj.at("timestamp").template get<epoch_time_ms_t>())
);
REQUIRE(
(test_log_events.at(idx).get_message()
== user_gen_json_obj.at("message").template get<string>())
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the UTC offset round-trip here as well.

create_test_log_events() deliberately changes UTC offsets, but this loop only checks timestamp and message. A regression in UtcOffsetChange handling would still pass this new legacy-IR serde test unless you also verify deserialized_log_event.get_utc_offset().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/tests/test-ir_encoding_methods.cpp` around lines 1412 - 1443,
The test loop currently validates timestamp and message but omits verifying UTC
offset round-trip; update the assertions in the loop that iterates
deserialized_log_events (the block using deserialized_log_event,
user_gen_schema_tree, auto_gen_node_id_value_pairs, user_gen_node_id_value_pairs
and serialized_json_result) to also assert that
deserialized_log_event.get_utc_offset() equals the original offset from the
matching test_log_events entry (created by create_test_log_events()), ensuring
UtcOffsetChange handling is validated; add a REQUIRE comparing
deserialized_log_event.get_utc_offset() to
test_log_events.at(idx).get_utc_offset() (or equivalent getter) immediately
after the existing timestamp/message checks.

Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the details of unstructured deserializer impl.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reusing the existing one, can we make a result-based version for deserialize_timestamp that returns Result<ir::epoch_time_ms_t>?


template <ir::EncodedVariableTypeReq encoded_variable_t>
auto UnstructuredIrDeserializerImpl<encoded_variable_t>::create(
EncodingType encoding_type,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param is redundant since the encoding type is already indicated by the template parameter, right? Let's remove it.

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
}
auto const ref_timestamp_iter{metadata.find(cProtocol::Metadata::ReferenceTimestampKey)};
if (metadata.end() == ref_timestamp_iter || false == ref_timestamp_iter->is_string()) {
return std::errc::protocol_error;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use our own error code, not std::errc, right?

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
auto const [is_auto_gen, node_locator]{m_pending_schema_insertions.back()};
m_pending_schema_insertions.pop_back();

key_name.clear();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this is confusing. The original purpose of passing key_name is to provide a buffer for key name deserialization. In our case, it is not used. I think it might be worth renaming the interface's key_name to key_name_buffer and indicating that it might not be used, depending on the implementation.

SchemaTree::NodeLocator{SchemaTree::cRootId, cTimestampKey, SchemaTree::Node::Type::Int}
)};
if (false == message_node_id.has_value() || false == timestamp_node_id.has_value()) {
return std::errc::protocol_error;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should use our own error code.

Comment on lines +231 to +236
auto const message_node_id{user_gen_keys_schema_tree->try_get_node_id(
SchemaTree::NodeLocator{SchemaTree::cRootId, cMessageKey, SchemaTree::Node::Type::Str}
)};
auto const timestamp_node_id{auto_gen_keys_schema_tree->try_get_node_id(
SchemaTree::NodeLocator{SchemaTree::cRootId, cTimestampKey, SchemaTree::Node::Type::Int}
)};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two locators appear twice in this file, how about drop the constexpr key name and make the locators themselves constexpr?

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.hpp Outdated
jonathan-imanu and others added 3 commits May 2, 2026 22:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp/ffi/ir_stream/IrDeserializationError.hpp`:
- Around line 19-26: The new enum members were inserted mid-list in
IrDeserializationErrorEnum, which changes the numeric values of existing members
(e.g., InvalidTag, UnsupportedVersion) and is a breaking change; restore
stability by either appending any new error members to the end of
IrDeserializationErrorEnum or by assigning explicit numeric values to every enum
member so existing numeric codes (including InvalidReferenceTimestampMetadata,
InvalidReferenceTimestampValue, InvalidTag, MissingRequiredSchemaNodes,
UnknownSchemaTreeNodeType, UnknownValueType, UnsupportedMetadataFormat,
UnsupportedVersion) keep their original values; update the enum declaration in
IrDeserializationErrorEnum accordingly and add a comment explaining that numeric
values are stable and must not be reordered.

In `@components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp`:
- Around line 145-148: The NodeIdValuePairs initialization is inconsistent:
replace the emplace-based builds for KeyValuePairLogEvent::NodeIdValuePairs
(auto_gen_pairs and user_gen_pairs) with the same aggregate initialization
pattern used in the four-byte specialization—construct the containers with
brace-initialized entries (using timestamp_node_id/Value{...} and
message_node_id/Value{...} referencing absolute_timestamp and
std::move(encoded_text_ast)) so both specializations use identical
initialization style.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75db059c-2aac-47d2-aeb2-bbdcf2c52ac3

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd5df9 and 25fad2d.

📒 Files selected for processing (13)
  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/SchemaTree.hpp
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
  • components/core/src/clp/ffi/ir_stream/DeserializerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/IrDeserializationError.cpp
  • components/core/src/clp/ffi/ir_stream/IrDeserializationError.hpp
  • components/core/src/clp/ffi/ir_stream/KvIrDeserializerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/KvIrDeserializerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/CMakeLists.txt

Comment on lines +19 to +26
InvalidReferenceTimestampMetadata,
InvalidReferenceTimestampValue,
InvalidTag,
UnsupportedMetadataFormat,
UnsupportedVersion,
MissingRequiredSchemaNodes,
UnknownSchemaTreeNodeType,
UnknownValueType,
UnsupportedMetadataFormat,
UnsupportedVersion,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep existing IrDeserializationErrorEnum values stable.

Lines 19-26 insert new members in the middle of a public uint8_t enum, which renumbers existing codes like InvalidTag and UnsupportedVersion. That is a breaking change for any caller that persists or branches on the numeric value.

Proposed fix
 enum class IrDeserializationErrorEnum : uint8_t {
     DuplicateKey = 1,
-    EncodedTextAstDeserializationFailure,
-    EndOfStream,
-    IncompleteStream,
-    InvalidKeyGroupOrdering,
-    InvalidMagicNumber,
-    InvalidReferenceTimestampMetadata,
-    InvalidReferenceTimestampValue,
-    InvalidTag,
-    MissingRequiredSchemaNodes,
-    UnknownSchemaTreeNodeType,
-    UnknownValueType,
-    UnsupportedMetadataFormat,
-    UnsupportedVersion,
+    EncodedTextAstDeserializationFailure = 2,
+    EndOfStream = 3,
+    IncompleteStream = 4,
+    InvalidKeyGroupOrdering = 5,
+    InvalidMagicNumber = 6,
+    InvalidTag = 7,
+    UnknownSchemaTreeNodeType = 8,
+    UnknownValueType = 9,
+    UnsupportedMetadataFormat = 10,
+    UnsupportedVersion = 11,
+    InvalidReferenceTimestampMetadata = 12,
+    InvalidReferenceTimestampValue = 13,
+    MissingRequiredSchemaNodes = 14,
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/core/src/clp/ffi/ir_stream/IrDeserializationError.hpp` around
lines 19 - 26, The new enum members were inserted mid-list in
IrDeserializationErrorEnum, which changes the numeric values of existing members
(e.g., InvalidTag, UnsupportedVersion) and is a breaking change; restore
stability by either appending any new error members to the end of
IrDeserializationErrorEnum or by assigning explicit numeric values to every enum
member so existing numeric codes (including InvalidReferenceTimestampMetadata,
InvalidReferenceTimestampValue, InvalidTag, MissingRequiredSchemaNodes,
UnknownSchemaTreeNodeType, UnknownValueType, UnsupportedMetadataFormat,
UnsupportedVersion) keep their original values; update the enum declaration in
IrDeserializationErrorEnum accordingly and add a comment explaining that numeric
values are stable and must not be reordered.

Comment thread components/core/src/clp/ffi/ir_stream/UnstructuredIrDeserializerImpl.cpp Outdated
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality wise looks good to me. Validated that:

  • Unstructured IR logs can be ingested into clp-s using timestamp-key="@timestamp".
  • Unstructured IR logs can be searched by clp-s cli.

Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.hpp Outdated
* deserialize.
*/
template <ir::EncodedVariableTypeReq encoded_variable_t>
[[nodiscard]] auto deserialize_timestamp(ReaderInterface& reader, encoded_tag_t encoded_tag)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deserialize_timestamp_or_timestamp_delta?

Comment on lines +634 to +643
epoch_time_ms_t ts{};
if (auto const error_code = deserialize_timestamp<encoded_variable_t>(reader, encoded_tag, ts);
IRErrorCode_Success != error_code)
{
if (IRErrorCode_Corrupted_IR == error_code) {
return IrDeserializationError{IrDeserializationErrorEnum::InvalidTag};
}
return IrDeserializationError{IrDeserializationErrorEnum::IncompleteStream};
}
return ts;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
epoch_time_ms_t ts{};
if (auto const error_code = deserialize_timestamp<encoded_variable_t>(reader, encoded_tag, ts);
IRErrorCode_Success != error_code)
{
if (IRErrorCode_Corrupted_IR == error_code) {
return IrDeserializationError{IrDeserializationErrorEnum::InvalidTag};
}
return IrDeserializationError{IrDeserializationErrorEnum::IncompleteStream};
}
return ts;
if constexpr (is_same_v<encoded_variable_t, eight_byte_encoded_variable_t>) {
if (cProtocol::Payload::TimestampVal != encoded_tag) {
return IrDeserializationError{IrDeserializationErrorEnum::InvalidTag};
}
return deserialize_int<epoch_time_ms_t>(reader);
} else {
if (cProtocol::Payload::TimestampDeltaByte == encoded_tag) {
return static_cast<epoch_time_ms_t>(
YSTDLIB_ERROR_HANDLING_TRYX(deserialize_int<int8_t>(reader))
);
}
if (cProtocol::Payload::TimestampDeltaShort == encoded_tag) {
return static_cast<epoch_time_ms_t>(
YSTDLIB_ERROR_HANDLING_TRYX(deserialize_int<int16_t>(reader))
);
}
if (cProtocol::Payload::TimestampDeltaInt == encoded_tag) {
return static_cast<epoch_time_ms_t>(
YSTDLIB_ERROR_HANDLING_TRYX(deserialize_int<int32_t>(reader))
);
}
if (cProtocol::Payload::TimestampDeltaLong == encoded_tag) {
return static_cast<epoch_time_ms_t>(
YSTDLIB_ERROR_HANDLING_TRYX(deserialize_int<int64_t>(reader))
);
}
return IrDeserializationError{IrDeserializationErrorEnum::InvalidTag};
}

I think we should just rewrite it using the latest helpers. We should do a clean-up to remove all old/outdated helpers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the docstring since we're now forwarding the error code from deserialize_int instead.

Comment on lines +19 to +20
* Virtual base class defining methods for deserializing IR units and log events
* called by `Deserializer`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be reverted (might be introduced by IDE auto-formatter).
Please check the diff everytime you make a commit to make sure all changes are intentional.

Comment on lines +41 to +42
* @return A result containing the deserialized UTC offset on success, or an
* error code indicating the failure:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert.

Comment thread components/core/src/clp/ffi/ir_stream/Deserializer.hpp Outdated
Comment on lines +272 to +276
if (metadata_json.contains(cProtocol::Metadata::UserDefinedMetadataKey)
&& false == metadata_json.at(cProtocol::Metadata::UserDefinedMetadataKey).is_object())
{
return std::errc::protocol_not_supported;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the metadata check. It should be supported for both kv-ir and the old IR format. Let's move it outside of this if-else cluster.

Comment thread components/core/src/clp/ffi/ir_stream/Deserializer.hpp Outdated

case IrUnitType::SchemaTreeNodeInsertion: {
std::string key_name;
std::string key_name_buffer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this buffer should be made as a member variable of Deserializer to avoid memory reallocation. Can you create an issue to track it? @coderabbitai

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@LinZhihao-723, I've created a GitHub issue to track moving key_name_buffer to a member variable of Deserializer. The link will be provided once the script completes.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo y-scope/clp \
  --title "refactor(kv-ir): Make \`key_name_buffer\` a member variable of \`Deserializer\` to avoid repeated memory reallocation" \
  --body "## Summary

In \`components/core/src/clp/ffi/ir_stream/Deserializer.hpp\`, the \`key_name_buffer\` variable inside \`Deserializer::deserialize_next_ir_unit\` is currently declared as a local \`std::string\` within the \`IrUnitType::SchemaTreeNodeInsertion\` case block:

\`\`\`cpp
case IrUnitType::SchemaTreeNodeInsertion: {
    std::string key_name_buffer;
    ...
}
\`\`\`

This causes a heap allocation on every call to \`deserialize_next_ir_unit\` that encounters a schema tree node insertion IR unit.

## Proposed Change

Move \`key_name_buffer\` to be a member variable of \`Deserializer\`, and clear it at the start of each use. This allows the allocated memory to be reused across calls, avoiding repeated heap allocations.

## References

- PR: https://github.com/y-scope/clp/pull/2227
- Comment: https://github.com/y-scope/clp/pull/2227#discussion_r3191785760
- Requested by: `@LinZhihao-723`
"

Length of output: 43


@LinZhihao-723, I've created the issue to track this: #2249

The issue captures the proposal to move key_name_buffer from a local variable inside the IrUnitType::SchemaTreeNodeInsertion case block to a member variable of Deserializer, so that the allocated memory can be reused across calls instead of being reallocated on every schema tree node insertion encountered during deserialization.


🧠 Learnings used
Learnt from: SharafMohamed
Repo: y-scope/clp PR: 2080
File: components/core/src/clp/ProfilerReporter.hpp:34-53
Timestamp: 2026-03-07T10:37:02.035Z
Learning: In components/core/src/clp/ProfilerReporter.hpp, do not suggest or implement resetting Profiler::reset_runtime_measurements() in ProfilerReporter constructors. The Reporter is intended to accumulate and report all runtime measurements up to its destruction, enabling mid-run reporting with a single reporter created at program start. When reviewing similar code, prefer preserving accumulated profiler state and only report/clear at explicit points if the design requires it; avoid constructor-time resets in reporter components.

} else {
if (EncodingType::FourByte == encoding_type) {
deserializer_impl = YSTDLIB_ERROR_HANDLING_TRYX(
UnstructuredIrDeserializerImpl<ir::four_byte_encoded_variable_t>::create(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the docstring to reflect that it forwards UnstructuredIrDeserializerImpl::create's return values on failure.

jonathan-imanu and others added 2 commits May 7, 2026 16:18
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants