[WIP] Ack/Nack for routing in transform_processor #1798

albertlockett · 2026-01-15T21:27:23Z

Opening this as draft for now. There are some addition test cases and cleanup I want to do, but I wanted to first ensure my handling of the Pdata Contexts was correct.

Closes: #1784

Now that we have route_to in OPL, in combination with if/else, this can create a scenario where we split the batch.

logs |
if (severity_text == "ERROR") {
  route_to "out_port1"
}
// implicit collect everything that didn't go in "if" branch

A pipeline like this would emit two batches:

"ERROR" logs on the processor's "out_port1"
all other logs on the default out port in two

If the batch had subscribers, when we process a pdata we must keep the context for the inbound batch, and create new contexts for the outbound batches. When all the outbound batches Ack/Nack'd, we must then Ack/Nack the inbound context.

This PR adds a Contexts type for juggling the inbound/outbound contexts and updates the transform processor to manage contexts + Ack/NAck correctly.

codecov · 2026-01-15T21:30:06Z

Codecov Report

❌ Patch coverage is 90.46322% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.56%. Comparing base (4b64646) to head (0bfc412).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1798      +/-   ##
==========================================
- Coverage   84.22%   83.56%   -0.67%     
==========================================
  Files         486      506      +20     
  Lines      140885   147598    +6713     
==========================================
+ Hits       118657   123336    +4679     
- Misses      21694    23728    +2034     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`84.36% <90.46%> (-1.10%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.52% <ø> (-0.01%)`	⬇️
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

albertlockett · 2026-01-15T21:37:30Z

@jmacd could I ask you for a review on this :)? I was using the batch_processor Ack/Nack implementation as inspiration.

One thing in particular I'd like to ensure is correct is the behaviour of juggling the contexts.

Basically when we find we've split the batch

For the inbound batch, put it in the inbounds slot map in Contexts which contains number of outbound batches + the original inbound context
For each outbound batch, put it in the outbounds slot map in Contexts with a value pointing back at the inbounds key.
Subscribe the outbound context using call data derived from the the outbounds slotmap key

Then when we receive an Ack/Nack:

Lookup the inbound key from outbounds slot map using the key from the calldata. Clear the outbounds slot map
decrement the count of outbounds in the inbounds slot map. If the count is zero, then we Ack/Nack the original inbound context.

I'm also curious if, when testing, is this the correct way to setup test Ack messages using the outbound contexts to simulate a downstream component having Ack/Nacked the batch:

otel-arrow/rust/otap-dataflow/crates/otap/src/transform_processor.rs

Lines 996 to 1017 in e5a0975

    
           // now we'll Ack the outbound messages and ensure that we eventually emit an ack 
        
           // for the inbound message 
        
           let call_data = outbound_context1.current_calldata().unwrap(); 
        
           let mut ack1 = AckMsg::new(OtapPdata::new( 
        
               outbound_context1, 
        
               OtapPayload::empty(SignalType::Logs), 
        
           )); 
        
           ack1.calldata = call_data; 
        
           let call_data = outbound_context2.current_calldata().unwrap(); 
        
           let mut ack2 = AckMsg::new(OtapPdata::new( 
        
               outbound_context2, 
        
               OtapPayload::empty(SignalType::Logs), 
        
           )); 
        
           ack2.calldata = call_data; 
        
           let call_data = outbound_context3.current_calldata().unwrap(); 
        
           let mut ack3 = AckMsg::new(OtapPdata::new( 
        
               outbound_context3, 
        
               OtapPayload::empty(SignalType::Logs), 
        
           )); 
        
           ack3.calldata = call_data;

I realize I'm asking you to reverse engineer a lot of code here, so happy to walk through on teams if it's easier :)

jmacd

Looks good. Looks like maybe the new code could be applied to the batch_processor in a future PR, maybe.

jmacd · 2026-01-16T19:37:02Z

@albertlockett I think you want to use a call to Context::next_ack, for the test section you quoted. This does what the effect handler would have done when the recipient responded with an Ack. See how batch_processor.rs tests use Context::next_ack, basically.

albertlockett · 2026-01-17T03:25:46Z

@albertlockett I think you want to use a call to Context::next_ack, for the test section you quoted. This does what the effect handler would have done when the recipient responded with an Ack. See how batch_processor.rs tests use Context::next_ack, basically.

thanks @jmacd, that worked! made this change in c031e62

albertlockett · 2026-01-17T03:30:44Z

Looks good. Looks like maybe the new code could be applied to the batch_processor in a future PR, maybe.

Yeah I think we could reuse the Contexts with a little bit of refactoring. The gap with the current implementation is that it doesn't expect an outbound batch to be associated with more than one inbound batch (because currently we only split, we don't combine batches). I imagine eventually this change will need to be made b/c we'd want OPL to support batching, so once that is in place we could reuse this in batch_processor.

lalitb · 2026-01-17T04:19:17Z

rust/otap-dataflow/crates/otap/src/transform_processor/context.rs

+    }
+
+    pub fn set_failed(&mut self, outbound_key: Key, error_reason: String) {
+        if let Some(inbound) = self.inbound.get_mut(outbound_key) {


Should we first lookup the outbound to get the inbound_key, as done in clear_outbound (line 115) ?

lalitb · 2026-01-17T04:22:47Z

rust/otap-dataflow/crates/otap/src/transform_processor/context.rs

+    /// Ack/NAck'd
+    pub fn clear_outbound(&mut self, outbound_key: Key) -> Option<(Context, Option<String>)> {
+        let inbound_key = {
+            let outbound = self.outbound.get(outbound_key)?;


Maybe I am missing something, but seems we get the slot, but never remove it from self.outbound ?

lalitb · 2026-01-17T04:45:02Z

rust/otap-dataflow/crates/otap/src/transform_processor/context.rs

+            // insert outbound
+            let outbound = Outbound { inbound_key };
+            self.outbound
+                .allocate(|| (outbound, ()))


If allocate fails, we return early but already incremented num_outbound at line 88 - this would leave the inbound context stuck.

albertlockett added 7 commits January 15, 2026 11:52

implemented code that has no tests

6f853d2

added tests for context

8f594b2

added some processor tests

f5734a0

comments in contexts

25df7b8

comments

ef1ff11

add TODO for optimization

af42a99

remove dead code

e5a0975

albertlockett requested a review from a team as a code owner January 15, 2026 21:27

github-project-automation bot added this to OTel-Arrow Jan 15, 2026

github-actions bot added the rust Pull requests that update Rust code label Jan 15, 2026

jmacd reviewed Jan 16, 2026

View reviewed changes

albertlockett added 2 commits January 16, 2026 22:22

fix suggestion for handling Ack in test

c031e62

fix clippies

0bfc412

lalitb reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Ack/Nack for routing in transform_processor #1798

[WIP] Ack/Nack for routing in transform_processor #1798

Uh oh!

albertlockett commented Jan 15, 2026

Uh oh!

codecov bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

albertlockett commented Jan 15, 2026

Uh oh!

jmacd left a comment

Uh oh!

jmacd commented Jan 16, 2026

Uh oh!

albertlockett commented Jan 17, 2026

Uh oh!

albertlockett commented Jan 17, 2026

Uh oh!

lalitb Jan 17, 2026

Uh oh!

lalitb Jan 17, 2026 •

edited

Loading

Uh oh!

lalitb Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Ack/Nack for routing in transform_processor #1798

Are you sure you want to change the base?

[WIP] Ack/Nack for routing in transform_processor #1798

Uh oh!

Conversation

albertlockett commented Jan 15, 2026

Uh oh!

codecov bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

albertlockett commented Jan 15, 2026

Uh oh!

jmacd left a comment

Choose a reason for hiding this comment

Uh oh!

jmacd commented Jan 16, 2026

Uh oh!

albertlockett commented Jan 17, 2026

Uh oh!

albertlockett commented Jan 17, 2026

Uh oh!

lalitb Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lalitb Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 15, 2026 •

edited

Loading

lalitb Jan 17, 2026 •

edited

Loading