Skip to content

[WIP] Worker visibility design#938

Closed
yuandrew wants to merge 6 commits intotemporalio:masterfrom
yuandrew:worker-visibility-design
Closed

[WIP] Worker visibility design#938
yuandrew wants to merge 6 commits intotemporalio:masterfrom
yuandrew:worker-visibility-design

Conversation

@yuandrew
Copy link
Copy Markdown
Contributor

What was changed

Why?

Checklist

  1. Closes

  2. How was this tested:

  1. Any docs updates needed?

Comment thread core/src/worker/client.rs
#[derive(Debug)]
pub(crate) struct WorkerHeartbeatInfo {
in_mem_thing: Option<InMemoryThing>,
pub(crate) data: WorkerHeartbeatData,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like this needs to be here - you can just call capture_heartbeat_data right before sending it out

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the whole struct? Isn't there state we want to keep in between heartbeats? like elapsed_since_last_heartbeat

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid GH highlight - no just the data. But, yes, good point, we do need ways to make some delta. Maybe just calling this last_data could clear things up

Comment thread core/src/telemetry/in_memory.rs Outdated
Comment on lines +15 to +16
Otel(Arc<dyn CoreMeter>),
Prometheus(Arc<dyn CoreMeter>),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for these variants if they're both just the same underlying type and the behavior is undifferentiated.

But, I'm realizing I may have been thinking about this a bit stupidly to start with anyway. I'm not sure we need to really be this clever. Sorry for leading you down that path.

Clearly we still need the "in memory thing" - but I don't think we need CompositeMeter. This comes from two observations:

  1. There aren't actually all that many metrics that we need to send that we are already recording
  2. The new ones we do add, we don't want to be sent out along with all the other metrics (or, if we do, we want to control that). For example, the last_interval_processed_tasks is something that would never make sense to export along with the normal metrics.

So, think we can simplify things quite a bit here. For the metrics which are already recorded, and also need to be used for heartbeats, I propose we simply make wrappers like:

struct MetricAndHeartbeatCounter {
    metric: Arc<dyn Counter>
    for_heartbeat: HeartbeatCounter
}

And use that from Metrics Context. Here HeartbeatCounter just directly wraps the underlying otel Counter<u64> which is provided by the in-memory meter provider. (Side note: I need to change the CoreMeter to return Boxes instead of Arcs, since they really aren't needed most of the time, looks like, will do that in my PR).

This all means we double-record for the handful of things that both get exported as metrics and sent with the heartbeat, but... not such a bit deal, esp if that's all hidden inside the MetricsContext.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of registering an in_mem_reader to the existing MeterProviderBuilder, you're suggesting we create a separate in-mem meter provider that we double record with for only the metrics we need, and plumb that through to MetricsContext?

Copy link
Copy Markdown
Member

@Sushisource Sushisource Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precisely. Feel free to DM me and we can zoom etc if you have questions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new update with CompositeMeter removed. Plumbed MetricAndHeartbeatCounter through to total_sticky_cache_hit and can currently see its value as 1, so plumbing should be in a working state.

Comment thread core/src/worker/mod.rs Outdated
Comment on lines +320 to +322
// TODO: Use existing MetricsContext or a new meter to record and export these metrics, possibly through the same MetricsCallBuffer
// ANDREW: metrics is temporal metrics, meter is user metrics
let (metrics, meter, in_mem_thing) = if let Some(ti) = telem_instance {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely want to use the existing Context - we don't want to have to pass around more stuff.

As for the TemporalMeter - yeah, seemingly that needs to bypass all of this, since that can get exposed to users via the custom slot suppliers and the lang-side metrics buffer etc.

MetricsContext, however, is not pub, and can hide the in-mem thing internally, so, stuffing it in there seems appropriate.

Comment thread tests/integ_tests/metrics_tests.rs Outdated
Comment on lines +557 to +562
let composite_meter = build_otlp_metric_exporter(opts).unwrap();

let in_memory_thing = composite_meter.in_mem_exporter().clone();

rt.telemetry_mut()
.attach_late_init_metrics(Arc::new(composite_meter), Some(in_memory_thing.clone()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why build the exporter which returns a composite meter, and then pull the in memory thing out of it just to then pass it back in?

composite_meter already has everything we need, so there shouldn't be any reason this is necessary (other than the type erasure where attach_late_init_metrics only takes a CoreMeter which I suspect is why you ended up here)

IMO lang doesn't need to know about any of this at all - even that a CompositeMeter is a thing that exists. If we end up needing to have lang record some of these metrics on its side we can consider what to do then - maybe just provide a new method on Worker to get something specifically for recording these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new CoreMeterWithMem wrapper trait that lets me pull out the InMemoryMeter, lmk your thoughts. InMemoryMeter is currently getting plumbed through TelemetryInstance to be passed to the instrument wrapper, MetricAndHeartbeatCounter

Comment thread core/src/worker/client.rs Outdated
pub versioning_behavior: VersioningBehavior,
}

struct WorkerPollerInfo {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need these - can just use the proto types directly

@yuandrew
Copy link
Copy Markdown
Contributor Author

Going a different direction, closing this PR for now

@yuandrew yuandrew closed this Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants