consider redesigning resource states to use `tracing` more idiomatically

Currently, state attributes for resources and async ops are represented by `tracing` events with a particular structure, within the resource's span. This is somewhat complicated, both in the instrumentation and in `console-subscriber`. These events consist of a field with the attribute's name and a value, and two additional optional fields for numeric values: a field `<attribute_name>.op`, which can be "add", "sub", or "override", and a field `<attribute_name>.unit`, which is used to optionally add a unit such as `ms`. The `op` field is used to record numeric values as a delta rather than as an absolute value.

Meanwhile, `tracing` already has a mechanism for attaching field values to spans, and recording new values for those fields, via the [`Span::record`](https://docs.rs/tracing/latest/tracing/span/struct.Span.html#method.record) method. I think we should change state attributes for resources/async ops to use this instead.

## Motivation

Here are some of the issues and questions I've noticed about the current approach.

 - It's fairly complicated
 - When spans for resources and async ops are created, there's no way to know all of the state attributes that will exist for that particular resource. 
 
   This means that as individual `state_update` events occur, the attributes are added to a hashmap one-by-one as the console layer becomes aware of those attributes. This isn't very efficient, as we might reallocate the hashmap multiple times. Also, if these maps are moved out of the aggregator task and updated in the `ConsoleLayer`, as in #238, they would have to be mutated to insert new attribute names, which would require adding a lock, even if the values themselves can all be updated atomically (since they're all integers or booleans). 

   If all these fields were declared up front on the span, however, we would know about all the fields that exist when the resource/async op is created, even if their values aren't populated yet. That way, we could create a single hashmap (or array) with all the attributes already present, and insert it into the span's extensions. This would only need a single allocation. If the values of those attributes are updated atomically, we wouldn't need any kind of locking to update or read from them.
 - I don't understand the advantage of recording changes in numeric values as deltas rather than as absolute values (the `.op` field and "add"/"sub"). All the values will eventually be reported to the `console` as absolute values in the wire format anyway. If we want to use these to display the "direction" in which a value has changed since the last update, we would have to implement that logic in the UI anyway, because the delta value is lost when sending the message on the wire.

   What do we gain from recording these values as deltas? As far as I can tell, it just makes the implementation much more complex.
 - I'm not sure if recording units as a structured field really buys us anything.
 
   In fact, taking the units at _each_ state update just complicates the recording logic, because it allows the unit to differ each time the state is updated. For example,  if the previous state update event for a field was 
   ```rust
   tracing::trace!(target: "runtime::resource::state_update", my_duration = 100, my_duration.op = "add", my_duration.unit = "ms"); 
   ```
   and then another one is recorded like this:
   ```rust
   tracing::trace!(target: "runtime::resource::state_update", my_duration = 360, my_duration.op = "add", my_duration.unit = "us"); 
   ```
   the console-subscriber recording code would have to implement the unit conversion to record the new value correctly. This seems unnecessarily complex to me.

   As far as I can tell, the units field is just to inform the UI about how to display the value. I think that we should change this so that the units are defined by the field _name_ as whatever the lowest-granularity unit that we'll record the resource's state in. That way, `console-subscriber` doesn't need to be aware of units at all, and the `console` UI can detect base units from the name of the field, and use that to display the value however it wants to.

## Proposal

Here's the change I think we should make. I think we should remove the `runtime::resource::state_update` and `runtime::async_op::state_update` events, and replace them with fields declared up front on the resource/async op's span. We should remove the `op` fields, and always record absolute values. Finally, rather than recording units as an additional field, we should just add a base unit to the end of the field name when appropriate, and allow the console UI to handle formatting based on units.

As an example of how we'd change this, consider `tokio::time::Sleep`. Currently, the instrumentation looks like this:

```rust
impl Sleep {
    pub(crate) fn new_timeout(
        deadline: Instant,
        location: Option<&'static Location<'static>>,
    ) -> Sleep {
       // ...

        #[cfg(all(tokio_unstable, feature = "tracing"))]
        let inner = {
            // ...
            let resource_span = tracing::trace_span!(
                "runtime.resource",
                concrete_type = "Sleep",
                kind = "timer",
                loc.file = location.file(),
                loc.line = location.line(),
                loc.col = location.column(),
            );

            let async_op_span = resource_span.in_scope(|| {
                tracing::trace!(
                    target: "runtime::resource::state_update",
                    duration = duration,
                    duration.unit = "ms",
                    duration.op = "override",
                );
                // ...
            });
         // ...
    }


    fn reset_inner(self: Pin<&mut Self>, deadline: Instant) {
       // ...

        #[cfg(all(tokio_unstable, feature = "tracing"))]
        {
            let _resource_enter = me.inner.ctx.resource_span.enter();
            // ...
            let duration =  /* ... */;

            tracing::trace!(
                target: "runtime::resource::state_update",
                duration = duration,
                duration.unit = "ms",
                duration.op = "override",
            );
        }
    }

    // ... and so on ...
}
```

After my proposed change, it would look like this instead, which I think is much simpler:

```rust
impl Sleep {
    pub(crate) fn new_timeout(
        deadline: Instant,
        location: Option<&'static Location<'static>>,
    ) -> Sleep {
       // ...

        #[cfg(all(tokio_unstable, feature = "tracing"))]
        let inner = {
            // ...
            let resource_span = tracing::trace_span!(
                "runtime.resource",
                concrete_type = "Sleep",
                kind = "timer",
                loc.file = location.file(),
                loc.line = location.line(),
                loc.col = location.column(),
                duration_ms = duration,
            );

            let async_op_span = resource_span.in_scope(|| {
                // ... we would still create the async op span, and it would now also 
                // declare its fields up front on the span ...
            });
         // ...
    }


    fn reset_inner(self: Pin<&mut Self>, deadline: Instant) {
       // ...

        #[cfg(all(tokio_unstable, feature = "tracing"))]
        {
            let _resource_enter = me.inner.ctx.resource_span.enter();
            // ...
            let duration =  /* ... */;
            // call `Span::record` with the new value.
            me.inner.ctx.resource_span.record("duration_ms", &duration);
        }
    }

    // ... and so on ...
}
```

## Potential Issues

We would _probably_ still want to handle the current `state_update` events as well, so that resources/async ops from `tokio` 1.15.x will still work with the console. This might limit some of the optimizations we can do if we use the new system instead, but I think we can at least make it so that most of those costs are only paid when using the older versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

consider redesigning resource states to use `tracing` more idiomatically #246

Motivation

Proposal

Potential Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

consider redesigning resource states to use tracing more idiomatically #246

Description

Motivation

Proposal

Potential Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

consider redesigning resource states to use `tracing` more idiomatically #246