Draft: nrf_edgeai_obsv — input-feature stream + mel descriptor metrics#113
Draft: nrf_edgeai_obsv — input-feature stream + mel descriptor metrics#113rrusak wants to merge 11 commits into
Conversation
…e a source; the core routes updates by it. Features go via nrf_edgeai_obsv_update_features() (SOURCE_FEATURES), probabilities via update() (SOURCE_PROBS). Add mel_energy_desc: mean/max/dynamic-range /floor histograms over the mel vector, scaled by a configured [p01,p99].
…nning and update related configurations
|
You can find the documentation preview for this PR here. |
… type with wide type in loop condition' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
… type with wide type in loop condition' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
| @@ -100,6 +100,20 @@ int nrf_edgeai_obsv_reset(nrf_edgeai_obsv_ctx_t *ctx); | |||
| */ | |||
| int nrf_edgeai_obsv_update(nrf_edgeai_obsv_ctx_t *ctx, const float *probs); | |||
There was a problem hiding this comment.
I'd rename this function - e.g. nrf_edgeai_obsv_update_probs - while it's still not "production ready" and API stability is not crucial at this point. IMO it's not intuitive why regular "update" function updates probabilities, not features.
| enum nrf_edgeai_obsv_metric_id { | ||
| NRF_EDGEAI_OBSV_METRIC_ID_TRANSITION_MATRIX = 2, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_PROBS_DISTRIBUTION = 3, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_PREDICTION_SWITCHING_RATE = 4, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_PROBS_ENTROPY_DIST = 5, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_PROBS_TOP2_MARGIN_DIST = 6, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_MEL_ENERGY_DESC = 7, | ||
| NRF_EDGEAI_OBSV_METRIC_ID_MEL_SPECTRAL_DESC = 8, | ||
| }; |
There was a problem hiding this comment.
Did you think about making some separation between probs-based and feature-based metrics? Like feature-based starting from e.g. 1000? Just a thought for your consideration.
There was a problem hiding this comment.
I don't have a strong opinion on this matter, but since you will be the main maintainer of this library, I would rely more on your opinion :) .
There was a problem hiding this comment.
On second thoughts - I think this may be not the best idea ;) Starting from e.g. 1000 will consume 2 more bytes in CBOR. Also, we actually do not know how many metrics we'll end up with. And if end up with more than 2 types of streams (shot in the dark).
I think we can keep it as it is.
| uint32_t *counts = msd_counts(hdr); | ||
| const uint8_t bins = hdr->bin_num; | ||
|
|
||
| assert(n <= hdr->num_features); |
There was a problem hiding this comment.
MSD has only an assert (which will be compiled out in release builds) while MED limits n to num_features. I think behavior could be aligned across all the metrics if no clear reason for that.
There was a problem hiding this comment.
MED limit features to n = CONFIG_NRF_EDGEAI_OBSV_MEL_ENERGY_DESC_MAX_FEATURES because it's create buffer on stack "float scratch[CONFIG_NRF_EDGEAI_OBSV_MEL_ENERGY_DESC_MAX_FEATURES];" so that it doesn't happen that we go beyond the limits of this buffer, and for MSD there is no such issue. In Kconfig there is CONFIG_NRF_EDGEAI_OBSV_MEL_ENERGY_DESC_MAX_FEATURES but no similar thing for MSD, for MSD only number of bins in distribution.
There was a problem hiding this comment.
I see, then I'm fine with it
There was a problem hiding this comment.
Every other nrf_obsv_metric_* file in this folder is an actual metric (has a *_create(), ops table, and metric_id), but this one is just shared binning helpers. The metric_ prefix reads like "this is a metric," and dist is easy to confuse with the real nrf_obsv_metric_probs_distribution.c.
Could we rename it to something like nrf_obsv_dist_binning (keeps the tie to the DIST_BINNING Kconfig) or nrf_obsv_binning? Location in metrics/ is fine since it's a private helper for the metrics.
| if ((p_ctx == NULL) || (p_feats == NULL)) { | ||
| return -EINVAL; | ||
| } | ||
|
|
||
| dispatch(p_ctx, NRF_EDGEAI_OBSV_SOURCE_FEATURES, p_feats, n); | ||
|
|
||
| return 0; |
There was a problem hiding this comment.
Will we need to keep number of feature extractions too? (like p_ctx->num_inferences)
There was a problem hiding this comment.
I believe we need this info in nrf_edgeai_obsv_model_info_t , for consistency and similarity with num_classes, I will change CDDL schema, what do you think?
| if (vn < 0.0f) { | ||
| vn = 0.0f; | ||
| } else if (vn > 1.0f) { | ||
| vn = 1.0f; | ||
| } |
There was a problem hiding this comment.
Is there a reason for not using clip01() like in MSD?
| if (h_norm < 0.0f) { | ||
| h_norm = 0.0f; | ||
| } else if (h_norm > 1.0f) { | ||
| h_norm = 1.0f; | ||
| } |
There was a problem hiding this comment.
Is there a reason for not using clip01() like in MSD?
| * mel descriptor metrics. The core dispatches an update to this metric only | ||
| * when the fed stream matches this value. | ||
| */ | ||
| uint8_t source; |
There was a problem hiding this comment.
The field holds an enum value but is stored as a raw uint8_t, and dispatch() compares it as a plain uint8_t. Not a big deal, but so we lose the enum's "type safety", and there's no comment explaining the narrowing. Could we either declare it enum nrf_edgeai_obsv_source source or state that narrowing to uint8_t is intentional? (e.g. for struct packing - but here I think it does not help anyway)
| for (uint8_t b = 0; b < bin_num - 1; b++) { | ||
| if (val < edges[b]) { | ||
| return b; | ||
| } | ||
| } |
There was a problem hiding this comment.
Just a nit - b is used for indexing, while _dist_uniform_edges uses i for the same thing. I know it was already like this, it just caught my eye now when moved to new file :)
|
|
||
| (void)p_cfg; | ||
|
|
||
| memset(counts, 0, sizeof(uint32_t) * NRF_EDGEAI_OBSV_MED_NUM_ROWS * hdr->bin_num); |
There was a problem hiding this comment.
Metrics in their init() functions could call clear() functions to de-duplicate "clearing" code. I think it applies to all the metrics currently implemented.
…e_probs and update references, rename nrf_obsv_metric_dist to nrf_obsv_dist_binning
…some code redundancy
| * @param priv Opaque per-instance storage pointer. | ||
| */ | ||
| void (*update)(const float *p_probs, uint16_t n, void *priv); | ||
| void (*update)(const float *p_data, uint16_t n, void *priv); |
There was a problem hiding this comment.
I think we should replace single update interface with update_probs and update_features, then remove the source field. This lib was not released yet and it's experimental so no restrictions on breaking API
| #. Register the metrics with the context using the :c:func:`nrf_edgeai_obsv_register` function. | ||
| #. Bind the Memfault transport once at application startup using the :c:func:`nrf_edgeai_obsv_memfault_init` function. | ||
| #. Call the :c:func:`nrf_edgeai_obsv_update` function after every inference. | ||
| #. Call the :c:func:`nrf_edgeai_obsv_update_probs` function with the output probability vector after every inference. |
There was a problem hiding this comment.
As I understood this is mandatory event if using only feature-source metrics. If that's true emphasize it
This PR extends the observability library from a single output (class-probability) stream to a two-stream design, so metrics can also be computed over the extracted model input features. Each metric declares a source (PROBS or FEATURES) in its descriptor, and the core routes an update only to metrics matching the fed stream (nrf_edgeai_obsv_update vs nrf_edgeai_obsv_update_features), keeping the two cadences independent. On top of this it adds two feature metrics for ww/kws mel spectrograms: mel_energy_desc (mean/max energy, dynamic range, floor ratio, normalized against configured [p01, p99] percentiles) and mel_spectral_desc (eight scale-invariant spectral-shape rows: band ratios, centroid, spread, entropy, flatness, contrast). All metrics remain self-contained descriptors with caller-provided storage and emit a row-major uint32 histogram snapshot for the existing CDR transport. Distribution-style metrics that live in a uniform [0, 1] range no longer store bin edges and bin values in O(1) via a shared helper, while the probability-distribution metric keeps custom edges. The change also hardens the hot path: a runtime cap prevents scratch[] overflow in release builds where assert() is compiled out, and the spectral metric clamps negative inputs to keep logf/sqrtf well-defined.