Skip to content

Put metadata for each column under the stats for that column #122

@bjdebus

Description

@bjdebus

Hi @SirMore ,

Thanks so much for adding the metadata on operations to the output of the quends results. As I parse the resulting output dictionaries, (e.g. the compute_statistics() output), I see that you provide a key for each column processed with the stats results. And then there is a separate "metadata" key that has the metadata for all columns.

I was wondering if it would make more sense to add a metadata key under the results of the stats for each column, rather than have the metadata be a separate key in the results. For example:

{'Q_D/Q_GBD': {'confidence_interval': (34.88, 40.4),
             'effective_sample_size': 24,
             'mean': 37.64,
             'mean_uncertainty': 1.41,
             'pm_std': (36.23, 39.05),
             'window_size': 49,
             'sss_start': 162.5,
             'metadata': [{'operation': 'is_stationary'},
                {'operation': 'trim',
                 'options': {'batch_size': 50,
                       'method': 'rolling_variance',
                       'robust': True,
                       'start_time': 0,
                       'threshold': 0.5}},
                {'operation': 'effective_sample_size',
                 'options': {'alpha': 0.05}},
                {'operation': 'compute_statistics',
                 'options': {'ddof': 1,
                       'method': 'non-overlapping',
                       'window_size': None}}]
}

In this example above, I also moved sss_start out of the metadata and into the stats results for that column.

I can see how the way you had it takes an operational view, as the metadata can originate from one operation being applied to multiple columns. However, I think adding the metadata in with each column result would make it easier to parse the metadata automatically. If a user grabs the statistics for one column, all the metadata will be retained rather than having to remember to grab the metadata separately and store it in a different object.

What are your thoughts?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions