Goal
We would like to add more and better metadata tracking to our Rt model runs. Ideally something close to what a tool like mlflow's tracking can do, but without needing to set up and manage our own central server.
Context
We currently save a number of fields from the config into the metadata output of the model, e.g. disease, geo_value, task_id, job_id, etc. As we use our new Rt model more and more, new use cases arise that require new metadata fields and tags. To that end, we would like to add three more fields to the configs generated here.
Required
group_id: A string, default null (json) / None (python), otherwise a string describing the group. The group id can be applied to a collection of job ids that make up, eg. a back test (API_v2_backtest), or the job ids that make up production on a given week (2025-05-16-production), some other group ...
tags: A dict[str, Any], default {}. Can add as many key: value tags as you like. Could be useful for noting which runs are production runs, or that the run is of importance to a certain person, or the git hash of the model used. It's definitely worth thinking about if some of these should be broken out into their own top-level key: value pairs. This was mostly inspired by mlflow's set of default tags. Potential issue: a dynamic dictionary won't play nicely in our downstream conversion to a polars dataframe. Maybe we make this just a list of strings?
notes: null A place for writing notes for our later selves. E.g. "because of issue X, we decided to use a different prior for parameter Y".
Goal
We would like to add more and better metadata tracking to our Rt model runs. Ideally something close to what a tool like mlflow's tracking can do, but without needing to set up and manage our own central server.
Context
We currently save a number of fields from the config into the metadata output of the model, e.g. disease, geo_value, task_id, job_id, etc. As we use our new Rt model more and more, new use cases arise that require new metadata fields and tags. To that end, we would like to add three more fields to the configs generated here.
Required
group_id: A string, defaultnull(json) /None(python), otherwise a string describing the group. The group id can be applied to a collection of job ids that make up, eg. a back test (API_v2_backtest), or the job ids that make up production on a given week (2025-05-16-production), some other group ...tags: Adict[str, Any], default{}. Can add as manykey: valuetags as you like. Could be useful for noting which runs are production runs, or that the run is of importance to a certain person, or the git hash of the model used. It's definitely worth thinking about if some of these should be broken out into their own top-levelkey: valuepairs. This was mostly inspired by mlflow's set of default tags. Potential issue: a dynamic dictionary won't play nicely in our downstream conversion to a polars dataframe. Maybe we make this just a list of strings?notes: nullA place for writing notes for our later selves. E.g. "because of issue X, we decided to use a different prior for parameter Y".