Rollups - merge two kinds

the CLI flow uses collectors -> csvs.tar.gz -> `dataframe_engine` -> report -> send.
the Service flow uses collectors -> dataframes in DB -> `*AnonymizedRollup` -> anonymize -> send.

Apart from storage differences, those are two implementations of the same thing.
(Technically 3, as the library/dataframes implementation is a refactor of the cli's, but unused pre-#285)

For cli, dataframe engines take care of knowing which data to load (`dates`), loading and preprocessing the data (`build_dataframe`), merging individual collects (`concat` or `merge`), and post processing (still in `build_dataframe`). Also group-by logic in `group`/`regroup`. The report engine then uses the data to build a specific report to send.

For service, the anonymized rollup class takes care of preprocessing the data (`prepare`), merging (`merge`), and postprocessing (`base`). The service `daily_metrics_rollup` task takes care of knowing which data to load, and the `daily_anonymize_and_prepare` task takes on the role of preparing the final report.

Merge those into one set of Rollup classes:
* some of dataframe engines have a lot of code to implement merging logic based on an adhoc definition language - replacing with direct pandas calls should make this much simpler to deal with
* the "knowing which date/data to use" logic should move outside rollups, this is not something to share between cli and the service, and if so, not in the rollup class (no opinionated collectors, no opinioned rollups, no opinionated reports either (eventually), just one set of opinionated classes putting things together)
* the prepare step (raw dataframe to sane dataframe) should be the place to handle sanitizing the input data, field types, etc.
* the merge step (df & df -> df) should just merge 2 dataframes into one, treat None as an empty dataframe of the right type
* the postprocess (dataframe to dict) step is necessary, but consider which parts should live there vs in the report
* some of the dataframe sanitization work is done in `library/dataframes/` (merged and currently unused), #285 & #313 (unfinished/not up to date)

And make a new Report type which follows the existing patterns of dealing with rollups, and produces a json anonymized report.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollups - merge two kinds #385

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rollups - merge two kinds #385

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions