You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the CLI flow uses collectors -> csvs.tar.gz -> dataframe_engine -> report -> send.
the Service flow uses collectors -> dataframes in DB -> *AnonymizedRollup -> anonymize -> send.
Apart from storage differences, those are two implementations of the same thing.
(Technically 3, as the library/dataframes implementation is a refactor of the cli's, but unused pre-#285)
For cli, dataframe engines take care of knowing which data to load (dates), loading and preprocessing the data (build_dataframe), merging individual collects (concat or merge), and post processing (still in build_dataframe). Also group-by logic in group/regroup. The report engine then uses the data to build a specific report to send.
For service, the anonymized rollup class takes care of preprocessing the data (prepare), merging (merge), and postprocessing (base). The service daily_metrics_rollup task takes care of knowing which data to load, and the daily_anonymize_and_prepare task takes on the role of preparing the final report.
Merge those into one set of Rollup classes:
some of dataframe engines have a lot of code to implement merging logic based on an adhoc definition language - replacing with direct pandas calls should make this much simpler to deal with
the "knowing which date/data to use" logic should move outside rollups, this is not something to share between cli and the service, and if so, not in the rollup class (no opinionated collectors, no opinioned rollups, no opinionated reports either (eventually), just one set of opinionated classes putting things together)
the prepare step (raw dataframe to sane dataframe) should be the place to handle sanitizing the input data, field types, etc.
the merge step (df & df -> df) should just merge 2 dataframes into one, treat None as an empty dataframe of the right type
the postprocess (dataframe to dict) step is necessary, but consider which parts should live there vs in the report
the CLI flow uses collectors -> csvs.tar.gz ->
dataframe_engine-> report -> send.the Service flow uses collectors -> dataframes in DB ->
*AnonymizedRollup-> anonymize -> send.Apart from storage differences, those are two implementations of the same thing.
(Technically 3, as the library/dataframes implementation is a refactor of the cli's, but unused pre-#285)
For cli, dataframe engines take care of knowing which data to load (
dates), loading and preprocessing the data (build_dataframe), merging individual collects (concatormerge), and post processing (still inbuild_dataframe). Also group-by logic ingroup/regroup. The report engine then uses the data to build a specific report to send.For service, the anonymized rollup class takes care of preprocessing the data (
prepare), merging (merge), and postprocessing (base). The servicedaily_metrics_rolluptask takes care of knowing which data to load, and thedaily_anonymize_and_preparetask takes on the role of preparing the final report.Merge those into one set of Rollup classes:
library/dataframes/(merged and currently unused), cli - use library.dataframes #285 & library.dataframes.BaseTraditional - move merge logic to child classes #313 (unfinished/not up to date)And make a new Report type which follows the existing patterns of dealing with rollups, and produces a json anonymized report.