-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I want to only remove fields which have not been used in the past 180 days. Given that timeframe of interest, there are many more than 5000 records in the result of this query on the history explore (notice the limit at the end):
models.WriteQuery(
model="system__activity",
view="history",
fields=[
"query.model",
"query.view",
"query.formatted_fields",
"query.filters",
"history.query_run_count",
],
filters={
"history.created_date": self.timeframe,
"query.model": model.replace("_", "^_"),
"query.view": explore.replace("_", "^_") if explore else "",
"query.formatted_fields": "-NULL",
"history.workspace_id": "production",
},
limit="5000",
),
^this is found in the fetcher.py module.
In my case, there are nearly 6000 different values for "query.formatted_fields" and 480,000 different values for "query.filters" and when combined there are 710,000 unique combinations. So for my case, even increasing the row limit isn't a viable option.
Right now system activity is limited to showing fields used together. It is grouping by every unique set of fields (and sets of filter:value) used in queries. Instead, it would be easier to use and produce fewer result rows if it showed the queries which contain each field (or filter:value). This article about grouping stack overflow posts by individual tag instead of set of tags is analogous:
https://www.googlecloudcommunity.com/gc/Modeling/Analytic-Block-Tags-and-Attributes-with-and-without-Arrays/m-p/591472