Skip to content

Henry vacuum explore does not work when there are >5000 queries on that explore in the timeframe of interest #109

@ajcrutch

Description

@ajcrutch

I want to only remove fields which have not been used in the past 180 days. Given that timeframe of interest, there are many more than 5000 records in the result of this query on the history explore (notice the limit at the end):

models.WriteQuery(
                model="system__activity",
                view="history",
                fields=[
                    "query.model",
                    "query.view",
                    "query.formatted_fields",
                    "query.filters",
                    "history.query_run_count",
                ],
                filters={
                    "history.created_date": self.timeframe,
                    "query.model": model.replace("_", "^_"),
                    "query.view": explore.replace("_", "^_") if explore else "",
                    "query.formatted_fields": "-NULL",
                    "history.workspace_id": "production",
                },
                limit="5000",
            ),

^this is found in the fetcher.py module.

In my case, there are nearly 6000 different values for "query.formatted_fields" and 480,000 different values for "query.filters" and when combined there are 710,000 unique combinations. So for my case, even increasing the row limit isn't a viable option.

Right now system activity is limited to showing fields used together. It is grouping by every unique set of fields (and sets of filter:value) used in queries. Instead, it would be easier to use and produce fewer result rows if it showed the queries which contain each field (or filter:value). This article about grouping stack overflow posts by individual tag instead of set of tags is analogous:
https://www.googlecloudcommunity.com/gc/Modeling/Analytic-Block-Tags-and-Attributes-with-and-without-Arrays/m-p/591472

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions