-
-
Notifications
You must be signed in to change notification settings - Fork 768
chore: unknown flags #9837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: unknown flags #9837
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
0d01642
to
4fb225b
Compare
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces an “unknown flags” feature to track and store unknown flag names reported by clients, including their source applications. The key changes are the addition of a new migration for the unknown_flags table, updates to the store, service, and routing layers, and corresponding tests and fixtures to support the in-memory tracking and periodic flushing of this data.
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/test/fixtures/store.ts | Added FakeUnknownFlagsStore and integrated it into the store. |
src/server-dev.ts | Enabled the new reportUnknownFlags flag in the config. |
src/migrations/20250424185110-unknown-flags.js | Added migration to create the unknown_flags table. |
src/lib/types/* | Updated store, service, and experimental types to include unknown flags. |
src/lib/services/index.ts | Instantiated and passed the UnknownFlagsService in services creation. |
src/lib/routes/admin-api/metrics.ts | Added a new route with UnknownFlagsController for retrieving unknown flags. |
src/lib/features/scheduler/schedule-services.ts | Scheduled flush and clear tasks for the unknownFlagsService. |
src/lib/features/metrics/unknown-flags/* | Added store, service, controller, and fake store for unknown flags. |
src/lib/features/metrics/client-metrics/metrics-service-v2.ts | Enhanced filtering logic and registration of unknown flags in client metrics reporting. |
src/lib/features/metrics/client-metrics/metrics-service-v2.test.ts | Updated tests to incorporate the UnknownFlagsService. |
src/lib/features/metrics/unknown-flags/fake-unknown-flags-store.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. Have an approval! Small musing: Have we stopped using dbTimer? I notice the last couple of stores there is no timer being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising
await this.db.transaction(async (tx) => { | ||
await tx(TABLE).delete(); | ||
if (flags.length > 0) { | ||
const rows = flags.map((flag) => ({ | ||
name: flag.name, | ||
app_name: flag.appName, | ||
seen_at: flag.seenAt, | ||
})); | ||
await tx(TABLE).insert(rows); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is to keep only 10 when we store... the latest 10... but I'm not convinced with the delete all... cause it needs to lock the whole table...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s true that multiple pods might flush around the same time, which means concurrent reads and writes are possible. That said, the current approach is designed to handle that safely. Each pod fetches the full current state from the database before flushing, merges it with its local in-memory cache, deduplicates by name
+ appName
, and keeps only the 10 most recent entries. So even if two pods flush close together, the end result is still consistent — just depending on which write lands last.
The DELETE + INSERT is intentional. Since the table is capped at 10 rows and flushes are infrequent, this approach keeps the logic simple and avoids the need for upserts or row-level diffs.
Also worth noting: the only reason we’re storing this in the DB at all is to ensure persistence across pod restarts and availability across pods. If this were purely local state, the in-memory cache would be enough.
If the feature grows in scope, we can revisit the update strategy, but for now this tradeoff feels right.
|
||
async clear(hoursAgo: number): Promise<void> { | ||
return this.db(TABLE) | ||
.whereRaw(`seen_at <= NOW() - INTERVAL '${hoursAgo} hours'`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think seen at is not the best to clear flags, because it might just keep being updated if it's a frequently accessed flag... maybe use created at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think seen_at
is the right field here, since what we care about is whether the flag has been reported recently. If an unknown flag keeps getting reported, it’s still relevant — so we want to keep it in the list. Using created_at
would remove flags that are still active just because they were first seen a while ago, which feels like the wrong signal for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of created_at might still be valuable. It will tell you the "age" or how long ago this flag was first seen (if we don't remove them). Which makes me think... maybe we should not delete all and recreate... instead, what if we make a count on the table and we only insert up to 10-count?
We should still delete flags that stopped appearing in the "unknown" group from the table, so it adds some complexity, so let's leave it as an idea here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced of what created_at
really gives us here. What's the benefit you're seeing? How valuable is it to know when an unknown flag was first seen? And are the first 10 really more valuable than the last 10?
I'll leave it for now, but I'm curious about your point of view here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to leave it for now. I'm thinking of differentiating the 2 scenarios we discussed:
- You made a typo or accidentally referenced a flag that doesn't exist. This will stay in code for a long time and therefore the difference between created_at (or when it was first registered as unknown) and seen_at should be large.
- Your normal workflow implies releasing the flag to prod first and then creating it in Unleash. In this case, the difference between created_at and last_seen should be at max a few hours to 1 or 2 days.
Keeping the created_at as "first time the flag was registered as unknown" comes with challenges, which is why I'm ok dropping this, but I hope my point is clearer now.
Can't find |
293609c
to
67aa130
Compare
Discussed with @gastonfournier and we're interested in gathering some data before we invest more time on this feature (e.g., UI). 67aa130 adds a new gauge for the total of unknown flags seen in the last 24 hours, if any (max of 10).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm still thinking on what the behavior should be... Also, what'd happen when multiple instances try to write to the db, what happens if an instance has seen a flag that doesn't exist in Unleash but another instance has not seen it (running the flush in a single instance would miss this unknown flag)?
Overall, this will be a nice first step, we don't have to have all the answers atm.
const validatedToggleNames = await this.filterValidToggleNames( | ||
toggleNamesToValidate, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: filtering valid toggle names seems to be a responsibility of unknown flag service more than metrics-service. I know there are practical reasons to keeping this logic here, so not something I'm asking to change but just a reflection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already a responsibility of this method before this PR: https://github.com/Unleash/unleash/blob/b322afb097e00f71be1429c378153c97442b724e/src/lib/features/metrics/client-metrics/metrics-service-v2.ts#L134C56-L134C57
I'm guessing because we're not interested in registering metrics for invalid flag names.
It doesn't make much sense to move it over to the new unknown flags service, at least for now, especially since it's behind a flag.
schedulerService.schedule( | ||
unknownFlagsService.flush.bind(unknownFlagsService), | ||
minutesToMilliseconds(2), | ||
'flushUnknownFlags', | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, cause I didn't see us using this feature in our scheduler service, and you may have a good use case here:
schedulerService.schedule( | |
unknownFlagsService.flush.bind(unknownFlagsService), | |
minutesToMilliseconds(2), | |
'flushUnknownFlags', | |
); | |
const minutes = 2; | |
schedulerService.schedule( | |
jobService.singleInstance('unknownFlagsFlushJob', | |
unknownFlagsService.flush.bind(unknownFlagsService), | |
minutes | |
), | |
minutesToMilliseconds(minutes), | |
'flushUnknownFlags', | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't want this to be a single instance job, right?
Each Unleash instance (pod) will receive unknown flags as part of metrics and store them in their own cache, so it makes sense that each Unleash instance (pod) has their own job to flush that cache into the DB.
If anything we could do it for the "clear", but I don't think doing that gives us much either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to comment on the clear one :D but yeah, maybe it's not going to give us much, it was just because I didn't see us using it and it came up in some conversations last week with either @kwasniew @sjaanus or @sastromskis. That we should have the ability of electing a leader for one of these scheduled jobs and I remember we had that already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I didn't remember / wasn't aware of it being a thing in our codebase until now, so thank you for the reminder.
That should be covered by the fact that we're merging the local unknown flags cache with the ones already registered in the DB when we flush: https://github.com/Unleash/unleash/pull/9837/files#diff-544e3c7c0bac8e29a5c0954c0b046e580ad6b11eca94dcb097fac0eedf7a8b7eR61-R79 |
Does this mean we never clean? Imagine a flag |
Yes, it is. We automatically clean old unknown flags due to our 10 max logic and the 24h clean up job that cleans unknown flags seen >24h ago. |
67aa130
to
36cf216
Compare
https://linear.app/unleash/issue/2-3406/hold-unknown-flags-in-memory-and-show-them-in-the-ui-somehow
This PR introduces a suggestion for a “unknown flags” feature.
When clients report metrics for flags that don’t exist in Unleash (e.g. due to typos), we now track a limited set of these unknown flag names along with the appnames that reported them. The goal is to help users identify and clean up incorrect flag usage across their apps.
We store up to 10 unknown flag + appName combinations, keeping only the most recent reports. Data is collected in-memory and flushed periodically to the DB, with deduplication and merging to ensure we don’t exceed the cap even across pods.
We were especially careful to make this implementation defensive, as unknown flags could be reported in very high volumes. Writes are batched, deduplicated, and hard-capped to avoid DB pressure.
No UI has been added yet — this is backend-only for now and intended as a step toward better visibility into client misconfigurations.
I would suggest starting with a simple banner that opens a dialog showing the list of unknown flags and which apps reported them.