Skip to content

Make parsing log filenames future-proof #4518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

europaul
Copy link
Contributor

Main commit is a8725c7. Instead of relying on a certain structure of log filenames, we should just use regexp to look for timestamps or UUIDs - this way the filename structure can change in the future, but it will still be backwards compatible with our filename parsing logic.

This PR contains the changes for pillar. In a later PR those changes will also be applied to newlog and edge-view.

@europaul europaul force-pushed the newlog-filenames branch 2 times, most recently from 02876f6 to 58d6ee9 Compare January 10, 2025 18:31
@europaul
Copy link
Contributor Author

the yetus warning can probably be ignored since we do the same in other files that import gomega

yetus: pkg/pillar/types/newlogtypes_test.go#L10
revive: should not use dot imports https://revive.run/r#dot-imports

@rene
Copy link
Contributor

rene commented Jan 13, 2025

the yetus warning can probably be ignored since we do the same in other files that import gomega

yetus: pkg/pillar/types/newlogtypes_test.go#L10 revive: should not use dot imports https://revive.run/r#dot-imports

Actually not in all places, like here gomega it's not dot imported: https://github.com/lf-edge/eve/blob/master/pkg/pillar/types/patchenvelopestypes_test.go#L10

I would avoid as much as we can to introduce new Yetus errors and in this case it's really not hard to fix it...

@rene
Copy link
Contributor

rene commented Jan 13, 2025

@europaul , LGTM and I liked the flexibility added by using regex. However, I would fix the Yetus error in order to avoid YAYEE (Yet Another Yetus Error on EVE 😃 )

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

@@ -85,21 +86,43 @@ type NewlogMetrics struct {
AppMetrics logfileMetrics // App metrics
}

// GetTimestampFromGzipName - get timestamp from gzip file name
func GetTimestampFromGzipName(fName string) (time.Time, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see, we still use this func in another package, edgeveiew:

timestamp, err := types.GetTimestampFromGzipName(f.Name())
and
timestamp, err := types.GetTimestampFromGzipName(l.Name())

So, we should be careful here...

@europaul
Copy link
Contributor Author

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

The new functions are (mostly) backward compatible with the old ones. The only problem that I see is with logs that get created before the devices sets the correct time (probably through NTP), so the initial log files look like dev.log.123.gz instead of dev.log.1731491904032.gz. The new function will not pick up a timestamp from them since it doesn't fit into \b\d{13,16}\b regexp.

@europaul
Copy link
Contributor Author

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

@rene
Copy link
Contributor

rene commented Jan 14, 2025

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

The new functions are (mostly) backward compatible with the old ones. The only problem that I see is with logs that get created before the devices sets the correct time (probably through NTP), so the initial log files look like dev.log.123.gz instead of dev.log.1731491904032.gz. The new function will not pick up a timestamp from them since it doesn't fit into \b\d{13,16}\b regexp.

@europaul , can we affirm that the timestamp will always precede the file extension, like this <any file name>.<TIMESTAMP>.<extension (.gz, etc)> ?

If so, the following regex could be used \b(\d{3,16}).\w{2,3}, then the first occurrence will be the timestamp and it will cover the dev.log.123.gz case. If you want to be more picky with the timestamps, another options is to use this regex: \b(\d{13,16}|\d{3}).\w{2,3}, which will cover only valid timestamps and the case for 3 digits....

@OhmSpectator
Copy link
Member

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

So, we are expecting a new PR very after this one is merged... Have you tested the changes in edge view somehow?

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the tests

@europaul
Copy link
Contributor Author

If so, the following regex could be used \b(\d{3,16}).\w{2,3}, then the first occurrence will be the timestamp and it will cover the dev.log.123.gz case. If you want to be more picky with the timestamps, another options is to use this regex: \b(\d{13,16}|\d{3}).\w{2,3}, which will cover only valid timestamps and the case for 3 digits....

it's actually not just 3 digits - the timestamp can be anything between 0 and now microseconds.

@europaul
Copy link
Contributor Author

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

So, we are expecting a new PR very after this one is merged... Have you tested the changes in edge view somehow?

ok, let's postpone this merge until we can execute it really in sync with the PR for edge-view and newlog. I'll convert it to draft for now

@europaul europaul marked this pull request as draft January 15, 2025 05:31
appUUID = fStr[0]
appUUID, err := types.GetUUIDFromFileName(fName)
if err != nil {
log.Errorf("buildAppUUIDMap: cannot parse app file name %s: %v", fName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this generate/log an error when it is run on a device log filename?

Copy link
Contributor Author

@europaul europaul Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would, because a device log filename doesn't contain a UUID. however this function is only run on app log files

if isApp {
buildAppUUIDMap(f.Name())
}

// GetTimestampFromFileName extracts a microsecond timestamp from a filename
func GetTimestampFromFileName(filename string) (time.Time, error) {
// Regular expression to match a microsecond timestamp (sequence of 13-16 digits)
timestampRegex := regexp.MustCompile(`\b\d{13,16}\b`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to declare this as a variable in the file so that it is compiled at init time and not for each Get call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that absolutely makes sense!

return time.Time{}, fmt.Errorf("getTimestampFromGzipName: invalid log file name %s", fName)
// GetTimestampFromFileName extracts a microsecond timestamp from a filename
func GetTimestampFromFileName(filename string) (time.Time, error) {
// Regular expression to match a microsecond timestamp (sequence of 13-16 digits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment below says millisecond and here you said microsecond. Which one is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing! it's always milliseconds

// GetUUIDFromFileName extracts a UUID from a filename with dot-delimited parts
func GetUUIDFromFileName(filename string) (string, error) {
// UUID regex pattern (supports v4 UUIDs like "123e4567-e89b-12d3-a456-426614174000")
uuidRegex := regexp.MustCompile(`[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define as a local variable in the file?


// Check each part for a UUID match
for _, part := range parts {
if uuidRegex.MatchString(part) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or call the UUID parser function we use elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we mostly use uuid.FromString, which returns a UUID type and I'm fine with string

Instead of parsing filenames according to a certain schema,
we should use regexp to look for timestamps or UUIDs. This
way the file names can change in the future without affecting
backward compatibility with the existing code for parsing
filenames.

Signed-off-by: Paul Gaiduk <[email protected]>
Since not all logs in persist are to be sent, we rename
LogRemainToSendMBytes to MaxGzipLogMBytesInPersist. This
parameter represents the maxinum size of all gzipped logs
in persist.

Signed-off-by: Paul Gaiduk <[email protected]>
@github-actions github-actions bot requested a review from OhmSpectator February 4, 2025 19:37
@europaul
Copy link
Contributor Author

europaul commented Feb 4, 2025

The only problem that I see is with logs that get created before the devices sets the correct time (probably through NTP), so the initial log files look like dev.log.123.gz instead of dev.log.1731491904032.gz. The new function will not pick up a timestamp from them since it doesn't fit into \b\d{13,16}\b regexp.

I fixed this problem, by using the first digits-only part that we find between dots as a timestamp

@europaul
Copy link
Contributor Author

europaul commented Feb 4, 2025

So, we are expecting a new PR very after this one is merged... Have you tested the changes in edge view somehow?

@OhmSpectator should I prepare another PR with changes to edge-view and newlog that would be merged right after this one before we merge this one?

@OhmSpectator
Copy link
Member

@OhmSpectator should I prepare another PR with changes to edge-view and newlog that would be merged right after this one before we merge this one?

I don’t actually remember how we update Pillar as a library. Do we first need to get some kind of hash and add it to the go.mod file of another component? If it’s possible to do within the same PR (I’m not sure), I would prefer to do it that way. I always ask @christoph-zededa or @deitch when dealing with such things.

@europaul
Copy link
Contributor Author

europaul commented Feb 5, 2025

Do we first need to get some kind of hash and add it to the go.mod file of another component?

Yes, that's how we usually do it, we need to first publish pillar before we can update it as a dependency in other modules. Afaik, it's not possible to do both in one PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants