Skip to content

[pkg/ottl] Add ParseSeverity function #37280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

bacherfl
Copy link
Contributor

@bacherfl bacherfl commented Jan 17, 2025

Description

This PR adds the ParseSeverity function, as discussed in the linked ticket. I also had to make a minor change to the
internal mapGetter, handling the map literals to return a raw map[string]any, instead of a pcommon.Map. This is because if there is a map literal within a slice, the pcommon.Slice.FromRaw cannot handle the pcommon.Map, as it only works with raw data types.

This change is however transparent, and the behavior to the outside of this package does not change.
EDIT: After merging main with the support for value expressions, introduced in #36883, this would affect the type of values returned by ParseValueExpression - previously this could potentially return map[string]any/[]any, but with the changes introduced in this PR, this would return a pcommon.Map/pcommon.Slice.
Please let me know if I should do this change in a separate PR though.

Link to tracking issue

Fixes #35079

Testing

Added unit and e2e tests

Documentation

Describe new function in the readme

Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 18, 2025
@TylerHelmuth
Copy link
Member

@bacherfl is this still draft?

@github-actions github-actions bot removed the Stale label Feb 19, 2025

`target` is a Getter that returns a string or an integer.
`severityMapping` is a map containing the log levels, and a list of values they are mapped from. These values can be either
strings, or map items containing a numeric range, defined by a `min` and `max` key, for the given log level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it explicit whether the range has inclusive or exclusive bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated the readme now to make this explicit

return nil, fmt.Errorf("could not get log level: %w", err)
}

logLevel, err := evaluateSeverity(value, severityMap.AsRaw())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern about this function is performance, building up/looping through the severity mappings for every single execution can be expensive. I wish we could enforce the Mapping argument to be a map literal value, so we could improve the lookups, but I think that's currently not supported by OTTL.

That said, I'd try to avoid as many loops as possible, and use the pcommon.Map and pcommon.Slices values instead of parsing them to .AsRaw(). I might be wrong, but I think it would save some cycles/allocs that might worth the change. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern about this function is performance, building up/looping through the severity mappings for every single execution can be expensive.

I fully agree with this part at least. This primary principle behind the stanza implementation is to build the mapping once and then having constant lookup times while processing each record. If we can't make this happen somehow in OTTL then we will pay quite a cost.

Can we build this mapping inside of parseSeverity but outside of the function returned by parseSeverity?

func parseSeverity[K any](target ottl.Getter[K], mapping ottl.PMapGetter[K]) ottl.ExprFunc[K] {
	// build static map of severities once
	return func(ctx context.Context, tCtx K) (any, error) {
		// use static map in every execution
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a better way but if nothing else, I think we could declare the compiled mapping as a var within parseSeverity and outside the function it returns, and use a sync.Once to compile it the first time the function executes. This is oversimplified psuedocode, but basically:

func parseSeverity[K any](target ottl.Getter[K], mapping ottl.PMapGetter[K]) ottl.ExprFunc[K] {
	var compileOnce sync.Once
	var staticMapping map[string]int
	return func(ctx context.Context, tCtx K) (any, error) {
		compileOnce.Do(func() {
			precompiledMapping, getMappingErr = args.Mapping.Get(ctx, tCtx)
		})

		value, err := target.Get(ctx, tCtx)
		if err != nil {
			return nil, fmt.Errorf("could not get log level: %w", err)
		}

		sev, ok := staticMapping[value]
		if !ok {
			return defaultSeverity
		}
		return sev
	}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My favorite solution is finding a way to force the input to be a literal. It will take OTTL changes, but that is really what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even theoretically possible for a range to be expressed as a literal? Granted, it's not strictly necessary that we support ranges but there are some use cases where it is quite convenient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it even theoretically possible for a range to be expressed as a literal?

Yes I think it's possible, not exactly the range, but the whole map as literal, which for OTTL essentially means that the function's parameter value can be retrieved at the bootstrap time, and differently from the ottl.Getters, it's immutable and does not depend on the transformation context to get accessed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it even theoretically possible for a range to be expressed as a literal?

Yes I think it's possible, not exactly the range, but the whole map as literal, which for OTTL essentially means that the function's parameter value can be retrieved at the bootstrap time, and differently from the ottl.Getters, it's immutable and does not depend on the transformation context to get accessed.

I agree supporting literal inputs is nice but my point is that sometimes it's more user friendly to support something which can be interpreted once when the function is built. I think the notion of a range is one of those situations because e.g. no one wants to have to list all the HTTP status codes in order to assign them to severity levels. This shouldn't be a choice between literal inputs and recomputing a mapping for every context. If we can provide more user friendly inputs AND compute a complete mapping only once, this is better than literal inputs and also better than recomputing the same thing repeatedly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree supporting literal inputs is nice but my point is that sometimes it's more user friendly to support something which can be interpreted once when the function is built

I might be missing the point, but that's what literals should do. The term "literals" is somehow confusing here, for OTTL it means we have a function's parameter that is not a getter, instead, it's a raw immutable value that is available when the function is built.

To support this use case, for example, OTTL needs to be changed so it knowns how to parse inputs like { "error":[ "err", { "min": 3, "max": 4 }]} into a raw pcommon.Map (required by the function argument).
The function usage doesn't change, the only thing that wouldn't be supported is using non-literal values (getters), as they're mutable, and their value might change from one statement to another, for example:

log_statements:
- context: log
  statements:
    - set(severity_number, ParseSeverity(severity_number, { "error":["err", { "min": 3, "max": 4 }]})) # Would work as the argument value is a literal, and cannot be changed by other statements.
    - set(cache["mappings"], { "error":["err", { "min": 3, "max": 4 }]}) 
    - set(severity_number, ParseSeverity(severity_number, cache["mappings"])) # Wouldn't work, as the cache["mappings"] path is mutable (getter), so it's not a literal and needs to be evaluate in every execution.
    - set(cache["mappings"], { "error":["err", { "min": 5, "max": 6 }]}) 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your description of literals and agree we need them. My point though is about what happens after we have parsed the syntax.

Every single line of evaluateSeverityNumberMapping is unnecessary if we build a mapping (immediate lookup, not function evaluation).

Severity ranges tend to have a very reasonable and finite number of possible values so there's no need to evaluate logic every time we see a log. "Is this a range criteria?" "Is there a min?" "What is the min?", "Is this value greater than the min?", etc are all unnecessary if we can precompile this into a lookup table that is instant access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, my answer was more focused on the OTTL side and how we could get the map parsed as literal.
I agree with you, we definitely need to build a lookup table for that, ideally with O(1) access.

rangeMin, gotMin := rangeMap[minKey]
rangeMax, gotMax := rangeMap[maxKey]
if !gotMin || !gotMax {
continue
Copy link
Contributor

@edmocosta edmocosta Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mixed feelings here, from one side we accept settings like:

set(attributes["test"], ParseSeverity(severity_number, {"info":[{"min": 200}]}))

But here that setting becomes no-op, and there's no error messages or logs alerting the user about that missing key (other than the no matching message). Should we consider this scenario invalid and raise an error instead? In addition to that, should we also raise an error if an invalid key name is present in the mappings ({"info":[{"invalid": 200}]}?

For the min and max keys, I think another option would be making them "optional", so when one of them is suppressed, it could mean "no min/max bounds".

If the behavior is like this for keeping it the same as stanza, I'd add a note into the docs explaining that both keys are required, otherwise the condition is ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroring for unexpected configuration feels right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated this section to return an error now

@djaglowski
Copy link
Member

djaglowski commented Apr 23, 2025

I thought the goal was having the same stanza functionality, and not exactly replicate the same configuration structure.

I agree, it's not necessary to have exactly the same structure. My hope was just that we'd keep the lessons learned from the stanza implementation.


The current mappings:

ParseSeverity(severity_number, {
       "error":["err", { "min": 3, "max": 4 }],
       "info":[{ "min": 1, "max": 2 }],
     }
))

Could be expressed as:

PaseSeverity(severity_number, {
     "error": [{"equals": "err"}, {"range":[3,4]}],
     "info":  [{"range":[1,2]}],
})

I think there's a tradeoff here. I agree it's more clear to use {"equals": "err"} but something to keep in mind is that users sometimes want to match against any of a large number of values. ["err", "error", "E", "foo", "bar"] is a lot easier to read than [ {"equals": "err"}, {"equals": "error"}, {"equals": "E"}, {"equals": "foo"}, {"equals": "bar"} ] IMO.

PaseSeverity(severity_number, {
     "error": [{"equals": "err", "range":[3,4]},  {"range":[5,6]}],
})

On the other hand, aside from min/max, I'm not convinced that AND is really all that useful with the current set of criteria. In this example, it would be impossible to find a value that is both equal to "err" and also in a numeric range.

I like the idea in principle though and can imagine that we'd introduce more sophisticated criteria later. This raises other design questions for me though, such as whether order of criteria matters, supporting OR and NOT as well as AND - basically becomes an entire DSL. IMO we should just keep it simple for this first version and can look at a more advanced function separately if a need arises.

@edmocosta
Copy link
Contributor

think there's a tradeoff here. I agree it's more clear to use {"equals": "err"} but something to keep in mind is that users sometimes want to match against any of a large number of values. ["err", "error", "E", "foo", "bar"] is a lot easier to read than [ {"equals": "err"}, {"equals": "error"}, {"equals": "E"}, {"equals": "foo"}, {"equals": "bar"} ] IMO

On the other hand, aside from min/max, I'm not convinced that AND is really all that useful with the current set of criteria. In this example, it would be impossible to find a value that is both equal to "err" and also in a numeric range.

I like the idea in principle though and can imagine that we'd introduce more sophisticated criteria later. This raises other design questions for me though, such as whether order of criteria matters, supporting OR and NOT as well as AND - basically becomes an entire DSL. IMO we should just keep it simple for this first version and can look at a more advanced function separately if a need arises.

Yes, I completely agree with you @djaglowski. My main point was giving conditions a name/type identifier, so it would be easier to extend and validating. The other examples are just possibilities, but I wouldn't suggest implementing them 😄

For example, the range condition has the min and max keys. At the moment, we only have that condition type, so we can check if the criteria item is a map, and validate the expected keys. If in the future we need to introduce another condition type that also uses a map, we would probably need to guess the condition type based on the keys names, which IMO, is not ideal.

Examples:

{"equals":["foo", "bar"]} -> if I know the condition type is "equals", I can validate the value to be a slice of T.
{"range":{"min": 1, "max": 2}} -> "range": validate if the map has both required keys min and max.
{"foo":{"min": 1, "optional": "1"}} -> "foo": validate if the map has the required key min, and if optional is a string.
etc

Thanks!

@TylerHelmuth
Copy link
Member

@djaglowski @edmocosta thanks for reviewing, lets move forward with your suggestions.

@bacherfl
Copy link
Contributor Author

bacherfl commented May 5, 2025

@djaglowski @edmocosta thanks for reviewing, lets move forward with your suggestions.

Thanks for the suggestions, will adapt the PR soon

@bacherfl
Copy link
Contributor Author

bacherfl commented May 8, 2025

reverting back to draft while addressing the comments/suggestions

@bacherfl bacherfl marked this pull request as draft May 8, 2025 06:32
@bacherfl
Copy link
Contributor Author

think there's a tradeoff here. I agree it's more clear to use {"equals": "err"} but something to keep in mind is that users sometimes want to match against any of a large number of values. ["err", "error", "E", "foo", "bar"] is a lot easier to read than [ {"equals": "err"}, {"equals": "error"}, {"equals": "E"}, {"equals": "foo"}, {"equals": "bar"} ] IMO

On the other hand, aside from min/max, I'm not convinced that AND is really all that useful with the current set of criteria. In this example, it would be impossible to find a value that is both equal to "err" and also in a numeric range.

I like the idea in principle though and can imagine that we'd introduce more sophisticated criteria later. This raises other design questions for me though, such as whether order of criteria matters, supporting OR and NOT as well as AND - basically becomes an entire DSL. IMO we should just keep it simple for this first version and can look at a more advanced function separately if a need arises.

Yes, I completely agree with you @djaglowski. My main point was giving conditions a name/type identifier, so it would be easier to extend and validating. The other examples are just possibilities, but I wouldn't suggest implementing them 😄

For example, the range condition has the min and max keys. At the moment, we only have that condition type, so we can check if the criteria item is a map, and validate the expected keys. If in the future we need to introduce another condition type that also uses a map, we would probably need to guess the condition type based on the keys names, which IMO, is not ideal.

Examples:

{"equals":["foo", "bar"]} -> if I know the condition type is "equals", I can validate the value to be a slice of T. {"range":{"min": 1, "max": 2}} -> "range": validate if the map has both required keys min and max. {"foo":{"min": 1, "optional": "1"}} -> "foo": validate if the map has the required key min, and if optional is a string. etc

Thanks!

I updated the structure now to use the suggested structure. For the range condition, there can be two cases though - one where the condition is defined as a map containing min and max, and another where the range is defined via a string placeholder, i.e. the http status code ranges like 2xx, 3xx and so on.

@bacherfl
Copy link
Contributor Author

I think I addressed most points now and updated the PR accordingly. However there is still the issue of having the need to go through the conditions and evaluate them each time the function is invoked, as we would need a way to be able to force map arguments to be a literal. Should we put this PR on hold while this is not possible in OTTL yet?

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 10, 2025
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jun 24, 2025
@bacherfl bacherfl reopened this Jun 26, 2025
@bacherfl
Copy link
Contributor Author

reopening, but we will need for #40222 (literal value support) before this one can be completed

@github-actions github-actions bot removed the Stale label Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pkg/ottl] ParseSeverity function
7 participants