-
Notifications
You must be signed in to change notification settings - Fork 490
fix(api): use UInt32Value for ProcessExit status #4392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(api): use UInt32Value for ProcessExit status #4392
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
a95b631 to
7c2adc9
Compare
The status field in ProcessExit was a simple number (uint32). In Proto3, if a number is 0, it doesn't show up in JSON. This was hiding the status code when a process exited successfully. We changed it to a wrapper type (UInt32Value). Now, even if the status is 0, it will always appear in the JSON output. Fixes cilium#902 Signed-off-by: Aritra Dey <[email protected]>
7c2adc9 to
249c893
Compare
|
Thanks for this PR! Can you explain where the bug was and how is this PR fixing it? At least for me, it is not obvious by looking at the code :) This way, we can hopefully avoid future issues from similar bugs! |
| // Status code on process exit. For example, the status code can indicate | ||
| // if an error was encountered or the program exited successfully. | ||
| uint32 status = 4; | ||
| google.protobuf.UInt32Value status = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that missing the status field, because it's zero is a real problem, it's the case for all the fields right?
moreover I wonder we could make above change and still stay backward compatible? @kkourt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it seems we are using a mix of google.protobuf.* classes for some integers and some direct types for others https://github.com/cilium/tetragon/blob/main/api/v1/tetragon/tetragon.proto. Indeed with Go and proto I think we can't force the uint32 to be present in the JSON if its value is zero. So using this class is a way of doing it (using pointers is another, maybe worse, way).
This is indeed a breaking change unfortunately :/!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix this in a reverse compatible way by using the optional label described here to get explicit presence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes unfortunately this is a breaking change.After going through here i think this can be a fix.
uint32 status = 4 [deprecated = true]
google.protobuf.UInt32Value exit_code = 7;
but i haven't tested this yet.I will keep this draft for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using optional will make the thing become a pointer in Go which is still a breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using optional will make the thing become a pointer in Go which is still a breaking change
yes.got this error locally
/src/api/v1/tetragon/tetragon.proto:323:3:Field "4" with name "status" on message "ProcessExit" changed
cardinality from "optional with implicit presence" to "optional with explicit presence"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32 status = 4 [deprecated = true]
google.protobuf.UInt32Value exit_code = 7;
I have tried this way and here is what the output looks like:
For status code 0
{
"process_exit": {
"process": {
"exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMyMTA3NDgxMDk4MjoxMjUyODk=",
"pid": 125289,
"uid": 1000,
"cwd": "/home/aritra/Downloads/projects/tetragon",
"binary": "/usr/bin/true",
"flags": "execve clone",
"start_time": "2025-12-03T18:36:24.375076061Z",
"auid": 1000,
"parent_exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMwOTA1MDAwMDAwMDoxMjUxMTU=",
"tid": 125289,
"in_init_tree": false
},
"parent": {
"exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMwOTA1MDAwMDAwMDoxMjUxMTU=",
"pid": 125115,
"uid": 1000,
"cwd": "/home/aritra/Downloads/projects/tetragon",
"binary": "/usr/bin/bash",
"arguments": "./verify_fix.sh",
"flags": "procFS",
"start_time": "2025-12-03T17:26:41.766178459Z",
"auid": 1000,
"parent_exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNTM1NTg3MDAwMDAwMDoxMTEwMTg=",
"tid": 125115,
"in_init_tree": false
},
"exit_code": 0,
"time": "2025-12-03T18:36:24.375839673Z"
},
"node_name": "aritra-IdeaPad-Slim-3-15IRH8",
"time": "2025-12-03T18:36:24.375838636Z"
}
and for other status code (e.g. 1)
{
"process_exit": {
"process": {
"exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMyMzA4MDE1MjE5MDoxMjUzMzY=",
"pid": 125336,
"uid": 1000,
"cwd": "/home/aritra/Downloads/projects/tetragon",
"binary": "/usr/bin/false",
"flags": "execve clone",
"start_time": "2025-12-03T18:36:26.380417078Z",
"auid": 1000,
"parent_exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMwOTA1MDAwMDAwMDoxMjUxMTU=",
"tid": 125336,
"in_init_tree": false
},
"parent": {
"exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNjMwOTA1MDAwMDAwMDoxMjUxMTU=",
"pid": 125115,
"uid": 1000,
"cwd": "/home/aritra/Downloads/projects/tetragon",
"binary": "/usr/bin/bash",
"arguments": "./verify_fix.sh",
"flags": "procFS",
"start_time": "2025-12-03T17:26:41.766178459Z",
"auid": 1000,
"parent_exec_id": "YXJpdHJhLUlkZWFQYWQtU2xpbS0zLTE1SVJIODoxNTM1NTg3MDAwMDAwMDoxMTEwMTg=",
"tid": 125115,
"in_init_tree": false
},
"status": 1,
"exit_code": 1,
"time": "2025-12-03T18:36:26.381048159Z"
},
"node_name": "aritra-IdeaPad-Slim-3-15IRH8",
"time": "2025-12-03T18:36:26.381047408Z"
}
question: is this redundancy acceptable? I am concerned that having both status and exit_code might confuse users,Does this seem like a good approach?
mtardy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep indeed, not sure what to do with that, let's discuss in https://github.com/cilium/tetragon/pull/4392/files#r2576338233.
Another unrelated question: why time is also missing in your "before fix" output?
Oh, my bad — I pasted the process_exec event output instead of process_exit. I’ve updated the PR description to include the correct output. |
|
I have a theory about what's going on. In protobuf3, the default values are never included during serialization in order to be more compact. Maybe if you add |
Yes, that does fix the missing status: 0 issue. However, it causes every event to include all null/empty fields (e.g., pod: null, docker: "", cap: null, ns: null, process_credentials: null, etc.). |
I wonder if this is something that actually needs to be fixed. I think when a client (API user) deserializes into an object, the default value will be inserted into the object, even though it wasn't included the protobuf message. So from a programatic interface perspective, no information is lost. As such, this problem seems contained to being a human interpretation issue. The human expects the the default value to be present, because they don't know that default value are omitted in protobuf3. If it's really important to be visible to the human, then we could manually add it after the However, I feel like this problem is not really specific to the status key -- it's more general as it pertains to all default values. |
yeah yeah that's why we use the The only valid approach is what you proposed here #4392 (comment). However the real question is do we care enough to do the whole deprecation dance. It's an enhancement I think, but not a crucial one, so not sure we need to bother 🤔. That would be a good topic to bring to the community meeting if you can join next Monday https://isogo.to/tetragon-meeting-notes. |
Fixes #902
Description
This PR fixes an issue where the status field in
ProcessExitevents was missing from JSON output when the exit code was 0 (success).From my testing,.This was not a bug in the tetra CLI, but rather an issue with the API definition itself.
Verification:
Before the Fix:
After:
Changelog