-
Notifications
You must be signed in to change notification settings - Fork 103
feat(ourlog): Add vercel log drain endpoint #5212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
837b7d7
to
1dac424
Compare
3810607
to
ec8cfbf
Compare
771fc43
to
c95f053
Compare
c95f053
to
d25fc99
Compare
fn parse_logs_data(payload: &[u8]) -> Result<Vec<VercelLog>> { | ||
// Try parsing as JSON array first | ||
if let Ok(logs) = serde_json::from_slice::<Vec<VercelLog>>(payload) { | ||
return Ok(logs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to differentiate the payload format based on the request, a header or something?
If we can't, you can peek at the first byte ([
) to figure out if it is an array or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we differentiate based on content type, done with a3db213
let logs: Vec<VercelLog> = payload_str | ||
.lines() | ||
.filter_map(|line| serde_json::from_str::<VercelLog>(line.trim()).ok()) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of collecting to a vector here, you could use the iterator and call produce
for each item instead. This reduces the amount of necessary allocations.
if logs.is_empty() { | ||
relay_log::debug!("Failed to parse any logs from vercel log drain payload"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that correct? Maybe we just got an empty content? Also the log line is not emitted for []
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted this in 8879201.
} | ||
|
||
// Fall back to NDJSON parsing | ||
let payload_str = std::str::from_utf8(payload).map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversion to utf8
first isn't necessary and means scanning an extra time across the data. slice::split
is probably enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, alongside the iterator improvements. Changed with 8879201
|
||
response = self.post(url, headers=headers, data=data) | ||
|
||
response.raise_for_status() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response.raise_for_status() | |
response.raise_for_status() | |
return response |
project_config["config"]["retentions"] = { | ||
"log": {"standard": 30, "downsampled": 13 * 30}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to omit, we can expect retentions to work for an integration if it works for the core logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, generally moved some test items around in 488f625
headers={"Content-Type": "application/json"}, | ||
) | ||
|
||
# Check that the items are properly processed via items_consumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else?
headers={"Content-Type": "application/x-ndjson"}, | ||
) | ||
|
||
# Check that the items are properly processed via items_consumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else (again)?
project_config["config"]["retentions"] = { | ||
"log": {"standard": 30, "downsampled": 13 * 30}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also mentioned below, we can omit this.
}, | ||
] | ||
|
||
# Check outcomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As opposed to what?
count += 1; | ||
let ourlog = relay_ourlogs::vercel_log_to_sentry_log(log); | ||
produce(ourlog); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
if count == 0 { | ||
relay_log::debug!("Failed to parse any logs from vercel log drain payload"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does output when you get empty arrays, but this is unexpected from the log drain endpoint.
After implementing this I also realized that we could return the count
from expand
, and use that accordingly, we can make a similar refactor to the otlp integration.
// Vercel Log Drain data in a json array payload | ||
Json, | ||
// Vercel Log Drain data in a newline delimited json payload | ||
NDJson, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NDJson, | |
NdJson, |
Even if it looks wrong, according to Rust code this is the proper capitalization.
TEST_CONFIG = { | ||
"outcomes": { | ||
"emit_outcomes": True, | ||
"batch_size": 1, | ||
"batch_interval": 1, | ||
"aggregator": { | ||
"bucket_interval": 1, | ||
"flush_interval": 1, | ||
}, | ||
}, | ||
"aggregator": { | ||
"bucket_interval": 1, | ||
"initial_delay": 0, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure, because you're just repeating the default config:
relay/tests/integration/fixtures/relay.py
Lines 147 to 158 in ca7e20d
"outcomes": { | |
"batch_size": 1, | |
"batch_interval": 1, | |
"aggregator": { | |
"bucket_interval": 1, | |
"flush_interval": 0, | |
}, | |
}, | |
"aggregator": { | |
"bucket_interval": 1, | |
"initial_delay": 0, | |
}, |
This didn't used to be the default, I changed that 2 weeks ago, maybe you tried before that?
continue; | ||
} | ||
|
||
if let Ok(log) = serde_json::from_slice::<VercelLog>(line) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wouldn't want to swallow errors here and instead emit a single error.
This would mean you need to pass in the RecordKeeper
and emit an error. But we can also tackle this separately/afterwards.
We should then also make a test which has a single invalid line.
Then this maybe also addresses the issue with the count
afterwards.
resolves https://linear.app/getsentry/issue/LOGS-389/add-vercel-log-drain-endpoint-to-relay
Building on top of the vercel log transform added in #5209, this PR adds an endpoint for the vercel log drain.
This endpoint is featured flagged, you can see the options automator PR here: https://github.com/getsentry/sentry-options-automator/pull/5367