-
Couldn't load subscription status.
- Fork 537
feat(tracing): add tracing spans to all I/O sections #3795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tracing): add tracing spans to all I/O sections #3795
Conversation
|
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
3f02b2c to
2641ccd
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3795 +/- ##
==========================================
- Coverage 74.32% 73.98% -0.35%
==========================================
Files 147 148 +1
Lines 39722 38882 -840
Branches 39722 38882 -840
==========================================
- Hits 29523 28765 -758
- Misses 8808 8852 +44
+ Partials 1391 1265 -126 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2641ccd to
e5812b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally very excited to have more tracing in place. Once tests pass I'm comfortable merging this
| let commit_or_bytes = this.commit_or_bytes; | ||
|
|
||
| if this.table_data.is_none() { | ||
| tracing::debug!("committing initial table version 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 in a lot of places we actually just import use tracing::*; which means this module prefix isn't necessary.
IMHO it's reasonable for us to pull tracing::* into most modules 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! It was a way to differentiate log from tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just imported with use tracing::*;, if you found it's ok for readability.
I didn't want to mix log and tracing crates, but it seems reasonable for now! We can tighten the imports or add explicit bridging later if the project grows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it now to be more explicit or is tracing a very small library?
37d883c to
ebed3e9
Compare
Same! I’m re-requesting a review based on your feedback. I’m still waiting for the code coverage results, and we can proceed with a merge! |
ebed3e9 to
fdd6946
Compare
|
This is stdout/err tracing only, right? |
I used
Where these traces go is entirely up to the user: with Example: Side note: Importantly, if no subscriber is configured, the overhead is nearly zero. |
|
I get that, my point is rather, how would one configure it to use push to an OTEL endpoint from python ^^ |
Ah, sorry! 😀 In Python, we might need to add a small API like deltalake.init_tracing("otel_endpoint") that sets up the exporter inside the Rust runtime, WDYT? I can take time to add it, since we have many cases where the performances is evaluated from Python! |
|
I'm marking this as a draft, please change that when you're ready for this to be merged @fvaleye |
Pull request was converted to draft
d810d7d to
a6738b0
Compare
…tracing Signed-off-by: Florian Valeye <[email protected]>
69785aa to
b6040eb
Compare
Resolved conflicts: - crates/core/src/delta_datafusion/mod.rs: Kept find_files implementation with tracing - crates/core/src/operations/optimize.rs: Merged logging with object_store initialization - crates/core/src/operations/write/mod.rs: Used DeltaPlanner::new() from main, maintained tracing - crates/core/src/protocol/checkpoints.rs: Used specific tracing imports from main
b6040eb to
ada34d4
Compare
|
@fvaleye is this in a state it can be reviewed again? |
Yes! |
| latest_version - steps, | ||
| (latest_version - steps) + 1, | ||
| ) | ||
| async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional async move doesn't seem necessary, why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more for adding an instrument to this part. instrument (commit_span).await
It ensures the span is correctly attached to the future and doesn't cause issues with await points.
But, I can extract the retry logic into a separate function and use #[instrument] macro on it, which would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, yeah the diff was so big I didn't see that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal, with all these whitespaces 😄
|
|
||
| #[async_trait::async_trait] | ||
| impl<T: ObjectStore + Clone> ObjectStore for DeltaIOStorageBackend<T> { | ||
| #[instrument(skip(self, bytes), fields(path = %location, size = bytes.content_length()))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapper is almost never hit at this point, maybe we need a simple wrapper during storage creation that wraps and instruments these methods.
I also wonder if the object store doesn't have some native tracing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
I don't know, but we could add an issue to track this need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fvaleye yeah sound goods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there changes being made here? Seems like a wrong resolution of merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, let me take a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it, but there is a significant change due to the async traced block.
| with tracer.start_as_current_span("delta_write_operation") as span: | ||
| span.set_attribute("table.path", table_path) | ||
| span.set_attribute("operation.mode", "overwrite") | ||
|
|
||
| write_deltalake(table_path, sample_data, mode="overwrite") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a follow up pr could be to do this for users directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just flushing some first pass comments. awesome stuff!!
For everyone reviewing this, I recommend disabling whitespace changes.
| .map_err(|err| -> TransactionError { | ||
| match err { | ||
| ObjectStoreError::AlreadyExists { .. } => { | ||
| warn!("commit entry already exists"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be a warning? conflicting commits might be considered something we expect in the "normal" flow of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
I wanted to inform the user about the AlreadyExists error encountered, which falls between info and warn.
| objection store communication. Please file Github issue to request for critical | ||
| openssl upgrade. | ||
|
|
||
| ## Tracing and Observability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it is high time to update our README, but is this critical enough for people coming in, or should this be in the docs (or both?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical, but I wanted to create a section to improve our issue resolution by adding instructions on how to enable tracing. This would provide additional context when investigating issues.
I can definitely move this somewhere else.
Signed-off-by: Florian Valeye <[email protected]>
Signed-off-by: Florian Valeye <[email protected]>
Signed-off-by: Florian Valeye <[email protected]>
Signed-off-by: Florian Valeye <[email protected]>
Signed-off-by: Florian Valeye <[email protected]>
9402e59 to
78bd04f
Compare
78bd04f to
4e669e1
Compare
Signed-off-by: Florian Valeye <[email protected]>
4e669e1 to
e04ffd1
Compare
Yeah that definetly helps, should have enabled that earlier :P |
Resolved conflicts in write operation by: - Using session refactoring from main (instead of state) - Keeping MetricObserver tracing from feature branch - Maintaining instrumentation spans for performance analysis
Description
To better understand performance in the
delta-rscrate, I added additional tracing to capture more detailed debug-level performance information.Python now uses
OpenTelemetryto collect tracing data emitted from Rust.With this change, we gain true end-to-end visibility: Python spans can serve as parents of Rust spans (and vice versa), ensuring a continuous trace across both runtimes.
Related Issue(s)
Documentation