Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure OpenLineage if present in Spark instrumentation #8541

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mobuchowski
Copy link

What Does This Do

If OpenLineage library is present on the classpath, hijack SparkListenerBus to inject the OpenLineageSparkListener with additional tags ourselves.

Motivation

https://docs.google.com/document/d/14nmf3UcqhzoOfooiveoEBUO-HHejKalLR-qAKzbR2PI/edit?tab=t.0#heading=h.dpfdi5vyt85h

https://docs.google.com/document/d/1EwNCsnhnilL6YHYMg4UYwqK_yKgBEUsxys669h8l_-M/edit?tab=t.0#heading=h.o8z5ox632qxx

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@mobuchowski mobuchowski added the inst: apache spark Apache Spark instrumentation label Mar 12, 2025
@mobuchowski mobuchowski requested a review from a team as a code owner March 12, 2025 12:21
Copy link
Contributor

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

@pr-commenter
Copy link

pr-commenter bot commented Mar 12, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master mobuchowski/openlineage-agent-direct
git_commit_date 1742234721 1742245417
git_commit_sha 1c3133b b936442
release_version 1.48.0-SNAPSHOT~1c3133b647 1.48.0-SNAPSHOT~b936442c10
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1742247872 1742247872
ci_job_id 851778439 851778439
ci_pipeline_id 59107964 59107964
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-b3bkiuj-project-304-concurrent-0-x3z8f2to 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-b3bkiuj-project-304-concurrent-0-x3z8f2to 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 5 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.048 s) : 0, 1048363
Total [baseline] (10.511 s) : 0, 10511407
Agent [candidate] (1.041 s) : 0, 1040915
Total [candidate] (10.49 s) : 0, 10489720
section appsec
Agent [baseline] (1.187 s) : 0, 1187125
Total [baseline] (10.738 s) : 0, 10738140
Agent [candidate] (1.181 s) : 0, 1180709
Total [candidate] (10.753 s) : 0, 10752926
section iast
Agent [baseline] (1.176 s) : 0, 1175853
Total [baseline] (11.041 s) : 0, 11041196
Agent [candidate] (1.172 s) : 0, 1171722
Total [candidate] (10.987 s) : 0, 10986616
section profiling
Agent [baseline] (1.262 s) : 0, 1262321
Total [baseline] (10.848 s) : 0, 10847887
Agent [candidate] (1.258 s) : 0, 1258216
Total [candidate] (10.788 s) : 0, 10787514
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.048 s -
Agent appsec 1.187 s 138.761 ms (13.2%)
Agent iast 1.176 s 127.49 ms (12.2%)
Agent profiling 1.262 s 213.958 ms (20.4%)
Total tracing 10.511 s -
Total appsec 10.738 s 226.733 ms (2.2%)
Total iast 11.041 s 529.789 ms (5.0%)
Total profiling 10.848 s 336.48 ms (3.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.041 s -
Agent appsec 1.181 s 139.794 ms (13.4%)
Agent iast 1.172 s 130.806 ms (12.6%)
Agent profiling 1.258 s 217.301 ms (20.9%)
Total tracing 10.49 s -
Total appsec 10.753 s 263.206 ms (2.5%)
Total iast 10.987 s 496.896 ms (4.7%)
Total profiling 10.788 s 297.794 ms (2.8%)
gantt
    title petclinic - break down per module: candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (723.178 ms) : 0, 723178
BytebuddyAgent [candidate] (718.585 ms) : 0, 718585
GlobalTracer [baseline] (241.055 ms) : 0, 241055
GlobalTracer [candidate] (240.152 ms) : 0, 240152
AppSec [baseline] (55.173 ms) : 0, 55173
AppSec [candidate] (55.162 ms) : 0, 55162
Remote Config [baseline] (703.697 µs) : 0, 704
Remote Config [candidate] (691.664 µs) : 0, 692
Telemetry [baseline] (12.938 ms) : 0, 12938
Telemetry [candidate] (11.477 ms) : 0, 11477
section appsec
BytebuddyAgent [baseline] (738.197 ms) : 0, 738197
BytebuddyAgent [candidate] (733.545 ms) : 0, 733545
GlobalTracer [baseline] (236.999 ms) : 0, 236999
GlobalTracer [candidate] (235.85 ms) : 0, 235850
AppSec [baseline] (176.484 ms) : 0, 176484
AppSec [candidate] (176.827 ms) : 0, 176827
Remote Config [baseline] (666.263 µs) : 0, 666
Remote Config [candidate] (653.023 µs) : 0, 653
Telemetry [baseline] (8.712 ms) : 0, 8712
Telemetry [candidate] (8.229 ms) : 0, 8229
IAST [baseline] (21.598 ms) : 0, 21598
IAST [candidate] (21.575 ms) : 0, 21575
section iast
BytebuddyAgent [baseline] (839.629 ms) : 0, 839629
BytebuddyAgent [candidate] (837.79 ms) : 0, 837790
GlobalTracer [baseline] (231.405 ms) : 0, 231405
GlobalTracer [candidate] (230.502 ms) : 0, 230502
AppSec [baseline] (56.975 ms) : 0, 56975
AppSec [candidate] (56.325 ms) : 0, 56325
Remote Config [baseline] (620.037 µs) : 0, 620
Remote Config [candidate] (617.311 µs) : 0, 617
Telemetry [baseline] (8.9 ms) : 0, 8900
Telemetry [candidate] (8.738 ms) : 0, 8738
IAST [baseline] (23.104 ms) : 0, 23104
IAST [candidate] (22.867 ms) : 0, 22867
section profiling
ProfilingAgent [baseline] (97.142 ms) : 0, 97142
ProfilingAgent [candidate] (96.101 ms) : 0, 96101
BytebuddyAgent [baseline] (709.707 ms) : 0, 709707
BytebuddyAgent [candidate] (709.122 ms) : 0, 709122
GlobalTracer [baseline] (350.532 ms) : 0, 350532
GlobalTracer [candidate] (349.196 ms) : 0, 349196
AppSec [baseline] (54.532 ms) : 0, 54532
AppSec [candidate] (54.042 ms) : 0, 54042
Remote Config [baseline] (677.747 µs) : 0, 678
Remote Config [candidate] (675.027 µs) : 0, 675
Telemetry [baseline] (8.901 ms) : 0, 8901
Telemetry [candidate] (8.842 ms) : 0, 8842
Profiling [baseline] (97.166 ms) : 0, 97166
Profiling [candidate] (96.124 ms) : 0, 96124
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.039 s) : 0, 1039408
Total [baseline] (8.655 s) : 0, 8654785
Agent [candidate] (1.044 s) : 0, 1044117
Total [candidate] (8.671 s) : 0, 8670908
section iast
Agent [baseline] (1.171 s) : 0, 1171014
Total [baseline] (9.235 s) : 0, 9235452
Agent [candidate] (1.172 s) : 0, 1171884
Total [candidate] (9.235 s) : 0, 9234529
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.174 s) : 0, 1173699
Total [baseline] (9.216 s) : 0, 9216427
Agent [candidate] (1.18 s) : 0, 1179818
Total [candidate] (9.216 s) : 0, 9216387
section iast_TELEMETRY_OFF
Agent [baseline] (1.168 s) : 0, 1167904
Total [baseline] (9.215 s) : 0, 9215436
Agent [candidate] (1.174 s) : 0, 1174425
Total [candidate] (9.228 s) : 0, 9228411
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.039 s -
Agent iast 1.171 s 131.607 ms (12.7%)
Agent iast_HARDCODED_SECRET_DISABLED 1.174 s 134.292 ms (12.9%)
Agent iast_TELEMETRY_OFF 1.168 s 128.497 ms (12.4%)
Total tracing 8.655 s -
Total iast 9.235 s 580.667 ms (6.7%)
Total iast_HARDCODED_SECRET_DISABLED 9.216 s 561.642 ms (6.5%)
Total iast_TELEMETRY_OFF 9.215 s 560.651 ms (6.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.044 s -
Agent iast 1.172 s 127.766 ms (12.2%)
Agent iast_HARDCODED_SECRET_DISABLED 1.18 s 135.701 ms (13.0%)
Agent iast_TELEMETRY_OFF 1.174 s 130.308 ms (12.5%)
Total tracing 8.671 s -
Total iast 9.235 s 563.62 ms (6.5%)
Total iast_HARDCODED_SECRET_DISABLED 9.216 s 545.478 ms (6.3%)
Total iast_TELEMETRY_OFF 9.228 s 557.502 ms (6.4%)
gantt
    title insecure-bank - break down per module: candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (717.965 ms) : 0, 717965
BytebuddyAgent [candidate] (720.709 ms) : 0, 720709
GlobalTracer [baseline] (238.817 ms) : 0, 238817
GlobalTracer [candidate] (240.188 ms) : 0, 240188
AppSec [baseline] (54.58 ms) : 0, 54580
AppSec [candidate] (55.482 ms) : 0, 55482
Remote Config [baseline] (681.958 µs) : 0, 682
Remote Config [candidate] (697.586 µs) : 0, 698
Telemetry [baseline] (12.192 ms) : 0, 12192
Telemetry [candidate] (12.185 ms) : 0, 12185
section iast
BytebuddyAgent [baseline] (838.033 ms) : 0, 838033
BytebuddyAgent [candidate] (838.019 ms) : 0, 838019
GlobalTracer [baseline] (229.57 ms) : 0, 229570
GlobalTracer [candidate] (230.463 ms) : 0, 230463
IAST [baseline] (22.752 ms) : 0, 22752
IAST [candidate] (22.697 ms) : 0, 22697
AppSec [baseline] (56.135 ms) : 0, 56135
AppSec [candidate] (56.504 ms) : 0, 56504
Remote Config [baseline] (615.789 µs) : 0, 616
Remote Config [candidate] (606.899 µs) : 0, 607
Telemetry [baseline] (8.735 ms) : 0, 8735
Telemetry [candidate] (8.761 ms) : 0, 8761
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (839.214 ms) : 0, 839214
BytebuddyAgent [candidate] (843.406 ms) : 0, 843406
GlobalTracer [baseline] (230.75 ms) : 0, 230750
GlobalTracer [candidate] (231.356 ms) : 0, 231356
IAST [baseline] (22.987 ms) : 0, 22987
IAST [candidate] (23.199 ms) : 0, 23199
AppSec [baseline] (56.132 ms) : 0, 56132
AppSec [candidate] (57.311 ms) : 0, 57311
Remote Config [baseline] (625.578 µs) : 0, 626
Remote Config [candidate] (620.043 µs) : 0, 620
Telemetry [baseline] (8.771 ms) : 0, 8771
Telemetry [candidate] (8.878 ms) : 0, 8878
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (835.018 ms) : 0, 835018
BytebuddyAgent [candidate] (840.05 ms) : 0, 840050
GlobalTracer [baseline] (229.746 ms) : 0, 229746
GlobalTracer [candidate] (230.819 ms) : 0, 230819
IAST [baseline] (23.231 ms) : 0, 23231
IAST [candidate] (22.452 ms) : 0, 22452
AppSec [baseline] (55.503 ms) : 0, 55503
AppSec [candidate] (56.843 ms) : 0, 56843
Remote Config [baseline] (598.777 µs) : 0, 599
Remote Config [candidate] (616.284 µs) : 0, 616
Telemetry [baseline] (8.612 ms) : 0, 8612
Telemetry [candidate] (8.644 ms) : 0, 8644
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2025-03-17T21:17:06 2025-03-17T21:24:51
git_branch master mobuchowski/openlineage-agent-direct
git_commit_date 1742234721 1742245417
git_commit_sha 1c3133b b936442
release_version 1.48.0-SNAPSHOT~1c3133b647 1.48.0-SNAPSHOT~b936442c10
start_time 2025-03-17T21:16:52 2025-03-17T21:24:37
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1742247091 1742247091
ci_job_id 851778441 851778441
ci_pipeline_id 59107964 59107964
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-w21syp1i-project-304-concurrent-0-ld87pyot 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-w21syp1i-project-304-concurrent-0-ld87pyot 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 14 metrics, 16 unstable metrics.

Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.363 ms) : 1343, 1383
.   : milestone, 1363,
appsec (1.738 ms) : 1714, 1762
.   : milestone, 1738,
appsec_no_iast (1.763 ms) : 1738, 1787
.   : milestone, 1763,
code_origins (1.693 ms) : 1666, 1719
.   : milestone, 1693,
iast (1.502 ms) : 1477, 1527
.   : milestone, 1502,
profiling (1.514 ms) : 1489, 1539
.   : milestone, 1514,
tracing (1.492 ms) : 1468, 1516
.   : milestone, 1492,
section candidate
no_agent (1.371 ms) : 1352, 1391
.   : milestone, 1371,
appsec (1.744 ms) : 1721, 1768
.   : milestone, 1744,
appsec_no_iast (1.738 ms) : 1713, 1763
.   : milestone, 1738,
code_origins (1.694 ms) : 1666, 1721
.   : milestone, 1694,
iast (1.516 ms) : 1491, 1542
.   : milestone, 1516,
profiling (1.522 ms) : 1498, 1545
.   : milestone, 1522,
tracing (1.483 ms) : 1458, 1508
.   : milestone, 1483,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.363 ms [1.343 ms, 1.383 ms] -
appsec 1.738 ms [1.714 ms, 1.762 ms] 375.015 µs (27.5%)
appsec_no_iast 1.763 ms [1.738 ms, 1.787 ms] 399.695 µs (29.3%)
code_origins 1.693 ms [1.666 ms, 1.719 ms] 329.75 µs (24.2%)
iast 1.502 ms [1.477 ms, 1.527 ms] 139.084 µs (10.2%)
profiling 1.514 ms [1.489 ms, 1.539 ms] 150.924 µs (11.1%)
tracing 1.492 ms [1.468 ms, 1.516 ms] 129.433 µs (9.5%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.371 ms [1.352 ms, 1.391 ms] -
appsec 1.744 ms [1.721 ms, 1.768 ms] 373.134 µs (27.2%)
appsec_no_iast 1.738 ms [1.713 ms, 1.763 ms] 366.561 µs (26.7%)
code_origins 1.694 ms [1.666 ms, 1.721 ms] 322.206 µs (23.5%)
iast 1.516 ms [1.491 ms, 1.542 ms] 145.1 µs (10.6%)
profiling 1.522 ms [1.498 ms, 1.545 ms] 150.532 µs (11.0%)
tracing 1.483 ms [1.458 ms, 1.508 ms] 111.743 µs (8.1%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.48.0-SNAPSHOT~b936442c10, baseline=1.48.0-SNAPSHOT~1c3133b647
    dateFormat X
    axisFormat %s
section baseline
no_agent (391.224 µs) : 371, 412
.   : milestone, 391,
iast (507.176 µs) : 485, 529
.   : milestone, 507,
iast_FULL (739.847 µs) : 718, 762
.   : milestone, 740,
iast_GLOBAL (559.858 µs) : 538, 582
.   : milestone, 560,
iast_HARDCODED_SECRET_DISABLED (520.511 µs) : 498, 543
.   : milestone, 521,
iast_INACTIVE (466.287 µs) : 445, 488
.   : milestone, 466,
iast_TELEMETRY_OFF (497.896 µs) : 476, 519
.   : milestone, 498,
tracing (459.071 µs) : 438, 480
.   : milestone, 459,
section candidate
no_agent (384.22 µs) : 364, 404
.   : milestone, 384,
iast (513.005 µs) : 490, 536
.   : milestone, 513,
iast_FULL (735.593 µs) : 714, 757
.   : milestone, 736,
iast_GLOBAL (568.012 µs) : 546, 590
.   : milestone, 568,
iast_HARDCODED_SECRET_DISABLED (525.579 µs) : 503, 548
.   : milestone, 526,
iast_INACTIVE (470.549 µs) : 449, 492
.   : milestone, 471,
iast_TELEMETRY_OFF (505.557 µs) : 482, 529
.   : milestone, 506,
tracing (459.114 µs) : 438, 480
.   : milestone, 459,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 391.224 µs [370.602 µs, 411.846 µs] -
iast 507.176 µs [485.394 µs, 528.959 µs] 115.952 µs (29.6%)
iast_FULL 739.847 µs [717.832 µs, 761.862 µs] 348.623 µs (89.1%)
iast_GLOBAL 559.858 µs [538.096 µs, 581.619 µs] 168.633 µs (43.1%)
iast_HARDCODED_SECRET_DISABLED 520.511 µs [498.427 µs, 542.595 µs] 129.287 µs (33.0%)
iast_INACTIVE 466.287 µs [444.801 µs, 487.773 µs] 75.063 µs (19.2%)
iast_TELEMETRY_OFF 497.896 µs [476.4 µs, 519.392 µs] 106.672 µs (27.3%)
tracing 459.071 µs [437.709 µs, 480.433 µs] 67.847 µs (17.3%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 384.22 µs [363.946 µs, 404.495 µs] -
iast 513.005 µs [490.272 µs, 535.738 µs] 128.784 µs (33.5%)
iast_FULL 735.593 µs [713.762 µs, 757.425 µs] 351.373 µs (91.5%)
iast_GLOBAL 568.012 µs [546.116 µs, 589.907 µs] 183.791 µs (47.8%)
iast_HARDCODED_SECRET_DISABLED 525.579 µs [503.328 µs, 547.829 µs] 141.358 µs (36.8%)
iast_INACTIVE 470.549 µs [448.938 µs, 492.161 µs] 86.329 µs (22.5%)
iast_TELEMETRY_OFF 505.557 µs [482.053 µs, 529.062 µs] 121.337 µs (31.6%)
tracing 459.114 µs [438.363 µs, 479.864 µs] 74.894 µs (19.5%)

Dacapo

}

static void setupSparkConf(SparkConf sparkConf) {
sparkConf.set("spark.openlineage.transport.type", "composite");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a condition on ol version that has composite transport?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that you mention it, we should have a condition on a pretty much newest version - that can support tags

Signed-off-by: Maciej Obuchowski <[email protected]>
@mobuchowski mobuchowski force-pushed the mobuchowski/openlineage-agent-direct branch from d36e910 to d48bb91 Compare March 13, 2025 15:13
@SuppressForbidden // called at most once per spark application
private static String removeUuidFromEndOfString(String input) {
return input.replaceAll(
"_[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$", "");
}

private Optional<Class> loadClass(String className) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just instrument OL listener instead? I.e. add listener setup code which sets all required settings if not present?

@@ -74,6 +77,7 @@ public abstract class AbstractDatadogSparkListener extends SparkListener {
private final int MAX_COLLECTION_SIZE = 5000;
private final int MAX_ACCUMULATOR_SIZE = 50000;
private final String RUNTIME_TAGS_PREFIX = "spark.datadog.tags.";
private static final String AGENT_OL_ENDPOINT = "openlineage/api/v1/lineage";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR since it is an agent change, but to keep the same default as openlineage, wondering if it could be easier to keep the endpoint name as /api/v1/lineage in the agent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that /api/v1/lineage endpoint was too confusing, and that as we're setting this - not customer - the additional endpoint setting does not matter that much. We can do a change on the client still, it was not yet released.

@@ -151,8 +156,50 @@ public AbstractDatadogSparkListener(SparkConf sparkConf, String appId, String sp
finishApplication(System.currentTimeMillis(), null, 0, null);
}
}));
initApplicationSpanIfNotInitialized();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should create the application span in the constructor/onApplicationStart.

In the case of Databricks or streaming jobs, we are currently not creating a application span:

  • in databricks the parent of a spark job is the databricks task that launched it
  • in streaming, we don't attach the streaming span to the application span

In those cases, I think we should skip overriding OpenLineage for now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this here, added check for databricks/streaming

@mobuchowski mobuchowski force-pushed the mobuchowski/openlineage-agent-direct branch 3 times, most recently from a97a316 to ad61bbd Compare March 17, 2025 10:57
Signed-off-by: Maciej Obuchowski <[email protected]>
@mobuchowski mobuchowski force-pushed the mobuchowski/openlineage-agent-direct branch from ad61bbd to b936442 Compare March 17, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: apache spark Apache Spark instrumentation type: enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants