Open
Description
I've executed very simple query with OpenLineage integration enabled:
use pg.public;
insert into cde(id,value) select id,value from abc;
And got OpenLineage events:
ol_query_start.json
ol_query_complete.json
Resulting events contain facets trino_metadata
and trino_query_statistics
, but there are few issues preventing them to be used:
trino_metadata.query_plan
is a text representation of plan. For automatic plan inspection it's better to include the same plan in JSON format, as another field in the same facet.
trino_query_statistics
facet contain a lot of fields, all of them have string values, even if value is actually an integer or float number:
{
"processedInputRows": "2",
"physicalInputRows": "2",
"processedInputBytes": "22",
"analysisTime": "0.039000000",
"internalNetworkRows": "4",
"completedSplits": "11",
"spilledBytes": "0",
"outputBlockedTime": "0.0",
"peakTaskTotalMemory": "288",
"physicalInputBytes": "0"
}
- Some fields in
trino_query_statistics
facet contains nested fields which are objects in JavaQueryStatistics
class, and instead of JSON they are returned as string representation of Java object:
{
"cpuTimeDistribution": "[{stageId=0, tasks=1, p25=15, p50=15, p75=15, p90=15, p95=15, p99=15, min=15, max=15, total=15, average=15.0}, {stageId=1, tasks=1, p25=6, p50=6, p75=6, p90=6, p95=6, p99=6, min=6, max=6, total=6, average=6.0}, {stageId=2, tasks=1, p25=3, p50=3, p75=3, p90=3, p95=3, p99=3, min=3, max=3, total=3, average=3.0}]",
"stageGcStatistics": "[{stageId=0, tasks=1, fullGcTasks=0, minFullGcSec=0, maxFullGcSec=0, totalFullGcSec=0, averageFullGcSec=0}, {stageId=1, tasks=1, fullGcTasks=0, minFullGcSec=0, maxFullGcSec=0, totalFullGcSec=0, averageFullGcSec=0}, {stageId=2, tasks=1, fullGcTasks=0, minFullGcSec=0, maxFullGcSec=0, totalFullGcSec=0, averageFullGcSec=0}]",
"outputBufferUtilization": "[{stageId=0, tasks=1, p01=0.0, p05=0.0, p10=0.0, p25=0.0, p50=0.0, p75=2.30083448689341E-4, p90=3.0994415283203125E-4, p95=3.0994415283203125E-4, p99=3.0994415283203125E-4, min=0.0, max=3.0994415283203125E-4, duration=0.010681886}, {stageId=1, tasks=1, p01=0.0, p05=0.0, p10=0.0, p25=0.0, p50=6.22467014518136E-5, p75=4.163024838600526E-4, p90=5.006790161132812E-4, p95=5.006790161132812E-4, p99=5.006790161132812E-4, min=0.0, max=5.006790161132812E-4, duration=0.013562824}, {stageId=2, tasks=1, p01=0.0, p05=0.0, p10=0.0, p25=0.0, p50=1.5597716669174048E-4, p75=5.968228705004009E-4, p90=6.318092346191406E-4, p95=6.318092346191406E-4, p99=6.318092346191406E-4, min=0.0, max=6.318092346191406E-4, duration=0.018201713}]",
}
This maybe okay for human, but not suitable for handling by a software, as this requires adding custom parsers.
- Some fields in
trino_query_statistics
are JSON-serialized strings inside OpenLineage event JSON:
Instead, it could be a nested JSON object.
Metadata
Metadata
Assignees
Labels
No labels