feat: Add now() to troubleshoot pipeline latencies#19386
feat: Add now() to troubleshoot pipeline latencies#19386abhishekrb19 wants to merge 6 commits intoapache:masterfrom
now() to troubleshoot pipeline latencies#19386Conversation
|
Fixes #18208 |
Co-authored-by: aho135 <andrewho135@gmail.com>
|
@abhishekrb19 Curious if you have any thoughts on this old issue: #6513 Seems like the reservations were mainly for using |
| `now()` is non-deterministic: replicated streaming tasks and task replays evaluate it to different | ||
| values, producing inconsistent results across replicas. Do not use `now()` to overwrite `__time`. | ||
| For Kafka, prefer [`kafka.timestamp`](../ingestion/data-formats.md#kafka) as the | ||
| `__time` source. |
There was a problem hiding this comment.
I'm not sure if it makes sense to mention Kafka here. The timestamps aren't equivalent as one is the time the event lands on Kafka and the other is the time the event is ingested in Druid
| |----|-----------| | ||
| |now|now() returns the current system timestamp in milliseconds since epoch (1970-01-01 00:00:00 UTC). This function is evaluated for each row at processing time. It's recommended to use this only for troubleshooting issues - see [Using now() in ingestion](#using-now-in-ingestion).| | ||
| |timestamp|timestamp(expr[,format-string]) parses string expr into date then returns milliseconds from java epoch. without 'format-string' it's regarded as ISO datetime format | | ||
| |unix_timestamp|same with 'timestamp' function but returns seconds instead | |
There was a problem hiding this comment.
| |unix_timestamp|same with 'timestamp' function but returns seconds instead | | |
| |unix_timestamp|unix_timestamp(expr[,format-string]) same with 'timestamp' function but returns seconds instead | |
Just for consistency with other functions
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
This is an automated review by Codex GPT-5
| { | ||
| // Return analysis indicating this is NOT constant | ||
| // by pretending we have a free variable | ||
| return new BindingAnalysis(); |
There was a problem hiding this comment.
[P2] Prevent now() from being planned as constant
Returning an empty BindingAnalysis makes ExpressionPlanner mark now() and expressions that only depend on now() as CONSTANT. Query-time expression selectors then build a ConstantExprEvalSelector and evaluate now() once when the selector is created, and expression filters/virtual columns also get stable cache keys, so native queries can reuse stale timestamps instead of evaluating at processing time. Add explicit non-deterministic handling, or otherwise keep now() out of constant selector/index/cache optimizations.
Fixes #18208
Adds a
now()expression macro to Druid's native expression. It returns the current system timestamp in milliseconds since epoch, evaluated per row at processing time. Intended primarily for ingestion-time troubleshooting - e.g.,capturing
now() - __timeas an ingestion_lag_ms dimension to measure end-to-end pipeline delay.Release note
Added
now()expression function that returns the current system timestamp in milliseconds since epoch. Useful at ingestion time for troubleshooting pipeline delays (e.g., now() - __time). Note:now()is non-deterministic as it evaluates for every row, so it can break idempotency and should not be used to overwrite__time.This PR has: