YTDB-154: Transaction-level query metrics listener#646
YTDB-154: Transaction-level query metrics listener#646
Conversation
Summary of ChangesHello @lpld, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces foundational observability features to YouTrackDB by implementing transaction-level query metrics. It provides a mechanism for tracking query execution and transaction commit times with configurable precision, laying the groundwork for future OpenTelemetry integration. The changes enable developers to monitor database performance with minimal overhead, offering both approximate and exact timing measurements. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Qodana for JVM20 new problems were found
💡 Qodana analysis was run in the pull request mode: only the changed files were checked Contact Qodana teamContact us at qodana-support@jetbrains.com
|
There was a problem hiding this comment.
Code Review
This pull request introduces a transaction-level query metrics listener for observability. The changes are comprehensive, including new interfaces for listeners, updates to YTDBTransaction to support them, a new TinkerPop strategy to inject metrics collection, and modifications to GranularTicker for low-overhead time approximation. The code is well-structured and includes new tests for the added functionality. I've found a couple of issues that should be addressed, for which I've left comments.
| () -> time = System.nanoTime(), | ||
| 0, granularity, TimeUnit.NANOSECONDS | ||
| () -> nanoTimeDifference = System.currentTimeMillis() - System.nanoTime() / 1_000_000, | ||
| timestampRefreshRate, timestampRefreshRate, TimeUnit.MILLISECONDS |
There was a problem hiding this comment.
The timestampRefreshRate parameter is provided in nanoseconds (from GlobalConfiguration and in tests), but it's being used with TimeUnit.MILLISECONDS here. This will cause the adjustment task to run much less frequently than intended. For example, with the default value of 10 seconds (10_000_000_000 ns), this schedules the task to run approximately every 115 days instead of every 10 seconds.
To fix this, you should use TimeUnit.NANOSECONDS.
| timestampRefreshRate, timestampRefreshRate, TimeUnit.MILLISECONDS | |
| timestampRefreshRate, timestampRefreshRate, TimeUnit.NANOSECONDS |
|
|
||
| public @Nonnull String getTrackingId() { | ||
| return trackingId != null ? trackingId : | ||
| " " + activeSession.getActiveTransaction().getId(); |
There was a problem hiding this comment.
The default tracking ID is being generated with a leading space. This seems unintentional. If it's not a typo, a comment explaining the purpose of the space would be helpful. Otherwise, it should be removed to avoid potential issues for consumers of this ID, such as needing to trim it.
| " " + activeSession.getActiveTransaction().getId(); | |
| String.valueOf(activeSession.getActiveTransaction().getId()); |
| () -> nanoTime = System.nanoTime(), | ||
| granularity, granularity, TimeUnit.NANOSECONDS | ||
| ); | ||
| executor.scheduleAtFixedRate( |
There was a problem hiding this comment.
Agree with Gemini.
| public void close() throws Exception { | ||
|
|
||
| final var duration = isLightweight ? ticker.approximateNanoTime() - nano : nano; | ||
| ytdbTx.getQueryMetricsListener().queryFinished( |
There was a problem hiding this comment.
If a traversal is created and closed without calling hasNext()/next(), close() still fires the listener with startMillis = 0 and a garbage duration.
| } | ||
|
|
||
| public @Nonnull String getTrackingId() { | ||
| return trackingId != null ? trackingId : |
|
|
||
| public @Nonnull String getTrackingId() { | ||
| return trackingId != null ? trackingId : | ||
| " " + activeSession.getActiveTransaction().getId(); |
There was a problem hiding this comment.
NPE if activeSesison is null, unlikely possible but good for defensive programming
There was a problem hiding this comment.
Do you think that throwing IllegalStateException here is okay? I am replacing activeSession with getDatabaseSession() call that already checks if the session is not null
|
|
||
| final var ticker = YouTrackDBEnginesManager.instance().getTicker(); | ||
| final String querySummary = YTDBQueryConfigParam.querySummary.getValue(traversal); | ||
| final var metricsStep = new YTDBQueryMetricsStep<>(traversal, ytdbTx, querySummary, ticker); |
There was a problem hiding this comment.
Step is added even if there is no real query listener.
There was a problem hiding this comment.
No, there is a "return" above:
if (!ytdbTx.isMonitoringEnabled()) {
return;
}but I am still gonna change this part a bit, because there is no need to add this step when only transaction listener is registered
There was a problem hiding this comment.
Yes, but check it does, or so presence of the TX listener is enough.
| package com.jetbrains.youtrackdb.internal.common.profiler.monitoring; | ||
|
|
||
| /// Controls the precision of query monitoring. | ||
| public enum QueryMonitoringMode { |
|
|
||
| public GranularTicker(long granularity) { | ||
| public GranularTicker(long granularityNanos, long timestampRefreshRate) { | ||
| this.executor = ThreadPoolExecutors.newSingleThreadScheduledPool("GranularTicker"); |
There was a problem hiding this comment.
I think we need to monitor all usages of thread pools and aggeregate them in one single manager to avoid thread context switching overhead. Please create PR.
There was a problem hiding this comment.
You mean a YouTrack issue?
There was a problem hiding this comment.
There was a problem hiding this comment.
Thank you already noticed
| private volatile long time; | ||
| private final long timestampRefreshRate; | ||
| private volatile long nanoTime; | ||
| private volatile long nanoTimeDifference; |
There was a problem hiding this comment.
Would be good to have comment why we need this.
|
|
||
| private void testQuery(QueryMonitoringMode mode, RememberingListener listener) throws Exception { | ||
| final var rand = RandomUtils.insecure(); | ||
| ((YTDBTransaction) g().tx()) |
There was a problem hiding this comment.
Would not it better if we override metho in GTS ?
There was a problem hiding this comment.
Sorry, could you elaborate?
There was a problem hiding this comment.
tx() returns transaction that users always need to case, let us override it in our graph traversal
There was a problem hiding this comment.
I actually thought about this, but the problem is that YTDBGraphTraversalSource#tx() can actually return DriverRemoteTransaction in remote mode and we can't simply change the type here.
| @Override | ||
| public void close() throws Exception { | ||
|
|
||
| final var duration = isLightweight ? ticker.approximateNanoTime() - nano : nano; |
There was a problem hiding this comment.
No, that is incorrect estimation for lightweight mode, we also check time from the granular ticker as our execution time is till start to the last hasNext/next not till the close, which may happen long time after.
andrii0lomakin
left a comment
There was a problem hiding this comment.
Please also add JMH benchmark to check performance overhead of lightweight mode on dumb querhy like g.inject(1).terate() I am looking forward to seeing results.
9e0e565 to
e016cb7
Compare
2284d2a to
607e5d5
Compare
…op step to collect metrics.
…lose In lightweight mode, duration now covers first-to-last hasNext/next call instead of first call to close. Also skip listener notification if the traversal was never iterated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset trackingId and queryMonitoringMode on commit and rollback. Also rename isMonitoringEnabled to isQueryMetricsEnabled, add @nonnull annotations with runtime checks, and remove unused imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move getValue (renamed to getConfigValue) out of the public API enum into YTDBStrategyUtil. Also move YTDBStrategyUtil from strategy.optimization to strategy package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Translate query bytecode to a readable Gremlin string using a custom ValueAnonymizingTypeTranslator that preserves labels, property names, step labels, and side-effect keys while parameterizing actual values. Comprehensive test coverage for ~100 Gremlin step variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ticker-measured duration window sits inside the System.nanoTime window, so assert [0, duration + lag] instead of approximate equality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add QueryMetricsOverheadBenchmark measuring the cost of lightweight monitoring on g.inject(1).iterate(). Enable JMH annotation processor for test compilation in core module. Also add @see tags to QueryMonitoringMode and null-check in isQueryMetricsEnabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7c2b992 to
d52d364
Compare
…nslator Split has() and property() into separate branches: property key (and optional Cardinality) are structural, values and meta-properties are parameterized. Add test coverage for all property() variants including Map, Cardinality+Map, CardinalityValueTraversal, and meta-properties. Also update class/method docs to accurately describe four argument categories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace g() casts with the base g field where the YTDB-specific type is not needed. Keep g() only for executeInTx and with(YTDBQueryConfigParam). Also apply linter fixes: var inference, unused import removal, line wrapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trackingId test coverage with randomized withTrackingId assertions. Fix Qodana inspections: suppress unused warning on benchmark, add @SuppressWarnings("resource") in YTDBStrategyUtil, remove unused TimeInterval.toMillis(). Fix javadoc references and minor comment tweaks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…csStep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move TransactionMetricsListener interface and its usages to a separate branch (ytdb-539-observability-tx-listeners) for future implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Motivation:
First part of the observability functionality in YouTrackDB, as described in https://youtrack.jetbrains.com/articles/YTDB-A-39/Observability-tools.
This change adds a low-overhead transaction-level
QueryMetricsListenerthat gets notified about the query execution time. This mechanism is supposed to be a base for our further OpenTelemetry integration. It can operate in two modes:LIGHTWEIGHTandEXACT. Lightweight mode adds almost no overhead to the query, while providing approximate values. Exact mode operates with higher precision at a potential performance cost.Changes:
QueryMetricsListenerinterface for query metrics that can be registered on an instance ofYTDBTransaction.YTDBTransactionnow has a method for listener registering:withQueryListener.withQueryMonitoringModeandwithTrackingId.querySummaryconfiguration parameter that can be used in Gremlin.with()step to provide a client string value that is gonna be associated with the query and passed to the query listener.GranularTickernow has the ability to approximate current time millis. It does so by keeping thecurrentTimeMillisandnanoTimedifference and refreshing it once in 10 seconds.YTDBQueryMetricsStrategyis now registered inYTDBGraphImplAbstract. It adds a newYTDBQueryMetricsStepthat does the execution time calculation based on the configured query execution mode.GranularTickerandYTDBQueryMetricsStrategy.GroovyTranslatorwith a customTypeTranslatoris used for translatingGraphTraversalinto a readable string representation with all the sensitive data anonymized.QueryMetricsOverheadBenchmarkthat measures the performance overhead of lightweight query monitoring on a minimal queryg.inject(1).iterate(). It has 3 different metrics: no monitoring, monitoring with empty "do nothing listener" and monitoring with a reading listener (one that gets the query string representation from theQueryDetails). Here are the results of running the benchmark on my working laptop (Intel i9-13900H):Shortcomings and differences from the original design
YTDBTransactionAPI is different from what has been described in the design document. During the implementation I have come to a conclusion that having separatewithTrackingIdandwithQueryMonitoringModeis clearer for understanding in situations when clients have the ability to register several different listeners on a single transaction.YTDBTransactioninternally keeps one instance of theQueryMetricsListenerinterface:QueryMetricsListener. In the future we will probably need to support multiple listeners of each type, but at the moment I didn't want to focus on this part - so it will probably got changed in the next PRs.