feature:add service-to-middleware relations#117
feature:add service-to-middleware relations#117wsx864321 wants to merge 10 commits intoVictoriaMetrics:masterfrom
Conversation
There was a problem hiding this comment.
2 issues found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/vtselect/traces/query/query.go">
<violation number="1" location="app/vtselect/traces/query/query.go:732">
P2: Middleware edge generation drops valid DB client relations when the parent span kind is not 2/5 due to restrictive inner-join filter.</violation>
</file>
<file name="lib/protoparser/opentelemetry/pb/trace_fields.go">
<violation number="1" location="lib/protoparser/opentelemetry/pb/trace_fields.go:73">
P2: Middleware detection uses `db.system` constant, which may miss spans following newer OTel semconv `db.system.name`.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
d750959 to
a689cc3
Compare
|
Hello. To align the scenario what this pull request want to optimize:
So this puill request want to generate the relation only by (middleware) |
No, currently it is ‘client’ span to be the “child”, ‘server’ or ‘consumer’ span to be the "parent"。 This is sql |
this is query result |
|
By "current implementation", I refer to what we have now in VictoriaMetrics. I'd like to confirm what's missing first, before checking the new implementation.
It seems this is not the case in current version (v0.8.0), could you confirm? |
I misunderstood your meaning, you are right. I just want to increase the relationship between services and middleware |
I see. The idea looks fine. I was thinking whether we really need that
Please feel free to correct me if it's not the case. I'm not familiar with the semantic conventions and haven't tested it. |
Exactly, there's really no need for ‘parent’,because there is ‘resource_attr:service.name’. Can I revise it and resubmit it? |
this is logSql |
|
Feel free to update the branch or perform a force push. |
There was a problem hiding this comment.
2 issues found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/vtselect/traces/query/query.go">
<violation number="1" location="app/vtselect/traces/query/query.go:617">
P1: `GetServiceGraphTimeRange` was repurposed to db/middleware edge extraction but is still used as the core service-to-service graph path, causing functional regression in background graph generation.</violation>
<violation number="2" location="app/vtselect/traces/query/query.go:624">
P2: Service graph namespace dimension is introduced on generation but dropped on read aggregation, causing cross-namespace edge collapse.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
c0c56d0 to
b3321bf
Compare
Updated code,Plz cr,thx. |
3f6a94e to
e5de53c
Compare
e6ba3fe to
89800f8
Compare
|
Hi. While I'm testing this, may I ask if you know of other projects that are also doing similar things (generating relationships from a single span)? This would help us align with them, especially as we consider what the service graph should look like in the future iteration. |
Sorry. I haven't researched any other projects yet |
|
@jiekun Datadog are doing the same and it's a great one :) I've reimplemented the same thing in our dashboards using grafana infinity and a bunch of JSON mutation logic, but it would be great to have it OOTB. Here the Postgres does not report any metrics.
|
This looks great, it seems that we can also achieve the same function, but the data may not be accurate in the case of tail sampling. I will also study Datadog.thx |
|
I looks good overall. But I think conditions like |
|
Also, regarding the use of The new attribute name
|
Of course, you can modify it as you wish |
ok, Then let's use the latest standard db.sysytem.name, and I'll upgrade our SDK to solve it |
1. Simplified the chile node by removing `namespace`. The `namespace` info may be return back since one service can connect to multiple databases in same type (e.g. MySQL), and the `namespace` is an important identifier. However, the existing service graphs doesn't carry this fine-grain information, and it's important to align them. This may be improved together when VictoriaTrace/Jaeger API support visualizing more info. 2. Removed extra flags for enabling svc-db service graph. It can share the same limit of the svc-svc since svc-svc relation (seems to) has higher priority. The change could be reverted if these are not the case in real-world. But let's start from the minimal implementation and iterate when necessary. Signed-off-by: Jiekun <jiekun@victoriametrics.com>
|
Please see the follow-up commit a7bc1d1. However, during my retesting, it appears that the "database" node without an identifier in the global view may be referenced by many services—even though these services are not actually pointing to the "same database" instance or cluster.
Each "edge" of the relation is still show the correct call count (e.g., the I have concern about this:
Edit: Alright. I think it's better to adopt
|
1. Simplified the chile node by removing `namespace`. The `namespace` info may be return back since one service can connect to multiple databases in same type (e.g. MySQL), and the `namespace` is an important identifier. However, the existing service graphs doesn't carry this fine-grain information, and it's important to align them. This may be improved together when VictoriaTrace/Jaeger API support visualizing more info. 2. Removed extra flags for enabling svc-db service graph. It can share the same limit of the svc-svc since svc-svc relation (seems to) has higher priority. The change could be reverted if these are not the case in real-world. But let's start from the minimal implementation and iterate when necessary. Signed-off-by: Jiekun <jiekun@victoriametrics.com>
There was a problem hiding this comment.
5 issues found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="app/vtselect/traces/query/query.go">
<violation number="1" location="app/vtselect/traces/query/query.go:712">
P2: DB namespace dimension was dropped from service-to-DB aggregation, collapsing distinct DB targets into one edge per DB system.</violation>
</file>
<file name="app/victoria-traces/servicegraph/servicegraph.go">
<violation number="1" location="app/victoria-traces/servicegraph/servicegraph.go:115">
P2: Removing middleware flags and guard introduces backward-incompatible startup risk and removes previous middleware opt-out behavior.</violation>
<violation number="2" location="app/victoria-traces/servicegraph/servicegraph.go:121">
P1: DB graph fetch error causes early continue, dropping already-fetched service graph rows for that tenant/time window.</violation>
</file>
<file name="lib/protoparser/opentelemetry/pb/trace_fields.go">
<violation number="1" location="lib/protoparser/opentelemetry/pb/trace_fields.go:42">
P2: DB service-graph field key changed to `span_attr:db.system.name`, which can miss existing telemetry that uses `db.system` and underreport service→middleware relations.</violation>
</file>
<file name="deployment/docker/compose-vt-single.yml">
<violation number="1" location="deployment/docker/compose-vt-single.yml:36">
P2: Default compose now references a fork/dirty VictoriaTraces image tag, creating pull reliability and provenance risks for users.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
GET, I use namespace, actually to add the library name, you can refer to it( https://github.com/open-telemetry/semantic-conventions/blob/main/docs/db/database-spans.md )Of course, I can also accept using SVC: dbname here |
Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apptest/tests/service_graph_test.go">
<violation number="1" location="apptest/tests/service_graph_test.go:77">
P2: The test assumes deterministic dependency ordering, but API/query code does not guarantee order, so this assertion can be flaky with multiple edges.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…VictoriaTraces into fork/wsx864321/feature/middleware_graph_0305
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #117 +/- ##
========================================
Coverage ? 7.71%
========================================
Files ? 61
Lines ? 8829
Branches ? 0
========================================
Hits ? 681
Misses ? 8053
Partials ? 95 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@wsx864321 please take a look if you can, and maybe build and try with your data. We do love to see the feedback. If you need help building an binary or docker image please let me know. If all good, it will be merged to the upstream very soon. |
I have no further questions, it meets my expectations. you can merge at any time |
|
apologies for piping up here :) are you sure you want to display the When building a graph you probably want to make the nodes clickable - and there's nothing to click if there's no semi-unique ID. EDIT: it's probably more about the overall approach reg mapping the 'inferred entities', which can be both external services and databases - feel free to ignore me and we can chat in a separate issue |
|
I personally don't have much experience with During my testing, I noticed that My expectations are as follows:
Anyway, since there's different idea, I think we can hold this pull request for now and see if we have better way before merging the initial implementation. |
|
There's essentially a couple of different tradeoffs from what I've seen; it's nearly impossible to make it work for everyone 😅 The problem 2) would only exist if you're using As an example, the Here's a brain dump on my experience of making it work on our side: Approach 0: db.system.namePros:
Cons:
Approach 1: (caller)+db.system.namePros:
Cons:
Approach 2: db.namespacePros:
Cons:
This is what Datadog is using by default, and before we've had a cloud around The Datadog agent turns Real life sample of a cloud of services connecting to 'main' namespace from an another demo follows. Those have different
Approach 3: (network.peer.address)This is the IP of the DB node you're connected to. Pros:
Cons:
Approach 4: (server.address)Pros:
Cons:
Approach 5: (db.namespace)+(server.address)We effectively fall back to A.4 if Pros:
Cons:
Personally, we went for the A.4 which works for us, but I'd say that A.5 is more suitable for an open solution. |
In fact, I would prefer db.system.name-db.namespace for several reasons:
@jiekun cc |
|
In a cluster architecture you usually have either a single DNS name, or two names (one is a fast read tier and the second is a master for the writes). The problem with the namespaces is that there's a default namespace which is used most of the time - like the 'main' example on my screenshot. |
I agree with this actually, but it's not the case I've experienced. In my previous job for cluster connection, the administrator provides different addresses for different instances, while each addresses is a DNS record and can be switch easily when incident happen. So the case here is that, different users may have different deployments, and it's hard to reach a consensus for now. I got both feedback from you now but it's hard to decide. |
|
Agree, I just remembered MongoDB connection strings, and they can have as many 'seed' DBs as they want... Maybe just make it configurable? Grafana/otelcol's servicegraph does something called 'dimensions' in the config, which control how their service graph is built. |
|
Perhaps we can proceed with the current, most certain solution, which can then be continuously evolved without compatibility issues. |








Describe Your Changes
Increase the relationship mapping between services and middleware.
It should be noted that the middleware name obtained here is db.system, according to the official documentation of Hotel( https://github.com/open-telemetry/semantic-conventions/blob/v1.32.0/docs/database/database-spans.md )db.system.name, But in fact, the constant provided by the official SDK is db.system, so I am using db.system here, which can be discussed here.
Summary by cubic
Adds service-to-database edges to the service graph so you can see which services call which DBs. DB nodes are named
<service>:<db>and can be limited or disabled via-servicegraph.databaseTaskLimit.vtselect.GetServiceDBGraphTimeRangeto build svc→DB edges from client spans withspan_attr:db.system.name; parent=resource_attr:service.name, child=<service>:<db.system.name>; aggregatescallCount.otelpb.SpanAttrDbSystemName; adds an integration test for svc→DB edges (e.g.,serviceA:MongoDB) and documents the new flag.Written for commit f6efba7. Summary will update on new commits.