Skip to content

Conversation

@stevejgordon
Copy link
Contributor

Changes

Adds extra test cases for SQL query text sanitisation and summary parsing to cover more scenarios.

We have used these test cases in the .NET implementation while refining and optimising it.

An open area for discussion is whether it should be optional to skip numeric literal sanitisation when the value is used as part of a TOP (e.g. SELECT TOP (10) FROM MyTable) statement or type declaration (e.g. ALTER TABLE Orders ADD COLUMN OrderStatus NVARCHAR(50)). The tests currently allow this as an optional expectation.

cc @alanwest

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

@maryliag
Copy link
Contributor

whether it should be optional to skip numeric literal sanitization

I think we can give the option, as you mentioned, because it has the advantage of performance (since it's one less action to do), but just to confirm I still rather have sanitization as default, since having the value can increase cardinality.

"name": "summary_truncated",
"input": {
"db.system.name": "other_sql",
"query": "SELECT * FROM Vecnzotjejucwitzgrfifscuevittljlnrlpbruvkezeptqciyvbjhsytmbucbhwttidayecnthaztxbyppbyztcqccedeirgkxzrfezjxfwbtuqxeusroqgbvulgmsvnelovkxqsaqmlogomkhtjuirzhaocxlrmerihnmwaelullionarkmxwdamhduwrbooknqsnilurutgyerxphokeqnnoumbpcfjtmqrbpukjllofiwaltyoawkp o, OrderDetails od"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one just makes me think of

When performing sanitization, instrumentation MAY truncate the sanitized value for performance considerations (since sanitizing has a performance cost).

do have any an implementation that truncates the value? curious because we didn't provide a guide on the size we should truncate, but wondering if is worth a test

Copy link
Member

@alanwest alanwest Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is about the truncation required for db.query.summary not the sanitized db.query.text.

The spec for generating the query summary states:

Instrumentations that parse the query to set db.query.summary SHOULD truncate the summary to 255 characters (ensuring truncation does not occur within an operation name or target).

So in this case db.query.summary is equal to SELECT rather than

SELECT Vecnzotjejucwitzgrfifscuevittljlnrlpbruvkezeptqciyvbjhsytmbucbhwttidayecnthaztxbyppbyztcqccedeirgkxzrfezjxfwbtuqxeusroqgbvulgmsvnelovkxqsaqmlogomkhtjuirzhaocxlrmerihnmwaelullionarkmxwdamhduwrbooknqsnilurutgyerxphokeqnnoumbpcfjtmqrbpukjllofiwaltyoawkp OrderDetails

Comment on lines +1135 to +1146
"name": "malformed_in_clause",
"input": {
"db.system.name": "other_sql",
"query": "SELECT * FROM table WHERE value IN ('abc', 0xAB, .456,"
},
"expected": {
"db.query.text": [
"SELECT * FROM table WHERE value IN (?, ?, ?,"
],
"db.query.summary": "SELECT table"
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to hear others' opinions on whether they think we should maintain "malformed" test cases in this shared test suite.

In .NET we test these cases to ensure we're not leaking any sensitive information because it is not easy for us to identify that the query is invalid. If other languages have the same challenge then it might make sense to keep these test cases. On the other hand, if other languages can more readily identify invalid queries, then I'd imagine they would simply refrain from setting db.query.text/db.query.summary in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it make sense to keep it, I don't know if all languages can easily identify it was malformed, so I see a few possible scenarios:

we can identify is malformed

  • replace sensitive information and send what we have sanitize
  • don't return any query text, this way we save some resources and skip sanitization. In this case we can return some message about being malformed, but I assume it would cause an error to be created and we would knew it was malformed

we can't identify is malformed

  • we need to sanitize, because we might think is a valid and send the query text

So I do think we should have the tests, but I'm not clear on what the results should be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to mark test cases representing invalid SQL with a boolean of some sort?

That way if there are languages that would otherwise not set db.query.summary and db.query.text when the query is invalid can skip the test case or validate that these attributes are not set.

Comment on lines +842 to +845
"name": "alter_role",
"input": {
"db.system.name": "other_sql",
"query": "ALTER ROLE app_admin ADD MEMBER johndoe;"
Copy link
Member

@alanwest alanwest Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this syntax may be unique to microsoft.sql_server. For example, postgres has an ALTER ROLE operation but it has a separate GRANT operation for what ADD MEMBER is doing here.

My original intent was for db.system.name to identify the dialect test case was meant to target. In this way, an implementation that only targets a specific dialect could be validated against only the relevant test cases. We were using other_sql to indicate that the test case is relevant for all dialects.

Since we've really only focused on MSSQL Server so far, there may be other test cases in this PR that should just be marked microsoft.sql_server. It's hard to tell though until we study other dialects a bit more. At this point, maybe it just makes sense to do our best to mark things microsoft.sql_server when we know for sure it's just SQL Server. We can come back and refine things as we learn more.

@joaopgrassi joaopgrassi moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Awaiting codeowners approval

Development

Successfully merging this pull request may close these issues.

3 participants