-
Notifications
You must be signed in to change notification settings - Fork 277
[chore] Additional test case cases for SQL sanitisation and summary #2959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[chore] Additional test case cases for SQL sanitisation and summary #2959
Conversation
I think we can give the option, as you mentioned, because it has the advantage of performance (since it's one less action to do), but just to confirm I still rather have sanitization as default, since having the value can increase cardinality. |
| "name": "summary_truncated", | ||
| "input": { | ||
| "db.system.name": "other_sql", | ||
| "query": "SELECT * FROM Vecnzotjejucwitzgrfifscuevittljlnrlpbruvkezeptqciyvbjhsytmbucbhwttidayecnthaztxbyppbyztcqccedeirgkxzrfezjxfwbtuqxeusroqgbvulgmsvnelovkxqsaqmlogomkhtjuirzhaocxlrmerihnmwaelullionarkmxwdamhduwrbooknqsnilurutgyerxphokeqnnoumbpcfjtmqrbpukjllofiwaltyoawkp o, OrderDetails od" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one just makes me think of
When performing sanitization, instrumentation MAY truncate the sanitized value for performance considerations (since sanitizing has a performance cost).
do have any an implementation that truncates the value? curious because we didn't provide a guide on the size we should truncate, but wondering if is worth a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is about the truncation required for db.query.summary not the sanitized db.query.text.
The spec for generating the query summary states:
Instrumentations that parse the query to set db.query.summary SHOULD truncate the summary to 255 characters (ensuring truncation does not occur within an operation name or target).
So in this case db.query.summary is equal to SELECT rather than
SELECT Vecnzotjejucwitzgrfifscuevittljlnrlpbruvkezeptqciyvbjhsytmbucbhwttidayecnthaztxbyppbyztcqccedeirgkxzrfezjxfwbtuqxeusroqgbvulgmsvnelovkxqsaqmlogomkhtjuirzhaocxlrmerihnmwaelullionarkmxwdamhduwrbooknqsnilurutgyerxphokeqnnoumbpcfjtmqrbpukjllofiwaltyoawkp OrderDetails
| "name": "malformed_in_clause", | ||
| "input": { | ||
| "db.system.name": "other_sql", | ||
| "query": "SELECT * FROM table WHERE value IN ('abc', 0xAB, .456," | ||
| }, | ||
| "expected": { | ||
| "db.query.text": [ | ||
| "SELECT * FROM table WHERE value IN (?, ?, ?," | ||
| ], | ||
| "db.query.summary": "SELECT table" | ||
| } | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to hear others' opinions on whether they think we should maintain "malformed" test cases in this shared test suite.
In .NET we test these cases to ensure we're not leaking any sensitive information because it is not easy for us to identify that the query is invalid. If other languages have the same challenge then it might make sense to keep these test cases. On the other hand, if other languages can more readily identify invalid queries, then I'd imagine they would simply refrain from setting db.query.text/db.query.summary in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it make sense to keep it, I don't know if all languages can easily identify it was malformed, so I see a few possible scenarios:
we can identify is malformed
- replace sensitive information and send what we have sanitize
- don't return any query text, this way we save some resources and skip sanitization. In this case we can return some message about being malformed, but I assume it would cause an error to be created and we would knew it was malformed
we can't identify is malformed
- we need to sanitize, because we might think is a valid and send the query text
So I do think we should have the tests, but I'm not clear on what the results should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to mark test cases representing invalid SQL with a boolean of some sort?
That way if there are languages that would otherwise not set db.query.summary and db.query.text when the query is invalid can skip the test case or validate that these attributes are not set.
| "name": "alter_role", | ||
| "input": { | ||
| "db.system.name": "other_sql", | ||
| "query": "ALTER ROLE app_admin ADD MEMBER johndoe;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this syntax may be unique to microsoft.sql_server. For example, postgres has an ALTER ROLE operation but it has a separate GRANT operation for what ADD MEMBER is doing here.
My original intent was for db.system.name to identify the dialect test case was meant to target. In this way, an implementation that only targets a specific dialect could be validated against only the relevant test cases. We were using other_sql to indicate that the test case is relevant for all dialects.
Since we've really only focused on MSSQL Server so far, there may be other test cases in this PR that should just be marked microsoft.sql_server. It's hard to tell though until we study other dialects a bit more. At this point, maybe it just makes sense to do our best to mark things microsoft.sql_server when we know for sure it's just SQL Server. We can come back and refine things as we learn more.
Changes
Adds extra test cases for SQL query text sanitisation and summary parsing to cover more scenarios.
We have used these test cases in the .NET implementation while refining and optimising it.
An open area for discussion is whether it should be optional to skip numeric literal sanitisation when the value is used as part of a TOP (e.g.
SELECT TOP (10) FROM MyTable) statement or type declaration (e.g.ALTER TABLE Orders ADD COLUMN OrderStatus NVARCHAR(50)). The tests currently allow this as an optional expectation.cc @alanwest
Merge requirement checklist
Change log entry added, according to the guidelines in When to add a changelog entry.[chore]Links to the prototypes or existing instrumentations (when adding or changing conventions)