-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Redact sensitive information in catalog queries #24563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
98470bb
to
ed595a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me. Some comments.
return statementRedactingEnabled; | ||
} | ||
|
||
@Config("statement-redacting-enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mosabua for suggestions about config naming. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want an option to disable this. Maybe as a temporary kill switch, but we should remove this as soon as we are happy with this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can prefix with experimental.
in that case like we have done in past to clarify this. Or maybe deprecated.
from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added deprecated.
prefix.
core/trino-main/src/main/java/io/trino/connector/DefaultCatalogFactory.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
protected Node visitCreateCatalog(CreateCatalog createCatalog, Void context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way to notice when we need to add new node visitors here?
Should this be a "wrapper" like the various Forwarding***
classes and a test to assert that full set of methods is overridden? That way once new methods get added we'll explicitly need to either override to do no-op or to redact?
WDYT? Might be overkill for now so need to change anything - just to have a discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
Perhaps this test could verify that all (minus exclusions) visit
methods are implemented only for Statement
nodes and not for all possible Node
types.
I think adding such a test is feasible, but I'll hold off for now to ensure we’ve reached agreement on the core parts of this functionality (e.g., the SPI, where the redacting is performed, etc.).
@@ -240,7 +248,7 @@ private <C> void createQueryInternal(QueryId queryId, Span querySpan, Slug slug, | |||
DispatchQuery dispatchQuery = dispatchQueryFactory.createDispatchQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this automatically also handles things like event listener and QueryResource
right?
Might be worth to explicitly call it out in the commit message (although you do imply that by mentioning anything using QueryInfo/BasicQueryInfo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this automatically also handles things like event listener and QueryResource right?
Correct.
I extracted tests confirming that to separate commits into separate commits to avoid distracting from the core functionality of redacting.
I refined the commit message and included your suggestion.
return statementRedactingEnabled; | ||
} | ||
|
||
@Config("statement-redacting-enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want an option to disable this. Maybe as a temporary kill switch, but we should remove this as soon as we are happy with this feature
|
||
public class SensitiveStatementRedactor | ||
{ | ||
public static final String REDACTED_VALUE = "***"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider a better value here than just ***
. We could also consider using a special function like $redacted$()
, which just throws exceptions if you try to actuall call that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
***
seems to be almost what everyone uses for redaction.
Can you expand on the function idea? Is that to make it so that the output of SHOW CREATE CATALOG
(as an example) is valid but still fails when you try to run it.
ed595a1
to
654d3e2
Compare
e049e24
to
c31d6e1
Compare
df20a77
to
2490789
Compare
The SPI will be used by the engine to redact security-sensitive information in statements that manage catalogs. It has been added at the connector factory level, rather than the connector level, to allow more flexibility in retrieving properties. In some cases, we want to perform redacting before a connector is initiated. For example, when we create a new catalog by issuing the CREATE CATALOG statement.
Exposed properties fall into one of the following categories: they are either explicitly marked as security-sensitive or are unknown. The connector assumes that unknown properties might be misspelled security-sensitive properties.
This preparatory commit enables bootstrapping HDFS to retrieve its security-sensitive properties.
2490789
to
7eec53c
Compare
A few questions/suggestions:
|
7eec53c
to
26978c8
Compare
This is a preparatory commit to enable the use of this method in two contexts: - Creating or updating a catalog - Redacting a catalog's security-sensitive properties
This commit introduces redacting of security-sensitive information in the following statements: * CREATE CATALOG * EXPLAIN CREATE CATALOG * PREPARE CREATE CATALOG The current approach is as follows: * For syntactically valid statements, only properties containing sensitive information are masked. * If a query is syntactically valid but retrieving security-sensitive properties fails for any reason (e.g., the query references a nonexistent connector or catalog property evaluation fails), all properties are masked. * If a query fails before or during parsing, nothing is masked. The redacted form is created right before initialization of the QueryStateMachine and is propagated to all places that create QueryInfo and BasicQueryInfo (e.g., REST endpoints, query events, and the system.runtime.queries table). Before this change, QueryInfo/BasicQueryInfo stored the raw query text received from the end user. From now on, the text will be altered for the cases listed above.
@JsonConstructor for TrimmedBasicQueryInfo was introduced to facilitate the deserialization of server responses in tests.
26978c8
to
3c0af7b
Compare
|
This pull request has gone a while without any activity. Tagging for triage help: @mosabua |
{ | ||
ConnectorFactory connectorFactory = connectorFactories.get(catalogProperties.connectorName()); | ||
if (connectorFactory == null) { | ||
// If someone tries to use a non-existent connector, we assume they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an error, actually. It should throw IllegalArgumentException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the query will fail during redaction. The idea is to avoid disrupting the natural flow and let it fail where it normally would if redaction didn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a query fail during redaction if it hasn’t first failed during analysis? I.e., it’s a condition that should never occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We perform redaction at a very early stage (before the query state machine is created) to modify query text exposed in query events and QueryInfo
. I believe that verifying the existence of a given connector happens only during execution, for example, in CreateCatalogTask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m very confused about the purpose of this change, then. It redaction happens before analysis, how is the analyzer and execution engine able to see the unredacted values so that it can to its job?
Can you describe the technical approach at a high level so that I don’t have to reverse engineer what the code is trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know if the following helps:
Problem
Currently, when we execute a CREATE CATALOG statement containing plaintext secrets, unredacted query text is exposed via the REST API, the system.runtime.queries table
, and query events.
Goal
Instead of displaying the query text in its raw form in the locations mentioned above, such as:
CREATE CATALOG test USING postgresql
WITH (
"connection-user" = 'bob',
"connection-password" = '1234'
)
we aim to redact security-sensitive property values:
CREATE CATALOG test USING postgresql
WITH (
"connection-user" = 'bob',
"connection-password" = '***'
)
Proposed Solution
The REST API, the system.runtime.queries
table, and query events obtain query text from the QueryInfo
object. Based on our research, the query text contained in QueryInfo
is not interpreted anywhere in the engine.
The QueryInfo
object is created by the QueryStateMachine
. To redact the query text, we propose performing redaction after the query is parsed (to ensure we have the AST, available for traversal and redaction) but before the QueryStateMachine
is created.
Since redaction occurs at an early stage of query processing, we need to duplicate some logic that is typically performed during analysis and execution. For example, this includes evaluating catalog properties. Additionally, we do not want to disrupt the normal query processing flow; therefore, we ensure the query never fails due to redaction. If, for any reason, redaction is not possible, we will resort to masking all properties.
To identify security-sensitive properties for a given connector, we propose introducing a new SPI to expose them: #24562
import static java.util.Objects.requireNonNull; | ||
import static org.weakref.jmx.$internal.guava.collect.ImmutableSet.toImmutableSet; | ||
|
||
public class SensitiveStatementRedactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about this approach. This couples the implementation with the AST structure and visitor infrastructure, and it requires performing actions that should be the responsibility of the analyzer (e.g., the logic in visitCreateCatalog). In particular, the dependency on CreateCatalogTask.evaluateProperties
is not appropriate.
Also, this is not sufficient -- what about values in query plans that may be derived from security sensitive properties? E.g., a TableHandle in a plan may need to include the URL/user/password for a database. Running an EXPLAIN would expose those details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is not sufficient -- what about values in query plans that may be derived from security sensitive properties? E.g., a TableHandle in a plan may need to include the URL/user/password for a database. Running an EXPLAIN would expose those details.
I tried to reproduce this but I couldn't find a case where TableHandle
includes catalog properties. Do you have any specific example when this might happen? Also, is my understanding correct that if security-sensitive properties might leak in EXPLAIN output, it is something that unrelated strictly to dynamic catalogs and might happen for static catalogs as well?
This pull request has gone a while without any activity. Tagging for triage help: @mosabua |
Description
This a follow-up to #24562 that introduces redacting of security-sensitive information in statements containing connector properties, specifically:
CREATE CATALOG
EXPLAIN CREATE CATALOG
PREPARE CREATE CATALOG
The current approach is as follows:
Redacted queries are returned through the REST API, the
system.runtime.queries
table, and query events (QueryCreatedEvent
andQueryCompletedEvent
).Notice that currently this PR includes 7 commits from #24562.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: