Redact sensitive information in catalog queries #24563

piotrrzysko · 2024-12-23T11:40:02Z

Description

This a follow-up to #24562 that introduces redacting of security-sensitive information in statements containing connector properties, specifically:

CREATE CATALOG
EXPLAIN CREATE CATALOG
PREPARE CREATE CATALOG

The current approach is as follows:

For syntactically valid statements, only properties containing sensitive information are masked.
If a valid query references a nonexistent connector, all properties are masked.
If a query fails before or during parsing, nothing is masked.

Redacted queries are returned through the REST API, the system.runtime.queries table, and query events (QueryCreatedEvent and QueryCompletedEvent).

Notice that currently this PR includes 7 commits from #24562.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Section
* Redact sensitive information in statements containing connector properties. ({issue}`23106`)

hashhar

Looks mostly good to me. Some comments.

hashhar · 2025-01-02T08:31:58Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

+        return statementRedactingEnabled;
+    }
+
+    @Config("statement-redacting-enabled")


@mosabua for suggestions about config naming. 😄

I'm not sure we want an option to disable this. Maybe as a temporary kill switch, but we should remove this as soon as we are happy with this feature

Agreed, we can prefix with experimental. in that case like we have done in past to clarify this. Or maybe deprecated. from the beginning.

Added deprecated. prefix.

core/trino-main/src/main/java/io/trino/connector/DefaultCatalogFactory.java

hashhar · 2025-01-02T08:35:05Z

core/trino-main/src/main/java/io/trino/sql/SensitiveStatementRedactor.java

+        }
+
+        @Override
+        protected Node visitCreateCatalog(CreateCatalog createCatalog, Void context)


Is there some way to notice when we need to add new node visitors here?

Should this be a "wrapper" like the various Forwarding*** classes and a test to assert that full set of methods is overridden? That way once new methods get added we'll explicitly need to either override to do no-op or to redact?

WDYT? Might be overkill for now so need to change anything - just to have a discussion.

That's a good point.

Perhaps this test could verify that all (minus exclusions) visit methods are implemented only for Statement nodes and not for all possible Node types.

I think adding such a test is feasible, but I'll hold off for now to ensure we’ve reached agreement on the core parts of this functionality (e.g., the SPI, where the redacting is performed, etc.).

hashhar · 2025-01-02T08:37:20Z

core/trino-main/src/main/java/io/trino/dispatcher/DispatchManager.java

@@ -240,7 +248,7 @@ private <C> void createQueryInternal(QueryId queryId, Span querySpan, Slug slug,
            DispatchQuery dispatchQuery = dispatchQueryFactory.createDispatchQuery(


this automatically also handles things like event listener and QueryResource right?

Might be worth to explicitly call it out in the commit message (although you do imply that by mentioning anything using QueryInfo/BasicQueryInfo).

this automatically also handles things like event listener and QueryResource right?

Correct.

I extracted tests confirming that to separate commits into separate commits to avoid distracting from the core functionality of redacting.

I refined the commit message and included your suggestion.

dain · 2025-01-02T19:48:04Z

core/trino-main/src/main/java/io/trino/FeaturesConfig.java

+        return statementRedactingEnabled;
+    }
+
+    @Config("statement-redacting-enabled")


I'm not sure we want an option to disable this. Maybe as a temporary kill switch, but we should remove this as soon as we are happy with this feature

dain · 2025-01-02T19:55:45Z

core/trino-main/src/main/java/io/trino/sql/SensitiveStatementRedactor.java

+
+public class SensitiveStatementRedactor
+{
+    public static final String REDACTED_VALUE = "***";


We should consider a better value here than just ***. We could also consider using a special function like $redacted$(), which just throws exceptions if you try to actuall call that function.

*** seems to be almost what everyone uses for redaction.

Can you expand on the function idea? Is that to make it so that the output of SHOW CREATE CATALOG (as an example) is valid but still fails when you try to run it.

The SPI will be used by the engine to redact security-sensitive information in statements that manage catalogs. It has been added at the connector factory level, rather than the connector level, to allow more flexibility in retrieving properties. In some cases, we want to perform redacting before a connector is initiated. For example, when we create a new catalog by issuing the CREATE CATALOG statement.

Exposed properties fall into one of the following categories: they are either explicitly marked as security-sensitive or are unknown. The connector assumes that unknown properties might be misspelled security-sensitive properties.

This preparatory commit enables bootstrapping HDFS to retrieve its security-sensitive properties.

piotrrzysko · 2025-01-20T08:59:59Z

A few questions/suggestions:

For now, I’m not masking syntactically invalid or unsupported queries (e.g., EXPLAIN ANALYZE CREATE CATALOG) in any way. Initially, I handled this by replacing the entire query text with ***. However, this seems like a significant change from the user’s perspective. I suggest starting a separate discussion about it and addressing it as a follow-up if needed.

Another example of syntactically valid but unsupported query (fails with Unsupported statement type: ExecuteImmediate):
```
PREPARE create_catalog FROM
EXECUTE IMMEDIATE 
'CREATE CATALOG cat USING postgresql WITH (
   "connection-url" = ''jdbc:postgresql://localhost:4000/trino'',
   "connection-user" = ''admin'',
   "connection-password" = ''1234''
)';
```
Regarding $redacted$() vs. ***, I propose creating a GitHub issue to start a discussion. I believe we need to involve more people in this conversation. If we decide to go with $redacted$(), input from Martin would be necessary, as this is a syntax-related change.
I noticed an inconsistency around PREPARE CREATE CATALOG. While issuing PREPARE CREATE CATALOG is allowed, executing the prepared statement is not. Please take a look at this test: link.
Currently, I’m not masking EXECUTE arguments because I’m unsure which direction we prefer:
- Forbid PREPARE CREATE CATALOG, or
- Add full support for it.
There are more places than just the query and preparedQuery fields in QueryInfo and query events where prepared statements may leak. The places I’ve identified so far are:
- io.trino.SessionRepresentation#preparedStatements
- io.trino.execution.QueryInfo#addedPreparedStatements
I haven’t addressed these yet because I’d like to first discuss point 3.

@dain @hashhar I'd appreciate your feedback.

This is a preparatory commit to enable the use of this method in two contexts: - Creating or updating a catalog - Redacting a catalog's security-sensitive properties

This commit introduces redacting of security-sensitive information in the following statements: * CREATE CATALOG * EXPLAIN CREATE CATALOG * PREPARE CREATE CATALOG The current approach is as follows: * For syntactically valid statements, only properties containing sensitive information are masked. * If a query is syntactically valid but retrieving security-sensitive properties fails for any reason (e.g., the query references a nonexistent connector or catalog property evaluation fails), all properties are masked. * If a query fails before or during parsing, nothing is masked. The redacted form is created right before initialization of the QueryStateMachine and is propagated to all places that create QueryInfo and BasicQueryInfo (e.g., REST endpoints, query events, and the system.runtime.queries table). Before this change, QueryInfo/BasicQueryInfo stored the raw query text received from the end user. From now on, the text will be altered for the cases listed above.

@JsonConstructor for TrimmedBasicQueryInfo was introduced to facilitate the deserialization of server responses in tests.

hashhar · 2025-01-23T16:10:22Z

I like the idea of a function that throws a error that you need to use a secret reference here because it allows the SHOW output to be valid SQL while preserving the fact that secrets are required.
IMO we should forbid PREPARE with arguments in general for all DDL - none of them works right now. PREPARE without arguments IMO should still be allowed because for example lot of tools might have a habit of using PreparedStatement in JDBC even if there are no arguments.
QueryInfo is only returned from v1/query from quick checking - if so then it's actually "secure" since the endpoint itself is protected via checkCanViewQuery (i.e. sysadmin + owner by default). But yes users can misconfigure their access control to allow checkCanViewQuery to pass for a wide group of people. Not sure of alternative ways of protecting QueryInfo.

github-actions · 2025-02-13T17:02:47Z

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

martint · 2025-02-13T18:08:32Z

core/trino-main/src/main/java/io/trino/connector/DefaultCatalogFactory.java

+    {
+        ConnectorFactory connectorFactory = connectorFactories.get(catalogProperties.connectorName());
+        if (connectorFactory == null) {
+            // If someone tries to use a non-existent connector, we assume they


This should be an error, actually. It should throw IllegalArgumentException.

But then the query will fail during redaction. The idea is to avoid disrupting the natural flow and let it fail where it normally would if redaction didn't exist.

Why would a query fail during redaction if it hasn’t first failed during analysis? I.e., it’s a condition that should never occur.

We perform redaction at a very early stage (before the query state machine is created) to modify query text exposed in query events and QueryInfo. I believe that verifying the existence of a given connector happens only during execution, for example, in CreateCatalogTask.

I’m very confused about the purpose of this change, then. It redaction happens before analysis, how is the analyzer and execution engine able to see the unredacted values so that it can to its job?

Can you describe the technical approach at a high level so that I don’t have to reverse engineer what the code is trying to achieve?

Please let me know if the following helps:

Problem

Currently, when we execute a CREATE CATALOG statement containing plaintext secrets, unredacted query text is exposed via the REST API, the system.runtime.queries table, and query events.

Goal

Instead of displaying the query text in its raw form in the locations mentioned above, such as:

CREATE CATALOG test USING postgresql WITH ( "connection-user" = 'bob', "connection-password" = '1234' )

we aim to redact security-sensitive property values:

CREATE CATALOG test USING postgresql WITH ( "connection-user" = 'bob', "connection-password" = '***' )

Proposed Solution

The REST API, the system.runtime.queries table, and query events obtain query text from the QueryInfo object. Based on our research, the query text contained in QueryInfo is not interpreted anywhere in the engine.

The QueryInfo object is created by the QueryStateMachine. To redact the query text, we propose performing redaction after the query is parsed (to ensure we have the AST, available for traversal and redaction) but before the QueryStateMachine is created.

Since redaction occurs at an early stage of query processing, we need to duplicate some logic that is typically performed during analysis and execution. For example, this includes evaluating catalog properties. Additionally, we do not want to disrupt the normal query processing flow; therefore, we ensure the query never fails due to redaction. If, for any reason, redaction is not possible, we will resort to masking all properties.

To identify security-sensitive properties for a given connector, we propose introducing a new SPI to expose them: #24562

martint · 2025-02-13T18:20:20Z

core/trino-main/src/main/java/io/trino/sql/SensitiveStatementRedactor.java

+import static java.util.Objects.requireNonNull;
+import static org.weakref.jmx.$internal.guava.collect.ImmutableSet.toImmutableSet;
+
+public class SensitiveStatementRedactor


I have concerns about this approach. This couples the implementation with the AST structure and visitor infrastructure, and it requires performing actions that should be the responsibility of the analyzer (e.g., the logic in visitCreateCatalog). In particular, the dependency on CreateCatalogTask.evaluateProperties is not appropriate.

Also, this is not sufficient -- what about values in query plans that may be derived from security sensitive properties? E.g., a TableHandle in a plan may need to include the URL/user/password for a database. Running an EXPLAIN would expose those details.

Also, this is not sufficient -- what about values in query plans that may be derived from security sensitive properties? E.g., a TableHandle in a plan may need to include the URL/user/password for a database. Running an EXPLAIN would expose those details.

I tried to reproduce this but I couldn't find a case where TableHandle includes catalog properties. Do you have any specific example when this might happen? Also, is my understanding correct that if security-sensitive properties might leak in EXPLAIN output, it is something that unrelated strictly to dynamic catalogs and might happen for static catalogs as well?

github-actions · 2025-03-11T17:03:24Z

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

cla-bot bot added the cla-signed label Dec 23, 2024

This was referenced Dec 23, 2024

Redact sensitive information in catalog queries #23104

Closed

Add connector SPI for returning redactable properties #24562

Open

piotrrzysko force-pushed the redact-sensitive-queries branch from 98470bb to ed595a1 Compare December 23, 2024 13:47

hashhar reviewed Jan 2, 2025

View reviewed changes

piotrrzysko mentioned this pull request Jan 2, 2025

Extend syntax for Dynamic Catalogs #22188

Open

1 task

dain reviewed Jan 2, 2025

View reviewed changes

piotrrzysko force-pushed the redact-sensitive-queries branch from ed595a1 to 654d3e2 Compare January 9, 2025 17:51

github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Jan 9, 2025

piotrrzysko force-pushed the redact-sensitive-queries branch 3 times, most recently from e049e24 to c31d6e1 Compare January 13, 2025 15:28

piotrrzysko force-pushed the redact-sensitive-queries branch 2 times, most recently from df20a77 to 2490789 Compare January 20, 2025 08:34

piotrrzysko added 7 commits January 20, 2025 09:49

Expose security-sensitive properties for HDFS

0352081

This preparatory commit enables bootstrapping HDFS to retrieve its security-sensitive properties.

Expose security-sensitive properties for Hive connector

20aee8e

Expose security-sensitive properties for Iceberg connector

362bc90

Expose security-sensitive properties for Delta Lake connector

6437e20

Expose security-sensitive properties for Hudi connector

e32ef86

piotrrzysko force-pushed the redact-sensitive-queries branch from 2490789 to 7eec53c Compare January 20, 2025 08:50

piotrrzysko force-pushed the redact-sensitive-queries branch from 7eec53c to 26978c8 Compare January 20, 2025 09:28

piotrrzysko added 3 commits January 23, 2025 15:28

Extract evaluateProperties in CreateCatalogTask

5060c7b

Add createCatalogProperties to CatalogManager

152bd6a

This is a preparatory commit to enable the use of this method in two contexts: - Creating or updating a catalog - Redacting a catalog's security-sensitive properties

piotrrzysko added 2 commits January 23, 2025 16:47

Ensure queries in system.runtime.queries are redacted

0fa189b

Ensure queries returned via REST API are redacted

3c0af7b

@JsonConstructor for TrimmedBasicQueryInfo was introduced to facilitate the deserialization of server responses in tests.

piotrrzysko force-pushed the redact-sensitive-queries branch from 26978c8 to 3c0af7b Compare January 23, 2025 15:54

piotrrzysko mentioned this pull request Jan 24, 2025

Add function $redacted$() #24790

Open

github-actions bot added the stale label Feb 13, 2025

martint requested changes Feb 13, 2025

View reviewed changes

github-actions bot removed the stale label Feb 14, 2025

github-actions bot added the stale label Mar 11, 2025

hashhar added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Mar 12, 2025

		@@ -240,7 +248,7 @@ private <C> void createQueryInternal(QueryId queryId, Span querySpan, Slug slug,
		DispatchQuery dispatchQuery = dispatchQueryFactory.createDispatchQuery(

Redact sensitive information in catalog queries #24563

Are you sure you want to change the base?

Redact sensitive information in catalog queries #24563

Uh oh!

Conversation

piotrrzysko commented Dec 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

hashhar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrrzysko commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hashhar commented Jan 23, 2025

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Problem

Goal

Proposed Solution

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 11, 2025

Uh oh!

Uh oh!

piotrrzysko commented Dec 23, 2024 •

edited

Loading

piotrrzysko commented Jan 20, 2025 •

edited

Loading