Description
When using the DataHub Trino source against a Trino coordinator configured with HTTPS and a self-signed certificate, ingestion fails with an SSL certificate verification error.
The Trino source documentation shows how to enable HTTPS via:
options:
connect_args:
http_scheme: "https"
However, there does not seem to be a clearly documented source-level option equivalent to ssl_verify for controlling certificate verification behavior in the Trino source.
In our environment, trying to use a self-signed HTTPS Trino endpoint caused ingestion to fail during schema discovery.
Environment
- DataHub version:
v1.5.0.6
- Ingestion mode: UI ingestion through
datahub-actions
- Source type:
trino
- Trino coordinator: HTTPS enabled on port
8443
- Certificate type: self-signed certificate
- Deployment type: Docker Compose, internal/offline environment
Example recipe
source:
type: trino
config:
host_port: "trino.example.internal:8443"
database: "example_catalog"
username: "admin"
password: "${TRINO_PASSWORD}"
include_views: true
include_tables: true
profiling:
enabled: true
profile_table_level_only: false
stateful_ingestion:
enabled: true
options:
connect_args:
http_scheme: "https"
Current behavior
The ingestion pipeline starts successfully and reaches the Trino metadata extraction stage, but fails when querying Trino over HTTPS.
The failure happens while DataHub is querying Trino metadata, for example:
SELECT "schema_name"
FROM "information_schema"."schemata"
The ingestion then fails with an SSL verification error similar to:
SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate
The full error path shows that the failure occurs in the Trino Python client / requests stack while calling the Trino /v1/statement endpoint.
Workaround
We were able to make direct HTTPS requests from inside the datahub-actions container succeed by setting container-level environment variables:
REQUESTS_CA_BUNDLE=/etc/datahub/certs/trino-server.crt
SSL_CERT_FILE=/etc/datahub/certs/trino-server.crt
After setting these variables, a direct request from inside the container worked:
import requests
r = requests.get("https://trino.example.internal:8443/v1/info", timeout=10)
print(r.status_code)
This returned 200, confirming that the network, Trino HTTPS endpoint, and certificate file were valid.
However, this workaround is global to the datahub-actions container and is not source-specific. It is inconvenient for UI-managed ingestion and for deployments where different Trino sources may need different SSL verification settings.
Expected behavior
The Trino source should provide a documented source-level SSL verification option, similar in spirit to other connectors that support SSL verification configuration.
For example:
source:
type: trino
config:
host_port: "trino.example.internal:8443"
database: "example_catalog"
username: "admin"
password: "${TRINO_PASSWORD}"
options:
connect_args:
http_scheme: "https"
ssl_verify: false
Or, for a private CA / self-signed certificate:
source:
type: trino
config:
host_port: "trino.example.internal:8443"
database: "example_catalog"
username: "admin"
password: "${TRINO_PASSWORD}"
options:
connect_args:
http_scheme: "https"
ssl_verify: "/etc/datahub/certs/trino-server.crt"
The value could be passed to the underlying Trino Python client as the verify argument.
Why this is useful
Trino is commonly deployed in private enterprise environments with HTTPS enabled and certificates issued by an internal CA or self-signed certificates. In these environments, users need a straightforward way to configure SSL verification per Trino source.
Relying on global container environment variables such as REQUESTS_CA_BUNDLE works as a workaround, but it is not ideal because:
- It applies globally to the whole container.
- It is harder to manage from UI ingestion.
- It does not support different certificate settings per source.
- It is not obvious from the Trino source documentation.
Proposal
Add and document a Trino source config option such as:
ssl_verify: true | false | "/path/to/cert.pem"
Default behavior should remain unchanged:
Suggested behavior:
ssl_verify: true: keep default certificate verification.
ssl_verify: false: disable certificate verification.
ssl_verify: "/path/to/cert.pem": use the provided certificate bundle for verification.
This option can then be mapped to the underlying Trino Python client's verify parameter.
Additional context
A similar class of issue has been discussed for other DataHub sources, such as Tableau, where source-level SSL verification configuration exists. It would be helpful for the Trino source to provide an equivalent documented mechanism.
Description
When using the DataHub Trino source against a Trino coordinator configured with HTTPS and a self-signed certificate, ingestion fails with an SSL certificate verification error.
The Trino source documentation shows how to enable HTTPS via:
However, there does not seem to be a clearly documented source-level option equivalent to
ssl_verifyfor controlling certificate verification behavior in the Trino source.In our environment, trying to use a self-signed HTTPS Trino endpoint caused ingestion to fail during schema discovery.
Environment
v1.5.0.6datahub-actionstrino8443Example recipe
Current behavior
The ingestion pipeline starts successfully and reaches the Trino metadata extraction stage, but fails when querying Trino over HTTPS.
The failure happens while DataHub is querying Trino metadata, for example:
The ingestion then fails with an SSL verification error similar to:
The full error path shows that the failure occurs in the Trino Python client / requests stack while calling the Trino
/v1/statementendpoint.Workaround
We were able to make direct HTTPS requests from inside the
datahub-actionscontainer succeed by setting container-level environment variables:After setting these variables, a direct request from inside the container worked:
This returned
200, confirming that the network, Trino HTTPS endpoint, and certificate file were valid.However, this workaround is global to the
datahub-actionscontainer and is not source-specific. It is inconvenient for UI-managed ingestion and for deployments where different Trino sources may need different SSL verification settings.Expected behavior
The Trino source should provide a documented source-level SSL verification option, similar in spirit to other connectors that support SSL verification configuration.
For example:
Or, for a private CA / self-signed certificate:
The value could be passed to the underlying Trino Python client as the
verifyargument.Why this is useful
Trino is commonly deployed in private enterprise environments with HTTPS enabled and certificates issued by an internal CA or self-signed certificates. In these environments, users need a straightforward way to configure SSL verification per Trino source.
Relying on global container environment variables such as
REQUESTS_CA_BUNDLEworks as a workaround, but it is not ideal because:Proposal
Add and document a Trino source config option such as:
Default behavior should remain unchanged:
Suggested behavior:
ssl_verify: true: keep default certificate verification.ssl_verify: false: disable certificate verification.ssl_verify: "/path/to/cert.pem": use the provided certificate bundle for verification.This option can then be mapped to the underlying Trino Python client's
verifyparameter.Additional context
A similar class of issue has been discussed for other DataHub sources, such as Tableau, where source-level SSL verification configuration exists. It would be helpful for the Trino source to provide an equivalent documented mechanism.