Skip to content

Trino source should support source-level SSL verification config for HTTPS with self-signed certificates #18016

Description

@ningsh7

Description

When using the DataHub Trino source against a Trino coordinator configured with HTTPS and a self-signed certificate, ingestion fails with an SSL certificate verification error.

The Trino source documentation shows how to enable HTTPS via:

options:
  connect_args:
    http_scheme: "https"

However, there does not seem to be a clearly documented source-level option equivalent to ssl_verify for controlling certificate verification behavior in the Trino source.

In our environment, trying to use a self-signed HTTPS Trino endpoint caused ingestion to fail during schema discovery.

Environment

  • DataHub version: v1.5.0.6
  • Ingestion mode: UI ingestion through datahub-actions
  • Source type: trino
  • Trino coordinator: HTTPS enabled on port 8443
  • Certificate type: self-signed certificate
  • Deployment type: Docker Compose, internal/offline environment

Example recipe

source:
  type: trino
  config:
    host_port: "trino.example.internal:8443"
    database: "example_catalog"
    username: "admin"
    password: "${TRINO_PASSWORD}"
    include_views: true
    include_tables: true
    profiling:
      enabled: true
      profile_table_level_only: false
    stateful_ingestion:
      enabled: true
    options:
      connect_args:
        http_scheme: "https"

Current behavior

The ingestion pipeline starts successfully and reaches the Trino metadata extraction stage, but fails when querying Trino over HTTPS.

The failure happens while DataHub is querying Trino metadata, for example:

SELECT "schema_name"
FROM "information_schema"."schemata"

The ingestion then fails with an SSL verification error similar to:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate

The full error path shows that the failure occurs in the Trino Python client / requests stack while calling the Trino /v1/statement endpoint.

Workaround

We were able to make direct HTTPS requests from inside the datahub-actions container succeed by setting container-level environment variables:

REQUESTS_CA_BUNDLE=/etc/datahub/certs/trino-server.crt
SSL_CERT_FILE=/etc/datahub/certs/trino-server.crt

After setting these variables, a direct request from inside the container worked:

import requests

r = requests.get("https://trino.example.internal:8443/v1/info", timeout=10)
print(r.status_code)

This returned 200, confirming that the network, Trino HTTPS endpoint, and certificate file were valid.

However, this workaround is global to the datahub-actions container and is not source-specific. It is inconvenient for UI-managed ingestion and for deployments where different Trino sources may need different SSL verification settings.

Expected behavior

The Trino source should provide a documented source-level SSL verification option, similar in spirit to other connectors that support SSL verification configuration.

For example:

source:
  type: trino
  config:
    host_port: "trino.example.internal:8443"
    database: "example_catalog"
    username: "admin"
    password: "${TRINO_PASSWORD}"
    options:
      connect_args:
        http_scheme: "https"
    ssl_verify: false

Or, for a private CA / self-signed certificate:

source:
  type: trino
  config:
    host_port: "trino.example.internal:8443"
    database: "example_catalog"
    username: "admin"
    password: "${TRINO_PASSWORD}"
    options:
      connect_args:
        http_scheme: "https"
    ssl_verify: "/etc/datahub/certs/trino-server.crt"

The value could be passed to the underlying Trino Python client as the verify argument.

Why this is useful

Trino is commonly deployed in private enterprise environments with HTTPS enabled and certificates issued by an internal CA or self-signed certificates. In these environments, users need a straightforward way to configure SSL verification per Trino source.

Relying on global container environment variables such as REQUESTS_CA_BUNDLE works as a workaround, but it is not ideal because:

  1. It applies globally to the whole container.
  2. It is harder to manage from UI ingestion.
  3. It does not support different certificate settings per source.
  4. It is not obvious from the Trino source documentation.

Proposal

Add and document a Trino source config option such as:

ssl_verify: true | false | "/path/to/cert.pem"

Default behavior should remain unchanged:

ssl_verify: true

Suggested behavior:

  • ssl_verify: true: keep default certificate verification.
  • ssl_verify: false: disable certificate verification.
  • ssl_verify: "/path/to/cert.pem": use the provided certificate bundle for verification.

This option can then be mapped to the underlying Trino Python client's verify parameter.

Additional context

A similar class of issue has been discussed for other DataHub sources, such as Tableau, where source-level SSL verification configuration exists. It would be helpful for the Trino source to provide an equivalent documented mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions