Skip to content

monte-carlo-data/custom-connector-setup

Repository files navigation

custom-connector-setup

A validation toolkit for building custom database connectors. You implement a set of base classes — providing connection logic and Jinja SQL templates for your database dialect — then run the included test suite to verify correctness and discover which metrics and capabilities your connector supports. The end result is a generic agent image that you host, deploy, and then register in Monte Carlo.

Supports multiple connectors side by side so you can build and test several at once.

Using an AI Coding Agent

An AI coding agent can handle the entire workflow — from scaffolding and driver installation to implementing all ~100 template methods, running tests, and building the deployable image. You just provide the database credentials.

Recommended: Claude Code skills

The repo includes four skills that automate the full workflow end-to-end:

Step Skill What it does
1 /create-connector <name> Scaffold a new connector directory
2 /setup-connection <name> Install driver, implement connection methods, stub credentials.jsonpauses for you to fill in credentials
3 /implement-connector <name> [hybrid] Implement all template methods section by section
4 /build-agent-image <name> [--mode MODE] Export capabilities and build deployable Docker image

The only manual step is filling in credentials.json when /setup-connection pauses. Everything else — scaffolding, driver installation, template implementation, testing, and image building — is handled by the skills.

Fallback: Other AI agents

If you're not using Claude Code, complete steps 1–6 of Quick Start below to set up connectivity, then provide AGENTS.md as context to your LLM along with the connector name. The agent will implement all remaining template methods, run tests iteratively, and export capabilities. Resume at step 10 to build the deployable image.

Quick Start

1. Create a connector

python scripts/create_connector.py <name>

This creates connectors/<name>/ with:

  • connector.py — base classes to implement (copy of the canonical template)
  • manifest.json — unique connection_type identifier
  • credentials.json — database credentials (gitignored)
  • requirements.txt — database driver dependencies
  • Dockerfile.extra — system dependency instructions (empty by default)

2. Implement the connector classes

Edit connectors/<name>/connector.py and fill in the base classes:

Class Purpose
BaseConnector Connection lifecycle — create_connection, create_cursor, execute_query, fetch_all_results, close_connection
MetadataQueryTemplates Jinja templates for discovering databases, schemas, tables, and columns
QueryLogCollectionTemplates Jinja template for fetching query logs
CustomSQLMonitorTemplates Jinja templates for custom SQL monitor operations (count wrapping, row limits)
QueryLanguageTemplates ~90 Jinja templates covering type casting, date/time functions, aggregations, comparisons, string operations, and more
FunctionalTestOperations (Optional) Jinja templates for functional validation — DDL/DML operations (create/drop table, insert rows, add/drop columns) that let the test suite run metadata collection before and after each mutation to confirm metrics actually update. This validates that your metadata sources reflect real-time changes. See Functional Validation Tests for details.

Every template method returns a template string. Most use format-string placeholders like {x} (substituted later by the backend); some use Jinja {{ variable }} syntax. For example:

def get_avg_function_template(self) -> str:
    return "AVG({x})"                                   # placeholder — {x} substituted later

def get_casting_to_numeric_expression_template(self) -> str:
    return "CAST({{ expression }} AS NUMERIC)"           # Jinja variable — rendered at template time

Each method's docstring documents which pattern it uses, the expected variables, and example implementations for common databases. See How Templates Work for details.

3. Add your database driver

Add your driver to connectors/<name>/requirements.txt:

psycopg2-binary==2.9.9

Then rebuild the Docker image:

docker compose build

4. Add system dependencies (optional)

If your database driver requires system-level libraries (ODBC drivers, native clients, etc.), add the installation commands to connectors/<name>/Dockerfile.extra:

RUN apt-get update && apt-get install -y --no-install-recommends \
    unixodbc-dev \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

Then regenerate the test Dockerfile:

python scripts/generate_test_dockerfile.py

The Dockerfile.extra contents are injected into both the test image and the deployable agent image. The create_connector.py script and the /setup-connection skill regenerate the test Dockerfile automatically — you only need to run the command above if you edit Dockerfile.extra manually after initial setup.

Dockerfile.extra supports RUN, ENV, and ARG instructions. COPY is not supported because the agent image builds in a temporary directory.

5. Configure credentials

Add your database credentials to connectors/<name>/credentials.json:

{
  "connect_args": {
    "host": "localhost",
    "port": 5432,
    "database": "mydb",
    "user": "myuser",
    "password": "mypassword"
  }
}

The keys in connect_args are whatever your create_connection() method expects via self.credentials:

def create_connection(self):
    import psycopg2
    return psycopg2.connect(
        host=self.credentials["host"],
        port=int(self.credentials["port"]),
        database=self.credentials["database"],
        user=self.credentials["user"],
        password=self.credentials["password"],
    )

This same JSON format is used for self-hosted credentials when deploying — just swap in production values.

6. Build the Docker image

docker compose build

Some database drivers include native libraries built for a specific architecture. If you hit errors loading .so files, rebuild with the correct platform:

docker compose build --build-arg TARGETPLATFORM=linux/amd64

Rebuild whenever you change requirements.txt or Dockerfile.extra (remember to regenerate the test Dockerfile first if you changed Dockerfile.extra).

7. Verify the connection

CONNECTOR=<name> docker compose run --rm test -m connection

This runs two quick checks: connection creation and cursor creation. Fix any credential or networking issues before moving on.

If only one connector exists, you can omit CONNECTOR=:

docker compose run --rm test -m connection

8. Implement and test section by section

Work through each section of connector.py incrementally. Implement the methods for one section, run its corresponding tests, fix any failures, then move on to the next section.

# Metadata collection
CONNECTOR=<name> docker compose run --rm test -m metadata

# Query language prerequisites (needed for metric monitors)
CONNECTOR=<name> docker compose run --rm test -m ql_prerequisites

# Query language metric templates
CONNECTOR=<name> docker compose run --rm test -m ql_metrics

# Custom SQL monitors
CONNECTOR=<name> docker compose run --rm test -m custom_monitors

# Functional validation (optional)
CONNECTOR=<name> docker compose run --rm test -m functional

Rebuild the Docker image (docker compose build) after changing connector.py or requirements.txt.

9. Run the full suite and export

Once all sections pass individually, run the full test suite with --export to generate the manifest and passing templates:

CONNECTOR=<name> docker compose run --rm test --export

Note: --export requires the full test suite (no -m filter).

10. Review capabilities

After a full test run with --export, output/<name>/manifest.json is generated with:

  • connection_type — unique identifier for this connector
  • connection_name — connector directory name
  • capabilities — which features your connector supports (metadata collection, query logs, custom SQL monitors, metric monitors, etc.)
  • metrics — which metrics your connector supports, derived from template results and the metrics mapping

Passing templates are exported to output/<name>/templates/.

11. Build a deployable agent image

Once your connector passes tests and templates are exported, package everything into a custom agent image:

python scripts/generate_agent_image.py

This takes the public montecarlodata/agent:latest-generic image as a base and layers on your connector artifacts. The resulting custom agent image contains:

  • Exported templates (output/<name>/templates/) — the passing Jinja templates
  • Manifest (output/<name>/manifest.json) — capabilities and supported metrics
  • Connector code (connector.py) — your connection and execution logic
  • Driver dependencies (requirements.txt, Dockerfile.extra) — database drivers and system libraries

Credentials are NOT included in the image. Your credentials.json stays local and is never copied into the image. Production credentials are provided at deploy time via self-hosted credentials.

The generic agent is an egress-only agent that works across all supported platforms (Docker Compose, Kubernetes, EKS, AKS, GKE). See Generic Agent Platforms for deployment options.

Options:

Flag Default Description
--version latest Agent base image version
--connector all with output/ Which connectors to include (repeatable)
--docker-platform linux/amd64 Docker platform for the image
--tag custom-agent:{version}-generic Output image tag
--mode auto auto, full, or hybrid — see Modes below

Include specific connectors:

python scripts/generate_agent_image.py --connector postgres --connector mysql

Modes:

Full (default) Hybrid
Metadata & query logs Collected by the agent Pushed externally
Requires supports_metadata == true supports_custom_sql_monitor == true
Metric monitors Optional (warning if prereqs incomplete) Optional (warning if prereqs incomplete)
Classes to implement All 5 BaseConnector + CustomSQLMonitorTemplates (+ QueryLanguageTemplates for metric monitors)

Full mode (default) — the agent handles metadata collection and metric monitors:

python scripts/generate_agent_image.py

Hybrid mode — metadata is pushed externally, the agent only needs metric monitor support:

python scripts/generate_agent_image.py --mode hybrid

Verify the image:

docker run --rm --entrypoint ls custom-agent:latest-generic /opt/custom-connectors/

Then push the agent image to your container registry and deploy. Your local connectors/<name>/credentials.json is already in the format needed for self-hosted credentials — just swap in production values and configure them at deploy time.

12. Clean up

When you're done, remove the Docker image and any stopped containers:

docker compose down --rmi local

Nothing is installed on your machine — everything runs inside the container.

Requirements

Project Structure

custom-connector-setup/
  connectors/
    _base/                                # Provided — do not edit
      connector.py                        # Canonical template with all base classes
      __init__.py                         # Exports the base classes
    <your-database>/                      # Created by you (one directory per connector)
      connector.py                        # Your implementation (fill in stubs)
      credentials.json                    # Database credentials (gitignored)
      manifest.json                       # {"connection_type": "custom-connector-xxx", "name": "..."}
      requirements.txt                    # Database driver deps
      Dockerfile.extra                    # System dependency instructions (optional)
  output/                                 # Auto-generated by --export (gitignored)
    <your-database>/
      manifest.json                       # Test results and supported features
      templates/                          # Passing .j2 templates
  scripts/                                # Provided
    create_connector.py                   # Scaffolding helper (stdlib only)
    generate_agent_image.py               # Builds deployable custom agent Docker image
    generate_test_dockerfile.py           # Regenerates root Dockerfile from Dockerfile.extra files
  tests/                                  # Provided — do not edit
    conftest.py                           # Test fixtures (TestConnector, Templates, QueryTestHelper)
    capabilities_plugin.py                # Pytest plugin — generates manifest.json
    test_connection.py                    # Connection tests
    test_metadata_collection.py           # Metadata discovery tests
    test_custom_monitors.py               # Custom SQL monitor tests
    test_ql_prerequisites.py              # Prerequisite templates for metric monitors
    test_ql_metrics.py                    # Metric-specific templates (AVG, STDDEV, LENGTH, regexp, etc.)
    test_functional_validation.py         # Functional validation tests (real-time metadata accuracy)
  .claude/
    skills/                               # Claude Code automation skills
      create-connector/SKILL.md
      setup-connection/SKILL.md
      implement-connector/SKILL.md
      build-agent-image/SKILL.md
  AGENTS.md                               # Instructions for AI coding agents
  pytest.toml                             # Pytest configuration and markers
  requirements.txt                        # Shared Python dependencies
  Dockerfile                              # Test runner image
  docker-compose.yml                      # Docker Compose configuration

How Templates Work

All customer-provided SQL is expressed as Jinja templates running in a sandboxed environment (jinja2.sandbox.ImmutableSandboxedEnvironment). No raw Python code is ingested by the backend — connection and execution logic stays in your deployment only.

Templates produce SQL fragments and come in three flavors:

Placeholder templates (most common)

These receive no Jinja variables. They output Python format-string placeholders like {x} that the backend substitutes later via .format(x=field_name). Because they pass through Jinja untouched (single braces aren't Jinja syntax), the rendered template is the format string itself.

def get_avg_function_template(self) -> str:
    return "AVG({x})"                     # {x} is a literal — NOT a Jinja variable

def get_is_gt_expression_template(self) -> str:
    return "{x} > {y}"                    # two placeholders

Parameterized templates

These receive named Jinja variables ({{ var }}) that the backend passes as keyword arguments at render time. Use these when the template needs actual values to produce correct SQL.

def get_casting_to_numeric_expression_template(self) -> str:
    return "CAST({{ expression }} AS NUMERIC)"

def add_from_clause_template(self) -> str:
    return "{{ select_clause }} FROM {{ from_expression }}"

Some templates are hybrid — they combine {x} placeholders with Jinja variables:

def get_in_past_days_expression_template(self) -> str:
    return "{x} >= CURRENT_DATE - INTERVAL '{{ days }} days'"

Static templates

No variables at all — the rendered output is always the same string.

def current_timestamp_func_template(self) -> str:
    return "CURRENT_TIMESTAMP()"

Boolean capability flags are also templates that render to "true" or "false":

def supports_literal_select_template(self) -> str:
    return "true"  # SELECT 1 works without FROM

Each method's docstring documents which pattern it uses and what variables it expects. Read the docstring before implementing.

How Tests Work

Tests use the ql fixture (a QueryTestHelper instance) that bridges your connector and templates:

@pytest.mark.template(func="get_avg_function_template")
def test_avg(ql):
    data = [{"val": 10}, {"val": 20}, {"val": 30}]
    # Placeholder templates: render first, then .format() to substitute {x}
    avg_expr = ql.render(ql.templates.get_avg_function_template).format(x="val")
    result = ql.select_from_data_source(data, avg_expr)
    assert float(result) == pytest.approx(20.0)

@pytest.mark.template(func="get_casting_to_numeric_expression_template")
def test_cast_numeric(ql):
    data = [{"val": "42"}]
    # Parameterized templates: pass Jinja variables as keyword arguments
    cast_expr = ql.render(ql.templates.get_casting_to_numeric_expression_template, expression="val")
    result = ql.select_from_data_source(data, cast_expr)
    assert float(result) == pytest.approx(42.0)

The helper builds CTEs from Python dicts, renders templates, executes queries against your real database, and validates results.

Functional Validation Tests

The standard tests verify that your metadata templates return correct data types, but they don't verify that the data is real-time. Functional validation tests catch stale sources (e.g. statistics tables that only update when stats are collected) by making actual database changes and verifying your metadata queries detect them.

How it works

The tests create a test table, run metadata collection, mutate the table (insert rows, add columns), run collection again, and assert the changes are detected.

Implementing FunctionalTestOperations

Add a FunctionalTestOperations class to your connector.py. All you need is a table identifier and Jinja templates for basic DDL/DML operations:

class FunctionalTestOperations:
    def get_test_table_identifier(self) -> tuple:
        return ("my_database", "my_schema", "pandora_functional_test")

    def create_test_table_template(self) -> str:
        return "CREATE TABLE {{ schema }}.{{ table }} (id SERIAL PRIMARY KEY, value TEXT)"

    def insert_rows_template(self) -> str:
        return "INSERT INTO {{ schema }}.{{ table }} (value) SELECT 'row_' || g FROM generate_series(1, {{ num_rows }}) g"

    def add_column_template(self) -> str:
        return "ALTER TABLE {{ schema }}.{{ table }} ADD COLUMN {{ column_name }} {{ column_type }}"

    def drop_column_template(self) -> str:
        return "ALTER TABLE {{ schema }}.{{ table }} DROP COLUMN {{ column_name }}"

    def drop_test_table_template(self) -> str:
        return "DROP TABLE IF EXISTS {{ schema }}.{{ table }}"

    def create_lineage_query_template(self) -> str:
        return "SELECT * FROM {{ schema }}.{{ table }} WHERE 1=0"

get_test_table_identifier() returns (database, schema, table) — the single source of truth for the test table identity. The framework injects these as {{ database }}, {{ schema }}, and {{ table }} into every template, so the table name in the SQL always matches what the tests look for in metadata results.

What the tests verify

Test What it validates
test_table_discovery_after_create New table appears in metadata
test_table_discovery_after_drop Dropped table disappears from metadata
test_volume_change_after_insert row_count increases after insert
test_byte_count_change_after_insert byte_count increases after insert
test_freshness_change_after_insert last_update_time advances after insert
test_schema_change_after_add_column New column appears in column metadata
test_schema_change_after_drop_column Dropped column disappears from column metadata
test_query_log_capture Executed query appears in query logs

Tests auto-skip when stubs are not implemented or when the relevant feature (row_count, freshness, columns, query logs) is not supported by your connector.

Running functional tests

CONNECTOR=<name> docker compose run --rm test -m functional

Advanced Usage

Testing with a local agent build

By default, generate_agent_image.py pulls the public montecarlodata/agent image from DockerHub as the base. If you need to test against a local or unreleased version of the agent, you can build the apollo-agent image locally and use --base-image to point at it:

# Clone and build the agent locally
git clone https://github.com/monte-carlo-data/apollo-agent.git
cd apollo-agent
docker build -t local-agent .

# Use the local build as the base for your custom image
cd /path/to/custom-connector-setup
python scripts/generate_agent_image.py --base-image local-agent

This is useful for debugging agent-side behavior or verifying your connector works with in-development agent changes before they're published.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors