Skip to content

Proposal: Build Mini-CAT in Python CDK #287

Open
@devin-ai-integration

Description

@devin-ai-integration

Overview

Following a conversation with @aaronsteers, this issue proposes incrementally migrating CAT tests from the monorepo to the Python CDK. This will enable running tests directly against Python classes/functions without requiring Docker containers, for better debugging. This will also enable running via docker or via CLI, for non-Python sources, and will support Yaml-based sources, including those with custom components.py files.

Tests should be runnable via CLI and/or Docker (for parity with CAT today), and should also be runnable as a pytest suite, for more control and faster debug iteration loops.

Current Test Categories

Connector Acceptance Test (CAT) Checklist

This file tracks all the tests implemented in the CAT framework.

Test Files to Examine:

  • test_incremental.py
  • test_full_refresh.py
  • test_core.py

Test Categories:

1. Specification Tests (test_core.py TestSpec)

  1. Configuration Schema Validation

    • Validates connector configuration against JSON schema
    • Checks enum usage and uniqueness
    • Validates oneOf usage in specs
    • Tests required vs optional fields
    • Validates property types and formats
    • Checks date patterns and formats
  2. Secret Handling

    • Verifies proper marking of secret fields
    • Ensures secrets never appear in outputs
    • Validates OAuth flow parameters
    • Tests OAuth as default auth method
  3. Schema Validation

    • Checks property types (no arrays at root)
    • Validates object structures
    • Ensures backward compatibility
    • Verifies additional properties handling

2. Connection Tests (test_core.py TestConnection)

  1. Basic Connection Check
    • Tests successful connection scenarios
    • Validates error handling
    • Verifies connection status messages

3. Discovery Tests (test_core.py TestDiscovery)

  1. Catalog Structure

    • Verifies stream discovery
    • Validates JSON schemas
    • Ensures unique stream names
    • Checks cursor field definitions
    • Validates primary key existence
  2. Schema Compatibility

    • Tests backward compatibility
    • Validates supported data types
    • Checks sync mode support
    • Verifies primary key data types

4. Basic Read Tests (test_core.py TestBasicRead)

  1. Record Validation

    • Checks record structure against schema
    • Validates data types and formats
    • Verifies required fields presence
    • Tests empty streams handling
  2. Stream Status

    • Validates stream status messages
    • Checks status progression (STARTED → RUNNING → COMPLETE)
    • Verifies state message format
  3. Error Handling

    • Tests failure scenarios
    • Validates error trace messages
    • Checks connector behavior with invalid configs

5. Full Refresh Tests (test_full_refresh.py)

  1. Sequential Read Validation
    • Verifies identical data between syncs
    • Validates record order consistency
    • Checks emitted_at timestamp progression

6. Incremental Sync Tests (test_incremental.py)

  1. State Management

    • Tests state message emission
    • Validates cursor field handling
    • Verifies state checkpoints
    • Tests abnormal state values
  2. Record Processing

    • Checks record progression
    • Validates incremental filtering
    • Tests slice management
    • Verifies data consistency

7. Connector Attributes Tests (test_core.py TestConnectorAttributes)

  1. Metadata Validation
    • Checks primary key definitions
    • Validates allowed hosts configuration
    • Verifies suggested streams setup

8. Documentation Tests (test_core.py TestConnectorDocumentation)

  1. Structure Validation
    • Checks required sections presence
    • Validates documentation format
    • Verifies content templates
    • Tests link validity

Key Implementation Considerations for CDK:

  1. Modular Test Framework

    • Each test category should be independent
    • Support for selective test execution
    • Configurable test parameters
  2. Environment Management

    • Abstract container dependencies
    • Support for local and containerized testing
    • Flexible resource cleanup
  3. State Handling

    • Generic state management interface
    • Support for different state formats
    • Robust checkpoint management
  4. Schema Validation

    • Reusable schema validators
    • Type checking utilities
    • Format validation helpers

Migration Strategy

Phase 1: Core Validation Layer

  • Implement schema validators
  • Add record structure validation
  • Create type checking utilities
  • Port documentation tests

Phase 2: Test Infrastructure

  • Create modular test runners
  • Add configurable test parameters
  • Implement environment abstraction layer
  • Port specification tests
  • Port basic read tests

Phase 3: State Management

  • Design generic state interfaces
  • Implement checkpoint handling
  • Add state format validation
  • Port incremental sync tests
  • Port full refresh tests

Phase 4: Container Abstraction

  • Abstract Docker dependencies
  • Create flexible test runners
  • Support both local and containerized testing
  • Port connection tests

Benefits

  1. Faster test cycles for Python connectors
  2. Simplified local development
  3. Better integration with IDE tooling
  4. Reduced infrastructure requirements

Implementation Notes

  • No need for backward compatibility
  • Migrated tests should work with:
    • Declarative Yaml Sources (with and without custom Python components)
    • Python-Based Sources and Destinations
    • Docker-Based Sources and Destinations (Fallback for everything: Java/Kotlin/Python/etc.)
  • Support customization via Yaml or Python (via existing manifest or new test manifest)
  • Support incremental expansion of test converage, until eventually CAT can be fully deprecated.
  • Provide clear inline docs (esp. class, file, and method-level docstrings)

/cc @aaronsteers

Related spike from a couple quarters ago:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions