Skip to content

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint #572

@djwhitt

Description

@djwhitt

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint

Problem

The Parquet export pipeline and GraphQL endpoint lack comprehensive end-to-end test coverage. Current testing includes:

  • Parquet: Only a bash script (scripts/tests/parquet/test-clickhouse-integration) that requires ClickHouse
  • GraphQL: Minimal coverage in indexing.test.ts - only block height queries and basic transaction listing

This gap creates risk for regressions and makes it difficult to validate changes to these critical components.

Scope

Parquet Export Pipeline

API Endpoints:

  • POST /ar-io/admin/export-parquet - Trigger export
  • GET /ar-io/admin/export-parquet/status - Check export status

Test Scenarios:

  1. Basic Export Flow

    • Index known transactions via queue-tx/queue-bundle
    • Trigger export with specific height range
    • Poll status until complete
    • Verify Parquet files created in output directory
    • Verify file naming convention
  2. Data Integrity

    • Export blocks/transactions/tags to Parquet
    • Use DuckDB to query the Parquet files
    • Compare row counts and field values against SQLite source
  3. Configuration Options

    • skipL1Transactions: true/false
    • skipL1Tags: true/false
    • maxFileRows file splitting
    • Different height ranges
  4. Error Handling

    • Invalid parameters (negative heights, missing outputDir)
    • Export while another is running
    • Invalid height ranges (start > end)
  5. Status Tracking

    • Verify status transitions: not_startedrunningcompleted
    • Verify error status on failure

GraphQL Endpoint

Test Scenarios:

  1. Transaction Queries

    • Single transaction by ID
    • Filter by owners
    • Filter by recipients
    • Filter by tags (name/value pairs)
    • Filter by bundledIn
    • Filter by block height range (min/max)
    • Combined filters
  2. Block Queries

    • Single block by ID
    • Filter by height range
    • Verify block fields (id, height, timestamp, previous)
  3. Pagination

    • Cursor-based navigation with first and after
    • Verify hasNextPage accuracy
    • Page size limits (10-1000 enforcement)
    • Both HEIGHT_ASC and HEIGHT_DESC sort orders
  4. Field Resolvers

    • owner.address and owner.key
    • quantity and fee (AR/winston conversion)
    • data.size and data.type
    • bundledIn.id for data items
    • signature (async fetching from parent)
  5. Data Item Relationships

    • Query data items via bundledIn filter
    • Verify parent/child relationships
    • Nested bundles

Test Data Strategy

Recommended Approach: Hybrid

Test Type Data Source
Parquet Export Real blocks via START_HEIGHT/STOP_HEIGHT + known bundles
GraphQL Blocks Real blocks via height range
GraphQL Transactions Known bundles with documented attributes
GraphQL Filters Mix of real bundles + synthetic data items for edge cases

Known Test Bundles (Already Used in Codebase)

Bundle ID Data Items Features
kJA49GtBVUWex2yiRKX1KSDbCE6I2xGicR-62_pnJ_c 19 Nested bundles (2-3 levels)
FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY 3 Mixed signature types (Arweave, Ethereum, Solana)
C7lP_aOvx4jXyFWBtJCrzTavK1gf5xfwvf5ML6I4msk 6 Standard bundle
-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ 79 Large bundle

Synthetic Data Items

For filter edge cases, use /ar-io/admin/queue-data-item to create data items with controlled attributes (owners, tags, etc.).

Pre-work Required

Document the attributes of existing test bundles:

  • Owner addresses
  • Data item IDs and their characteristics
  • Tags present on each data item
  • Signature types

Implementation

New Test Files

test/end-to-end/
├── parquet-export.test.ts       # Parquet export pipeline tests
├── graphql-transactions.test.ts # GraphQL transaction query tests
├── graphql-blocks.test.ts       # GraphQL block query tests
├── graphql-pagination.test.ts   # GraphQL pagination tests
└── utils.ts                     # Add new helper functions

New Utility Functions (in utils.ts)

// Parquet export helpers
export const triggerParquetExport = async (params: ParquetExportParams) => {...};
export const getParquetExportStatus = async () => {...};
export const waitForParquetExportComplete = async (timeout?: number) => {...};

// Parquet verification (using DuckDB)
export const queryParquetFile = async (filePath: string, sql: string) => {...};
export const verifyParquetRowCount = async (dir: string, table: string, expected: number) => {...};

// GraphQL helpers
export const fetchGql = async (query: string, variables?: object) => {...};
export const fetchGqlTransactions = async (filters: TransactionFilters) => {...};
export const fetchGqlBlocks = async (filters: BlockFilters) => {...};

Docker Compose Additions

  • Mount additional volume for Parquet output directory in tests
  • Consider optional ClickHouse service for full pipeline testing

Acceptance Criteria

Parquet Export

  • Test basic export flow with status polling
  • Test data integrity by comparing Parquet output to SQLite source
  • Test all configuration options (skipL1Transactions, skipL1Tags, maxFileRows)
  • Test error handling for invalid parameters
  • Test status transitions

GraphQL

  • Test single transaction query by ID
  • Test transaction filters (owners, recipients, tags, bundledIn, block height)
  • Test single block query by ID
  • Test block filters (height range)
  • Test pagination (first, after, hasNextPage)
  • Test both sort orders (HEIGHT_ASC, HEIGHT_DESC)
  • Test field resolvers (owner, quantity, fee, data, bundledIn, signature)
  • Test nested bundle relationships

Infrastructure

  • Document test bundle attributes
  • Add utility functions to test/end-to-end/utils.ts
  • Ensure tests run reliably in CI

Technical Notes

Parquet Verification with DuckDB

The codebase already uses duckdb-async for the export pipeline. Use the same library to verify exported files:

import { Database } from 'duckdb-async';

const db = await Database.create(':memory:');
const rows = await db.all(`SELECT COUNT(*) as count FROM '${parquetDir}/blocks/*.parquet'`);

GraphQL Query Examples

# Filter by bundledIn
query {
  transactions(bundledIn: ["FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY"]) {
    edges {
      node {
        id
        bundledIn { id }
      }
    }
  }
}

# Pagination
query {
  transactions(first: 10, after: "cursor", sort: HEIGHT_DESC) {
    pageInfo { hasNextPage }
    edges {
      cursor
      node { id }
    }
  }
}

Test Data Dependencies

Tests depend on fetching real transactions from arweave.net/trusted gateways. Consider:

  • Retry logic for transient network failures
  • Documenting which transactions are required
  • Potential caching/fixtures for offline testing (future enhancement)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions