-
Notifications
You must be signed in to change notification settings - Fork 74
Description
End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint
Problem
The Parquet export pipeline and GraphQL endpoint lack comprehensive end-to-end test coverage. Current testing includes:
- Parquet: Only a bash script (
scripts/tests/parquet/test-clickhouse-integration) that requires ClickHouse - GraphQL: Minimal coverage in
indexing.test.ts- only block height queries and basic transaction listing
This gap creates risk for regressions and makes it difficult to validate changes to these critical components.
Scope
Parquet Export Pipeline
API Endpoints:
POST /ar-io/admin/export-parquet- Trigger exportGET /ar-io/admin/export-parquet/status- Check export status
Test Scenarios:
-
Basic Export Flow
- Index known transactions via
queue-tx/queue-bundle - Trigger export with specific height range
- Poll status until complete
- Verify Parquet files created in output directory
- Verify file naming convention
- Index known transactions via
-
Data Integrity
- Export blocks/transactions/tags to Parquet
- Use DuckDB to query the Parquet files
- Compare row counts and field values against SQLite source
-
Configuration Options
skipL1Transactions: true/falseskipL1Tags: true/falsemaxFileRowsfile splitting- Different height ranges
-
Error Handling
- Invalid parameters (negative heights, missing outputDir)
- Export while another is running
- Invalid height ranges (start > end)
-
Status Tracking
- Verify status transitions:
not_started→running→completed - Verify error status on failure
- Verify status transitions:
GraphQL Endpoint
Test Scenarios:
-
Transaction Queries
- Single transaction by ID
- Filter by owners
- Filter by recipients
- Filter by tags (name/value pairs)
- Filter by
bundledIn - Filter by block height range (
min/max) - Combined filters
-
Block Queries
- Single block by ID
- Filter by height range
- Verify block fields (id, height, timestamp, previous)
-
Pagination
- Cursor-based navigation with
firstandafter - Verify
hasNextPageaccuracy - Page size limits (10-1000 enforcement)
- Both
HEIGHT_ASCandHEIGHT_DESCsort orders
- Cursor-based navigation with
-
Field Resolvers
owner.addressandowner.keyquantityandfee(AR/winston conversion)data.sizeanddata.typebundledIn.idfor data itemssignature(async fetching from parent)
-
Data Item Relationships
- Query data items via
bundledInfilter - Verify parent/child relationships
- Nested bundles
- Query data items via
Test Data Strategy
Recommended Approach: Hybrid
| Test Type | Data Source |
|---|---|
| Parquet Export | Real blocks via START_HEIGHT/STOP_HEIGHT + known bundles |
| GraphQL Blocks | Real blocks via height range |
| GraphQL Transactions | Known bundles with documented attributes |
| GraphQL Filters | Mix of real bundles + synthetic data items for edge cases |
Known Test Bundles (Already Used in Codebase)
| Bundle ID | Data Items | Features |
|---|---|---|
kJA49GtBVUWex2yiRKX1KSDbCE6I2xGicR-62_pnJ_c |
19 | Nested bundles (2-3 levels) |
FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY |
3 | Mixed signature types (Arweave, Ethereum, Solana) |
C7lP_aOvx4jXyFWBtJCrzTavK1gf5xfwvf5ML6I4msk |
6 | Standard bundle |
-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ |
79 | Large bundle |
Synthetic Data Items
For filter edge cases, use /ar-io/admin/queue-data-item to create data items with controlled attributes (owners, tags, etc.).
Pre-work Required
Document the attributes of existing test bundles:
- Owner addresses
- Data item IDs and their characteristics
- Tags present on each data item
- Signature types
Implementation
New Test Files
test/end-to-end/
├── parquet-export.test.ts # Parquet export pipeline tests
├── graphql-transactions.test.ts # GraphQL transaction query tests
├── graphql-blocks.test.ts # GraphQL block query tests
├── graphql-pagination.test.ts # GraphQL pagination tests
└── utils.ts # Add new helper functions
New Utility Functions (in utils.ts)
// Parquet export helpers
export const triggerParquetExport = async (params: ParquetExportParams) => {...};
export const getParquetExportStatus = async () => {...};
export const waitForParquetExportComplete = async (timeout?: number) => {...};
// Parquet verification (using DuckDB)
export const queryParquetFile = async (filePath: string, sql: string) => {...};
export const verifyParquetRowCount = async (dir: string, table: string, expected: number) => {...};
// GraphQL helpers
export const fetchGql = async (query: string, variables?: object) => {...};
export const fetchGqlTransactions = async (filters: TransactionFilters) => {...};
export const fetchGqlBlocks = async (filters: BlockFilters) => {...};Docker Compose Additions
- Mount additional volume for Parquet output directory in tests
- Consider optional ClickHouse service for full pipeline testing
Acceptance Criteria
Parquet Export
- Test basic export flow with status polling
- Test data integrity by comparing Parquet output to SQLite source
- Test all configuration options (skipL1Transactions, skipL1Tags, maxFileRows)
- Test error handling for invalid parameters
- Test status transitions
GraphQL
- Test single transaction query by ID
- Test transaction filters (owners, recipients, tags, bundledIn, block height)
- Test single block query by ID
- Test block filters (height range)
- Test pagination (first, after, hasNextPage)
- Test both sort orders (HEIGHT_ASC, HEIGHT_DESC)
- Test field resolvers (owner, quantity, fee, data, bundledIn, signature)
- Test nested bundle relationships
Infrastructure
- Document test bundle attributes
- Add utility functions to
test/end-to-end/utils.ts - Ensure tests run reliably in CI
Technical Notes
Parquet Verification with DuckDB
The codebase already uses duckdb-async for the export pipeline. Use the same library to verify exported files:
import { Database } from 'duckdb-async';
const db = await Database.create(':memory:');
const rows = await db.all(`SELECT COUNT(*) as count FROM '${parquetDir}/blocks/*.parquet'`);GraphQL Query Examples
# Filter by bundledIn
query {
transactions(bundledIn: ["FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY"]) {
edges {
node {
id
bundledIn { id }
}
}
}
}
# Pagination
query {
transactions(first: 10, after: "cursor", sort: HEIGHT_DESC) {
pageInfo { hasNextPage }
edges {
cursor
node { id }
}
}
}Test Data Dependencies
Tests depend on fetching real transactions from arweave.net/trusted gateways. Consider:
- Retry logic for transient network failures
- Documenting which transactions are required
- Potential caching/fixtures for offline testing (future enhancement)