End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint

# End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint

## Problem

The Parquet export pipeline and GraphQL endpoint lack comprehensive end-to-end test coverage. Current testing includes:

- **Parquet**: Only a bash script (`scripts/tests/parquet/test-clickhouse-integration`) that requires ClickHouse
- **GraphQL**: Minimal coverage in `indexing.test.ts` - only block height queries and basic transaction listing

This gap creates risk for regressions and makes it difficult to validate changes to these critical components.

## Scope

### Parquet Export Pipeline

**API Endpoints:**
- `POST /ar-io/admin/export-parquet` - Trigger export
- `GET /ar-io/admin/export-parquet/status` - Check export status

**Test Scenarios:**

1. **Basic Export Flow**
   - Index known transactions via `queue-tx`/`queue-bundle`
   - Trigger export with specific height range
   - Poll status until complete
   - Verify Parquet files created in output directory
   - Verify file naming convention

2. **Data Integrity**
   - Export blocks/transactions/tags to Parquet
   - Use DuckDB to query the Parquet files
   - Compare row counts and field values against SQLite source

3. **Configuration Options**
   - `skipL1Transactions: true/false`
   - `skipL1Tags: true/false`
   - `maxFileRows` file splitting
   - Different height ranges

4. **Error Handling**
   - Invalid parameters (negative heights, missing outputDir)
   - Export while another is running
   - Invalid height ranges (start > end)

5. **Status Tracking**
   - Verify status transitions: `not_started` → `running` → `completed`
   - Verify error status on failure

### GraphQL Endpoint

**Test Scenarios:**

1. **Transaction Queries**
   - Single transaction by ID
   - Filter by owners
   - Filter by recipients
   - Filter by tags (name/value pairs)
   - Filter by `bundledIn`
   - Filter by block height range (`min`/`max`)
   - Combined filters

2. **Block Queries**
   - Single block by ID
   - Filter by height range
   - Verify block fields (id, height, timestamp, previous)

3. **Pagination**
   - Cursor-based navigation with `first` and `after`
   - Verify `hasNextPage` accuracy
   - Page size limits (10-1000 enforcement)
   - Both `HEIGHT_ASC` and `HEIGHT_DESC` sort orders

4. **Field Resolvers**
   - `owner.address` and `owner.key`
   - `quantity` and `fee` (AR/winston conversion)
   - `data.size` and `data.type`
   - `bundledIn.id` for data items
   - `signature` (async fetching from parent)

5. **Data Item Relationships**
   - Query data items via `bundledIn` filter
   - Verify parent/child relationships
   - Nested bundles

## Test Data Strategy

### Recommended Approach: Hybrid

| Test Type | Data Source |
|-----------|-------------|
| Parquet Export | Real blocks via `START_HEIGHT`/`STOP_HEIGHT` + known bundles |
| GraphQL Blocks | Real blocks via height range |
| GraphQL Transactions | Known bundles with documented attributes |
| GraphQL Filters | Mix of real bundles + synthetic data items for edge cases |

### Known Test Bundles (Already Used in Codebase)

| Bundle ID | Data Items | Features |
|-----------|------------|----------|
| `kJA49GtBVUWex2yiRKX1KSDbCE6I2xGicR-62_pnJ_c` | 19 | Nested bundles (2-3 levels) |
| `FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY` | 3 | Mixed signature types (Arweave, Ethereum, Solana) |
| `C7lP_aOvx4jXyFWBtJCrzTavK1gf5xfwvf5ML6I4msk` | 6 | Standard bundle |
| `-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ` | 79 | Large bundle |

### Synthetic Data Items

For filter edge cases, use `/ar-io/admin/queue-data-item` to create data items with controlled attributes (owners, tags, etc.).

### Pre-work Required

Document the attributes of existing test bundles:
- Owner addresses
- Data item IDs and their characteristics
- Tags present on each data item
- Signature types

## Implementation

### New Test Files

```
test/end-to-end/
├── parquet-export.test.ts       # Parquet export pipeline tests
├── graphql-transactions.test.ts # GraphQL transaction query tests
├── graphql-blocks.test.ts       # GraphQL block query tests
├── graphql-pagination.test.ts   # GraphQL pagination tests
└── utils.ts                     # Add new helper functions
```

### New Utility Functions (in utils.ts)

```typescript
// Parquet export helpers
export const triggerParquetExport = async (params: ParquetExportParams) => {...};
export const getParquetExportStatus = async () => {...};
export const waitForParquetExportComplete = async (timeout?: number) => {...};

// Parquet verification (using DuckDB)
export const queryParquetFile = async (filePath: string, sql: string) => {...};
export const verifyParquetRowCount = async (dir: string, table: string, expected: number) => {...};

// GraphQL helpers
export const fetchGql = async (query: string, variables?: object) => {...};
export const fetchGqlTransactions = async (filters: TransactionFilters) => {...};
export const fetchGqlBlocks = async (filters: BlockFilters) => {...};
```

### Docker Compose Additions

- Mount additional volume for Parquet output directory in tests
- Consider optional ClickHouse service for full pipeline testing

## Acceptance Criteria

### Parquet Export
- [ ] Test basic export flow with status polling
- [ ] Test data integrity by comparing Parquet output to SQLite source
- [ ] Test all configuration options (skipL1Transactions, skipL1Tags, maxFileRows)
- [ ] Test error handling for invalid parameters
- [ ] Test status transitions

### GraphQL
- [ ] Test single transaction query by ID
- [ ] Test transaction filters (owners, recipients, tags, bundledIn, block height)
- [ ] Test single block query by ID
- [ ] Test block filters (height range)
- [ ] Test pagination (first, after, hasNextPage)
- [ ] Test both sort orders (HEIGHT_ASC, HEIGHT_DESC)
- [ ] Test field resolvers (owner, quantity, fee, data, bundledIn, signature)
- [ ] Test nested bundle relationships

### Infrastructure
- [ ] Document test bundle attributes
- [ ] Add utility functions to `test/end-to-end/utils.ts`
- [ ] Ensure tests run reliably in CI

## Technical Notes

### Parquet Verification with DuckDB

The codebase already uses `duckdb-async` for the export pipeline. Use the same library to verify exported files:

```typescript
import { Database } from 'duckdb-async';

const db = await Database.create(':memory:');
const rows = await db.all(`SELECT COUNT(*) as count FROM '${parquetDir}/blocks/*.parquet'`);
```

### GraphQL Query Examples

```graphql
# Filter by bundledIn
query {
  transactions(bundledIn: ["FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY"]) {
    edges {
      node {
        id
        bundledIn { id }
      }
    }
  }
}

# Pagination
query {
  transactions(first: 10, after: "cursor", sort: HEIGHT_DESC) {
    pageInfo { hasNextPage }
    edges {
      cursor
      node { id }
    }
  }
}
```

### Test Data Dependencies

Tests depend on fetching real transactions from arweave.net/trusted gateways. Consider:
- Retry logic for transient network failures
- Documenting which transactions are required
- Potential caching/fixtures for offline testing (future enhancement)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint #572

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint

Problem

Scope

Parquet Export Pipeline

GraphQL Endpoint

Test Data Strategy

Recommended Approach: Hybrid

Known Test Bundles (Already Used in Codebase)

Synthetic Data Items

Pre-work Required

Implementation

New Test Files

New Utility Functions (in utils.ts)

Docker Compose Additions

Acceptance Criteria

Parquet Export

GraphQL

Infrastructure

Technical Notes

Parquet Verification with DuckDB

GraphQL Query Examples

Test Data Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test Type	Data Source
Parquet Export	Real blocks via `START_HEIGHT`/`STOP_HEIGHT` + known bundles
GraphQL Blocks	Real blocks via height range
GraphQL Transactions	Known bundles with documented attributes
GraphQL Filters	Mix of real bundles + synthetic data items for edge cases

Bundle ID	Data Items	Features
`kJA49GtBVUWex2yiRKX1KSDbCE6I2xGicR-62_pnJ_c`	19	Nested bundles (2-3 levels)
`FcWiW5v28eBf5s9XAKTRiqD7dq9xX_lS5N6Xb2Y89NY`	3	Mixed signature types (Arweave, Ethereum, Solana)
`C7lP_aOvx4jXyFWBtJCrzTavK1gf5xfwvf5ML6I4msk`	6	Standard bundle
`-H3KW7RKTXMg5Miq2jHx36OHSVsXBSYuE2kxgsFj6OQ`	79	Large bundle

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint #572

Description

End-to-End Tests for Parquet Export Pipeline and GraphQL Endpoint

Problem

Scope

Parquet Export Pipeline

GraphQL Endpoint

Test Data Strategy

Recommended Approach: Hybrid

Known Test Bundles (Already Used in Codebase)

Synthetic Data Items

Pre-work Required

Implementation

New Test Files

New Utility Functions (in utils.ts)

Docker Compose Additions

Acceptance Criteria

Parquet Export

GraphQL

Infrastructure

Technical Notes

Parquet Verification with DuckDB

GraphQL Query Examples

Test Data Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions