Support lazy reading of CDB64 root TX indexes from remote sources

## Summary

Add support for reading CDB64 root TX index files lazily from remote sources, enabling gateways to use distributed index files without requiring local storage. This extends the existing `Cdb64RootTxIndex` to support local files, Arweave TX IDs, and arbitrary HTTP endpoints as index sources.

## Background

The current `Cdb64RootTxIndex` implementation (`src/discovery/cdb64-root-tx-index.ts`) provides O(1) lookups of data item ID → root TX ID mappings from pre-built CDB64 files stored locally. This works well but requires:
- Downloading entire index files before use
- Local storage for potentially large index files
- Manual distribution/syncing of index files

Since the existing `ContiguousDataSource` interface already supports range-based fetching via the `region?: Region` parameter, and HTTP servers commonly support Range headers, we can fetch only the bytes needed for each lookup from CDB64 files stored remotely.

## Requirements

### Must Have

- **ByteRangeSource abstraction**: Interface for random-access byte reads that can be backed by local files, Arweave, or HTTP endpoints
- **FileByteRangeSource**: Implementation using `fs.FileHandle` for local files (minimal overhead wrapper)
- **ContiguousDataByteRangeSource**: Implementation using `ContiguousDataSource.getData()` with region support for Arweave
- **HttpByteRangeSource**: Implementation using HTTP Range requests for arbitrary URLs (S3, CDN, dedicated servers)
- **Refactored Cdb64Reader**: Use `ByteRangeSource` instead of direct file handle access
- **Mixed source support**: Allow configuring local files, Arweave TX IDs, and HTTP URLs as index sources
- **Caching for remote sources**: Cache header (4KB) permanently and use LRU for hash table regions to minimize network round trips

### Should Have

- **Configurable source order**: Local files first (faster), then remote sources
- **Graceful degradation**: If a remote source is unavailable, continue with other sources
- **Metrics**: Track cache hit rates and fetch latencies for remote sources

### Won't Have (for now)

- Automatic discovery of index TX IDs (requires separate manifest/registry)
- Write support for remote indexes (read-only)
- Chunk-based fetching for Arweave (use HTTP Range requests via gateways)

## Technical Design

### ByteRangeSource Interface

```typescript
interface ByteRangeSource {
  /** Read bytes at offset */
  read(offset: number, length: number): Promise<Buffer>;
  /** Total size if known (for validation) */
  getSize?(): Promise<number>;
  /** Cleanup resources */
  close?(): Promise<void>;
}
```

### Implementations

```typescript
// Local file - wraps fs.FileHandle
class FileByteRangeSource implements ByteRangeSource {
  async read(offset: number, length: number): Promise<Buffer> {
    const buffer = Buffer.alloc(length);
    await this.fileHandle.read(buffer, 0, length, offset);
    return buffer;
  }
}

// Arweave - uses existing ContiguousDataSource with region support
class ContiguousDataByteRangeSource implements ByteRangeSource {
  async read(offset: number, length: number): Promise<Buffer> {
    const result = await this.dataSource.getData({
      id: this.txId,
      region: { offset, size: length },
    });
    return streamToBuffer(result.stream);
  }
}

// HTTP - uses Range headers for arbitrary URLs (S3, CDN, etc.)
class HttpByteRangeSource implements ByteRangeSource {
  async read(offset: number, length: number): Promise<Buffer> {
    const response = await this.httpClient.get(this.url, {
      headers: {
        Range: `bytes=${offset}-${offset + length - 1}`,
      },
      responseType: 'arraybuffer',
    });
    return Buffer.from(response.data);
  }
}

// Caching wrapper - critical for remote source performance
class CachingByteRangeSource implements ByteRangeSource {
  // Cache header permanently, LRU for hash table regions
}
```

### CDB64 Lookup Access Pattern

Each lookup requires reading:
1. **Header** (4096 bytes) - table pointers, cached permanently
2. **Hash table slots** (16 bytes each) - linear probing, 1-N reads
3. **Record** (16 byte header + 32 byte key + ~50 byte value) - verification + data

With caching, typical lookups would be:
- **Local file**: Same as today (negligible abstraction overhead)
- **Remote (warm cache)**: 1-2 network requests for hash table + record
- **Remote (cold)**: 2-3 network requests (header + hash table + record)

### Configuration

```bash
# Existing - local files
CDB64_ROOT_TX_INDEX_PATH=/path/to/indexes/

# New - Arweave TX IDs (comma-separated, fetched via ContiguousDataSource)
CDB64_ROOT_TX_INDEX_TX_IDS=TxId123,TxId456

# New - HTTP URLs (comma-separated, supports S3, CDN, dedicated servers)
CDB64_ROOT_TX_INDEX_URLS=https://indexes.example.com/root.cdb,https://s3.amazonaws.com/bucket/index.cdb
```

## Files to Modify

- `src/lib/cdb64.ts` - Refactor `Cdb64Reader` to use `ByteRangeSource`
- `src/lib/byte-range-source.ts` - New file with interface and implementations
- `src/discovery/cdb64-root-tx-index.ts` - Support mixed local/Arweave/HTTP sources
- `src/config.ts` - Add `CDB64_ROOT_TX_INDEX_TX_IDS` and `CDB64_ROOT_TX_INDEX_URLS` configs
- `src/system.ts` - Wire up ContiguousDataSource for Arweave-backed indexes

## Testing

- Unit tests for `ByteRangeSource` implementations
- Unit tests for refactored `Cdb64Reader` with mock `ByteRangeSource`
- Integration tests with actual CDB64 files via all source types
- Performance comparison: local vs remote (with/without cache)

## Performance Considerations

- **Local files**: Negligible overhead from abstraction (one extra function call)
- **Remote sources**: Network latency dominates; caching is critical
  - Header cache: Eliminates 1 round trip per lookup
  - Hash table region cache: Reduces probing costs
  - Consider prefetching common hash table regions on initialization

## Future Enhancements

- Index manifest TX that lists all index TX IDs for automatic discovery
- Composite indexes spanning multiple TXs with routing hints
- Background warming of remote index caches

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support lazy reading of CDB64 root TX indexes from remote sources #569

Summary

Background

Requirements

Must Have

Should Have

Won't Have (for now)

Technical Design

ByteRangeSource Interface

Implementations

CDB64 Lookup Access Pattern

Configuration

Files to Modify

Testing

Performance Considerations

Future Enhancements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support lazy reading of CDB64 root TX indexes from remote sources #569

Description

Summary

Background

Requirements

Must Have

Should Have

Won't Have (for now)

Technical Design

ByteRangeSource Interface

Implementations

CDB64 Lookup Access Pattern

Configuration

Files to Modify

Testing

Performance Considerations

Future Enhancements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions