Skip to content

Commit 9392503

Browse files
authored
Merge pull request #98 from ar-io/PE-8929-add-cdb64
docs: add CDB64 root transaction index documentation PE-8929
2 parents 516065e + 1596a80 commit 9392503

File tree

4 files changed

+241
-0
lines changed

4 files changed

+241
-0
lines changed
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
---
2+
title: "CDB64 Root Transaction Index"
3+
description: "Configure fast O(1) lookups for data item resolution using CDB64 indexes"
4+
---
5+
6+
import { Callout } from "fumadocs-ui/components/callout";
7+
import { Tab, Tabs } from "fumadocs-ui/components/tabs";
8+
9+
## Overview
10+
11+
When your gateway receives a request for a data item (content inside an ANS-104 bundle), it needs to find the root Arweave transaction containing that data. The CDB64 index provides O(1) lookups for this mapping, enabling instant resolution of historical data items.
12+
13+
<Callout type="info">
14+
**Default Behavior**: As of Release 67, CDB64 is enabled by default with no configuration required. The gateway ships with a pre-built index covering approximately 964 million data items.
15+
</Callout>
16+
17+
## How It Works
18+
19+
The gateway checks multiple sources when resolving a data item ID to its root transaction. The order is controlled by `ROOT_TX_LOOKUP_ORDER`:
20+
21+
1. **db** - Your local SQLite database (fastest, but requires locally parsing ANS-104 bundles to index discovered items)
22+
2. **gateways** - HEAD requests to other AR.IO gateways
23+
3. **cdb** - CDB64 file-based index (O(1) lookup from local files or cached remote data)
24+
4. **graphql** - GraphQL queries to trusted gateways
25+
26+
The default configuration tries each source in order until a match is found:
27+
28+
```bash
29+
ROOT_TX_LOOKUP_ORDER=db,gateways,cdb,graphql
30+
```
31+
32+
## Default Coverage
33+
34+
The shipped CDB64 index covers:
35+
36+
- Non-AO data items (excludes `Bundler-App-Name: AO`)
37+
- Non-Redstone data items
38+
- Data items with content types
39+
- Block heights 0 through 1,820,000
40+
41+
This means most historical ArDrive, Akord, and similar application data can be resolved via the CDB64 index. The default shipped index stores partition data on Arweave, so network requests are made to fetch CDB data (with intelligent byte-range caching). For zero network latency, you can download the CDB files locally.
42+
43+
## Configuration Options
44+
45+
### Disabling CDB64
46+
47+
If you want to disable CDB64 lookups (not recommended), remove `cdb` from the lookup order:
48+
49+
```bash
50+
ROOT_TX_LOOKUP_ORDER=db,gateways,graphql
51+
```
52+
53+
### Using Custom Index Sources
54+
55+
You can configure custom CDB64 index sources to supplement or replace the default index:
56+
57+
<Tabs items={["Local File", "Local Directory", "HTTP URL", "Arweave TX", "Multiple Sources"]}>
58+
<Tab value="Local File">
59+
```bash
60+
CDB64_ROOT_TX_INDEX_SOURCES=/path/to/custom-index.cdb
61+
```
62+
</Tab>
63+
<Tab value="Local Directory">
64+
```bash
65+
# Directory containing multiple .cdb files or a partitioned index
66+
CDB64_ROOT_TX_INDEX_SOURCES=/path/to/index-directory/
67+
```
68+
</Tab>
69+
<Tab value="HTTP URL">
70+
```bash
71+
CDB64_ROOT_TX_INDEX_SOURCES=https://cdn.example.com/index.cdb
72+
```
73+
</Tab>
74+
<Tab value="Arweave TX">
75+
```bash
76+
# 43-character base64url transaction ID
77+
CDB64_ROOT_TX_INDEX_SOURCES=ABC123def456xyz789ABC123def456xyz789ABC12
78+
```
79+
</Tab>
80+
<Tab value="Multiple Sources">
81+
```bash
82+
# Sources are tried in order until a match is found
83+
CDB64_ROOT_TX_INDEX_SOURCES=/local/index.cdb,https://cdn.example.com/index/,TxId123...
84+
```
85+
</Tab>
86+
</Tabs>
87+
88+
### Remote Index Configuration
89+
90+
When using HTTP or Arweave-stored indexes, you can tune the caching and request behavior:
91+
92+
```bash
93+
# Caching settings
94+
CDB64_REMOTE_CACHE_MAX_REGIONS=100 # Max cached byte-range regions per source
95+
CDB64_REMOTE_CACHE_TTL_MS=300000 # Cache TTL (5 minutes)
96+
97+
# Request settings
98+
CDB64_REMOTE_REQUEST_TIMEOUT_MS=30000 # Request timeout
99+
CDB64_REMOTE_MAX_CONCURRENT_REQUESTS=4 # Max concurrent HTTP requests
100+
101+
# Retrieval order for fetching CDB files from Arweave
102+
CDB64_REMOTE_RETRIEVAL_ORDER=gateways,chunks
103+
```
104+
105+
### File Watching
106+
107+
For local CDB64 directories, the gateway automatically watches for new or removed `.cdb` files:
108+
109+
```bash
110+
# Enable/disable automatic reloading (default: true)
111+
CDB64_ROOT_TX_INDEX_WATCH=true
112+
```
113+
114+
When enabled, you can add new index files to the directory without restarting your gateway.
115+
116+
## Partitioned Indexes
117+
118+
Large CDB64 indexes can be split across up to 256 partition files for better manageability. Records are partitioned by the first byte of the binary data item ID, represented as a hex prefix (00-ff). A partitioned index consists of:
119+
120+
- `manifest.json` - Describes all partitions and their locations
121+
- `00.cdb` through `ff.cdb` - Partition files (only populated prefixes exist)
122+
123+
Partitions can be stored in different locations (local files, HTTP, Arweave), allowing flexible deployment strategies.
124+
125+
<Tabs items={["Local Directory", "Remote Manifest", "Arweave Manifest"]}>
126+
<Tab value="Local Directory">
127+
```bash
128+
# Point to directory containing manifest.json
129+
CDB64_ROOT_TX_INDEX_SOURCES=/path/to/partitioned-index/
130+
```
131+
</Tab>
132+
<Tab value="Remote Manifest">
133+
```bash
134+
# HTTP URL to manifest
135+
CDB64_ROOT_TX_INDEX_SOURCES=https://cdn.example.com/index/manifest.json
136+
```
137+
</Tab>
138+
<Tab value="Arweave Manifest">
139+
```bash
140+
# Append :manifest to transaction ID
141+
CDB64_ROOT_TX_INDEX_SOURCES=ABC123def456xyz789ABC123def456xyz789ABC12:manifest
142+
```
143+
</Tab>
144+
</Tabs>
145+
146+
## Generating Custom Indexes
147+
148+
If you need to create CDB64 indexes for specific data sets, the gateway includes CLI tools:
149+
150+
```bash
151+
# Generate from CSV file
152+
./tools/generate-cdb64-root-tx-index --input data.csv --output index.cdb
153+
154+
# Generate partitioned index (creates manifest.json automatically)
155+
./tools/generate-cdb64-root-tx-index --input data.csv --partitioned --output-dir ./index/
156+
157+
# Export from local SQLite database
158+
./tools/export-sqlite-to-cdb64 --output index.cdb
159+
160+
# Verify index completeness
161+
./tools/verify-cdb64 --index index.cdb --gateway https://arweave.net
162+
```
163+
164+
The `--partitioned` flag automatically shards records by ID prefix and generates the `manifest.json` with local file locations.
165+
166+
For high-throughput generation, a Rust-backed tool is also available:
167+
168+
```bash
169+
./tools/generate-cdb64-root-tx-index-rs --input data.csv --output index.cdb
170+
```
171+
172+
## Uploading Indexes to Arweave
173+
174+
You can upload partitioned CDB64 indexes to Arweave for permanent, decentralized storage:
175+
176+
```bash
177+
./tools/upload-cdb64-to-arweave \
178+
--input-dir ./partitioned-index/ \
179+
--wallet ./wallet.json \
180+
--concurrency 5
181+
```
182+
183+
This tool:
184+
1. Uploads each partition file to Arweave via Turbo
185+
2. Resolves the bundle IDs and byte offsets for each partition
186+
3. Updates the manifest with `arweave-bundle-item` locations
187+
188+
The resulting manifest can be shared with other gateway operators or uploaded to Arweave for decentralized index distribution.
189+
190+
## Performance Considerations
191+
192+
- **O(1) lookups** - Each lookup requires only 2-3 file reads regardless of index size
193+
- **Byte-range caching** - The 4KB header is cached permanently; other regions use LRU caching
194+
- **Lazy loading** - Partitioned indexes only open accessed partitions, reducing memory usage
195+
- **Circuit breakers** - If CDB64 lookups fail repeatedly, the gateway automatically falls back to other sources
196+
197+
## Troubleshooting
198+
199+
### CDB64 lookups not working
200+
201+
1. Verify `cdb` is in your `ROOT_TX_LOOKUP_ORDER`
202+
2. Check that index files exist and are readable
203+
3. Review gateway logs for CDB64-related errors
204+
205+
### Slow remote index performance
206+
207+
1. Increase `CDB64_REMOTE_CACHE_MAX_REGIONS` for frequently accessed indexes
208+
2. Consider downloading the index locally for best performance
209+
3. Check network connectivity to remote sources
210+
211+
### Missing data items in index
212+
213+
The default shipped index excludes AO and Redstone data. For these, you'll need to:
214+
- Generate a custom index covering the desired data
215+
- Rely on other lookup sources (db, gateways, graphql)
216+
217+
For the complete list of CDB64 environment variables, see [Environment Variables Reference](/build/run-a-gateway/manage/environment-variables#cdb64-root-transaction-index).

content/build/run-a-gateway/manage/environment-variables.mdx

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,24 @@ The main ar.io Gateway service that handles data retrieval, indexing, and servin
6363
| `CHUNK_DATA_SOURCE_TYPE` | string | `fs` | Chunk data source type (fs, legacy-s3) |
6464
| `CHUNK_METADATA_SOURCE_TYPE` | string | `fs` | Chunk metadata source type (fs, legacy-psql) |
6565

66+
### CDB64 Root Transaction Index
67+
68+
The CDB64 index provides O(1) constant-time lookups for resolving data item IDs to their root Arweave transactions. As of Release 67, a pre-built index is enabled by default covering ~964 million records.
69+
70+
| Variable | Type | Default | Description |
71+
| ------------------------------------- | ------- | ------------------------- | ---------------------------------------------------------------------------------------------------------------- |
72+
| `ROOT_TX_LOOKUP_ORDER` | string | `db,gateways,cdb,graphql` | Comma-separated list of root TX lookup sources. Options: db, cdb, gateways, turbo, graphql |
73+
| `CDB64_ROOT_TX_INDEX_SOURCES` | string | shipped manifest | Comma-separated list of CDB64 sources: local paths, directories, HTTP URLs, Arweave TX IDs, or bundle data items |
74+
| `CDB64_ROOT_TX_INDEX_WATCH` | boolean | `true` | Enable file watching for local CDB64 directories. New files auto-load without restart |
75+
| `CDB64_REMOTE_RETRIEVAL_ORDER` | string | `gateways,chunks` | Data sources for fetching remote CDB64 files. Options: gateways, chunks, tx-data |
76+
| `CDB64_REMOTE_CACHE_MAX_REGIONS` | number | `100` | Maximum byte-range regions to cache per remote source |
77+
| `CDB64_REMOTE_CACHE_TTL_MS` | number | `300000` | TTL for cached byte-range regions (5 minutes) |
78+
| `CDB64_REMOTE_REQUEST_TIMEOUT_MS` | number | `30000` | Request timeout for remote CDB64 sources |
79+
| `CDB64_REMOTE_MAX_CONCURRENT_REQUESTS`| number | `4` | Maximum concurrent HTTP requests across all remote CDB64 sources |
80+
| `CDB64_REMOTE_SEMAPHORE_TIMEOUT_MS` | number | `5000` | Maximum wait time for a request slot before failing |
81+
82+
For detailed configuration and usage, see [CDB64 Root TX Index](/build/run-a-gateway/manage/cdb64).
83+
6684
### Indexing & Synchronization
6785

6886
| Variable | Type | Default | Description |

content/build/run-a-gateway/manage/index.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ import {
1313
Settings,
1414
Wrench,
1515
CreditCard,
16+
Search,
1617
} from "lucide-react";
1718

1819
Master the advanced features and configurations of your ar.io Gateway. These comprehensive guides cover everything from performance optimization to content moderation, helping you run a professional-grade gateway infrastructure.
@@ -49,6 +50,10 @@ Master the advanced features and configurations of your ar.io Gateway. These com
4950
Configure advanced filters to efficiently process and index only the data you need, optimizing performance and resource usage.
5051
</Card>
5152

53+
<Card title="CDB64 Root TX Index" href="/build/run-a-gateway/manage/cdb64" icon={<Search className="w-6 h-6" />}>
54+
Configure the CDB64 index for O(1) data item lookups. Enabled by default with ~964 million records for instant historical data resolution.
55+
</Card>
56+
5257
<Card
5358
title="Setting Apex Domain Content"
5459
href="/build/run-a-gateway/manage/setting-apex-domain"

content/build/run-a-gateway/manage/meta.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"ssl-certs",
77
"environment-variables",
88
"filters",
9+
"cdb64",
910
"content-moderation",
1011
"index-snapshots",
1112
"setting-apex-domain",

0 commit comments

Comments
 (0)