|
| 1 | +--- |
| 2 | +title: "CDB64 Root Transaction Index" |
| 3 | +description: "Configure fast O(1) lookups for data item resolution using CDB64 indexes" |
| 4 | +--- |
| 5 | + |
| 6 | +import { Callout } from "fumadocs-ui/components/callout"; |
| 7 | +import { Tab, Tabs } from "fumadocs-ui/components/tabs"; |
| 8 | + |
| 9 | +## Overview |
| 10 | + |
| 11 | +When your gateway receives a request for a data item (content inside an ANS-104 bundle), it needs to find the root Arweave transaction containing that data. The CDB64 index provides O(1) lookups for this mapping, enabling instant resolution of historical data items. |
| 12 | + |
| 13 | +<Callout type="info"> |
| 14 | + **Default Behavior**: As of Release 67, CDB64 is enabled by default with no configuration required. The gateway ships with a pre-built index covering approximately 964 million data items. |
| 15 | +</Callout> |
| 16 | + |
| 17 | +## How It Works |
| 18 | + |
| 19 | +The gateway checks multiple sources when resolving a data item ID to its root transaction. The order is controlled by `ROOT_TX_LOOKUP_ORDER`: |
| 20 | + |
| 21 | +1. **db** - Your local SQLite database (fastest, but requires locally parsing ANS-104 bundles to index discovered items) |
| 22 | +2. **gateways** - HEAD requests to other AR.IO gateways |
| 23 | +3. **cdb** - CDB64 file-based index (O(1) lookup from local files or cached remote data) |
| 24 | +4. **graphql** - GraphQL queries to trusted gateways |
| 25 | + |
| 26 | +The default configuration tries each source in order until a match is found: |
| 27 | + |
| 28 | +```bash |
| 29 | +ROOT_TX_LOOKUP_ORDER=db,gateways,cdb,graphql |
| 30 | +``` |
| 31 | + |
| 32 | +## Default Coverage |
| 33 | + |
| 34 | +The shipped CDB64 index covers: |
| 35 | + |
| 36 | +- Non-AO data items (excludes `Bundler-App-Name: AO`) |
| 37 | +- Non-Redstone data items |
| 38 | +- Data items with content types |
| 39 | +- Block heights 0 through 1,820,000 |
| 40 | + |
| 41 | +This means most historical ArDrive, Akord, and similar application data can be resolved via the CDB64 index. The default shipped index stores partition data on Arweave, so network requests are made to fetch CDB data (with intelligent byte-range caching). For zero network latency, you can download the CDB files locally. |
| 42 | + |
| 43 | +## Configuration Options |
| 44 | + |
| 45 | +### Disabling CDB64 |
| 46 | + |
| 47 | +If you want to disable CDB64 lookups (not recommended), remove `cdb` from the lookup order: |
| 48 | + |
| 49 | +```bash |
| 50 | +ROOT_TX_LOOKUP_ORDER=db,gateways,graphql |
| 51 | +``` |
| 52 | + |
| 53 | +### Using Custom Index Sources |
| 54 | + |
| 55 | +You can configure custom CDB64 index sources to supplement or replace the default index: |
| 56 | + |
| 57 | +<Tabs items={["Local File", "Local Directory", "HTTP URL", "Arweave TX", "Multiple Sources"]}> |
| 58 | + <Tab value="Local File"> |
| 59 | + ```bash |
| 60 | + CDB64_ROOT_TX_INDEX_SOURCES=/path/to/custom-index.cdb |
| 61 | + ``` |
| 62 | + </Tab> |
| 63 | + <Tab value="Local Directory"> |
| 64 | + ```bash |
| 65 | + # Directory containing multiple .cdb files or a partitioned index |
| 66 | + CDB64_ROOT_TX_INDEX_SOURCES=/path/to/index-directory/ |
| 67 | + ``` |
| 68 | + </Tab> |
| 69 | + <Tab value="HTTP URL"> |
| 70 | + ```bash |
| 71 | + CDB64_ROOT_TX_INDEX_SOURCES=https://cdn.example.com/index.cdb |
| 72 | + ``` |
| 73 | + </Tab> |
| 74 | + <Tab value="Arweave TX"> |
| 75 | + ```bash |
| 76 | + # 43-character base64url transaction ID |
| 77 | + CDB64_ROOT_TX_INDEX_SOURCES=ABC123def456xyz789ABC123def456xyz789ABC12 |
| 78 | + ``` |
| 79 | + </Tab> |
| 80 | + <Tab value="Multiple Sources"> |
| 81 | + ```bash |
| 82 | + # Sources are tried in order until a match is found |
| 83 | + CDB64_ROOT_TX_INDEX_SOURCES=/local/index.cdb,https://cdn.example.com/index/,TxId123... |
| 84 | + ``` |
| 85 | + </Tab> |
| 86 | +</Tabs> |
| 87 | + |
| 88 | +### Remote Index Configuration |
| 89 | + |
| 90 | +When using HTTP or Arweave-stored indexes, you can tune the caching and request behavior: |
| 91 | + |
| 92 | +```bash |
| 93 | +# Caching settings |
| 94 | +CDB64_REMOTE_CACHE_MAX_REGIONS=100 # Max cached byte-range regions per source |
| 95 | +CDB64_REMOTE_CACHE_TTL_MS=300000 # Cache TTL (5 minutes) |
| 96 | + |
| 97 | +# Request settings |
| 98 | +CDB64_REMOTE_REQUEST_TIMEOUT_MS=30000 # Request timeout |
| 99 | +CDB64_REMOTE_MAX_CONCURRENT_REQUESTS=4 # Max concurrent HTTP requests |
| 100 | + |
| 101 | +# Retrieval order for fetching CDB files from Arweave |
| 102 | +CDB64_REMOTE_RETRIEVAL_ORDER=gateways,chunks |
| 103 | +``` |
| 104 | + |
| 105 | +### File Watching |
| 106 | + |
| 107 | +For local CDB64 directories, the gateway automatically watches for new or removed `.cdb` files: |
| 108 | + |
| 109 | +```bash |
| 110 | +# Enable/disable automatic reloading (default: true) |
| 111 | +CDB64_ROOT_TX_INDEX_WATCH=true |
| 112 | +``` |
| 113 | + |
| 114 | +When enabled, you can add new index files to the directory without restarting your gateway. |
| 115 | + |
| 116 | +## Partitioned Indexes |
| 117 | + |
| 118 | +Large CDB64 indexes can be split across up to 256 partition files for better manageability. Records are partitioned by the first byte of the binary data item ID, represented as a hex prefix (00-ff). A partitioned index consists of: |
| 119 | + |
| 120 | +- `manifest.json` - Describes all partitions and their locations |
| 121 | +- `00.cdb` through `ff.cdb` - Partition files (only populated prefixes exist) |
| 122 | + |
| 123 | +Partitions can be stored in different locations (local files, HTTP, Arweave), allowing flexible deployment strategies. |
| 124 | + |
| 125 | +<Tabs items={["Local Directory", "Remote Manifest", "Arweave Manifest"]}> |
| 126 | + <Tab value="Local Directory"> |
| 127 | + ```bash |
| 128 | + # Point to directory containing manifest.json |
| 129 | + CDB64_ROOT_TX_INDEX_SOURCES=/path/to/partitioned-index/ |
| 130 | + ``` |
| 131 | + </Tab> |
| 132 | + <Tab value="Remote Manifest"> |
| 133 | + ```bash |
| 134 | + # HTTP URL to manifest |
| 135 | + CDB64_ROOT_TX_INDEX_SOURCES=https://cdn.example.com/index/manifest.json |
| 136 | + ``` |
| 137 | + </Tab> |
| 138 | + <Tab value="Arweave Manifest"> |
| 139 | + ```bash |
| 140 | + # Append :manifest to transaction ID |
| 141 | + CDB64_ROOT_TX_INDEX_SOURCES=ABC123def456xyz789ABC123def456xyz789ABC12:manifest |
| 142 | + ``` |
| 143 | + </Tab> |
| 144 | +</Tabs> |
| 145 | + |
| 146 | +## Generating Custom Indexes |
| 147 | + |
| 148 | +If you need to create CDB64 indexes for specific data sets, the gateway includes CLI tools: |
| 149 | + |
| 150 | +```bash |
| 151 | +# Generate from CSV file |
| 152 | +./tools/generate-cdb64-root-tx-index --input data.csv --output index.cdb |
| 153 | + |
| 154 | +# Generate partitioned index (creates manifest.json automatically) |
| 155 | +./tools/generate-cdb64-root-tx-index --input data.csv --partitioned --output-dir ./index/ |
| 156 | + |
| 157 | +# Export from local SQLite database |
| 158 | +./tools/export-sqlite-to-cdb64 --output index.cdb |
| 159 | + |
| 160 | +# Verify index completeness |
| 161 | +./tools/verify-cdb64 --index index.cdb --gateway https://arweave.net |
| 162 | +``` |
| 163 | + |
| 164 | +The `--partitioned` flag automatically shards records by ID prefix and generates the `manifest.json` with local file locations. |
| 165 | + |
| 166 | +For high-throughput generation, a Rust-backed tool is also available: |
| 167 | + |
| 168 | +```bash |
| 169 | +./tools/generate-cdb64-root-tx-index-rs --input data.csv --output index.cdb |
| 170 | +``` |
| 171 | + |
| 172 | +## Uploading Indexes to Arweave |
| 173 | + |
| 174 | +You can upload partitioned CDB64 indexes to Arweave for permanent, decentralized storage: |
| 175 | + |
| 176 | +```bash |
| 177 | +./tools/upload-cdb64-to-arweave \ |
| 178 | + --input-dir ./partitioned-index/ \ |
| 179 | + --wallet ./wallet.json \ |
| 180 | + --concurrency 5 |
| 181 | +``` |
| 182 | + |
| 183 | +This tool: |
| 184 | +1. Uploads each partition file to Arweave via Turbo |
| 185 | +2. Resolves the bundle IDs and byte offsets for each partition |
| 186 | +3. Updates the manifest with `arweave-bundle-item` locations |
| 187 | + |
| 188 | +The resulting manifest can be shared with other gateway operators or uploaded to Arweave for decentralized index distribution. |
| 189 | + |
| 190 | +## Performance Considerations |
| 191 | + |
| 192 | +- **O(1) lookups** - Each lookup requires only 2-3 file reads regardless of index size |
| 193 | +- **Byte-range caching** - The 4KB header is cached permanently; other regions use LRU caching |
| 194 | +- **Lazy loading** - Partitioned indexes only open accessed partitions, reducing memory usage |
| 195 | +- **Circuit breakers** - If CDB64 lookups fail repeatedly, the gateway automatically falls back to other sources |
| 196 | + |
| 197 | +## Troubleshooting |
| 198 | + |
| 199 | +### CDB64 lookups not working |
| 200 | + |
| 201 | +1. Verify `cdb` is in your `ROOT_TX_LOOKUP_ORDER` |
| 202 | +2. Check that index files exist and are readable |
| 203 | +3. Review gateway logs for CDB64-related errors |
| 204 | + |
| 205 | +### Slow remote index performance |
| 206 | + |
| 207 | +1. Increase `CDB64_REMOTE_CACHE_MAX_REGIONS` for frequently accessed indexes |
| 208 | +2. Consider downloading the index locally for best performance |
| 209 | +3. Check network connectivity to remote sources |
| 210 | + |
| 211 | +### Missing data items in index |
| 212 | + |
| 213 | +The default shipped index excludes AO and Redstone data. For these, you'll need to: |
| 214 | +- Generate a custom index covering the desired data |
| 215 | +- Rely on other lookup sources (db, gateways, graphql) |
| 216 | + |
| 217 | +For the complete list of CDB64 environment variables, see [Environment Variables Reference](/build/run-a-gateway/manage/environment-variables#cdb64-root-transaction-index). |
0 commit comments