|
1 | 1 | # Helicone Data Export Tool |
2 | 2 |
|
3 | | -A command-line tool to export request/response data from Helicone's API. This tool allows you to fetch and export your Helicone request history in various formats, with options to include full request and response bodies. |
4 | | - |
5 | | -## Features |
6 | | - |
7 | | -- Export data in multiple formats (JSON, JSONL, CSV) |
8 | | -- Date range filtering |
9 | | -- Rate limiting and batch processing |
10 | | -- Full request/response body inclusion (optional) |
11 | | -- Automatic pagination handling |
| 3 | +A robust command-line tool to export request/response data from Helicone's API. This tool allows you to fetch and export your Helicone request history in various formats, with advanced features for reliability and monitoring. |
| 4 | + |
| 5 | +## ✨ Key Features |
| 6 | + |
| 7 | +### Core Features |
| 8 | +- 📦 Export data in multiple formats (JSON, JSONL, CSV) |
| 9 | +- 📅 Date range filtering |
| 10 | +- 🔄 Automatic pagination handling |
| 11 | +- 📄 Full request/response body inclusion (optional) |
| 12 | +- 🚫 Automatic filtering of large `streamed_data` fields |
| 13 | + |
| 14 | +### Advanced Features (NEW!) |
| 15 | +- 💾 **Auto-recovery from crashes** - Checkpoint system saves progress automatically |
| 16 | +- 🔁 **Retry logic with exponential backoff** - Handles transient failures gracefully |
| 17 | +- 🛑 **Graceful shutdown** - Ctrl+C saves progress for later resume |
| 18 | +- 📊 **Progress tracking** - Real-time progress bar with ETA |
| 19 | +- 🔍 **Multiple log levels** - quiet, normal, or verbose output |
| 20 | +- ✅ **Pre-flight validation** - Checks API key, permissions, and disk space |
| 21 | +- ⚡ **Configurable batch sizes and retry attempts** |
| 22 | +- 🔒 **Overwrite protection** - Prompts before overwriting existing files |
| 23 | +- 🏷️ **Property filtering** - Filter exports by custom properties |
12 | 24 |
|
13 | 25 | ## Prerequisites |
14 | 26 |
|
@@ -36,42 +48,108 @@ export HELICONE_API_KEY="your-helicone-api-key" |
36 | 48 | ts-node index.ts [options] |
37 | 49 | ``` |
38 | 50 |
|
39 | | -### Options |
| 51 | +### Core Options |
| 52 | + |
| 53 | +| Option | Description | Default | |
| 54 | +|--------|-------------|---------| |
| 55 | +| `--start-date <date>` | Start date (YYYY-MM-DD or ISO string) | 30 days ago | |
| 56 | +| `--end-date <date>` | End date (YYYY-MM-DD or ISO string) | now | |
| 57 | +| `--limit <number>` | Maximum number of records to fetch | unlimited | |
| 58 | +| `--format <format>` | Output format: json, jsonl, or csv | jsonl | |
| 59 | +| `--include-body` | Include full request/response bodies | false | |
| 60 | +| `--output, -o <path>` | Custom output file path | output.{format} | |
| 61 | +| `--property, -p <key=value>` | Filter by property (can use multiple times) | - | |
| 62 | +| `--help, -h` | Show help message and exit | - | |
| 63 | + |
| 64 | +### Advanced Options |
| 65 | + |
| 66 | +| Option | Description | Default | |
| 67 | +|--------|-------------|---------| |
| 68 | +| `--log-level <level>` | Log level: quiet, normal, or verbose | normal | |
| 69 | +| `--max-retries <number>` | Maximum retry attempts for failed requests | 5 | |
| 70 | +| `--batch-size <number>` | Batch size for API requests | 1000 | |
| 71 | +| `--clean-state` | Remove checkpoint and start fresh export | - | |
| 72 | +| `--resume` | Explicitly resume from checkpoint | - | |
40 | 73 |
|
41 | | -- `--start-date <date>`: Start date (default: 30 days ago) |
42 | | -- `--end-date <date>`: End date (default: now) |
43 | | -- `--limit <number>`: Maximum number of records to fetch |
44 | | -- `--format <format>`: Output format: json, jsonl, or csv (default: jsonl) |
45 | | -- `--include-body`: Include full request/response bodies (default: false) |
| 74 | +### Examples |
46 | 75 |
|
47 | | -### Date Format |
| 76 | +#### Basic Usage |
48 | 77 |
|
49 | | -Dates should be provided in YYYY-MM-DD format or as an ISO string. |
| 78 | +1. **Export last 30 days of data** (default behavior): |
| 79 | +```bash |
| 80 | +ts-node index.ts |
| 81 | +``` |
50 | 82 |
|
51 | | -### Examples |
| 83 | +2. **Export specific date range in CSV format**: |
| 84 | +```bash |
| 85 | +ts-node index.ts --start-date 2024-01-01 --end-date 2024-02-01 --format csv |
| 86 | +``` |
52 | 87 |
|
53 | | -1. Export last 30 days of data in JSONL format: |
| 88 | +3. **Export with full request/response bodies**: |
| 89 | +```bash |
| 90 | +ts-node index.ts --limit 100 --include-body |
| 91 | +``` |
54 | 92 |
|
| 93 | +4. **Custom output file**: |
55 | 94 | ```bash |
56 | | -ts-node index.ts |
| 95 | +ts-node index.ts --output my-export.jsonl |
57 | 96 | ``` |
58 | 97 |
|
59 | | -2. Export data for a specific date range in CSV format: |
| 98 | +5. **Filter by property** (e.g., only export LlamaCoder requests): |
| 99 | +```bash |
| 100 | +ts-node index.ts --property appname=LlamaCoder |
| 101 | +``` |
60 | 102 |
|
| 103 | +6. **Multiple property filters**: |
61 | 104 | ```bash |
62 | | -ts-node index.ts --start-date 2024-01-01 --end-date 2024-02-01 --format csv |
| 105 | +ts-node index.ts --property appname=LlamaCoder --property environment=production |
63 | 106 | ``` |
64 | 107 |
|
65 | | -3. Export limited number of records with full request/response bodies: |
| 108 | +#### Advanced Usage |
66 | 109 |
|
| 110 | +7. **Quiet mode for automation**: |
67 | 111 | ```bash |
68 | | -ts-node index.ts --limit 100 --include-body |
| 112 | +ts-node index.ts --log-level quiet --limit 10000 |
| 113 | +``` |
| 114 | + |
| 115 | +8. **Verbose logging for debugging**: |
| 116 | +```bash |
| 117 | +ts-node index.ts --log-level verbose --max-retries 10 |
69 | 118 | ``` |
70 | 119 |
|
71 | | -4. Export data in pretty-printed JSON format: |
| 120 | +9. **Large export with custom batch size**: |
| 121 | +```bash |
| 122 | +ts-node index.ts --limit 50000 --batch-size 500 |
| 123 | +``` |
72 | 124 |
|
| 125 | +10. **Clean state and start fresh**: |
73 | 126 | ```bash |
74 | | -ts-node index.ts --format json --limit 50 |
| 127 | +ts-node index.ts --clean-state |
| 128 | +``` |
| 129 | + |
| 130 | +11. **Filter by property with other options**: |
| 131 | +```bash |
| 132 | +ts-node index.ts --property appname=LlamaCoder --format csv --limit 5000 --include-body |
| 133 | +``` |
| 134 | + |
| 135 | +#### Recovery Scenarios |
| 136 | + |
| 137 | +12. **After a crash** (automatic resume prompt): |
| 138 | +```bash |
| 139 | +ts-node index.ts |
| 140 | +# Will detect checkpoint and ask: "Resume from checkpoint? (y/n)" |
| 141 | +``` |
| 142 | + |
| 143 | +13. **Force resume from checkpoint**: |
| 144 | +```bash |
| 145 | +ts-node index.ts --resume |
| 146 | +``` |
| 147 | + |
| 148 | +14. **Cancel and save progress** (during export): |
| 149 | +``` |
| 150 | +Press Ctrl+C during export |
| 151 | +# Progress is saved automatically |
| 152 | +# Run the same command again to resume |
75 | 153 | ``` |
76 | 154 |
|
77 | 155 | ## Output Formats |
@@ -101,30 +179,89 @@ Tabular format with the following columns: |
101 | 179 | - latency |
102 | 180 | - cost |
103 | 181 |
|
| 182 | +## How It Works |
| 183 | + |
| 184 | +### Auto-Recovery System |
| 185 | + |
| 186 | +The tool automatically saves checkpoints after each batch of records: |
| 187 | + |
| 188 | +1. **Checkpoint file** (`.helicone-export-state.json`) tracks: |
| 189 | + - Current offset in the export |
| 190 | + - Total records processed |
| 191 | + - Output file path |
| 192 | + - Export configuration |
| 193 | + |
| 194 | +2. **On restart**, the tool: |
| 195 | + - Detects existing checkpoint |
| 196 | + - Validates it matches current configuration |
| 197 | + - Prompts user to resume or start fresh |
| 198 | + |
| 199 | +3. **On crash/interrupt**: |
| 200 | + - Checkpoint is saved before exit |
| 201 | + - Output file is properly closed |
| 202 | + - No data loss occurs |
| 203 | + |
| 204 | +### Retry Logic |
| 205 | + |
| 206 | +When API requests fail, the tool automatically retries with exponential backoff: |
| 207 | + |
| 208 | +- **Attempt 1**: Wait 1 second |
| 209 | +- **Attempt 2**: Wait 2 seconds |
| 210 | +- **Attempt 3**: Wait 4 seconds |
| 211 | +- **Attempt 4**: Wait 8 seconds |
| 212 | +- **Attempt 5**: Wait 16 seconds |
| 213 | + |
| 214 | +Special handling for rate limits (429): |
| 215 | +- Respects `Retry-After` header if present |
| 216 | +- Otherwise uses exponential backoff |
| 217 | + |
| 218 | +### Progress Tracking |
| 219 | + |
| 220 | +Three log levels available: |
| 221 | + |
| 222 | +- **quiet**: Only start/complete/error messages |
| 223 | +- **normal**: Progress bar with ETA and records/sec |
| 224 | +- **verbose**: Detailed logs of each API call and retry attempt |
| 225 | + |
| 226 | +Example progress bar: |
| 227 | +``` |
| 228 | +[==================> ] 62% (6,234/10,000) ETA: 3m 45s | 12.3 rec/s |
| 229 | +``` |
| 230 | + |
104 | 231 | ## Rate Limiting |
105 | 232 |
|
106 | | -The tool implements automatic rate limiting: |
| 233 | +The tool implements intelligent rate limiting: |
107 | 234 |
|
108 | | -- Processes records in batches of 1000 |
| 235 | +- Processes records in configurable batches (default 1000) |
109 | 236 | - Fetches signed bodies in chunks of 10 |
110 | 237 | - Adds delays between chunks to avoid API limits |
| 238 | +- Automatically handles 429 rate limit responses |
111 | 239 |
|
112 | 240 | ## Error Handling |
113 | 241 |
|
114 | | -- Validates command-line arguments |
115 | | -- Handles API errors gracefully |
116 | | -- Provides clear error messages |
117 | | -- Ensures proper cleanup of file streams |
| 242 | +Comprehensive error handling: |
| 243 | + |
| 244 | +- ✅ Pre-flight validation (API key, permissions, disk space) |
| 245 | +- ✅ Validates command-line arguments |
| 246 | +- ✅ Handles API errors with retry logic |
| 247 | +- ✅ Distinguishes retryable vs fatal errors |
| 248 | +- ✅ Provides clear, actionable error messages |
| 249 | +- ✅ Ensures proper cleanup of file streams and signal handlers |
| 250 | + |
| 251 | +## Architecture |
118 | 252 |
|
119 | | -## Development |
| 253 | +The code is structured into specialized classes: |
120 | 254 |
|
121 | | -The code is written in TypeScript and follows modern best practices: |
| 255 | +- **CheckpointManager**: Handles state persistence and recovery |
| 256 | +- **ProgressTracker**: Manages logging and progress display |
| 257 | +- **HeliconeClient**: API client with retry logic |
| 258 | +- **ExportWriter**: Handles file writing for different formats |
122 | 259 |
|
123 | | -- Strong typing |
124 | | -- Error handling |
125 | | -- Resource cleanup |
126 | | -- Rate limiting |
127 | | -- Progress tracking |
| 260 | +Benefits: |
| 261 | +- Strong TypeScript typing |
| 262 | +- Separation of concerns |
| 263 | +- Easy to test and maintain |
| 264 | +- Extensible for new features |
128 | 265 |
|
129 | 266 | ## License |
130 | 267 |
|
|
0 commit comments