Skip to content

Commit 97235bf

Browse files
committed
improved export scripts
1 parent 6763e6f commit 97235bf

File tree

3 files changed

+1139
-196
lines changed

3 files changed

+1139
-196
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
node_modules
2+
output.jsonl

examples/export/typescript/README.md

Lines changed: 176 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,26 @@
11
# Helicone Data Export Tool
22

3-
A command-line tool to export request/response data from Helicone's API. This tool allows you to fetch and export your Helicone request history in various formats, with options to include full request and response bodies.
4-
5-
## Features
6-
7-
- Export data in multiple formats (JSON, JSONL, CSV)
8-
- Date range filtering
9-
- Rate limiting and batch processing
10-
- Full request/response body inclusion (optional)
11-
- Automatic pagination handling
3+
A robust command-line tool to export request/response data from Helicone's API. This tool allows you to fetch and export your Helicone request history in various formats, with advanced features for reliability and monitoring.
4+
5+
## ✨ Key Features
6+
7+
### Core Features
8+
- 📦 Export data in multiple formats (JSON, JSONL, CSV)
9+
- 📅 Date range filtering
10+
- 🔄 Automatic pagination handling
11+
- 📄 Full request/response body inclusion (optional)
12+
- 🚫 Automatic filtering of large `streamed_data` fields
13+
14+
### Advanced Features (NEW!)
15+
- 💾 **Auto-recovery from crashes** - Checkpoint system saves progress automatically
16+
- 🔁 **Retry logic with exponential backoff** - Handles transient failures gracefully
17+
- 🛑 **Graceful shutdown** - Ctrl+C saves progress for later resume
18+
- 📊 **Progress tracking** - Real-time progress bar with ETA
19+
- 🔍 **Multiple log levels** - quiet, normal, or verbose output
20+
-**Pre-flight validation** - Checks API key, permissions, and disk space
21+
-**Configurable batch sizes and retry attempts**
22+
- 🔒 **Overwrite protection** - Prompts before overwriting existing files
23+
- 🏷️ **Property filtering** - Filter exports by custom properties
1224

1325
## Prerequisites
1426

@@ -36,42 +48,108 @@ export HELICONE_API_KEY="your-helicone-api-key"
3648
ts-node index.ts [options]
3749
```
3850

39-
### Options
51+
### Core Options
52+
53+
| Option | Description | Default |
54+
|--------|-------------|---------|
55+
| `--start-date <date>` | Start date (YYYY-MM-DD or ISO string) | 30 days ago |
56+
| `--end-date <date>` | End date (YYYY-MM-DD or ISO string) | now |
57+
| `--limit <number>` | Maximum number of records to fetch | unlimited |
58+
| `--format <format>` | Output format: json, jsonl, or csv | jsonl |
59+
| `--include-body` | Include full request/response bodies | false |
60+
| `--output, -o <path>` | Custom output file path | output.{format} |
61+
| `--property, -p <key=value>` | Filter by property (can use multiple times) | - |
62+
| `--help, -h` | Show help message and exit | - |
63+
64+
### Advanced Options
65+
66+
| Option | Description | Default |
67+
|--------|-------------|---------|
68+
| `--log-level <level>` | Log level: quiet, normal, or verbose | normal |
69+
| `--max-retries <number>` | Maximum retry attempts for failed requests | 5 |
70+
| `--batch-size <number>` | Batch size for API requests | 1000 |
71+
| `--clean-state` | Remove checkpoint and start fresh export | - |
72+
| `--resume` | Explicitly resume from checkpoint | - |
4073

41-
- `--start-date <date>`: Start date (default: 30 days ago)
42-
- `--end-date <date>`: End date (default: now)
43-
- `--limit <number>`: Maximum number of records to fetch
44-
- `--format <format>`: Output format: json, jsonl, or csv (default: jsonl)
45-
- `--include-body`: Include full request/response bodies (default: false)
74+
### Examples
4675

47-
### Date Format
76+
#### Basic Usage
4877

49-
Dates should be provided in YYYY-MM-DD format or as an ISO string.
78+
1. **Export last 30 days of data** (default behavior):
79+
```bash
80+
ts-node index.ts
81+
```
5082

51-
### Examples
83+
2. **Export specific date range in CSV format**:
84+
```bash
85+
ts-node index.ts --start-date 2024-01-01 --end-date 2024-02-01 --format csv
86+
```
5287

53-
1. Export last 30 days of data in JSONL format:
88+
3. **Export with full request/response bodies**:
89+
```bash
90+
ts-node index.ts --limit 100 --include-body
91+
```
5492

93+
4. **Custom output file**:
5594
```bash
56-
ts-node index.ts
95+
ts-node index.ts --output my-export.jsonl
5796
```
5897

59-
2. Export data for a specific date range in CSV format:
98+
5. **Filter by property** (e.g., only export LlamaCoder requests):
99+
```bash
100+
ts-node index.ts --property appname=LlamaCoder
101+
```
60102

103+
6. **Multiple property filters**:
61104
```bash
62-
ts-node index.ts --start-date 2024-01-01 --end-date 2024-02-01 --format csv
105+
ts-node index.ts --property appname=LlamaCoder --property environment=production
63106
```
64107

65-
3. Export limited number of records with full request/response bodies:
108+
#### Advanced Usage
66109

110+
7. **Quiet mode for automation**:
67111
```bash
68-
ts-node index.ts --limit 100 --include-body
112+
ts-node index.ts --log-level quiet --limit 10000
113+
```
114+
115+
8. **Verbose logging for debugging**:
116+
```bash
117+
ts-node index.ts --log-level verbose --max-retries 10
69118
```
70119

71-
4. Export data in pretty-printed JSON format:
120+
9. **Large export with custom batch size**:
121+
```bash
122+
ts-node index.ts --limit 50000 --batch-size 500
123+
```
72124

125+
10. **Clean state and start fresh**:
73126
```bash
74-
ts-node index.ts --format json --limit 50
127+
ts-node index.ts --clean-state
128+
```
129+
130+
11. **Filter by property with other options**:
131+
```bash
132+
ts-node index.ts --property appname=LlamaCoder --format csv --limit 5000 --include-body
133+
```
134+
135+
#### Recovery Scenarios
136+
137+
12. **After a crash** (automatic resume prompt):
138+
```bash
139+
ts-node index.ts
140+
# Will detect checkpoint and ask: "Resume from checkpoint? (y/n)"
141+
```
142+
143+
13. **Force resume from checkpoint**:
144+
```bash
145+
ts-node index.ts --resume
146+
```
147+
148+
14. **Cancel and save progress** (during export):
149+
```
150+
Press Ctrl+C during export
151+
# Progress is saved automatically
152+
# Run the same command again to resume
75153
```
76154

77155
## Output Formats
@@ -101,30 +179,89 @@ Tabular format with the following columns:
101179
- latency
102180
- cost
103181

182+
## How It Works
183+
184+
### Auto-Recovery System
185+
186+
The tool automatically saves checkpoints after each batch of records:
187+
188+
1. **Checkpoint file** (`.helicone-export-state.json`) tracks:
189+
- Current offset in the export
190+
- Total records processed
191+
- Output file path
192+
- Export configuration
193+
194+
2. **On restart**, the tool:
195+
- Detects existing checkpoint
196+
- Validates it matches current configuration
197+
- Prompts user to resume or start fresh
198+
199+
3. **On crash/interrupt**:
200+
- Checkpoint is saved before exit
201+
- Output file is properly closed
202+
- No data loss occurs
203+
204+
### Retry Logic
205+
206+
When API requests fail, the tool automatically retries with exponential backoff:
207+
208+
- **Attempt 1**: Wait 1 second
209+
- **Attempt 2**: Wait 2 seconds
210+
- **Attempt 3**: Wait 4 seconds
211+
- **Attempt 4**: Wait 8 seconds
212+
- **Attempt 5**: Wait 16 seconds
213+
214+
Special handling for rate limits (429):
215+
- Respects `Retry-After` header if present
216+
- Otherwise uses exponential backoff
217+
218+
### Progress Tracking
219+
220+
Three log levels available:
221+
222+
- **quiet**: Only start/complete/error messages
223+
- **normal**: Progress bar with ETA and records/sec
224+
- **verbose**: Detailed logs of each API call and retry attempt
225+
226+
Example progress bar:
227+
```
228+
[==================> ] 62% (6,234/10,000) ETA: 3m 45s | 12.3 rec/s
229+
```
230+
104231
## Rate Limiting
105232

106-
The tool implements automatic rate limiting:
233+
The tool implements intelligent rate limiting:
107234

108-
- Processes records in batches of 1000
235+
- Processes records in configurable batches (default 1000)
109236
- Fetches signed bodies in chunks of 10
110237
- Adds delays between chunks to avoid API limits
238+
- Automatically handles 429 rate limit responses
111239

112240
## Error Handling
113241

114-
- Validates command-line arguments
115-
- Handles API errors gracefully
116-
- Provides clear error messages
117-
- Ensures proper cleanup of file streams
242+
Comprehensive error handling:
243+
244+
- ✅ Pre-flight validation (API key, permissions, disk space)
245+
- ✅ Validates command-line arguments
246+
- ✅ Handles API errors with retry logic
247+
- ✅ Distinguishes retryable vs fatal errors
248+
- ✅ Provides clear, actionable error messages
249+
- ✅ Ensures proper cleanup of file streams and signal handlers
250+
251+
## Architecture
118252

119-
## Development
253+
The code is structured into specialized classes:
120254

121-
The code is written in TypeScript and follows modern best practices:
255+
- **CheckpointManager**: Handles state persistence and recovery
256+
- **ProgressTracker**: Manages logging and progress display
257+
- **HeliconeClient**: API client with retry logic
258+
- **ExportWriter**: Handles file writing for different formats
122259

123-
- Strong typing
124-
- Error handling
125-
- Resource cleanup
126-
- Rate limiting
127-
- Progress tracking
260+
Benefits:
261+
- Strong TypeScript typing
262+
- Separation of concerns
263+
- Easy to test and maintain
264+
- Extensible for new features
128265

129266
## License
130267

0 commit comments

Comments
 (0)