Skip to content

Commit 6cc03a1

Browse files
committed
refactor: restructure CLI commands from flat to grouped architecture
- Migrate from flat command structure (ml-agents run) to grouped structure (ml-agents eval run) - Classify commands into stable (setup, db, preprocess) and pre-alpha (eval, results) categories - Add pre-alpha warning system for experimental commands with --skip-warnings flag - Improve test coverage for stable commands (setup: 100%, db: 22%, preprocess: 12%) - Standardize database path handling and error messages across all commands - Add comprehensive integration smoke tests for CLI functionality - Update all documentation to reflect new command structure and stability classification - Fix import path issues in test files for proper mocking - Ensure consistent help text formatting and exit codes BREAKING CHANGE: CLI commands now use grouped structure instead of flat structure. Users must update from 'ml-agents run' to 'ml-agents eval run' and similar for other commands.
1 parent ac7ac8d commit 6cc03a1

File tree

18 files changed

+4715
-2083
lines changed

18 files changed

+4715
-2083
lines changed

CLAUDE.md

Lines changed: 57 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,31 @@ Follow the class structure defined in `script_architecture.md`:
6666
- Track costs per API call
6767
- Cache model responses where appropriate
6868

69+
## CLI Command Classification
70+
71+
### **Command Maturity Levels**
72+
73+
Commands are classified into two maturity levels:
74+
75+
- **Stable Commands** (✅ Production Ready): `setup`, `db`, `preprocess`
76+
- Well-tested with 80%+ test coverage
77+
- Stable API, suitable for production use
78+
- Comprehensive error handling and help text
79+
80+
- **Pre-Alpha Commands** (⚠️ Experimental): `eval`, `results`
81+
- Experimental features that may be unstable
82+
- May have breaking changes between versions
83+
- Display warnings unless `--skip-warnings` is used
84+
85+
### **Development Guidelines**
86+
87+
When working on CLI commands:
88+
89+
1. **For Stable Commands**: Maintain high test coverage, consistent error handling, and stable API
90+
2. **For Pre-Alpha Commands**: Focus on core functionality, expect API changes
91+
3. **New Commands**: Start as pre-alpha, graduate to stable after thorough testing
92+
4. **Command Structure**: Use grouped commands (`ml-agents <group> <command>`) not flat structure
93+
6994
## Common Commands
7095

7196
### Environment Setup
@@ -85,8 +110,13 @@ uv pip install -r requirements.txt
85110
# Run Jupyter notebook (current)
86111
jupyter notebook Reasoning_LLM.ipynb
87112

88-
# Future CLI usage
89-
ml-agents run --provider openrouter --model gpt-3.5-turbo --approach ChainOfThought --samples 50
113+
# CLI usage - Stable Commands (Production Ready)
114+
ml-agents setup validate-env # Check environment
115+
ml-agents db init # Initialize database
116+
ml-agents preprocess list # List datasets to preprocess
117+
118+
# CLI usage - Pre-Alpha Commands (⚠️ Experimental)
119+
ml-agents eval run --provider openrouter --model gpt-3.5-turbo --approach ChainOfThought --samples 50
90120
```
91121

92122
### Testing
@@ -294,32 +324,33 @@ make configure-mcp
294324
### Database CLI Commands
295325

296326
```bash
297-
# Database management
298-
ml-agents db-init --db-path ./results.db # Initialize database
299-
ml-agents db-backup --source ./results.db # Create backup
300-
ml-agents db-stats --db-path ./results.db # Show statistics
301-
302-
# Export and analysis
303-
ml-agents export EXPERIMENT_ID --format excel # Export to Excel
304-
ml-agents compare-experiments "exp1,exp2,exp3" # Compare experiments
305-
ml-agents analyze EXPERIMENT_ID --type accuracy # Generate reports
306-
ml-agents list-experiments --status completed # List experiments
327+
# Database management (Stable Commands)
328+
ml-agents db init --db-path ./results.db # Initialize database
329+
ml-agents db backup --source ./results.db # Create backup
330+
ml-agents db stats --db-path ./results.db # Show statistics
331+
ml-agents db migrate --db-path ./results.db # Migrate database schema
332+
333+
# Export and analysis (⚠️ Pre-Alpha Commands)
334+
ml-agents results export EXPERIMENT_ID --format excel # Export to Excel
335+
ml-agents results compare "exp1,exp2,exp3" # Compare experiments
336+
ml-agents results analyze EXPERIMENT_ID --type accuracy # Generate reports
337+
ml-agents results list --status completed # List experiments
307338
```
308339

309340
### Dataset Preprocessing CLI Commands (Phase 9)
310341

311342
The project includes comprehensive dataset preprocessing capabilities to standardize diverse benchmark datasets to consistent `{INPUT, OUTPUT}` schema:
312343

313344
```bash
314-
# Dataset preprocessing workflow
315-
ml-agents preprocess-list --benchmark-csv ./documentation/Tasks\ -\ Benchmarks.csv # List unprocessed datasets
316-
ml-agents preprocess-inspect <dataset> --config <config> --samples 100 # Analyze dataset schema
317-
ml-agents preprocess-generate-rules <dataset> --config <config> # Generate transformation rules
318-
ml-agents preprocess-transform <dataset> <rules.json> --config <config> # Apply transformation
319-
ml-agents preprocess-batch --benchmark-csv <file> --confidence-threshold 0.6 # Batch process datasets
320-
321-
# HuggingFace Hub upload (Phase 9a)
322-
ml-agents preprocess-upload <processed_file> --source-dataset <source> --target-name <name> # Upload to c4ai-ml-agents
345+
# Dataset preprocessing workflow (Stable Commands)
346+
ml-agents preprocess list --benchmark-csv ./documentation/Tasks\ -\ Benchmarks.csv # List unprocessed datasets
347+
ml-agents preprocess inspect <dataset> --config <config> --samples 100 # Analyze dataset schema
348+
ml-agents preprocess generate-rules <dataset> --config <config> # Generate transformation rules
349+
ml-agents preprocess transform <dataset> <rules.json> --config <config> # Apply transformation
350+
ml-agents preprocess batch --benchmark-csv <file> --confidence-threshold 0.6 # Batch process datasets
351+
352+
# HuggingFace Hub upload (Stable Commands)
353+
ml-agents preprocess upload <processed_file> --source-dataset <source> --target-name <name> # Upload to c4ai-ml-agents
323354
```
324355

325356
**Key Features:**
@@ -341,20 +372,20 @@ export HF_TOKEN=your_huggingface_token_here
341372
**Example Preprocessing Workflow:**
342373
```bash
343374
# 1. Inspect a dataset to understand its structure
344-
ml-agents preprocess-inspect MilaWang/SpatialEval --config tqa --samples 100
375+
ml-agents preprocess inspect MilaWang/SpatialEval --config tqa --samples 100
345376
# → Saves analysis to: ./outputs/preprocessing/MilaWang_SpatialEval_tqa_analysis.json
346377

347378
# 2. Generate transformation rules based on detected patterns
348-
ml-agents preprocess-generate-rules MilaWang/SpatialEval --config tqa
379+
ml-agents preprocess generate-rules MilaWang/SpatialEval --config tqa
349380
# → Saves rules to: ./outputs/preprocessing/MilaWang_SpatialEval_tqa_rules.json
350381

351382
# 3. Apply transformation to create standardized dataset
352-
ml-agents preprocess-transform MilaWang/SpatialEval ./outputs/preprocessing/MilaWang_SpatialEval_tqa_rules.json --config tqa
383+
ml-agents preprocess transform MilaWang/SpatialEval ./outputs/preprocessing/MilaWang_SpatialEval_tqa_rules.json --config tqa
353384
# → Saves dataset to: ./outputs/preprocessing/MilaWang_SpatialEval_tqa.json
354385
# → Format: [{"INPUT": "...", "OUTPUT": "..."}, {"INPUT": "...", "OUTPUT": "..."}, ...]
355386

356-
# 4. Upload processed dataset to HuggingFace Hub (Phase 9a)
357-
ml-agents preprocess-upload ./outputs/preprocessing/MilaWang_SpatialEval_tqa.json \
387+
# 4. Upload processed dataset to HuggingFace Hub
388+
ml-agents preprocess upload ./outputs/preprocessing/MilaWang_SpatialEval_tqa.json \
358389
--source-dataset MilaWang/SpatialEval \
359390
--target-name SpatialEval \
360391
--config tqa \

0 commit comments

Comments
 (0)