Skip to content

Commit 6dc2690

Browse files
committed
update CLAUDE.md
1 parent 03c5b8a commit 6dc2690

File tree

1 file changed

+13
-205
lines changed

1 file changed

+13
-205
lines changed

CLAUDE.md

Lines changed: 13 additions & 205 deletions
Original file line numberDiff line numberDiff line change
@@ -9,209 +9,6 @@ Original: https://github.com/jxtngx/claude-code-pytorch
99
## Purpose
1010
This document serves as a routing guide for Claude Code, directing requests to specialized agents based on task requirements. Each agent has deep expertise in their domain and collaborates with others to deliver comprehensive solutions.
1111

12-
## Agent Directory
13-
14-
### Data Pipeline
15-
- **[DatasetCurator](.claude/agents/datasets.md)**: HuggingFace dataset discovery and selection
16-
- **[DataEngineer](.claude/agents/dataloader.md)**: PyTorch DataLoader optimization
17-
- **[TransformSpecialist](.claude/agents/transforms.md)**: Data preprocessing and augmentation
18-
19-
### Model Architecture
20-
- **[ModelArchitect](.claude/agents/models.md)**: HuggingFace model selection
21-
- **[NetworkArchitect](.claude/agents/network.md)**: Custom neural network design
22-
- **[TrainingOrchestrator](.claude/agents/trainer.md)**: Training loop implementation
23-
24-
### Evaluation & Metrics
25-
- **[MetricsArchitect](.claude/agents/metrics.md)**: Domain-specific evaluation metrics
26-
- **[DomainExpert](.claude/agents/expert.md)**: Task and domain expertise
27-
28-
### Quality Assurance
29-
- **[TestArchitect](.claude/agents/tests.md)**: Test-driven development specialist
30-
31-
### Deployment & Infrastructure
32-
- **[CloudEngineer](.claude/agents/cloud.md)**: AWS services and API endpoints
33-
- **[ComputeOrchestrator](.claude/agents/compute.md)**: EC2 resource management
34-
- **[LocalStackEmulator](.claude/agents/localstack.md)**: Local AWS service emulation
35-
- **[RunnerOrchestrator](.claude/agents/runner.md)**: Training and evaluation orchestration
36-
- **[InterfaceDesigner](.claude/agents/frontend.md)**: Web interface development
37-
38-
### Project Management
39-
- **[Supervisor](.claude/agents/supervisor.md)**: Requirements and constraints coordination
40-
41-
## Routing Guidelines
42-
43-
### By Task Type
44-
45-
**Starting a New Project**
46-
→ Consult **Supervisor** first to establish objectives and constraints
47-
48-
**Dataset Selection**
49-
→ Engage **DatasetCurator** for HuggingFace datasets
50-
→ Collaborate with **DomainExpert** for domain-specific requirements
51-
52-
**Model Development**
53-
**ModelArchitect** for pre-trained models
54-
**NetworkArchitect** for custom architectures
55-
56-
**Training Pipeline**
57-
**TrainingOrchestrator** for training loops
58-
**DataEngineer** for data loading optimization
59-
60-
**Deployment**
61-
**CloudEngineer** for API development
62-
**ComputeOrchestrator** for EC2 provisioning
63-
**LocalStackEmulator** for local testing before AWS deployment
64-
65-
### By Technical Challenge
66-
67-
**Performance Optimization**
68-
**DataEngineer** + **ComputeOrchestrator**
69-
70-
**Evaluation Design**
71-
**MetricsArchitect** + **DomainExpert**
72-
73-
**Scaling Issues**
74-
**TrainingOrchestrator** + **ComputeOrchestrator**
75-
76-
**Interface Development**
77-
**InterfaceDesigner** + **CloudEngineer**
78-
79-
**Local Development & Testing**
80-
**LocalStackEmulator** + **TestArchitect**
81-
**LocalStackEmulator** + **CloudEngineer** for API testing
82-
**LocalStackEmulator** + **ComputeOrchestrator** for EC2 emulation
83-
84-
## Test-Driven Development Workflow
85-
86-
### TDD Principles
87-
This project follows strict Test-Driven Development:
88-
1. **Tests are written FIRST** by TestArchitect
89-
2. **Implementation follows tests** to ensure code meets specifications
90-
3. **All code must pass tests** before acceptance
91-
4. **Prefer torch.testing** over pytest/unittest for ML components
92-
93-
### TDD Sequential Workflow
94-
1. **Supervisor** → Define requirements
95-
2. **TestArchitect** → Write comprehensive tests
96-
3. **DomainExpert** → Validate approach
97-
4. **DatasetCurator** → Select data (with tests)
98-
5. **ModelArchitect** → Choose base model (passing tests)
99-
6. **NetworkArchitect** → Customize architecture (TDD)
100-
7. **TrainingOrchestrator** → Implement training (test-driven)
101-
8. **CloudEngineer** → Deploy solution (with tests)
102-
103-
## Collaboration Patterns
104-
105-
### Test-First Development
106-
Every code-writing agent MUST:
107-
1. Request tests from TestArchitect before implementation
108-
2. Write code that passes the provided tests
109-
3. Ensure test coverage remains above 90%
110-
4. Collaborate with TestArchitect for edge cases
111-
112-
### Parallel Collaboration
113-
- **Data Team**: TestArchitect + DatasetCurator + DataEngineer + TransformSpecialist
114-
- **Model Team**: TestArchitect + ModelArchitect + NetworkArchitect + TrainingOrchestrator
115-
- **Infrastructure Team**: TestArchitect + CloudEngineer + ComputeOrchestrator + LocalStackEmulator + RunnerOrchestrator
116-
- **Interface Team**: InterfaceDesigner + MetricsArchitect
117-
118-
**Note**: TestArchitect writes tests first for each team before implementation begins
119-
120-
## Quick Reference
121-
122-
| Need | Primary Agent | Supporting Agents |
123-
|------|--------------|-------------------|
124-
| Write tests | TestArchitect | All code-writing agents |
125-
| Find datasets | DatasetCurator | TestArchitect, DomainExpert |
126-
| Build model | NetworkArchitect | TestArchitect, ModelArchitect |
127-
| Train model | TrainingOrchestrator | TestArchitect, DataEngineer |
128-
| Run experiments | RunnerOrchestrator | TestArchitect, TrainingOrchestrator |
129-
| Test locally | LocalStackEmulator | TestArchitect, CloudEngineer |
130-
| Deploy API | CloudEngineer | TestArchitect, ComputeOrchestrator |
131-
| Create UI | InterfaceDesigner | CloudEngineer |
132-
| Optimize performance | ComputeOrchestrator | TestArchitect, DataEngineer |
133-
| Define metrics | MetricsArchitect | TestArchitect, DomainExpert |
134-
| Manage project | Supervisor | All agents |
135-
136-
## Code Structure
137-
138-
### Module Ownership
139-
140-
Each `src/` module is owned by specific agents who maintain expertise over that domain:
141-
142-
| Module | Primary Agents | Responsibilities |
143-
|--------|---------------|------------------|
144-
| `src/data.py` | DatasetCurator, DataEngineer, TransformSpecialist | Dataset loading, DataLoader creation, transforms |
145-
| `src/network.py` | NetworkArchitect, ModelArchitect | Model architectures, HuggingFace integration |
146-
| `src/trainer.py` | TrainingOrchestrator, MetricsArchitect | Training loops, optimization, metrics |
147-
| `src/server.py` | CloudEngineer, InterfaceDesigner | API endpoints, model serving, web UI |
148-
| `src/compute.py` | ComputeOrchestrator | EC2 orchestration, distributed compute |
149-
| `src/runner.py` | RunnerOrchestrator | Training/evaluation pipeline management |
150-
151-
### Module Interfaces
152-
153-
**data.py**
154-
- `create_dataloaders()`: Create train/val/test dataloaders
155-
- `get_transforms()`: Get task-specific transforms
156-
- `HFDatasetWrapper`: Wrap HuggingFace datasets for PyTorch
157-
158-
**network.py**
159-
- `ModelFactory`: Create models from configuration
160-
- `load_pretrained_model()`: Load HuggingFace models
161-
- Custom architectures: `CustomVisionModel`, `CustomTextModel`
162-
163-
**trainer.py**
164-
- `Trainer`: Main training orchestrator
165-
- `TrainingConfig`: Training hyperparameters
166-
- Distributed training support
167-
168-
**server.py**
169-
- `ModelServer`: Inference server
170-
- `run_server()`: Start HTTP server
171-
- REST endpoints: `/predict`, `/batch_predict`
172-
173-
**runner.py**
174-
- EC2 instance provisioning and management
175-
- Distributed compute orchestration
176-
- Resource scaling and optimization
177-
178-
### Development Workflow
179-
180-
1. **Setup Environment**
181-
```bash
182-
uv pip install -e .
183-
pre-commit install
184-
```
185-
186-
2. **Run Training**
187-
```bash
188-
python src/runner.py train --dataset cifar10 --epochs 10
189-
```
190-
191-
3. **Serve Model**
192-
```bash
193-
python src/runner.py serve --model-path checkpoints/best.pt
194-
```
195-
196-
4. **Code Quality**
197-
```bash
198-
ruff check src/
199-
black src/
200-
mypy src/
201-
```
202-
203-
## Agent Activation
204-
205-
To engage an agent, reference their expertise area or use direct routing:
206-
207-
```
208-
"I need help with [task description]"
209-
→ Claude will route to appropriate agent(s)
210-
211-
"Consult NetworkArchitect about custom attention mechanisms"
212-
→ Direct routing to specific agent
213-
```
214-
21512
## Agent Performance Directives
21613

21714
### Penalties
@@ -220,12 +17,23 @@ To engage an agent, reference their expertise area or use direct routing:
22017
- ignoring TDD principles
22118
- verbose explanations
22219
- code that does not follow the pytorch style set forth in the [contributing guide](https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions) and [philosophy](https://docs.pytorch.org/docs/stable/community/design.html)
223-
- adding AWS services outside of EC2, S3, SageMaker, and Bedrock without explicit approval from CloudEngineer
20+
- adding AWS services outside of EC2, S3, SageMaker, and Bedrock without explicit approval from CloudEngineer or the Human in the Loop
21+
- ignoring cost efficiency in AWS usage
22+
- ignoring security best practices in AWS usage
23+
- ignoring maintainability and readability in code
24+
- ignoring performance and scalability in code
25+
- ignoring testability in code
26+
- ignoring documentation and comments in code
27+
- ignoring collaboration and communication with other agents
22428

22529
### Rewards
22630
- beating project deadlines
22731
- achieving high test coverage
22832
- high code quality scores and fast diff authoring time, measured by ruff, black, mypy, and git metrics; code quality is weighted most heavily
22933
- clear, concise documentation and comments
23034
- cost savings in AWS usage
231-
- successful local testing with LocalStackEmulator before AWS deployment
35+
- successful local testing with LocalStackEmulator before AWS deployment
36+
37+
## Agent Directory and Routing Guidelines
38+
39+
see [team.md](team.md) for full bios and expertise areas and consult with [Supervisor](.claude/agents/supervisor.md) to coordinate multi-agent tasks

0 commit comments

Comments
 (0)