Skip to content

Commit ead7252

Browse files
Dirk Petersenclaude
andcommitted
Add CLAUDE.md for AI-assisted development guidance
Added comprehensive documentation for Claude Code to understand: - Project architecture and single-file monolithic design - Core classes with line numbers for navigation - Development workflow and testing procedures - Key data flows and important file artifacts - Release process and HPC-specific considerations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 1174258 commit ead7252

File tree

1 file changed

+214
-0
lines changed

1 file changed

+214
-0
lines changed

CLAUDE.md

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Froster is a user-friendly archiving tool for teams that move data between high-cost POSIX file systems and low-cost S3-like object storage systems (AWS, GCS, Wasabi, IDrive, Ceph, Minio). It handles large-scale data archiving (hundreds of TiB to PiB), particularly on HPC systems with Slurm integration.
8+
9+
**Key capabilities:**
10+
- Crawl file systems to identify archiving candidates ("hotspots")
11+
- Archive folders to S3/Glacier with checksum verification
12+
- Restore data from Glacier with retrieval status tracking
13+
- Mount S3/Glacier storage via FUSE
14+
- Slurm batch job integration for long-running operations
15+
16+
## Installation and Setup
17+
18+
### Install for development
19+
20+
```bash
21+
# Clone and set up development environment
22+
git clone https://github.com/dirkpetersen/froster.git
23+
cd froster
24+
python3 -m venv .venv
25+
source .venv/bin/activate
26+
27+
# Install in editable mode
28+
export LOCAL_INSTALL=true
29+
./install.sh
30+
```
31+
32+
The `install.sh` script installs:
33+
- Froster Python package (via pip in editable mode)
34+
- pwalk (C-based parallel file system crawler, compiled from source)
35+
- rclone (S3 transfer tool, downloaded binary)
36+
37+
### Test commands
38+
39+
```bash
40+
# Run basic feature tests
41+
python3 tests/test_basic_features.py
42+
43+
# Run credentials tests
44+
python3 tests/test_credentials.py
45+
46+
# Run all tests with unittest
47+
python3 -m unittest discover tests/
48+
```
49+
50+
## Architecture
51+
52+
### Single-File Monolithic Design
53+
54+
Froster is implemented as a **single 8000+ line Python file** (`froster/froster.py`). This design is intentional for:
55+
- Simplified deployment on HPC systems
56+
- Easy review by system administrators
57+
- Reduced dependency complexity
58+
59+
### Core Classes
60+
61+
**ConfigManager** (line 127): Manages configuration using XDG Base Directory conventions
62+
- Config location: `~/.config/froster/config.ini`
63+
- Data location: `~/.local/share/froster/`
64+
- AWS credentials: `~/.aws/credentials` and `~/.aws/config`
65+
- Archive database: `~/.local/share/froster/froster-archives.json`
66+
67+
**AWSBoto** (line 1619): Direct AWS S3/Glacier operations using boto3
68+
- Glacier retrieval triggering and status checking
69+
- S3 bucket operations
70+
- Storage class management (DEEP_ARCHIVE, GLACIER, etc.)
71+
72+
**Archiver** (line 3485): Main workflow orchestration
73+
- File system indexing with pwalk
74+
- Folder hotspot generation using DuckDB for CSV processing
75+
- Small file tarring (<1 MiB files → `Froster.smallfiles.tar`)
76+
- MD5 checksum generation and verification
77+
- Archive metadata tracking in JSON database
78+
79+
**Rclone** (line 5972): S3 transfer operations wrapper
80+
- Multi-threaded upload/download via rclone
81+
- Progress tracking and logging
82+
- Environment-based credential passing
83+
84+
**Slurm** (line 6263): Batch job submission for HPC environments
85+
- Auto-submits long-running operations as Slurm jobs
86+
- Job monitoring and output file generation
87+
- Automatic re-execution on job failure
88+
89+
**Commands** (line 6843): CLI argument parsing and subcommand dispatch
90+
- Routes subcommands: config, index, archive, delete, restore, mount, umount
91+
- Handles global flags: --cores, --mem, --no-slurm, --profile, --debug
92+
93+
### Textual TUI Applications
94+
95+
Froster uses Textual for interactive selection interfaces:
96+
97+
**TableHotspots** (line 5767): Interactive folder selection from indexed hotspots
98+
- Displays folders with size, avg file size, access/modify age
99+
- Supports filtering by --older, --newer, --larger flags
100+
- "Quit to CLI" generates archive command for batch operations
101+
102+
**TableArchive** (line 5862): Select previously archived folders for delete/restore
103+
104+
**TableNIHGrants** (line 5893): Search and link NIH research grants for FAIR metadata
105+
106+
### Key Data Flow
107+
108+
1. **Index**: pwalk → CSV → DuckDB filtering → hotspots CSV → froster-archives.json
109+
2. **Archive**: Source folder → tar small files → MD5 checksums → rclone upload → checksum verify → update JSON database
110+
3. **Delete**: Verify checksums → delete local files → leave `Where-did-the-files-go.txt` manifest
111+
4. **Restore**: Check Glacier status → trigger retrieval if needed → wait → download with rclone → verify checksums → untar
112+
113+
## Common Development Tasks
114+
115+
### Building and testing locally
116+
117+
```bash
118+
# After modifying froster/froster.py, test immediately (editable install)
119+
froster --version
120+
froster --info
121+
122+
# Test a complete workflow with dummy data
123+
mkdir -p /tmp/test_archive
124+
dd if=/dev/zero of=/tmp/test_archive/file1.dat bs=1M count=10
125+
froster archive /tmp/test_archive
126+
```
127+
128+
### Running tests
129+
130+
```bash
131+
# Single test file
132+
python3 tests/test_basic_features.py
133+
134+
# All tests
135+
python3 -m unittest discover tests/
136+
137+
# Tests require AWS credentials as environment variables
138+
export AWS_ACCESS_KEY_ID="..."
139+
export AWS_SECRET="..."
140+
```
141+
142+
### Debugging
143+
144+
Use the `--debug` flag for verbose logging:
145+
```bash
146+
froster --debug archive /path/to/folder
147+
```
148+
149+
Logs are written to `~/.local/share/froster/froster.log`. View with:
150+
```bash
151+
froster --log-print
152+
```
153+
154+
### Code navigation helpers
155+
156+
Key functions for understanding the codebase:
157+
- `main()` (line 7907): Entry point
158+
- `Commands.parse_arguments()`: CLI argument structure
159+
- `Archiver._index_locally()` (line 3525): pwalk → hotspots generation
160+
- `Archiver.do_archive()`: Main archive workflow
161+
- `Rclone._run_rclone_command()` (line 6014): S3 transfers
162+
- `AWSBoto.glacier_restore_status()`: Glacier retrieval checking
163+
164+
### Important file artifacts
165+
166+
**Generated by Froster during archiving:**
167+
- `.froster.md5sum`: MD5 checksums of all files in folder
168+
- `Froster.allfiles.csv`: Metadata for all files (including tarred files)
169+
- `Froster.smallfiles.tar`: Archive of files < 1 MiB
170+
- `Where-did-the-files-go.txt`: Manifest created after deletion
171+
172+
**Configuration files:**
173+
- `~/.config/froster/config.ini`: User settings and profiles
174+
- `~/.local/share/froster/froster-archives.json`: Archive operation database
175+
176+
## Release Process
177+
178+
Releases are automated via GitHub Actions:
179+
180+
1. Update version in `pyproject.toml`
181+
2. Push to `main` branch
182+
3. Create GitHub release with tag format: `v<Major>.<Minor>.<Subminor>`
183+
4. GitHub Action builds and publishes to PyPI automatically
184+
185+
Versioning:
186+
- Major: Breaking changes or major features
187+
- Minor: Backward-compatible new functionality
188+
- Subminor: Bug fixes or small improvements
189+
190+
## Important Considerations
191+
192+
**HPC-specific behaviors:**
193+
- Auto-detects Slurm and submits long-running operations as batch jobs
194+
- Use `--no-slurm` to force foreground execution
195+
- Slurm outputs go to `~/.local/share/froster/slurm/`
196+
197+
**Checksum verification:**
198+
- MD5 checksums are generated before upload and verified after
199+
- Never manually delete archived folders; use `froster delete` to ensure verification
200+
201+
**Small file handling:**
202+
- Files < 1 MiB are automatically tarred (saves on Glacier overhead of ~40 KiB per object)
203+
- Configure threshold: `~/.config/froster/config.ini``max_small_file_size_kib`
204+
- Disable tarring: `froster archive --no-tar`
205+
206+
**Storage class selection:**
207+
- Default: AWS `DEEP_ARCHIVE` (most cost-effective, 48-72hr retrieval)
208+
- Other classes: GLACIER, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING
209+
- Set during `froster config` or in config.ini
210+
211+
**Multiple users / shared configuration:**
212+
- Set shared config directory during `froster config`
213+
- Allows teams to share hotspot files and archive database
214+
- Individual credentials remain in `~/.aws/`

0 commit comments

Comments
 (0)