|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +Froster is a user-friendly archiving tool for teams that move data between high-cost POSIX file systems and low-cost S3-like object storage systems (AWS, GCS, Wasabi, IDrive, Ceph, Minio). It handles large-scale data archiving (hundreds of TiB to PiB), particularly on HPC systems with Slurm integration. |
| 8 | + |
| 9 | +**Key capabilities:** |
| 10 | +- Crawl file systems to identify archiving candidates ("hotspots") |
| 11 | +- Archive folders to S3/Glacier with checksum verification |
| 12 | +- Restore data from Glacier with retrieval status tracking |
| 13 | +- Mount S3/Glacier storage via FUSE |
| 14 | +- Slurm batch job integration for long-running operations |
| 15 | + |
| 16 | +## Installation and Setup |
| 17 | + |
| 18 | +### Install for development |
| 19 | + |
| 20 | +```bash |
| 21 | +# Clone and set up development environment |
| 22 | +git clone https://github.com/dirkpetersen/froster.git |
| 23 | +cd froster |
| 24 | +python3 -m venv .venv |
| 25 | +source .venv/bin/activate |
| 26 | + |
| 27 | +# Install in editable mode |
| 28 | +export LOCAL_INSTALL=true |
| 29 | +./install.sh |
| 30 | +``` |
| 31 | + |
| 32 | +The `install.sh` script installs: |
| 33 | +- Froster Python package (via pip in editable mode) |
| 34 | +- pwalk (C-based parallel file system crawler, compiled from source) |
| 35 | +- rclone (S3 transfer tool, downloaded binary) |
| 36 | + |
| 37 | +### Test commands |
| 38 | + |
| 39 | +```bash |
| 40 | +# Run basic feature tests |
| 41 | +python3 tests/test_basic_features.py |
| 42 | + |
| 43 | +# Run credentials tests |
| 44 | +python3 tests/test_credentials.py |
| 45 | + |
| 46 | +# Run all tests with unittest |
| 47 | +python3 -m unittest discover tests/ |
| 48 | +``` |
| 49 | + |
| 50 | +## Architecture |
| 51 | + |
| 52 | +### Single-File Monolithic Design |
| 53 | + |
| 54 | +Froster is implemented as a **single 8000+ line Python file** (`froster/froster.py`). This design is intentional for: |
| 55 | +- Simplified deployment on HPC systems |
| 56 | +- Easy review by system administrators |
| 57 | +- Reduced dependency complexity |
| 58 | + |
| 59 | +### Core Classes |
| 60 | + |
| 61 | +**ConfigManager** (line 127): Manages configuration using XDG Base Directory conventions |
| 62 | +- Config location: `~/.config/froster/config.ini` |
| 63 | +- Data location: `~/.local/share/froster/` |
| 64 | +- AWS credentials: `~/.aws/credentials` and `~/.aws/config` |
| 65 | +- Archive database: `~/.local/share/froster/froster-archives.json` |
| 66 | + |
| 67 | +**AWSBoto** (line 1619): Direct AWS S3/Glacier operations using boto3 |
| 68 | +- Glacier retrieval triggering and status checking |
| 69 | +- S3 bucket operations |
| 70 | +- Storage class management (DEEP_ARCHIVE, GLACIER, etc.) |
| 71 | + |
| 72 | +**Archiver** (line 3485): Main workflow orchestration |
| 73 | +- File system indexing with pwalk |
| 74 | +- Folder hotspot generation using DuckDB for CSV processing |
| 75 | +- Small file tarring (<1 MiB files → `Froster.smallfiles.tar`) |
| 76 | +- MD5 checksum generation and verification |
| 77 | +- Archive metadata tracking in JSON database |
| 78 | + |
| 79 | +**Rclone** (line 5972): S3 transfer operations wrapper |
| 80 | +- Multi-threaded upload/download via rclone |
| 81 | +- Progress tracking and logging |
| 82 | +- Environment-based credential passing |
| 83 | + |
| 84 | +**Slurm** (line 6263): Batch job submission for HPC environments |
| 85 | +- Auto-submits long-running operations as Slurm jobs |
| 86 | +- Job monitoring and output file generation |
| 87 | +- Automatic re-execution on job failure |
| 88 | + |
| 89 | +**Commands** (line 6843): CLI argument parsing and subcommand dispatch |
| 90 | +- Routes subcommands: config, index, archive, delete, restore, mount, umount |
| 91 | +- Handles global flags: --cores, --mem, --no-slurm, --profile, --debug |
| 92 | + |
| 93 | +### Textual TUI Applications |
| 94 | + |
| 95 | +Froster uses Textual for interactive selection interfaces: |
| 96 | + |
| 97 | +**TableHotspots** (line 5767): Interactive folder selection from indexed hotspots |
| 98 | +- Displays folders with size, avg file size, access/modify age |
| 99 | +- Supports filtering by --older, --newer, --larger flags |
| 100 | +- "Quit to CLI" generates archive command for batch operations |
| 101 | + |
| 102 | +**TableArchive** (line 5862): Select previously archived folders for delete/restore |
| 103 | + |
| 104 | +**TableNIHGrants** (line 5893): Search and link NIH research grants for FAIR metadata |
| 105 | + |
| 106 | +### Key Data Flow |
| 107 | + |
| 108 | +1. **Index**: pwalk → CSV → DuckDB filtering → hotspots CSV → froster-archives.json |
| 109 | +2. **Archive**: Source folder → tar small files → MD5 checksums → rclone upload → checksum verify → update JSON database |
| 110 | +3. **Delete**: Verify checksums → delete local files → leave `Where-did-the-files-go.txt` manifest |
| 111 | +4. **Restore**: Check Glacier status → trigger retrieval if needed → wait → download with rclone → verify checksums → untar |
| 112 | + |
| 113 | +## Common Development Tasks |
| 114 | + |
| 115 | +### Building and testing locally |
| 116 | + |
| 117 | +```bash |
| 118 | +# After modifying froster/froster.py, test immediately (editable install) |
| 119 | +froster --version |
| 120 | +froster --info |
| 121 | + |
| 122 | +# Test a complete workflow with dummy data |
| 123 | +mkdir -p /tmp/test_archive |
| 124 | +dd if=/dev/zero of=/tmp/test_archive/file1.dat bs=1M count=10 |
| 125 | +froster archive /tmp/test_archive |
| 126 | +``` |
| 127 | + |
| 128 | +### Running tests |
| 129 | + |
| 130 | +```bash |
| 131 | +# Single test file |
| 132 | +python3 tests/test_basic_features.py |
| 133 | + |
| 134 | +# All tests |
| 135 | +python3 -m unittest discover tests/ |
| 136 | + |
| 137 | +# Tests require AWS credentials as environment variables |
| 138 | +export AWS_ACCESS_KEY_ID="..." |
| 139 | +export AWS_SECRET="..." |
| 140 | +``` |
| 141 | + |
| 142 | +### Debugging |
| 143 | + |
| 144 | +Use the `--debug` flag for verbose logging: |
| 145 | +```bash |
| 146 | +froster --debug archive /path/to/folder |
| 147 | +``` |
| 148 | + |
| 149 | +Logs are written to `~/.local/share/froster/froster.log`. View with: |
| 150 | +```bash |
| 151 | +froster --log-print |
| 152 | +``` |
| 153 | + |
| 154 | +### Code navigation helpers |
| 155 | + |
| 156 | +Key functions for understanding the codebase: |
| 157 | +- `main()` (line 7907): Entry point |
| 158 | +- `Commands.parse_arguments()`: CLI argument structure |
| 159 | +- `Archiver._index_locally()` (line 3525): pwalk → hotspots generation |
| 160 | +- `Archiver.do_archive()`: Main archive workflow |
| 161 | +- `Rclone._run_rclone_command()` (line 6014): S3 transfers |
| 162 | +- `AWSBoto.glacier_restore_status()`: Glacier retrieval checking |
| 163 | + |
| 164 | +### Important file artifacts |
| 165 | + |
| 166 | +**Generated by Froster during archiving:** |
| 167 | +- `.froster.md5sum`: MD5 checksums of all files in folder |
| 168 | +- `Froster.allfiles.csv`: Metadata for all files (including tarred files) |
| 169 | +- `Froster.smallfiles.tar`: Archive of files < 1 MiB |
| 170 | +- `Where-did-the-files-go.txt`: Manifest created after deletion |
| 171 | + |
| 172 | +**Configuration files:** |
| 173 | +- `~/.config/froster/config.ini`: User settings and profiles |
| 174 | +- `~/.local/share/froster/froster-archives.json`: Archive operation database |
| 175 | + |
| 176 | +## Release Process |
| 177 | + |
| 178 | +Releases are automated via GitHub Actions: |
| 179 | + |
| 180 | +1. Update version in `pyproject.toml` |
| 181 | +2. Push to `main` branch |
| 182 | +3. Create GitHub release with tag format: `v<Major>.<Minor>.<Subminor>` |
| 183 | +4. GitHub Action builds and publishes to PyPI automatically |
| 184 | + |
| 185 | +Versioning: |
| 186 | +- Major: Breaking changes or major features |
| 187 | +- Minor: Backward-compatible new functionality |
| 188 | +- Subminor: Bug fixes or small improvements |
| 189 | + |
| 190 | +## Important Considerations |
| 191 | + |
| 192 | +**HPC-specific behaviors:** |
| 193 | +- Auto-detects Slurm and submits long-running operations as batch jobs |
| 194 | +- Use `--no-slurm` to force foreground execution |
| 195 | +- Slurm outputs go to `~/.local/share/froster/slurm/` |
| 196 | + |
| 197 | +**Checksum verification:** |
| 198 | +- MD5 checksums are generated before upload and verified after |
| 199 | +- Never manually delete archived folders; use `froster delete` to ensure verification |
| 200 | + |
| 201 | +**Small file handling:** |
| 202 | +- Files < 1 MiB are automatically tarred (saves on Glacier overhead of ~40 KiB per object) |
| 203 | +- Configure threshold: `~/.config/froster/config.ini` → `max_small_file_size_kib` |
| 204 | +- Disable tarring: `froster archive --no-tar` |
| 205 | + |
| 206 | +**Storage class selection:** |
| 207 | +- Default: AWS `DEEP_ARCHIVE` (most cost-effective, 48-72hr retrieval) |
| 208 | +- Other classes: GLACIER, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING |
| 209 | +- Set during `froster config` or in config.ini |
| 210 | + |
| 211 | +**Multiple users / shared configuration:** |
| 212 | +- Set shared config directory during `froster config` |
| 213 | +- Allows teams to share hotspot files and archive database |
| 214 | +- Individual credentials remain in `~/.aws/` |
0 commit comments