This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
blockcopy is a Python CLI tool for efficiently copying large files and block devices (VM devices, LVM snapshots) over network. It uses a three-stage pipeline with hash comparison to transfer only changed blocks, making incremental copies fast. Hash computation is parallelized via ThreadPoolExecutor to avoid CPU bottlenecks on fast NVMe disks.
make check # Run all tests with uv
make check pytest_args="-k test_name" # Run with additional pytest arguments
uv run pytest -s -vv tests # Same as above, explicit
uv run pytest tests/test_checksum.py::test_checksum_file # Run single testmake lint # Run flake8
uv run flake8uv sync # Install dependenciesThe tool is a single-file script (blockcopy.py) with three subcommands that form a pipeline connected via stdin/stdout:
blockcopy checksum /dev/destination | \
ssh srchost blockcopy retrieve /dev/source | \
blockcopy save /dev/destination
- checksum - Reads destination file/device, computes SHA3-512 hash for each 128KB block, outputs binary hash stream
- retrieve - Reads source file, compares hashes from stdin, outputs only differing blocks as binary data stream
- save - Reads block data from stdin, writes to destination file/device
Each stage uses ThreadPoolExecutor with:
- 1 read worker thread (reads file sequentially)
- N hash worker threads (compute SHA3-512 in parallel, N = min(cpu_count, 8))
- 1 send worker thread (writes output stream)
Threads communicate via Queue objects. ExceptionCollector aggregates errors from worker threads.
checksum → retrieve:
Hashcommand: 4 bytes cmd + 8 bytes position + 4 bytes size + 64 bytes SHA3-512 digestrestcommand: 4 bytes cmd + 8 bytes offset (signals to send remaining data beyond checksummed range)donecommand: 4 bytes cmd (signals completion)
retrieve → save:
datacommand: 4 bytes cmd + 8 bytes position + 4 bytes size + N bytes datadlzmcommand: same asdatabut LZMA compressed (optional--lzmaflag)metacommand: 4 bytes cmd + 8 bytes atime_ns + 8 bytes mtime_ns + 4 bytes mode + 4 bytes uid + 4 bytes gid + 2 bytes owner_name_len + owner_name + 2 bytes group_name_len + group_name + 8 bytes total_size + 3 bytes "end"donecommand: 4 bytes cmd (signals completion)
checksum:
--progress- show progress info to stderr--start OFFSET- start reading from byte offset--end OFFSET- stop reading at byte offset
retrieve:
--lzma- compress blocks using LZMA (sendsdlzminstead ofdatawhen compressed is smaller)
save:
--truncate- truncate destination file to source size (usesreceived_total_sizefrommeta)-t, --times- preserve atime/mtime from source-p, --perms- preserve mode from source-o, --owner- preserve uid/owner from source-g, --group- preserve gid/group from source--numeric-ids- use numeric uid/gid instead of name lookup
- Block size: 128 KB (
block_size = 128 * 1024) - Hash algorithm: SHA3-512 (
sha3_512) - Worker threads: min(cpu_count, 8)
When releasing a new version, update the version string in these locations:
blockcopy.py-__version__variablepyproject.toml-versionfield in[project]sectionREADME.md- download URLs contain the version tag (e.g.v0.0.2)tests/test_version.py- version assertions in tests