Status: Final Specification - v1.0 Format Version: 1.0.0 Effective Date: 2025-12-19 Authority: Blackfall Labs Maintainer: Magnus Victis Trent
Engrams are a universal archive format designed for long-term knowledge preservation with integrated database access capabilities. They provide write-once containers maintaining semantic structure, cryptographic verification, and embedded SQLite database querying without temporal dependencies on external tooling or network services.
An Engram is more than a file: it is a declaration that knowledge should survive network loss, platform collapse, and institutional decay.
This is the normative specification for the Engram immutable archive format.
Documentation:
- FORMAT-SPECIFICATION.md - Pure, implementation-agnostic binary format definition (v1.0)
- LICENSE - MIT License
FORMAT-SPECIFICATION.md is the canonical format definition:
- Pure binary structure and field layouts
- No implementation-specific details (no Rust, CLI, or tool references)
- Use this to implement the format in any language
- Normative reference for format compliance
For implementation guidance (Rust code examples, CLI workflows, operational best practices), see the reference implementation at engram-rs.
**The format specification defines:
- Binary structure and field layouts
- Compression strategies (None, LZ4, Zstandard)
- Frame-based compression for large files (≥50MB)
- Local Entry Headers (LOCA) and End of Central Directory Record (ENDR)
- Virtual File System integration for embedded SQLite databases
- Cryptographic verification and encryption modes
- Immutable Knowledge — Engrams are write-once, read-many archives representing finalized knowledge containers
- Semantic Preservation — Data retains full meaning and structure across technological transitions
- Offline First — Archives remain readable and queryable without any network access
- Database Integration — SQLite databases embedded within archives accept direct SQL queries at 80-90% native performance
- Random Access — Individual files extract in sub-millisecond timeframes through O(1) hash-indexed lookup
- Format Longevity — Binary structure employs fixed-width fields, explicit versioning, and reserved extension space for multi-decade durability
[File Header: 64 bytes fixed]
- Magic number: 0x89 'E' 'N' 'G' 0x0D 0x0A 0x1A 0x0A
- Version, central directory offset/size, entry count
- Flags (encryption mode), reserved space
[File Data Region: variable length]
├─ Local Entry 1: [LOCA header][compressed data]
├─ Local Entry 2: [LOCA header][compressed data]
└─ Local Entry N: [LOCA header][compressed data]
[Central Directory: 320 bytes per entry]
- Hash-indexed file manifest
- O(1) lookup complexity
- Path, offset, size, CRC32, compression method
[End of Central Directory: 64 bytes fixed]
- ENDR signature for validation
- Duplicate offsets for corruption detection
v1.0.0 (2025-12-19) - Production Release
Local Entry Headers (LOCA): Variable-length headers preceding each file's compressed data enable sequential streaming reads without central directory consultation.
End Record (ENDR): 64-byte record at archive terminus enables validation and central directory location through backward scan.
Frame-Based Compression: Large files (≥50MB) employ 64KB independent frames permitting partial decompression of requested byte ranges.
Encryption Support: Archive-level and per-file AES-256-GCM encryption with mode flags in header.
VFS Integration: Direct SQLite database queries without extraction through Virtual File System abstraction.
See FORMAT-SPECIFICATION.md for complete technical details.
This specification is licensed under the MIT License.
Copyright (c) 2025 Blackfall Labs
Technical Support: [email protected] Issues: Please file an issue on the public repo
Ideal For:
- Immutable software releases and documentation snapshots
- Embedded database distribution with query-without-extraction
- Long-term knowledge preservation with format stability
- Cryptographic verification of distributed archives
- Random access to large file collections
Avoid For:
- Frequent incremental updates (use mutable formats)
- Whole-archive streaming decompression (use tar.gz)
- Maximum legacy tool compatibility (use ZIP)
- Very small file counts with simple access patterns