This document describes the exact binary layout used by the goxa archiver. It details
every byte stored inside a .goxa archive so that other implementations can both
create and parse them with perfect fidelity. Unless stated otherwise all integers
are encoded in little-endian order and the structures are packed with no padding
between fields. The format intentionally mirrors how Go structures are serialized
in memory so that parsing and generation can be done with minimal overhead.
Version 2 extends the original format with a trailer that lists block offsets and adds optional per-block checksums. Older archives remain readable but lack these features.
[Header]
[File data blocks]
[Trailer]
The header records every empty directory and file. File data follows, compressed into blocks. A trailer at the end holds a block index and a checksum.
The header begins with a fixed section:
| Offset | Size | Description |
|---|---|---|
| 0 | 4 | Magic bytes GOXA |
| 4 | 2 | Version (uint16, currently 2) |
| 6 | 4 | Feature flags (uint32) |
| 10 | 1 | Compression type |
| 11 | 1 | Checksum type |
| 12 | 1 | Checksum length in bytes |
| 13 | 4 | Block size (uint32, 0 when uncompressed) |
| 17 | 8 | Trailer offset (uint64) |
| 25 | 8 | Archive size (uint64) |
| 33 | 8 | Empty directory count (uint64) |
Immediately after this count come the empty directory entries, followed by the file count and file entries. A checksum of the header (including the trailer offset) appears after the file entries.
- Magic bytes – always the ASCII string
GOXAand used to quickly verify that a file is actually a GoXA archive. - Version – current format version. Readers should reject archives with a higher version as incompatible.
- Feature flags – a bit mask selecting optional features. See the Feature Flags table below.
- Compression type – index into the compression table. If
fNoCompressis set then file data is stored uncompressed even though this field still holds a value. - Checksum type and length – define the algorithm used for archive and optional per-file checksums. Length is stored separately so future algorithms can be supported without new format versions.
- Block size – preferred size of compressed blocks. Archives that do not use
compression set this value to
0. - Trailer offset – absolute offset to the trailer from the start of the archive. This allows a reader to jump directly to the block index when seeking within a large archive.
- Archive size – total size in bytes of the entire archive file, useful when preallocating space on extraction.
- Empty directory count – number of entries in the empty directory table that follows immediately after this header.
| Value | Algorithm |
|---|---|
| 0 | gzip |
| 1 | zstd |
| 2 | lz4 |
| 3 | s2 |
| 4 | snappy |
| 5 | brotli |
The choice of compression affects only the file data blocks. Metadata is never compressed. The algorithms listed above are chosen for their balance of speed and ratio. When decoding, readers must gracefully handle unknown values by failing with a clear error message.
| Value | Algorithm |
|---|---|
| 0 | CRC32 |
| 1 | CRC16 |
| 2 | XXHash3 |
| 3 | SHA‑256 |
| 4 | Blake3 |
Blake3 is the default when checksums are enabled because it offers strong cryptographic properties while remaining extremely fast. The checksum type is used for the header, trailer, and (when enabled) for individual file data. Readers should expect the length field to match the output size of the selected algorithm and reject mismatches.
| Flag | Value | Purpose |
|---|---|---|
fNone |
0x1 | Reserved |
fAbsolutePaths |
0x2 | Store absolute paths |
fPermissions |
0x4 | Preserve permissions |
fModDates |
0x8 | Preserve modification times |
fChecksums |
0x10 | Include checksums |
fNoCompress |
0x20 | Disable compression |
fIncludeInvis |
0x40 | Include hidden files |
fSpecialFiles |
0x80 | Archive symlinks and other special files |
fBlockChecksums |
0x100 | Store per-block checksums |
Flags may be combined.
The flags control what information is stored for each entry:
fAbsolutePaths– paths are written exactly as provided, starting with a leading/. Without this flag all stored paths are relative.fPermissions– saves the Unix permission bits for directories and files. If absent, extracted files will receive default permissions chosen by the operating system.fModDates– records the last modification time. When omitted, extraction tools typically set the current time for all files.fChecksums– adds a checksum for each file's contents. This helps detect corruption but increases archive size.fNoCompress– disables compression for all files even if a compression algorithm is specified.fIncludeInvis– includes hidden files (dot files) that would otherwise be skipped.fSpecialFiles– allows storing symbolic links and other special file types such as device nodes. Without this flag those entries are ignored.fBlockChecksums– when combined withfChecksums, adds a checksum before every compressed block. This allows detection of corruption in streaming scenarios.
[Empty Dir Count: uint64]
[Entries...]
Each entry optionally stores mode and mod time depending on the flags:
[Mode uint32?][ModTime int64?][PathLen uint16][UTF-8 Path]
Paths longer than 65,535 bytes cannot be stored.
Empty directories have no associated file data and therefore occupy no space in the data block section. They are listed only so that an exact directory tree can be recreated during extraction.
[File Count: uint64]
[Entries...]
Each file entry contains:
- Uncompressed size (
uint64) - Optional mode (
uint32) and mod time (int64) [PathLen uint16][UTF-8 Path]- Type byte (
0=file,1=symlink,2=hardlink,3=other) [LinkLen uint16][Target]for symlinks and hardlinks
entryOther records only metadata and has no file data.
File paths are likewise limited to 65,535 bytes. The size field records the
uncompressed size of the file data. For symlinks and hardlinks the link field
stores the target path as UTF‑8 text and the size value is set to zero. Other
special files (for example fifos or device nodes) use entryOther and rely on
the feature flags to indicate that such files should be created during
extraction.
For each file entry the archive stores:
- A checksum of the entire file when
fChecksumsis set. WhenfBlockChecksumsis also set, a checksum precedes each block instead. - The file data split into blocks. Each block is compressed using the selected algorithm. Without compression the block size is
0and each file is stored as one block.
Checksums appear either once per file or before every block depending on the combination of fChecksums and fBlockChecksums. They are written using the algorithm and length specified in the header. Block boundaries are independent for each file and no padding is inserted between blocks or between the checksum and the following data.
The trailer starts at the offset recorded in the header. Layout:
[Block Count uint32]
[ [Offset uint64][Size uint64] ... ]
[Trailer Checksum: checksum length from header]
Offsets are absolute from the start of the archive. The trailer checksum covers everything from the Block Count field up to the end of the last block entry.
The block index allows random access to the compressed data. Each entry records the absolute offset and compressed size of one block. When block checksums are enabled an additional checksum of the block immediately follows the block data. Readers should verify the trailer checksum before trusting any offsets.
- Directories containing files are implied; only empty directories are listed.
- Compression and checksums are applied per file.
- Special file entries contain metadata but no data.
- See
create()in the source for a working reference implementation. - File data blocks are written sequentially in the same order as file entries. The trailer exists so that a reader can efficiently locate the data for any file without scanning the entire archive.
- When block checksums are enabled, the checksum length is repeated for each block so that streams can be verified on the fly.