Skip to content

Commit 92aa8d3

Browse files
committed
bug fixes
1 parent 8dfff06 commit 92aa8d3

4 files changed

Lines changed: 183 additions & 10 deletions

File tree

.github/copilot-instructions.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Copilot Instructions for magicnumber
2+
3+
## Project Overview
4+
A Go library that identifies file types by reading byte signatures (magic numbers) rather than relying on file extensions or MIME types. Supports 90+ file format signatures across images, video, audio, archives, executables, documents, and text formats.
5+
6+
## Build, Test, and Lint Commands
7+
8+
### Using Task (preferred)
9+
```bash
10+
# List all available tasks
11+
task --list-all
12+
13+
# Run tests
14+
task test # Standard test run (no caching)
15+
task testr # Run tests with race detection (slower)
16+
17+
# Lint and format
18+
task lint # Run gofumpt formatter and golangci-lint
19+
20+
# Static analysis
21+
task nil # Run nilaway static analysis for nil dereferences
22+
23+
# Dependencies
24+
task update # Update all dependencies to latest version
25+
task patch # Update only patch versions of dependencies
26+
27+
# Documentation
28+
task doc # Generate and browse module documentation locally
29+
```
30+
31+
### Direct Go commands
32+
```bash
33+
# Test a single package
34+
go test -count 1 -v ./archive.go ./archive_test.go
35+
36+
# Test a single function
37+
go test -run TestArchive -v -count 1 ./...
38+
39+
# Run with verbose output and detailed test names
40+
go test -v -count 1 ./...
41+
42+
# Lint check
43+
golangci-lint run -c .golangci.yaml
44+
```
45+
46+
## High-Level Architecture
47+
48+
### Core Design Pattern
49+
The library uses a **matcher pattern** where each file signature is associated with a `Matcher` function that performs byte-level validation:
50+
- `Matcher` is a function type: `func(io.ReaderAt) bool` that checks if a reader contains a specific file signature
51+
- `Finder` is a map of `Signature -> Matcher` that contains all detection logic
52+
- Signatures are defined as an `iota` enum starting from `ZeroByte` (-2)
53+
54+
### Main Components
55+
56+
**magicnumber.go** (core API)
57+
- `Signature` enum: ~90 file type constants
58+
- `Find(io.ReaderAt) Signature`: Main entry point, returns identified file type
59+
- `MatchExt(filename, reader)`: Validates if file content matches its extension
60+
- `New() *Finder`: Returns map of all signature matchers
61+
- `Ext()`: Returns map of signatures to file extensions
62+
- Helper types: `Extension`, `Finder`, `Matcher`
63+
64+
**Format-specific modules** (grouped by category):
65+
- `executable.go`: DOS/Windows executables, self-extracting archives (PKLITE, PKSFX)
66+
- `archive.go`: ZIP variants, RAR, TAR, 7z, GZip, etc. (uses PKWARE detection logic)
67+
- `media.go`: Images (JPEG, PNG, BMP, TIFF), video (MP4, AVI, MOV), audio (MP3, WAV, FLAC, OGG)
68+
- `cdimage.go`: CD/DVD ISO formats (ISO 9660, Nero, PowerISO, Alcohol 120)
69+
- `text.go`: Text and document formats (UTF-8/16/32, ANSI, PDF, RTF)
70+
- `id3.go`: ID3 tag parsing for MP3 metadata
71+
- `synthesismusic.go`: Tracker music formats (MOD, IT, XM, MTM)
72+
73+
**knowns.go**: Currently empty; placeholder for additional data structures
74+
75+
### Detection Strategy
76+
1. `Find()` iterates through all matchers in `Finder` map
77+
2. For text/special cases (ANSI, plain text, XBIN), it applies secondary checks in specific order
78+
3. Returns `Unknown` if no signature matches
79+
4. Returns `ZeroByte` for empty files
80+
81+
### Testing
82+
- Each module has a corresponding `*_test.go` file
83+
- Tests use actual file samples from `testdata/` directory
84+
- Pattern: `TestSignatureName` function names (e.g., `TestArchive`, `TestMSExe`)
85+
- Use `go test -count 1` to disable caching (important for file I/O tests)
86+
87+
## Performance Considerations
88+
89+
### Key Bottleneck: Repeated `New()` Calls
90+
**Current Issue**: Every function that detects file types creates a new `Finder` map with all matchers:
91+
- `Find()` calls `New()` once per file
92+
- `MatchExt()` calls `New()` once per file
93+
- Category functions (`Archive()`, `Image()`, `Video()`, `Document()`, etc.) in knowns.go call `New()` once each
94+
95+
**Impact**: Building the ~90-entry matcher map and dereference with `*New()` happens on every detection call. For high-volume batch processing, this is inefficient.
96+
97+
### Optimization Opportunities
98+
1. **Cache `New()` result** - Create Finder once at package init or module load, then reuse
99+
```go
100+
var (
101+
defaultFinder *Finder
102+
)
103+
104+
func init() {
105+
defaultFinder = New()
106+
}
107+
```
108+
109+
2. **Lazy initialization** - If caching entire map isn't desired, lazy-initialize on first use
110+
111+
3. **Specialized matchers** - Category functions already filter to smaller lists, which is good for targeted detection
112+
113+
### Current Strength: Minimal Byte Reading
114+
- Matchers only read necessary bytes (often 2-6 bytes from specific offsets)
115+
- No full file buffering
116+
- Good for large files or streaming scenarios
117+
118+
### Testing Impact
119+
Tests use `go test -count 1` to avoid caching, ensuring fresh reads. This is important and should be preserved.
120+
121+
## Key Conventions
122+
123+
### Code Organization
124+
- **One file per format category**, not one per signature type
125+
- Matcher functions placed near related helpers (e.g., `Pklite()` near other DOS executables)
126+
- Internal helper functions in same file as their public matchers (e.g., `NotASCII()` in text.go)
127+
128+
### Naming Conventions
129+
- Matcher functions: PascalCase, match signature constant names (e.g., `MSExe()` for `MicrosoftExecutable`)
130+
- Signature constants: Descriptive PascalCase (e.g., `PKWAREZip`, `MicrosoftExecutable`)
131+
- `.String()` returns lowercase/hyphenated descriptive names for UX
132+
- `.Title()` returns full format names (e.g., "JPEG File Interchange Format")
133+
134+
### Byte Reading Pattern
135+
All matchers follow a consistent pattern for safety:
136+
```go
137+
func MatcherName(r io.ReaderAt) bool {
138+
const size = N // number of bytes to read
139+
const offset = M // where to read from (usually 0)
140+
p := make([]byte, size)
141+
sr := io.NewSectionReader(r, offset, size)
142+
if n, err := sr.Read(p); err != nil || n < size {
143+
return false // handle read errors gracefully
144+
}
145+
// bytes.Equal() or bytes.Compare() or bitmask checks
146+
return /* validation logic */
147+
}
148+
```
149+
150+
### Testing Files
151+
- Test files use sample files from `testdata/` directory
152+
- Test pattern: Read file → pass to function → assert signature
153+
- Some matchers have multiple test cases for different variants
154+
155+
### Linting Configuration
156+
- Uses golangci-lint with custom config in `.golangci.yaml`
157+
- gofumpt formatter enabled for consistent formatting
158+
- Max complexity: 17 (cyclop), max function length: 60 lines
159+
- Several linters disabled: exhaustive, ireturn, varnamelen, wsl variants
160+
- Test files get leniency on complexity/function length rules
161+
162+
### Dependencies
163+
- `github.com/nalgeon/be`: Likely used for byte order operations (check usage)
164+
- `golang.org/x/text`: Text encoding support (UTF-8/16/32 detection)
165+
- `go.uber.org/nilaway`: Static nil pointer analysis tool
166+
167+
### Documentation Sources
168+
Magic number byte values sourced from:
169+
- Gary Kessler's File Signatures Table
170+
- Just Solve the File Format Problem (Archive Team)
171+
- OSDev Wiki
172+
- Wikipedia List of File Signatures

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
module github.com/Defacto2/magicnumber
22

3-
go 1.24.5
3+
go 1.25.6
44

55
require (
66
github.com/nalgeon/be v0.3.0

magicnumber.go

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -130,20 +130,20 @@ func (sign Signature) String() string { //nolint:funlen
130130
"TIFF image",
131131
"BMP image",
132132
"PCX image",
133-
"BMP image",
133+
"ILBM image",
134134
"Microsoft icon",
135135
"RIPscrip",
136136
"MPEG-4 video",
137137
"QuickTime video",
138-
"QuickTime video",
138+
"QuickTime M4V video",
139139
"AVI video",
140140
"Windows Media video",
141-
"MPEG-4 video",
141+
"MPEG video",
142142
"Flash video",
143143
"RealPlayer video",
144144
"MIDI audio",
145145
"MP3 audio",
146-
"ACC audio",
146+
"AAC audio",
147147
"Ogg audio",
148148
"FLAC audio",
149149
"Wave audio",
@@ -176,7 +176,7 @@ func (sign Signature) String() string { //nolint:funlen
176176
"MS-DOS KWAJ",
177177
"MS-DOS SZDD",
178178
"MS-DOS executable",
179-
"Microsoft compound fFile",
179+
"Microsoft compound file",
180180
"CD, ISO 9660",
181181
"CD, Nero",
182182
"CD, PowerISO",
@@ -189,7 +189,7 @@ func (sign Signature) String() string { //nolint:funlen
189189
"UTF-32 text",
190190
"ANSI text",
191191
"plain text",
192-
"IFF AMIM image",
192+
"IFF ANIM image",
193193
"IFF PBM image",
194194
"XBIN binary text",
195195
}[sign]
@@ -443,6 +443,7 @@ func New() *Finder { //nolint:funlen
443443
UTF32Text: Utf32,
444444
ElectronicArtsAnim: IffAnim,
445445
PlanarBitMap: IffPBM,
446+
XBinaryText: XBin,
446447
}
447448
return &finds
448449
}

magicnumber_test.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ func ExampleArchive() {
6969
}
7070
fmt.Println(sign2)
7171
// Output: Microsoft cabinet
72-
// binary data
72+
// binary data or text
7373
}
7474

7575
func ExampleFind() {
@@ -149,8 +149,8 @@ func TestUnknowns(t *testing.T) {
149149
sign, err := magicnumber.Archive(nr)
150150
be.Err(t, err, nil)
151151
be.Equal(t, magicnumber.Unknown, sign)
152-
be.Equal(t, "binary data", sign.String())
153-
be.Equal(t, "Binary data", sign.Title())
152+
be.Equal(t, sign.String(), "binary data or text")
153+
be.Equal(t, sign.Title(), "Binary data or binary text")
154154

155155
b, sign, err := magicnumber.MatchExt(emptyFile, nr)
156156
be.Err(t, err, nil)

0 commit comments

Comments
 (0)