Skip to content

Commit 6c60709

Browse files
authored
Merge pull request #145 from omerbenamram/ewfinfo-binary-owned
ewf: make ewfinfo reporting binary-owned
2 parents 7295954 + 3f36392 commit 6c60709

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+7518
-832
lines changed
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
name: ewfinfo parity port
3+
overview: Implement a Rust `ewfinfo` CLI (clap) + supporting library report/printer APIs in `crates/ewf` that match libewf’s `ewfinfo` image-metadata behavior/output (text + DFXML), with explicit TODO/`unimplemented` for any unsupported surface area (no silent fallbacks). Keep logical file outputs (`-F`/`-H`/`-B`) in the `ewfinfo` binary target (not the library); use `miette` for application-facing diagnostics while keeping library errors in `thiserror`.
4+
todos:
5+
- id: ewfinfo-api
6+
content: Add documented `crates/ewf::ewfinfo` library module for image-metadata reports + printers (no LEF/-F/-H/-B).
7+
status: completed
8+
- id: docs-and-unit-tests
9+
content: Add module docs + rustdoc examples + unit tests for every new public API (include “References” with upstream source file paths).
10+
status: completed
11+
- id: ewf1-metadata
12+
content: Implement EWF1 metadata extraction for header values, media/ewf info, digest hashes, sessions/tracks, and acquisition errors.
13+
status: completed
14+
- id: ewf2-metadata
15+
content: Implement EWF2 metadata extraction (device/case tags, set-id, compression, md5/sha1 sections, etc.).
16+
status: completed
17+
- id: ewfinfo-logical-cli
18+
content: Implement `ewfinfo` CLI-only logical evidence outputs (`-F`/`-H`/`-B`) in the `ewfinfo` binary target (may extend `LefReader` public API minimally, but keep formatting + bodyfile semantics out of the library).
19+
status: completed
20+
- id: printers
21+
content: Implement text + DFXML printers for the image-metadata report that match libewf `ewfinfo` formatting.
22+
status: completed
23+
- id: cli-ewfinfo
24+
content: Add `ewfinfo` binary target using clap for libewf-compatible flags/conflicts and miette for user-facing errors (binary can be multi-file).
25+
status: completed
26+
- id: golden-tests
27+
content: Add golden-output tests for text/dfxml/file-entry/hierarchy/bodyfile, plus TODO/unimplemented tests for unsupported paths.
28+
status: completed
29+
---
30+
31+
# Port libewf `ewfinfo` to `crates/ewf`
32+
33+
## Goal
34+
35+
- Add a **new `ewfinfo` Rust binary** (in `crates/ewf`) and the supporting **public library APIs** so we can reproduce libewf `ewfinfo` feature-for-feature:
36+
- Options: `-A -B -d -e -f -F -H -i -m -s -v -V -h` (per `external/libewf/manuals/ewfinfo.1` + `external/libewf/ewftools/ewfinfo.c`).
37+
- Output: **text** and **DFXML** with the same section structure + formatting.
38+
- **No best-effort fallbacks**: if something isn’t implemented in Rust, we leave an explicit `TODO:` and return `unimplemented!()` / `Error::Unsupported("TODO: ...")` rather than silently degrading.
39+
40+
## Approach (map C ewfinfo → Rust)
41+
42+
### 1) Create a reusable ewfinfo library module (image metadata only)
43+
44+
- Add a new **library** module under `crates/ewf/src/ewfinfo/` that provides a Rust-native “report + printer” API for **image metadata only**.
45+
- **Do not** 1:1 port or “mirror” libewf’s `info_handle_t`. The `ewfinfo` **binary target** should own the clap-facing types and translate them into a small, strongly-typed library API.
46+
- Keep the boundary sharp:
47+
- **Library (`crates/ewf`)**: build a structured report for EWF *image* metadata + print it (text/DFXML).
48+
- **Binary target (`ewfinfo`)**: owns **logical evidence** modes and outputs (`-F`/`-H`/`-B`), path separator handling, and any bodyfile semantics.
49+
- Proposed (public) library surface (names TBD; document every `pub` item):
50+
- `EwfInfoReport`: data model for the sections libewf prints for images:
51+
- Acquisition/header values (libewf title: “Acquiry information”)
52+
- EWF information
53+
- Media information
54+
- Digest hash information
55+
- Sessions / Tracks
56+
- Acquisition read errors
57+
- `EwfInfoPrinter` (trait) + concrete printers (e.g. `TextPrinter`, `DfxmlPrinter`) with `EwfInfoPrintOptions` (date formatting, verbosity, etc.)
58+
- `EwfInfoBuildOptions` for report construction knobs that actually affect parsing/normalization (e.g. header decoding/codepage), **not** CLI-only options like `-s`/`-B`.
59+
- `EwfInfoError` (library) implemented with `thiserror`.
60+
- **Module documentation requirements** (non-negotiable):
61+
- Each new module gets `//!` docs with a short compatibility statement and a “References” section that attributes upstream reference material by file path (at minimum):
62+
- `external/libewf/ewftools/info_handle.h`
63+
- `external/libewf/ewftools/ewfinfo.c`
64+
- `external/libewf/manuals/ewfinfo.1`
65+
- Include rustdoc examples (doctests) that exercise the public API surface (using existing small fixtures/builders).
66+
67+
### 2) Extend readers to expose the metadata ewfinfo prints
68+
69+
Keep the existing small summary API (`EwfInfo` in [`crates/ewf/src/info.rs`](crates/ewf/src/info.rs)) stable; add *new* APIs instead.
70+
71+
#### Disk images (`EwfReader`)
72+
73+
- Add `EwfReader::ewfinfo_report(&self, opts: &EwfInfoBuildOptions) -> Result<EwfInfoReport, EwfInfoError>`.
74+
- Implement format-specific extraction:
75+
- **EWF1 (E01/S01)**: parse required sections from the already-discovered section descriptors (header/header2/volume/disk/data/hash/digest/error/session/track).
76+
- Header values: parse both `header` (ASCII/codepage) and `header2` (UTF-16LE) and construct the same identifier→description mapping used by `info_handle_header_values_fprint`.
77+
- EWF + media info fields: derive from parsed volume/disk/data structures (`sectors_per_chunk`, `bytes_per_sector`, `number_of_sectors`, `error_granularity`, `set_identifier`, compression level/method).
78+
- Hash values: read stored global hashes from digest/hash sections (no recomputation unless ewfinfo does so).
79+
- Sessions/tracks: parse ranges as start_sector/sector_count.
80+
- Acquisition errors: parse ranges as start_sector/sector_count.
81+
- **EWF2 (Ex01)**: reuse existing parsing in [`crates/ewf/src/reader.rs`](crates/ewf/src/reader.rs) (case data/device information tags) to populate the same report fields:
82+
- `set_id`, `compression_method`, `chunk_count`, `sectors_per_chunk`, `bytes_per_sector`, `number_of_sectors`
83+
- global MD5/SHA1 sections (parse from section types) to populate digest hash info
84+
- sessions/tracks/errors: if format doesn’t carry them, report 0 entries (matching libewf behavior for “none present”).
85+
86+
### 3) Implement printers for exact text + DFXML output
87+
88+
- Add printer modules under `crates/ewf/src/ewfinfo/`:
89+
- Text printer replicating:
90+
- section headers/footers and indentation
91+
- field label padding (the C code aligns to 24 columns)
92+
- exact section titles: “Acquiry information”, “EWF information”, “Media information”, “Digest hash information”, etc.
93+
- DFXML printer replicating the XML emitted by `info_handle_dfxml_*_fprint` (header/footer + element names).
94+
- **No fallback behavior**: invalid inputs/options should be rejected early. For the **CLI**, clap should enforce as much as possible (enums, conflicts, defaults). For the **library**, return explicit `EwfInfoError::Unsupported("TODO: …")` where needed rather than silently defaulting.
95+
96+
### 4) Add the `ewfinfo` binary target (clap + miette) and keep logical outputs there
97+
98+
- Implement `ewfinfo` as a **binary target** that can be split across multiple Rust modules (prefer directory-style bin: `crates/ewf/src/bin/ewfinfo/main.rs` + submodules).
99+
- Use **clap** to translate libewf flags/idioms into a typed CLI surface (instead of manually porting structs):
100+
- `#[derive(Parser)] `root + `Args`/`Subcommand` as needed.
101+
- `ValueEnum` / typed enums for `-f` (text/dfxml), `-d` (date format), etc.
102+
- conflict groups for `-e`/`-i`/`-m` (mutually exclusive), and for logical modes (`-F` vs `-H` etc.) as required.
103+
- rely on clap’s generated `--help`/`--version` UX while keeping short flags compatible.
104+
- Use **miette** for user-facing diagnostics:
105+
- Map library `thiserror` errors into `miette::Diagnostic` at the application boundary with helpful context (`wrap_err`, filenames, option values).
106+
- Keep **logical evidence outputs** out of the library:
107+
- `-F` (file entry detail), `-H` (hierarchy), `-B` (bodyfile) live in the `ewfinfo` binary target.
108+
- If the binary needs additional LEF accessors, add small, generic `pub` APIs to `LefReader` (document + unit test them), but keep formatting and bodyfile semantics in the binary.
109+
110+
### 5) Tests + documentation (unit tests first, then golden outputs)
111+
112+
- Add **unit tests** for every new library type/module under `crates/ewf/src/ewfinfo/` (and any new public reader accessors):
113+
- parsing/normalization invariants
114+
- section ordering and required fields presence
115+
- printer formatting invariants (labels, indentation, titles)
116+
- Add **CLI unit tests** (clap `try_parse_from`) for flag conflicts/defaults and for mapping from CLI types → library options.
117+
- Add deterministic **golden-output integration tests** in `crates/ewf/tests/` that:
118+
- generate small synthetic E01/Ex01/L01/Lx01 fixtures using existing writer/test helpers
119+
- run the Rust `ewfinfo` binary (via `std::process::Command`) and compare stdout to committed golden files for:
120+
- default text
121+
- `-f dfxml`
122+
- `-F` and `-H`
123+
- `-B` bodyfile output
124+
- For any feature we haven’t implemented yet (e.g., extended attributes/access control entries if present in real-world files), add a test that asserts we fail with an **explicit TODO/unimplemented** marker.
125+
126+
## Files most likely to change
127+
128+
- [`crates/ewf/src/lib.rs`](crates/ewf/src/lib.rs) (export new ewfinfo APIs)
129+
- [`crates/ewf/src/info.rs`](crates/ewf/src/info.rs) (keep as-is; add new full-metadata types elsewhere)
130+
- [`crates/ewf/src/reader.rs`](crates/ewf/src/reader.rs) (expose/retain parsed metadata needed for ewfinfo)
131+
- New: [`crates/ewf/src/ewfinfo/mod.rs`](crates/ewf/src/ewfinfo/mod.rs)
132+
- New: [`crates/ewf/src/ewfinfo/print_text.rs`](crates/ewf/src/ewfinfo/print_text.rs)
133+
- New: [`crates/ewf/src/ewfinfo/print_dfxml.rs`](crates/ewf/src/ewfinfo/print_dfxml.rs)
134+
- New (preferred): `crates/ewf/src/bin/ewfinfo/` (binary crate modules, e.g. `main.rs`, `cli.rs`, `image.rs`, `logical.rs`, `bodyfile.rs`)
135+
136+
## Notes / constraints
137+
138+
- We’ll use libewf’s behavior/spec as reference but implement logic natively in Rust; no “silent compatibility” shims.
139+
- Any missing surface area is left as `TODO:` + explicit `unimplemented`/`Unsupported` error (per your requirement).
140+
- Error policy: `thiserror` in the library; `miette` at the application boundary for pretty CLI diagnostics.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
alwaysApply: true
3+
---
4+
5+
# No best-effort code.
6+
7+
When commiting code to this repository - AVOID using "best-effort" code. If you do not have the means to implement some part of a solution to be SPEC EXACT - prefer leaving a TODO, or unimplemented path altogether!

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ members = [
1919
"crates/ewf",
2020
"crates/ntfs",
2121
"crates/ntfs-explorer-gui",
22+
"crates/dfxml",
2223
]
2324

2425
[dependencies]

crates/dfxml/Cargo.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[package]
2+
name = "dfxml"
3+
version = "0.1.0"
4+
edition = "2024"
5+
6+
[dependencies]
7+
quick-xml = "0.38.4"
8+
thiserror = "2"
9+
10+
[dev-dependencies]
11+
tempfile = "3.23"

0 commit comments

Comments
 (0)