Skip to content

Refactor attribute parsing to borrow UTF-16LE#147

Merged
omerbenamram merged 5 commits intomasterfrom
perf/borrowed-utf16-api-rebased
Jan 3, 2026
Merged

Refactor attribute parsing to borrow UTF-16LE#147
omerbenamram merged 5 commits intomasterfrom
perf/borrowed-utf16-api-rebased

Conversation

@omerbenamram
Copy link
Copy Markdown
Owner

@omerbenamram omerbenamram commented Jan 1, 2026

Summary

  • Introduce Utf16LeStr<'a> (UTF-16LE) backed by utf16-simd, and delay UTF-16LE → UTF-8 conversion until display/serialization.
  • Refactor MFT attribute parsing to be slice-based and zero-copy where possible.

Notes / Breaking changes

  • Several attribute structures are now <'a> and borrow from the entry buffer.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-targets --all-features

Note

Introduces borrowed UTF-16LE strings and migrates MFT attribute parsing to slice-based, zero-copy APIs to reduce allocations and improve safety.

  • Add utf16-simd and new Utf16LeStr<'a>; delay UTF-16→UTF-8 conversion to display/serialization
  • Refactor attribute types to be borrowing (<'a>), e.g. MftAttributeHeader<'a>, FileNameAttr<'a>, AttributeListAttr<'a>, DataAttr<'a>, IndexRootAttr<'a>, RawAttribute<'a>
  • Replace stream parsers with from_slice/from_slice_at; implement MftAttributeContent::from_record and use decode_data_runs directly
  • Update iteration over attributes to operate on record slices, correctly stop at $END, and add overflow/EOF checks
  • Replace consuming into_* helpers with non-consuming as_* accessors; adjust mft_dump, csv, and path building to use borrowed names and to_utf8_string() when needed
  • Make timestamp serializers and windows_filetime_to_timestamp public; minor JSON writer callsite cleanups
  • Add/extend tests for index root parsing, empty/nonresident mapping pairs, 32-bit overflow, $STANDARD_INFORMATION 48/72-byte layouts, and end-marker handling

Written by Cursor Bugbot for commit b38da6e. This will update automatically on new commits. Configure here.

Introduce `Utf16LeStr` backed by `utf16-simd` and refactor attribute parsing to
be slice-based and zero-copy, delaying UTF-16LE → UTF-8 conversion until output.
@omerbenamram omerbenamram force-pushed the perf/borrowed-utf16-api-rebased branch from cd29f09 to c5d2020 Compare January 1, 2026 23:15
Preserve the first Win32/Win32AndDos FILE_NAME attribute to match prior behavior and avoid unstable results when multiple Win32 names are present.
@omerbenamram omerbenamram force-pushed the perf/borrowed-utf16-api-rebased branch from 90d1dd8 to 393bda4 Compare January 3, 2026 12:12
Stop attribute iteration when only the 4-byte 0xFFFF_FFFF terminator remains, instead of erroring with UnexpectedEof. Add regression coverage for the packed terminator case and 32-bit length overflow.
@omerbenamram omerbenamram force-pushed the perf/borrowed-utf16-api-rebased branch from 393bda4 to 4bf1297 Compare January 3, 2026 12:12
Treat an empty mapping pairs section as an empty runlist to preserve pre-refactor behavior and avoid spurious FailedToDecodeDataRuns errors.
Reject zero/short index entry lengths and prevent offset overflow to avoid invalid reads and non-advancing loops when parsing index nodes.
@omerbenamram omerbenamram merged commit dc36db7 into master Jan 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant