Refactor attribute parsing to borrow UTF-16LE#147
Merged
omerbenamram merged 5 commits intomasterfrom Jan 3, 2026
Merged
Conversation
Introduce `Utf16LeStr` backed by `utf16-simd` and refactor attribute parsing to be slice-based and zero-copy, delaying UTF-16LE → UTF-8 conversion until output.
cd29f09 to
c5d2020
Compare
Preserve the first Win32/Win32AndDos FILE_NAME attribute to match prior behavior and avoid unstable results when multiple Win32 names are present.
90d1dd8 to
393bda4
Compare
Stop attribute iteration when only the 4-byte 0xFFFF_FFFF terminator remains, instead of erroring with UnexpectedEof. Add regression coverage for the packed terminator case and 32-bit length overflow.
393bda4 to
4bf1297
Compare
Treat an empty mapping pairs section as an empty runlist to preserve pre-refactor behavior and avoid spurious FailedToDecodeDataRuns errors.
Reject zero/short index entry lengths and prevent offset overflow to avoid invalid reads and non-advancing loops when parsing index nodes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Utf16LeStr<'a>(UTF-16LE) backed byutf16-simd, and delay UTF-16LE → UTF-8 conversion until display/serialization.Notes / Breaking changes
<'a>and borrow from the entry buffer.Test plan
cargo fmt --all -- --checkcargo clippy --all-targets --all-features -- -D warningscargo test --all-targets --all-featuresNote
Introduces borrowed UTF-16LE strings and migrates MFT attribute parsing to slice-based, zero-copy APIs to reduce allocations and improve safety.
utf16-simdand newUtf16LeStr<'a>; delay UTF-16→UTF-8 conversion to display/serialization<'a>), e.g.MftAttributeHeader<'a>,FileNameAttr<'a>,AttributeListAttr<'a>,DataAttr<'a>,IndexRootAttr<'a>,RawAttribute<'a>from_slice/from_slice_at; implementMftAttributeContent::from_recordand usedecode_data_runsdirectly$END, and add overflow/EOF checksinto_*helpers with non-consumingas_*accessors; adjustmft_dump,csv, and path building to use borrowed names andto_utf8_string()when neededwindows_filetime_to_timestamppublic; minor JSON writer callsite cleanups$STANDARD_INFORMATION48/72-byte layouts, and end-marker handlingWritten by Cursor Bugbot for commit b38da6e. This will update automatically on new commits. Configure here.