This PR introduces significant performance optimizations for filesystem operations and adds new features for file processing commands. Key improvements include parallel batch processing, shared utilities for multi-file operations, and new command options. #44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces significant performance optimizations for filesystem operations and adds new features for file processing commands. Key improvements include parallel batch processing, shared utilities for multi-file operations, and new command options.
Changes
Performance Optimizations
Parallel batch processing: Added
Promise.allwith configurable batch sizes (default: 100) across multiple commands:find- Parallel directory traversal withreaddirWithFileTypesoptimizationdu- Parallel size calculation with batched stat callsls -R- Parallel recursive listingtree- Parallel tree buildinggrep- Parallel file searchingglob- Parallel glob expansionreaddirWithFileTypesAPI: Added new filesystem interface method to get directory entries with type info, avoiding separate stat calls for type checks. Implemented in OverlayFS.Smart stat skipping:
findnow skips stat calls when only type info is needed (e.g.,-type f) and full metadata isn't required.New Features
xargs
-ddelimiter option-d DELIMoption for custom input delimiter (GNU xargs extension)\n,\t,\r,\0,\\find . -name '*.json' | xargs -d '\n' jq .jq multi-file support
find | xargspipelinesShared batched-read utility
src/commands/batched-read.tsmodule for parallel file readingjqandxan catfor efficient multi-file processingdev:exec improvements
--root <path>for OverlayFS mounting--no-limitto remove execution limits for large scriptsBug Fixes
Test Plan
-doptionPerformance Results
Tested against a large filesystem (68k+ repos, 110k+ JSON files):
find repos -type d -name 'issues': 1.6s for 7,581 resultsrepos/**/*.jsonglob: 1.7s for 109,915 filesjqwith 28k files: 4.6s (down from 9.2s with batching)