Description
You've mentioned a few times that you'd like to move directory traversal to rayon. Here's a working/incomplete sketch of one approach to doing that:
https://github.com/jessegrosjean/walk
Maybe all obvious, but maybe a useful resource (and was good learning for me) for later development. The basic design is to create a channel to use as work queue. And then use rayon's par_bridge()
to process that channel of work in parallel.
Performance is comparable to ignore
crate walking linux source:
par_ignore_walk time: [72.235 ms 73.115 ms 73.753 ms]
change: [-3.4721% -2.1156% -0.7586%] (p = 0.01 < 0.05)
Change within noise threshold.
rayon_walk time: [60.600 ms 60.734 ms 60.917 ms]
change: [+0.0896% +0.5440% +1.0160%] (p = 0.04 < 0.05)
Change within noise threshold.
In this run I have ignore crate not doing any filtering, but it's still likely doing more then my rayon_walk, so I expect performance is about the same between the two even though rayon walk is running faster. Another benefit of this design is that it's fairly strait forward to get sorted results also computed in parallel. But that adds some complexity so I figured I would post this without that code for now.
Last you also mentioned wanting to rethink the ignore
create API. Currently it's callback based. I wonder if it would make more sense (again if you are thinking long term about a redesign) to make it iterator based. So ignore crate would just be responsible for generating an iterator over DirEntry
results as quickly as possible. If you also wanted to do heavy processing on those entries you could call par_bridge
again on those results and do your heavy processing (such as perform ripgrep search) there.