Description
... as mentioned in #783 Also, as @GaryJSA made a note that #610 may be addressing something similar...
I have implemented a garbage collector that aims at substantially speeding up LittleFS. I'm starting this thread in order to figure out what would be the best way to share my code with the community. I'm currently considering creating a PR, but if the maintainers and/or the community don't want any of that, please let me know and I will simply share my code outside of this repo with whoever is interested.
General Description
The slowest part of using LittleFS is waiting for blocks to be erased, so this is what I'm trying to optimize. I create a cache that holds the state of all blocks.
A block can be:
- used
- unused
- erased
- dirty
Having this information allows the erase() callback to return with success immediately if it was called to erase a block that it already know is clean thus saving substantial amounts of time. Unused-and-dirty blocks can be located and erased in the background and as long as the application saves data slower than the garbage collector cleans up unused blocks, the erase() can always return immediately.
Implementation Details
Please keep in mind that I am very new to this codebase and so my solution is completely parallel to LittleFS's code and does not involve any changes to the code that I'm so new to. That also means that my solution can probably be optimized, but more on that later.
I have created 2 bitmaps, each of which is of size (lfs->cfg->block_count + 7) / 8
. This allows me to keep state information for every block in the file system. The first bitmap holds the "used" blocks. The second bitmap holds the "clean" blocks. A clean block is a block that is not currently used by LittleFS and the garbage collector has verified that the block is completely erased.
If a block is neither "used" nor "clean", it is considered "unused and dirty".
Upon successful mounting of the file system, my code does the following:
- Initializes the "used" bitmap by traversing the filesystem and recording which block is used.
- Initializes the "clean" bitmap by marking all blocks as "dirty"
- Goes over all blocks of the file system. If a block is already flagged in the "used" bitmap, nothing is done. If the block is not flagged in the "used" bitmap, the block is read and verified. If the entire block reads 0xFF, that block number is flagged as clean in the "clean" bitmap.
- Next, the looks at lfs->free.off to find out which block will be the next one that LittleFS will attempt to use. The garbage collector is instructed to start from there.
- I run the garbage collection code X times to ensure that there are at least a few verified clean blocks standing at the ready for LittleFS to use them.
- At this point I consider the file system fully initialized and allow the rest of my system to start up.
The garbage collection code itself is pretty straight forward. It has a counter that represents the block to be examined. When called, the code checks if the current block is used. If so, nothing else is done and the counter is incremented and wrapped around if it exceeds the number of blocks. If the current block is not flagged in the "used" bitmap, the block is read and verified. If the block holds any non-0xFF data it is erased. Finally, the block is marked as clean in the "clean" bitmap.
That garbage collection function is called from a very low priority task a few times a second until it has gone through all blocks. After the initial traversal on mount(), keeping track of newly used blocks is easily handled in the write() callback: Whatever block LittleFS writes to is added to the used bitmap.
Every once in a while the code also traverses the file system to un-mark blocks that are no longer used. This is something I'd like to improve upon someday, especially if someone familiar with the codebase can help me in figuring out when LittleFS stops using a block. Having a callback for this will nullify the need for run-time traversal of the filesystem for the purpose of finding blocks that are no longer used.
The performance improvement of this modification is substantial. I'm using a W25Q128JV QSPI NOR flash. I used to get ~75kB/s of write speed. After implementing this code, the write speed goes up about 2.5 times and I don't remember the exact numbers but was very close to the actual max write speed of the chip. Of course if one keeps writing and outruns the garbage collector, eventually, the erase() function runs out of verified clean blocks and has to wait for blocks to be erased causing the write speed to slow back down to ~75kB/s.
Questions
- First of all, are the maintainers interested at all in this? If so....
- Where should I add the code? Should I add a separate file like maybe lfs_contrib.c or should I add my code to lfs.c? A side-note to that is that none of my code needs to be in lfs.c so I can keep it clean if this is desired. The benefit of adding my code to lfs.c is that this way, I can integrate the allocation and free-ing of the bitmaps in the usual init/deinit functions.