Skip to content

Poor write performance (testing with RAM-backed implementation) #535

Open
@gtaska

Description

@gtaska

See also #395.

TLDR

The 8-byte buffer in lfs_bd_cmp() is a bottleneck on slower systems.

In a ram-backed configuration, performance increased by ~40% when this buffer was increased to 256 bytes. (Note: When using real flash devices, you will not see this level of performance improvement, as performance will be limited more by the flash device, however testing against RAM gives a good indication of any bottlenecks that may exist in the file system).

Please consider:

  • Make the size of the internal dat[] array overridable/tunable, or a member of lfs_config
  • If technically possible, leverage the existing pcache and rcache arrays to perform the compare, rather than creating a third array on the stack.

Full background

I have been investigating lfs performance on a Cortex-M4 processor running at 100MHz and, to rule out flash performance, implemented lfs_config->read(), lfs_config->prog() and lfs_config->erase() to read/write to a chunk of RAM instead (64kB, emulating 4x 256 pages per block).

I was surprised to see that even with this configuration, lfs_config->prog() performance was only ~1MB/s, indicating that there may be an inherent bottleneck in littlefs.

Part of this was due to the fact that I was using a vendor-provided precompiled version of newlib-nano with either PREFER_SIZE_OVER_SPEED or ___OPTIMIZE_SIZE___ defined that resulted in the simplest, and slowest implementation of memcpy() possible - and it was only able to achieve 10MB/s raw memcpy() throughput. (This is not directly relevant to the issue found, but I've included this in case others find this ticket - it may be worth checking your implementations of standard libc functions)

With an optimized version of raw memcpy() performance increased to 79MB/s, but lfs_config->prog() performance stubbornly remained around ~1.6MB/s.

When I increased lfs_bd_cmp()'s dat[8] buffer to 256 bytes (and therefore reduce the number of function calls to lfs_bd_read() and memcmp() by a factor of 32), lfs_config->prog performance increased to ~2.6MB/s. Still not great, but a worthy performance increase.

Numbers

Write benchmark: repeat(open/create file, write 8kB in 256bytes chunks, close file, delete file)
Read bechmark: create 8kB file, repeat(open file, read 8kB in 256bytes chunks, close file), delete file

Read Benchmark timings (1MB) littlefs throughput
newlib-nano (optimized for size) 353ms 2.8MB/s
optimized memcpy 100ms 10MB/s
Write Benchmark timings (1MB) littlefs throughput
newlib-nano (optimized for size) 1041ms 960kB/s
optimzed memcpy 634ms 1.58MB/s
lfs_bd_cmp(), dat[8->256] 417ms 2.39MB/s
no validation (best case performance) 241ms 4.15MB/s

It is worth noting that the optimized version of the Write benchmark (417ms) is slower than a separate 'no-validation' Write benchmark (241ms) followed by a Read Benchmark (100ms). This may indicate additional inefficiencies in the lfs_config->prog() implementation. Given that the lfs_config->read() performs a checksum validation, would it be faster/sufficient to validate the data written in lfs_config->prog() indirectly by using checksum validation, rather than a full-on memcmp()?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions