filehash: implement per-bucket semaphores #2019

karim-manaouil · 2025-11-25T18:10:35Z

thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel memory fragmentation [2]. thchallenge-fio writes a large number of 64K files inefficiently. On a system with 128GiB of RAM, it could create up to 100K files. A typical fio config looks like this

[global]
direct=0
ioengine=sync
blocksize=4096
invalidate=0
fallocate=none
create_on_open=1

[writer]
nrfiles=98200
filesize=65536
readwrite=write
numjobs=16

While testing this on a system with 128GiB of RAM and four NVMe SSDs on RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was observed that for the initial ~20 minutes, the test was running painfully slow at only ~2MiB/s with 16 fio threads.

After some analysis, it was noticed that during this period, fio spends most of its time in looking up and adding the files from/to the file hash table, as shown in the fio report below

12.04%  [.] __lookup_file_hash

       fio                                             -      -
        |
         --9.94%--__lookup_file_hash
                   |
                    --7.17%--lookup_file_hash
                              generic_open_file
                              td_io_open_file
                              get_io_u
                              thread_main
                              run_threads
                              fio_backend
                              main
 5.96%  [.] strcmp@plt
       fio                                             -      -
        |
         --5.16%--__lookup_file_hash

56.38%  [.] __strcmp_evex

       libc.so.6                                       -      -
        |
        |--47.01%--__lookup_file_hash
        |          |
        |          |--35.25%--lookup_file_hash
        |          |          generic_open_file
        |          |          td_io_open_file
        |          |          get_io_u
        |          |          thread_main
        |          |          run_threads
        |          |          fio_backend
        |          |          main
        |          |
        |           --11.76%--add_file_hash
        |                     generic_open_file
        |                     td_io_open_file
        |                     get_io_u
        |                     thread_main
        |                     run_threads
        |                     fio_backend
        |                     main
        |
         --9.31%--__strcmp_evex
                   |
                    --5.31%--add_file_hash
                              generic_open_file
                              td_io_open_file
                              get_io_u
                              thread_main
                              run_threads
                              fio_backend
                              main

When the hash table is small, the threads spend a lot of time iterating through the files and comparing the filename. Increasing the size of the hash table helped significantly improve the performance by 20x, where now it only takes 1 minute or less. However that comes at the expense of more memory.

Luckily, because the hash table has a single lock, it was observed that giving each bucket its own lock helped achieve the same performance improvment. Since this could potentially help more with multi-threading scenarios, in this patch, we consider this approach.

Thus this patch implements a fio_sem per bucket. The filename is used to calculate which semaphore needs to be locked/unlocked.

There is a global filename_list that also originally used the hash_lock for protection. We assign a random bucket with flist_bucket_name.

[1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio
[2] https://lwn.net/Articles/770235/

Signed-off-by: Karim Manaouil [email protected]

thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel memory fragmentation [2]. thchallenge-fio writes a large number of 64K files inefficiently. On a system with 128GiB of RAM, it could create up to 100K files. A typical fio config looks like this [global] direct=0 ioengine=sync blocksize=4096 invalidate=0 fallocate=none create_on_open=1 [writer] nrfiles=98200 filesize=65536 readwrite=write numjobs=16 While testing this on a system with 128GiB of RAM and four NVMe SSDs on RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was observed that for the initial ~20 minutes, the test was running painfully slow at only ~2MiB/s with 16 fio threads. After some analysis, it was noticed that during this period, fio spends most of its time in looking up and adding the files from/to the file hash table, as shown in the fio report below 12.04% [.] __lookup_file_hash fio - - | --9.94%--__lookup_file_hash | --7.17%--lookup_file_hash generic_open_file td_io_open_file get_io_u thread_main run_threads fio_backend main 5.96% [.] strcmp@plt fio - - | --5.16%--__lookup_file_hash 56.38% [.] __strcmp_evex libc.so.6 - - | |--47.01%--__lookup_file_hash | | | |--35.25%--lookup_file_hash | | generic_open_file | | td_io_open_file | | get_io_u | | thread_main | | run_threads | | fio_backend | | main | | | --11.76%--add_file_hash | generic_open_file | td_io_open_file | get_io_u | thread_main | run_threads | fio_backend | main | --9.31%--__strcmp_evex | --5.31%--add_file_hash generic_open_file td_io_open_file get_io_u thread_main run_threads fio_backend main When the hash table is small, the threads spend a lot of time iterating through the files and comparing the filename. Increasing the size of the hash table helped significantly improve the performance by 20x, where now it only takes 1 minute or less. However that comes at the expense of more memory. Luckily, because the hash table has a single lock, it was observed that giving each bucket its own lock helped achieve the same performance improvment. Since this could potentially help more with multi-threading scenarios, in this patch, we consider this approach. Thus this patch implements a fio_sem per bucket. The filename is used to calculate which semaphore needs to be locked/unlocked. There is a global filename_list that also originally used the hash_lock for protection. We assign a random bucket with flist_bucket_name. [1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio [2] https://lwn.net/Articles/770235/

axboe · 2025-11-25T18:23:15Z

Thanks for looking into this. The commit is missing a Signed-off-by, but more importantly I think this would be a lot cleaner if you just embed the lock inside the bucket. Then you don't need two separate structs, two separate allocations, etc. You can also then just return the bucket, rather than need to pass in an index, which is then used to index both the buckets and the locks.

karim-manaouil · 2025-11-25T18:55:48Z

@axboe thanks for the feedback. The reason I made a separate allocation is because the hash table is a just a pointer array of linked lists of type struct flist_head (which is embedded inside fio_file), which is defined as

struct flist_head {
    struct flist_head *next, *prev;
}

It wouldn't make sense to add the locks here. We could potentiality re-define the hash table as an array of

struct hash_node {
    struct flist_head *head;
    struct fio_sem *sem;
 }

But imho, it's not that different from what I already did. Given that this is unlikely to change (last change in filehash.c was from 2021, then 2018, then 2016). I think this simple straightforward implementation is more than enough for now.

If that's ok. I will resend with Signed-off-by. Otherwise I could also look into an alternative implementation. Let me know what you think.

axboe · 2025-11-25T18:59:35Z

I'll disagree with you there - I think the best way to do this would be to do a patch 1 that converts everything to using a struct hash_bucket or similar, with it just having the list in there. Then do a patch 2 on top of that that then switches to per-bucket locking. First patch is a nice cleanup, second is the actual functional change. As a bonus, patch 2 will be smaller than the all-in-one patch you have now. Patches should be as simple and basic as possible, and no simpler, and be simple building blocks. Splitting it up makes sense.

The fact that filehash hasn't seen a lot of churn is not an excuse to avoid making it more maintainable.

karim-manaouil · 2025-11-25T19:02:59Z

You're right. I fully agree. That actually sounds better. I'll do exactly that, then resend soon. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

filehash: implement per-bucket semaphores #2019

filehash: implement per-bucket semaphores #2019

Uh oh!

karim-manaouil commented Nov 25, 2025 •

edited

Loading

Uh oh!

axboe commented Nov 25, 2025

Uh oh!

karim-manaouil commented Nov 25, 2025

Uh oh!

axboe commented Nov 25, 2025

Uh oh!

karim-manaouil commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

filehash: implement per-bucket semaphores #2019

Are you sure you want to change the base?

filehash: implement per-bucket semaphores #2019

Uh oh!

Conversation

karim-manaouil commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axboe commented Nov 25, 2025

Uh oh!

karim-manaouil commented Nov 25, 2025

Uh oh!

axboe commented Nov 25, 2025

Uh oh!

karim-manaouil commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karim-manaouil commented Nov 25, 2025 •

edited

Loading