Skip to content

Conversation

@karim-manaouil
Copy link

@karim-manaouil karim-manaouil commented Nov 25, 2025

thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel memory fragmentation [2]. thchallenge-fio writes a large number of 64K files inefficiently. On a system with 128GiB of RAM, it could create up to 100K files. A typical fio config looks like this

[global]
direct=0
ioengine=sync
blocksize=4096
invalidate=0
fallocate=none
create_on_open=1

[writer]
nrfiles=98200
filesize=65536
readwrite=write
numjobs=16

While testing this on a system with 128GiB of RAM and four NVMe SSDs on RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was observed that for the initial ~20 minutes, the test was running painfully slow at only ~2MiB/s with 16 fio threads.

After some analysis, it was noticed that during this period, fio spends most of its time in looking up and adding the files from/to the file hash table, as shown in the fio report below

12.04%  [.] __lookup_file_hash

       fio                                             -      -
        |
         --9.94%--__lookup_file_hash
                   |
                    --7.17%--lookup_file_hash
                              generic_open_file
                              td_io_open_file
                              get_io_u
                              thread_main
                              run_threads
                              fio_backend
                              main
 5.96%  [.] strcmp@plt
       fio                                             -      -
        |
         --5.16%--__lookup_file_hash

56.38%  [.] __strcmp_evex

       libc.so.6                                       -      -
        |
        |--47.01%--__lookup_file_hash
        |          |
        |          |--35.25%--lookup_file_hash
        |          |          generic_open_file
        |          |          td_io_open_file
        |          |          get_io_u
        |          |          thread_main
        |          |          run_threads
        |          |          fio_backend
        |          |          main
        |          |
        |           --11.76%--add_file_hash
        |                     generic_open_file
        |                     td_io_open_file
        |                     get_io_u
        |                     thread_main
        |                     run_threads
        |                     fio_backend
        |                     main
        |
         --9.31%--__strcmp_evex
                   |
                    --5.31%--add_file_hash
                              generic_open_file
                              td_io_open_file
                              get_io_u
                              thread_main
                              run_threads
                              fio_backend
                              main

When the hash table is small, the threads spend a lot of time iterating through the files and comparing the filename. Increasing the size of the hash table helped significantly improve the performance by 20x, where now it only takes 1 minute or less. However that comes at the expense of more memory.

Luckily, because the hash table has a single lock, it was observed that giving each bucket its own lock helped achieve the same performance improvment. Since this could potentially help more with multi-threading scenarios, in this patch, we consider this approach.

Thus this patch implements a fio_sem per bucket. The filename is used to calculate which semaphore needs to be locked/unlocked.

There is a global filename_list that also originally used the hash_lock for protection. We assign a random bucket with flist_bucket_name.

[1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio
[2] https://lwn.net/Articles/770235/

Signed-off-by: Karim Manaouil [email protected]

thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel
memory fragmentation [2]. thchallenge-fio writes a large number of 64K files
inefficiently. On a system with 128GiB of RAM, it could create up to
100K files. A typical fio config looks like this

[global]
direct=0
ioengine=sync
blocksize=4096
invalidate=0
fallocate=none
create_on_open=1

[writer]
nrfiles=98200
filesize=65536
readwrite=write
numjobs=16

While testing this on a system with 128GiB of RAM and four NVMe SSDs on
RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was
observed that for the initial ~20 minutes, the test was running
painfully slow at only ~2MiB/s with 16 fio threads.

After some analysis, it was noticed that during this period, fio spends
most of its time in looking up and adding the files from/to the file
hash table, as shown in the fio report below

    12.04%  [.] __lookup_file_hash

           fio                                             -      -
            |
             --9.94%--__lookup_file_hash
                       |
                        --7.17%--lookup_file_hash
                                  generic_open_file
                                  td_io_open_file
                                  get_io_u
                                  thread_main
                                  run_threads
                                  fio_backend
                                  main
     5.96%  [.] strcmp@plt
           fio                                             -      -
            |
             --5.16%--__lookup_file_hash

    56.38%  [.] __strcmp_evex

           libc.so.6                                       -      -
            |
            |--47.01%--__lookup_file_hash
            |          |
            |          |--35.25%--lookup_file_hash
            |          |          generic_open_file
            |          |          td_io_open_file
            |          |          get_io_u
            |          |          thread_main
            |          |          run_threads
            |          |          fio_backend
            |          |          main
            |          |
            |           --11.76%--add_file_hash
            |                     generic_open_file
            |                     td_io_open_file
            |                     get_io_u
            |                     thread_main
            |                     run_threads
            |                     fio_backend
            |                     main
            |
             --9.31%--__strcmp_evex
                       |
                        --5.31%--add_file_hash
                                  generic_open_file
                                  td_io_open_file
                                  get_io_u
                                  thread_main
                                  run_threads
                                  fio_backend
                                  main

When the hash table is small, the threads spend a lot of time iterating
through the files and comparing the filename. Increasing the size of the
hash table helped significantly improve the performance by 20x, where now
it only takes 1 minute or less. However that comes at the expense of more
memory.

Luckily, because the hash table has a single lock, it was observed that
giving each bucket its own lock helped achieve the same performance
improvment. Since this could potentially help more with multi-threading
scenarios, in this patch, we consider this approach.

Thus this patch implements a fio_sem per bucket. The filename is used to
calculate which semaphore needs to be locked/unlocked.

There is a global filename_list that also originally used the hash_lock
for protection. We assign a random bucket with flist_bucket_name.

[1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio
[2] https://lwn.net/Articles/770235/
@axboe
Copy link
Owner

axboe commented Nov 25, 2025

Thanks for looking into this. The commit is missing a Signed-off-by, but more importantly I think this would be a lot cleaner if you just embed the lock inside the bucket. Then you don't need two separate structs, two separate allocations, etc. You can also then just return the bucket, rather than need to pass in an index, which is then used to index both the buckets and the locks.

@karim-manaouil
Copy link
Author

@axboe thanks for the feedback. The reason I made a separate allocation is because the hash table is a just a pointer array of linked lists of type struct flist_head (which is embedded inside fio_file), which is defined as

struct flist_head {
    struct flist_head *next, *prev;
}

It wouldn't make sense to add the locks here. We could potentiality re-define the hash table as an array of

struct hash_node {
    struct flist_head *head;
    struct fio_sem *sem;
 }

But imho, it's not that different from what I already did. Given that this is unlikely to change (last change in filehash.c was from 2021, then 2018, then 2016). I think this simple straightforward implementation is more than enough for now.

If that's ok. I will resend with Signed-off-by. Otherwise I could also look into an alternative implementation. Let me know what you think.

@axboe
Copy link
Owner

axboe commented Nov 25, 2025

I'll disagree with you there - I think the best way to do this would be to do a patch 1 that converts everything to using a struct hash_bucket or similar, with it just having the list in there. Then do a patch 2 on top of that that then switches to per-bucket locking. First patch is a nice cleanup, second is the actual functional change. As a bonus, patch 2 will be smaller than the all-in-one patch you have now. Patches should be as simple and basic as possible, and no simpler, and be simple building blocks. Splitting it up makes sense.

The fact that filehash hasn't seen a lot of churn is not an excuse to avoid making it more maintainable.

@karim-manaouil
Copy link
Author

You're right. I fully agree. That actually sounds better. I'll do exactly that, then resend soon. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants