-
Notifications
You must be signed in to change notification settings - Fork 1.4k
filehash: implement per-bucket semaphores #2019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel
memory fragmentation [2]. thchallenge-fio writes a large number of 64K files
inefficiently. On a system with 128GiB of RAM, it could create up to
100K files. A typical fio config looks like this
[global]
direct=0
ioengine=sync
blocksize=4096
invalidate=0
fallocate=none
create_on_open=1
[writer]
nrfiles=98200
filesize=65536
readwrite=write
numjobs=16
While testing this on a system with 128GiB of RAM and four NVMe SSDs on
RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was
observed that for the initial ~20 minutes, the test was running
painfully slow at only ~2MiB/s with 16 fio threads.
After some analysis, it was noticed that during this period, fio spends
most of its time in looking up and adding the files from/to the file
hash table, as shown in the fio report below
12.04% [.] __lookup_file_hash
fio - -
|
--9.94%--__lookup_file_hash
|
--7.17%--lookup_file_hash
generic_open_file
td_io_open_file
get_io_u
thread_main
run_threads
fio_backend
main
5.96% [.] strcmp@plt
fio - -
|
--5.16%--__lookup_file_hash
56.38% [.] __strcmp_evex
libc.so.6 - -
|
|--47.01%--__lookup_file_hash
| |
| |--35.25%--lookup_file_hash
| | generic_open_file
| | td_io_open_file
| | get_io_u
| | thread_main
| | run_threads
| | fio_backend
| | main
| |
| --11.76%--add_file_hash
| generic_open_file
| td_io_open_file
| get_io_u
| thread_main
| run_threads
| fio_backend
| main
|
--9.31%--__strcmp_evex
|
--5.31%--add_file_hash
generic_open_file
td_io_open_file
get_io_u
thread_main
run_threads
fio_backend
main
When the hash table is small, the threads spend a lot of time iterating
through the files and comparing the filename. Increasing the size of the
hash table helped significantly improve the performance by 20x, where now
it only takes 1 minute or less. However that comes at the expense of more
memory.
Luckily, because the hash table has a single lock, it was observed that
giving each bucket its own lock helped achieve the same performance
improvment. Since this could potentially help more with multi-threading
scenarios, in this patch, we consider this approach.
Thus this patch implements a fio_sem per bucket. The filename is used to
calculate which semaphore needs to be locked/unlocked.
There is a global filename_list that also originally used the hash_lock
for protection. We assign a random bucket with flist_bucket_name.
[1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio
[2] https://lwn.net/Articles/770235/
|
Thanks for looking into this. The commit is missing a Signed-off-by, but more importantly I think this would be a lot cleaner if you just embed the lock inside the bucket. Then you don't need two separate structs, two separate allocations, etc. You can also then just return the bucket, rather than need to pass in an index, which is then used to index both the buckets and the locks. |
|
@axboe thanks for the feedback. The reason I made a separate allocation is because the hash table is a just a pointer array of linked lists of type It wouldn't make sense to add the locks here. We could potentiality re-define the hash table as an array of But imho, it's not that different from what I already did. Given that this is unlikely to change (last change in filehash.c was from 2021, then 2018, then 2016). I think this simple straightforward implementation is more than enough for now. If that's ok. I will resend with Signed-off-by. Otherwise I could also look into an alternative implementation. Let me know what you think. |
|
I'll disagree with you there - I think the best way to do this would be to do a patch 1 that converts everything to using a The fact that filehash hasn't seen a lot of churn is not an excuse to avoid making it more maintainable. |
|
You're right. I fully agree. That actually sounds better. I'll do exactly that, then resend soon. Thank you. |
thpchallenge-fio from mmtests [1] is frequently used for testing Linux kernel memory fragmentation [2]. thchallenge-fio writes a large number of 64K files inefficiently. On a system with 128GiB of RAM, it could create up to 100K files. A typical fio config looks like this
[global]
direct=0
ioengine=sync
blocksize=4096
invalidate=0
fallocate=none
create_on_open=1
[writer]
nrfiles=98200
filesize=65536
readwrite=write
numjobs=16
While testing this on a system with 128GiB of RAM and four NVMe SSDs on RAID0 that can achieve up to 25GB/s and at least 2M IOPS, it was observed that for the initial ~20 minutes, the test was running painfully slow at only ~2MiB/s with 16 fio threads.
After some analysis, it was noticed that during this period, fio spends most of its time in looking up and adding the files from/to the file hash table, as shown in the fio report below
When the hash table is small, the threads spend a lot of time iterating through the files and comparing the filename. Increasing the size of the hash table helped significantly improve the performance by 20x, where now it only takes 1 minute or less. However that comes at the expense of more memory.
Luckily, because the hash table has a single lock, it was observed that giving each bucket its own lock helped achieve the same performance improvment. Since this could potentially help more with multi-threading scenarios, in this patch, we consider this approach.
Thus this patch implements a fio_sem per bucket. The filename is used to calculate which semaphore needs to be locked/unlocked.
There is a global filename_list that also originally used the hash_lock for protection. We assign a random bucket with flist_bucket_name.
[1] https://github.com/gormanm/mmtests/blob/master/configs/config-workload-thpchallenge-fio
[2] https://lwn.net/Articles/770235/
Signed-off-by: Karim Manaouil [email protected]