Skip to content

perf(dracut-functions.sh): optimise get_persistent_dev() #2003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Oct 14, 2022

99base install() goes from like

real    0m0.350s
user    0m0.250s
sys     0m0.119s

to like

real    0m0.216s
user    0m0.162s
sys     0m0.076s

on my system.

Microbenchmarks show (appx. an easy-average case):

$ hyperfine -S'bash --norc' 'gget_persistent_dev /dev/zd0' 'mget_persistent_dev /dev/zd0' 'old_get_persistent_dev /dev/zd0'
Benchmark 1: gget_persistent_dev /dev/zd0
  Time (mean ± σ):       7.2 ms ±   0.3 ms    [User: 6.7 ms, System: 4.0 ms]
  Range (min … max):     5.3 ms …   8.4 ms    307 runs

Benchmark 2: mget_persistent_dev /dev/zd0
  Time (mean ± σ):       7.1 ms ±   0.6 ms    [User: 6.2 ms, System: 3.1 ms]
  Range (min … max):     4.1 ms …   7.9 ms    301 runs

Benchmark 3: old_get_persistent_dev /dev/zd0
  Time (mean ± σ):      22.3 ms ±   1.4 ms    [User: 19.1 ms, System: 5.3 ms]
  Range (min … max):    19.7 ms …  27.3 ms    127 runs

Summary
  'mget_persistent_dev /dev/zd0' ran
    1.01 ± 0.10 times faster than 'gget_persistent_dev /dev/zd0'
    3.14 ± 0.34 times faster than 'old_get_persistent_dev /dev/zd0'

For best case (this is the first device in the glob):

$ hyperfine -S'bash --norc' 'gget_persistent_dev /dev/zd272p1' 'mget_persistent_dev /dev/zd272p1' 'old_get_persistent_dev /dev/zd272p1'
+ hyperfine --style=basic '-Sbash --norc' 'gget_persistent_dev /dev/zd272p1' 'mget_persistent_dev /dev/zd272p1' 'old_get_persistent_dev /dev/zd272p1'
Benchmark 1: gget_persistent_dev /dev/zd272p1
  Time (mean ± σ):       6.5 ms ±   0.8 ms    [User: 6.1 ms, System: 3.8 ms]
  Range (min … max):     4.4 ms …   7.9 ms    284 runs

Benchmark 2: mget_persistent_dev /dev/zd272p1
  Time (mean ± σ):       6.9 ms ±   0.5 ms    [User: 5.8 ms, System: 3.2 ms]
  Range (min … max):     4.5 ms …   8.1 ms    299 runs

Benchmark 3: old_get_persistent_dev /dev/zd272p1
  Time (mean ± σ):       8.4 ms ±   0.7 ms    [User: 7.0 ms, System: 2.1 ms]
  Range (min … max):     6.1 ms …   9.6 ms    246 runs

Summary
  'gget_persistent_dev /dev/zd272p1' ran
    1.06 ± 0.15 times faster than 'mget_persistent_dev /dev/zd272p1'
    1.28 ± 0.19 times faster than 'old_get_persistent_dev /dev/zd272p1'

And for worst case (/dev/zero doesn't have a corresponding device blockdev in the glob):

$ hyperfine -S'bash --norc' 'gget_persistent_dev /dev/zero' 'mget_persistent_dev /dev/zero' 'old_get_persistent_dev /dev/zero'
Benchmark 1: gget_persistent_dev /dev/zero
  Time (mean ± σ):       7.0 ms ±   0.7 ms    [User: 6.5 ms, System: 4.0 ms]
  Range (min … max):     5.0 ms …   8.5 ms    269 runs

Benchmark 2: mget_persistent_dev /dev/zero
  Time (mean ± σ):       6.7 ms ±   0.9 ms    [User: 6.0 ms, System: 2.8 ms]
  Range (min … max):     4.8 ms …   8.8 ms    275 runs

Benchmark 3: old_get_persistent_dev /dev/zero
  Time (mean ± σ):     791.9 ms ±  44.6 ms    [User: 646.9 ms, System: 215.4 ms]
  Range (min … max):   723.1 ms … 842.1 ms    10 runs

Summary
  'mget_persistent_dev /dev/zero' ran
    1.04 ± 0.17 times faster than 'gget_persistent_dev /dev/zero'
  117.86 ± 17.18 times faster than 'old_get_persistent_dev /dev/zero'

Or, in short:

  • 20 -> 6 ms
  • 8 -> 6 ms
  • 800 -> 7 ms

In the cases that are run in the 99base setup, I saw baseline speeds of 40-50ms, and it's run like 3 times there for me. If you have a less weird (non-root-on-ZFS) or more weird (wide btrfs) setup you'll likely see a better improvement.

get_persistent_dev() is only used in module-setup.shs

Checklist

  • I have tested it locally
  • I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

@Conan-Kudo Conan-Kudo enabled auto-merge (rebase) October 17, 2022 21:30
auto-merge was automatically disabled October 22, 2022 14:08

Head branch was pushed to by a user without write access

@nabijaczleweli
Copy link
Contributor Author

Admittedly, it only happened when I extended the stat cmdline to 175x its original size on my mid-sized system (lsblk is 31 devices with grep -v p and 92 without), but I had been worried about just blowing the stat environment block size.

The contenders are:

sget_persistent_dev() {
    local _dev _pol
    _dev=$(stat -L -c $'%t:%T' "$1" 2> /dev/null)
    [ -z "$_dev" ] && return
    if [[ -n $persistent_policy ]]; then
        _pol="/dev/disk/${persistent_policy}/*"
    else
        _pol=
    fi
    # shellcheck disable=SC2086
    stat -L -c $'%n\t%t:%T' \
        $_pol \
        /dev/mapper/* \
        /dev/disk/by-uuid/* \
        /dev/disk/by-label/* \
        /dev/disk/by-partuuid/* \
        /dev/disk/by-partlabel/* \
        /dev/disk/by-id/* \
        /dev/disk/by-path/* 2> /dev/null \
        | awk -v_dev="$_dev" -vdfl="$1" '$0 ~ ("\t" _dev "$") {--NF; print; dfl=0; exit}  END {if(dfl) print dfl}'
}

pget_persistent_dev() {
    local _dev _pol
    _dev=$(stat -L -c $'%t:%T' "$1" 2> /dev/null)
    [ -z "$_dev" ] && return
    if [[ -n $persistent_policy ]]; then
        _pol="/dev/disk/${persistent_policy}/*"
    else
        _pol=
    fi
    # shellcheck disable=SC2086
    printf '%s\0' \
        $_pol \
        /dev/mapper/* \
        /dev/disk/by-uuid/* \
        /dev/disk/by-label/* \
        /dev/disk/by-partuuid/* \
        /dev/disk/by-partlabel/* \
        /dev/disk/by-id/* \
        /dev/disk/by-path/* \
        | xargs -0 stat -L -c $'%n\t%t:%T' 2> /dev/null \
        | awk -v_dev="$_dev" -vdfl="$1" '$0 ~ ("\t" _dev "$") {--NF; print; dfl=0; exit}  END {if(dfl) print dfl}'
}

fget_persistent_dev() {
    local _dev _pol
    _dev=$(stat -L -c $'%t:%T' "$1" 2> /dev/null)
    [ -z "$_dev" ] && return
    if [[ -n $persistent_policy ]]; then
        _pol="/dev/disk/${persistent_policy}"
    else
        _pol=
    fi
    # shellcheck disable=SC2086
    find \
        $_pol \
        /dev/mapper \
        /dev/disk/by-uuid \
        /dev/disk/by-label \
        /dev/disk/by-partuuid \
        /dev/disk/by-partlabel \
        /dev/disk/by-id \
        /dev/disk/by-path \
        -print0  \
        | xargs -0 stat -L -c $'%n\t%t:%T' 2> /dev/null \
        | awk -v_dev="$_dev" -vdfl="$1" '$0 ~ ("\t" _dev "$") {--NF; print; dfl=0; exit}  END {if(dfl) print dfl}'
}

And the results:

$ hyperfine -S'bash --norc' \
> 'sget_persistent_dev /dev/zd0' \
> 'pget_persistent_dev /dev/zd0' \
> 'fget_persistent_dev /dev/zd0'
Benchmark 1: sget_persistent_dev /dev/zd0
  Time (mean ± σ):       6.9 ms ±   0.7 ms    [User: 5.9 ms, System: 3.0 ms]
  Range (min … max):     4.5 ms …   7.9 ms    289 runs

Benchmark 2: pget_persistent_dev /dev/zd0
  Time (mean ± σ):       7.8 ms ±   0.8 ms    [User: 8.1 ms, System: 3.3 ms]
  Range (min … max):     5.4 ms …   9.8 ms    276 runs

Benchmark 3: fget_persistent_dev /dev/zd0
  Time (mean ± σ):       7.3 ms ±   1.0 ms    [User: 7.1 ms, System: 3.7 ms]
  Range (min … max):     5.1 ms …   8.9 ms    306 runs

Summary
  'sget_persistent_dev /dev/zd0' ran
    1.06 ± 0.18 times faster than 'fget_persistent_dev /dev/zd0'
    1.14 ± 0.17 times faster than 'pget_persistent_dev /dev/zd0'
$ hyperfine -S'bash --norc' \
> 'sget_persistent_dev /dev/zd272p1' \
> 'pget_persistent_dev /dev/zd272p1' \
> 'fget_persistent_dev /dev/zd272p1'
Benchmark 1: sget_persistent_dev /dev/zd272p1
  Time (mean ± σ):       6.7 ms ±   1.1 ms    [User: 5.8 ms, System: 3.0 ms]
  Range (min … max):     4.3 ms …   8.1 ms    289 runs

Benchmark 2: pget_persistent_dev /dev/zd272p1
  Time (mean ± σ):       6.7 ms ±   0.8 ms    [User: 7.6 ms, System: 2.4 ms]
  Range (min … max):     5.1 ms …   8.4 ms    373 runs

Benchmark 3: fget_persistent_dev /dev/zd272p1
  Time (mean ± σ):       6.7 ms ±   1.0 ms    [User: 6.5 ms, System: 3.7 ms]
  Range (min … max):     5.1 ms …   8.9 ms    296 runs

Summary
  'sget_persistent_dev /dev/zd272p1' ran
    1.00 ± 0.20 times faster than 'pget_persistent_dev /dev/zd272p1'
    1.00 ± 0.22 times faster than 'fget_persistent_dev /dev/zd272p1'
$ hyperfine -S'bash --norc' \
> 'sget_persistent_dev /dev/zero' \
> 'pget_persistent_dev /dev/zero' \
> 'fget_persistent_dev /dev/zero'
Benchmark 1: sget_persistent_dev /dev/zero
  Time (mean ± σ):       7.3 ms ±   0.8 ms    [User: 6.1 ms, System: 3.2 ms]
  Range (min … max):     5.4 ms …   8.7 ms    374 runs

Benchmark 2: pget_persistent_dev /dev/zero
  Time (mean ± σ):       7.0 ms ±   0.7 ms    [User: 7.4 ms, System: 2.8 ms]
  Range (min … max):     5.9 ms …   9.2 ms    344 runs

Benchmark 3: fget_persistent_dev /dev/zero
  Time (mean ± σ):       7.9 ms ±   0.9 ms    [User: 7.6 ms, System: 3.7 ms]
  Range (min … max):     5.8 ms …  10.3 ms    329 runs

Summary
  'pget_persistent_dev /dev/zero' ran
    1.04 ± 0.15 times faster than 'sget_persistent_dev /dev/zero'
    1.12 ± 0.17 times faster than 'fget_persistent_dev /dev/zero'

Which I'd classify as "who the hell knows" because, really, who does, it's all roughly within jitter, so I've rebased and updated to use printf because I like it more and it avoids an exec.

This shaves ~40ms (from ~50ms; average case, huge variance up to
700ms on the old one), and makes 99base go from
  real    0m0.350s
  user    0m0.250s
  sys     0m0.119s
to
  real    0m0.216s
  user    0m0.162s
  sys     0m0.076s
@stale
Copy link

stale bot commented Dec 3, 2022

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Dec 3, 2022
@nabijaczleweli
Copy link
Contributor Author

world's stinkiest bot 2022 (real)

@stale stale bot removed the stale communication is stuck label Dec 3, 2022
@stale
Copy link

stale bot commented Jan 20, 2023

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Jan 20, 2023
@nabijaczleweli
Copy link
Contributor Author

so true botstie

@stale stale bot removed the stale communication is stuck label Jan 20, 2023
@stale
Copy link

stale bot commented Feb 19, 2023

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

@stale stale bot added the stale communication is stuck label Feb 19, 2023
@nabijaczleweli
Copy link
Contributor Author

oh no, cringe (dutch)

@LaszloGombos LaszloGombos added enhancement Issue adding new functionality and removed stale communication is stuck labels Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue adding new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants