Skip to content

Conversation

@ytsssun
Copy link
Contributor

@ytsssun ytsssun commented Aug 23, 2025

Issue number:

Closes #637

Description of changes:

This PR optimizes boot time by decoupling the network stack initialization from the DATA partition, allowing network services to start in parallel with local filesystem operations.

Boot sequence change overview

Before After
image(1) After

Direct PRIVATE Partition Mount

  • Added direct mount point /.bottlerocket for the PRIVATE partition that bypasses dependency on DATA partition
  • Created a bind mount from /.bottlerocket to /var/lib/bottlerocket for backward compatibility
  • Network services now use /.bottlerocket/netdog directly instead of /var/lib/netdog

Netdog Updates

  • Updated path constants to use /.bottlerocket/netdog instead of /var/lib/netdog
  • Drop default dependencies and update generate-network-config.service with /.bottlerocket dependencies.
  • Create /etc/sysctl.d with ExecStart entry for write-network-status.service.
  • Drop default dependencies for write-network-status.service and explicitly add Before=network-online.target for write-network-status.service.
  • Removed dependency on systemd-tmpfiles for directory creation

D-Bus and Network Service Improvements

  • Added DefaultDependencies=no to dbus.socket to allow early D-Bus initialization
  • Made network-pre.target require dbus-broker.service to ensure D-Bus is ready before network services start
  • Disabled PrivateTmp for dbus-broker.service to remove dependency on systemd-tmpfiles and local-fs.target

Testing done [WIP]:

Click to expand testing details

Functional Testing

  • Tested using metal-dev instance to confirm that net.toml injected to PRIVATE partition can still be read and that
    write-network-status.service can write to the /.bottlerocket/netdog path with no issue
  • Verified the node booted properly with all network functionality intact
  • Used systemd-analyze plot to generate the boot plot and confirmed the decoupling of the network stack and DATA
    partition
  • Verified that network initialization and local-fs (DATA) now occur in parallel
  • Confirmed network configuration and state files are properly persisted across reboots
  • Validated that all existing network functionality is maintained with the new directory structure

Boot-time Testing [WIP]

  • Initial tests on m5.xlarge instances show ~987ms improvement in network-online.target timing and ~817ms improvement in configured.target timing
  • Additional performance testing across a wider range of instance types is in progress
  • We are also internally running a larger sample of boots to better measure this improvement with statistical analysis (in progress)

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@ytsssun ytsssun changed the title Network Stack: Implement Boot Time Optimization by Removing DATA Partition Dependency Network stack boot time optimization: decouple from DATA partition Aug 23, 2025
@ytsssun ytsssun force-pushed the boot-time-network-stack branch 2 times, most recently from 4f90c47 to c601e5b Compare August 25, 2025 06:13
@ytsssun ytsssun force-pushed the boot-time-network-stack branch from c601e5b to b948d56 Compare September 3, 2025 22:12
@ytsssun
Copy link
Contributor Author

ytsssun commented Sep 3, 2025

Pushed new commit to address a race condition we noticed during our test:

Networkd service started too early and it sometimes failed to get the status info of the network interface from D-Bus because D-Bus was not yet ready.

Added change to initialize D-Bus earlier and enforced dependency on dbus-broker.service in network-pre.target.

@ytsssun ytsssun requested a review from bcressey September 3, 2025 23:29
@ytsssun ytsssun force-pushed the boot-time-network-stack branch 4 times, most recently from 5c4c9b9 to a9b4687 Compare September 16, 2025 23:53
@ytsssun
Copy link
Contributor Author

ytsssun commented Sep 16, 2025

Synced with @bcressey , we are adopting different implementation here - instead of using overlayfs for the /.bottlerocket/netdog to grant netdog access to the files on private partition, we will just grant netdog api_exec_t to allow access to private_t.

Pushed code changes for that.

@ytsssun ytsssun force-pushed the boot-time-network-stack branch from a9b4687 to 33a8fe6 Compare September 17, 2025 20:39
@ytsssun
Copy link
Contributor Author

ytsssun commented Sep 17, 2025

Pushed a minor change to add /etc/sysctl.d as ReadWritePath to write-network-status.service, otherwise will get error like

Failed to write sysctl config to '/etc/sysctl.d/90-primary_interface.conf': Read-only file system (os error 30)

@ytsssun ytsssun force-pushed the boot-time-network-stack branch 3 times, most recently from 85fd745 to 94a18a4 Compare September 30, 2025 23:21
@ytsssun
Copy link
Contributor Author

ytsssun commented Sep 30, 2025

Pushed change so that we use /run/netdog for the network status files. Netdog already is properly configured to have access to the /run/netdog path, and we remove the status files on each boot, which is the same lifetime as the /run/netdog.

Comment on lines 11 to 13
RequiresMountsFor=/.bottlerocket
After=systemd-networkd-wait-online.service systemd-resolved.service run-netdog.mount prepare-sysctl.service
Requires=systemd-networkd-wait-online.service systemd-resolved.service run-netdog.mount prepare-sysctl.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the change to run-netdog.mount:

Suggested change
RequiresMountsFor=/.bottlerocket
After=systemd-networkd-wait-online.service systemd-resolved.service run-netdog.mount prepare-sysctl.service
Requires=systemd-networkd-wait-online.service systemd-resolved.service run-netdog.mount prepare-sysctl.service
RequiresMountsFor=/.bottlerocket /run/netdog
After=systemd-networkd-wait-online.service systemd-resolved.service prepare-sysctl.service
Requires=systemd-networkd-wait-online.service systemd-resolved.service prepare-sysctl.service

I would probably also just add the mkdir -p /etc/sysctl.d as another ExecStart command, vs. adding another unit that systemd has to do the bookkeeping for.
The run-netdog.mount change is a little odd because that unit is already WantedBy network-pre.target, which is reached before any of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think systemd ordering phase, /run/netdog gets mounted pretty early. This is just to explicitly call out the dependency here since we now use /run/netdog for the status files. IMO it does not hurt. Thoughts?

z /var/lib/systemd/random-seed 600 root root -
R /var/lib/systemd/linger
D /var/lib/systemd/linger 0700 root root -
d /etc/sysctl.d 0700 root root -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep this for now, and just have any units that need this directory early arrange to create it for themselves.

Right now corndog also needs this directory, and doesn't create it, and it's conceptually weird for corndog to depend on netdog (or related units) for this.

It's outside the scope of this change, but I'd like to explore a better mechanism, like etc.target, where we can collect all the units needed to populate /etc, and we order almost everything after that, to stop the madness of every early unit needing to depend on selinux-policy-files.service or run the risk subtle breakage.

prepare-sysctl.service seems like a similar dependency where more units may end up needing it over time.

Before=local-fs.target umount.target
After=dev-disk-by\x2dpartlabel-BOTTLEROCKET\x2dPRIVATE.device selinux-policy-files.service
Requires=dev-disk-by\x2dpartlabel-BOTTLEROCKET\x2dPRIVATE.device
Wants=selinux-policy-files.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not making this Required as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type=ext4
Options=defaults,nosuid,nodev,noexec,noatime,private,context=system_u:object_r:private_t:s0
Type=none
Options=rbind,rshared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to make this rshared? Why not just rprivate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking @bcressey 's comments - #638 (comment)

Most of these options aren't valid for bind mounts because they share the same filesystem superblock.
Generally it's useful to set up the recursive options so mounts are propagated in both locations

Create a direct mount point (/.bottlerocket) for the PRIVATE partition that
bypasses the dependency on the DATA partition. Previously, /var/lib/bottlerocket
was backed by the PRIVATE partition but required DATA partition availability
through the /var -> /local/var bind mount chain.

This change maintains backward compatibility by adding a bind mount from
/.bottlerocket to /var/lib/bottlerocket, allowing network services to
initialize in parallel with local-fs operations.

Signed-off-by: Yutong Sun <[email protected]>
- Update path constants to use /run/netdog instead of /var/lib/netdog.
- Remove OVERRIDE_NET_CONFIG_FILE as it's no longer needed.

Signed-off-by: Yutong Sun <[email protected]>
- Drop dependency on tmpfiles.
- Drop default dependencies and update generate-network-config.service with /.bottlerocket dependencies.
- Create /etc/sysctl.d with ExecStart entry for write-network-status.service.
- Drop default dependencies for write-network-status.service.
- Explicitly add Before=network-online.target for write-network-status.service.

Signed-off-by: Yutong Sun <[email protected]>
Fix race condition where network services attempt to use D-Bus functionality
before the D-Bus broker is fully initialized:

1. Add DefaultDependencies=no to dbus.socket to allow D-Bus to start earlier
   in the boot process
2. Make network-pre.target require dbus-broker.service to ensure D-Bus is
   ready before network services start
3. Disable PrivateTmp for dbus-broker.service to remove dependency on
   systemd-tmpfiles and local-fs.target

Signed-off-by: Yutong Sun <[email protected]>
Migrator depends on /var/lib/bottlerocket/, previously it was the mount point
to PRIVATE partition, now that /.bottlerocket is the mount point and /var/lib/bottlerocket
became the bind mount, the explicit dependency would be necessary.

Signed-off-by: Yutong Sun <[email protected]>
@ytsssun ytsssun force-pushed the boot-time-network-stack branch from 94a18a4 to 8c34907 Compare October 1, 2025 21:21
RefuseManualStart=true
RefuseManualStop=true
Before=early-boot-config.service
Before=early-boot-config.service network-online.target
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this explicit Before annotation, otherwise I get weird systemd ordering that write-network-status happens after network-online.target:

network-online.target @1.979s
└─systemd-networkd-wait-online.service @1.254s +722ms
  └─systemd-networkd.service @1.189s +61ms
    └─network-pre.target @1.186s
...

With this change, the write-network-status.service happens before the network-online.target.

network-online.target @1.788s
└─write-network-status.service @1.753s +33ms      <-- listed as dependency
  └─systemd-resolved.service @2.669s +217ms
    └─systemd-sysctl.service @753ms +22ms
...

@ytsssun
Copy link
Contributor Author

ytsssun commented Oct 1, 2025

Pushed change to address comments from @bcressey and @arnaldo2792

@ytsssun ytsssun requested a review from bcressey October 1, 2025 21:47
@ytsssun ytsssun requested a review from arnaldo2792 October 1, 2025 21:47
@bcressey
Copy link
Contributor

bcressey commented Oct 3, 2025

Nice!

@ytsssun ytsssun marked this pull request as ready for review October 3, 2025 21:03
DefaultDependencies=false
After=dbus.socket
# Ensure the dbus user is created before starting dbus-broker
After=dbus.socket systemd-sysusers.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will have to make the same change to whippet.service now that I merged #661 😅

@@ -1,9 +1,12 @@
[Unit]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the commit message only explains what was done but not why. There is another commit in the series where you used a list and in each item, you explained why you did what you did. You should follow the same convention (or even better, favor a narrative rather than a bullet list)

@@ -0,0 +1,10 @@
[Unit]
# Ensure D-Bus is fully initialized before network services start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing . at the end of this sentence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Network stack boot time optimization: decouple from DATA partition

3 participants