Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary patching rust hot-reloading, sub-second rebuilds, independent server/client hot-reload #3797

Draft
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

jkelleyrtp
Copy link
Member

@jkelleyrtp jkelleyrtp commented Feb 25, 2025

Inlines the work from https://github.com/jkelleyrtp/ipbp to bring pure rust hot-reloading to Dioxus.

fast_reload.mp4

The approach we're taking works across all platforms though each will require some bespoke logic. The object crate is thankfully generic over mac/win/linux/wasm, though we need to handle system linkers differently.

This change also enables dx to operate as a faster linker allowing sub-second (in many cases, sub 200ms) incremental rebuilds.

Todo:

  • Add logic to the devtools types and generic integration
  • Wire up desktop
  • Rework existing hot-reload engine to be properly compatible
  • Remove old binaries
  • Wire up iOS
  • Wire up macOS
  • Wire up Android
  • Wire up Linux
  • Wire up wasm
  • Wire up windows
  • Wire up server
  • clean up app/server impl (support more than 2 exe-s in prep for dioxus.json)
  • fix integration with old hot-reload engine

Notes:

This unfortunately brings a very large refactor to the build system since we need to persist app bundles while allowing new builds to be "merged" into them. I ended up flattening BuildRequest + Bundle together and Runner + Builder together since we need knowledge of previous bundles and currently running processes to get patching to work properly.

@jkelleyrtp jkelleyrtp changed the title Binary patching rust hot-reloading Binary patching rust hot-reloading, sub-second rebuilds Feb 25, 2025
@jkelleyrtp jkelleyrtp changed the title Binary patching rust hot-reloading, sub-second rebuilds Binary patching rust hot-reloading, sub-second rebuilds, independent server/client hot-reload Feb 25, 2025
@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 18, 2025

progress update

I've migrated everything over from ipbp so now anyone should be able to run the demos in macOS/iOS. Going to add linux + android support next.

I've been tinkering with the syntax for subsecond a bit and am generally happy now with the API. You can wrap any closure with ::call() and that closure is now "hot":

pub fn launch() {
    loop {
        std::thread::sleep(std::time::Duration::from_secs(1));
        subsecond::call(|| tick());
    }
}

fn tick() {
    println!("boom!");
}

If you need more granular support over "hot" functions then you'll want to use ::current(closure) which gives you a HotFn with extra flags and methods for running a callback. It also lets you run closures which are FnOnce since ::call() currently does not.

::call() taking an FnMut is meant to provide an "unwind" point that our assembly-diffing logic can bounce up to by emitting panics. This is meant to support cases where you might add a field to a struct and need to "rebuild" the app from a higher checkpoint (aka re-instancing).

For example, a TUI app with some state:

struct App {
    should_exit: bool,
    temperatures: Vec<u8>,
}

might implement a "run" method that calls subsecond:

    fn run(&mut self, terminal: &mut DefaultTerminal) -> Result<()> {
        while !self.should_exit {
            subsecond::call(|| self.tick(terminal))?;
        }
        Ok(())
    }

If the struct's size/layout change, then we want to rebuild the app from scratch. Alternatively, we could somehow migrate it, which is out of scope for this PR, but implementations can be found in libraries like dexterous. We might end up taking an approach that unwinds the stack to the app's constructor and then copies it to a new size/layout, merging the new fields in. TODO on what this should look like.

Here's a vide of the tui_demo in the subsecond_harness crate:

subsecond-tui.mp4

runtime integration

Originally I wanted to use LLDB to drive the patching system - and we still might need to for proper "patching" - but I ran into a bunch of segfaults and LLDB crashes when we sigstopped the program in the middle of malloc/demalloc. Apparently there's a large list of things you cannot do when a program is sigstopped and using allocators is one such thing. We could look into using a dedicated bump allocator and continue using lldb, but for now I have an adapter build on websockets. We might end up migrating to a shared-memory system such that the HOST and DUT can share the patch table freely. The challenge with these approaches is that they're not very portable and websockets seem to be available literally everywhere.

zero-link / thinlink

One cool thing spun out of this work is "zerolink" (thinlink maybe?): our new approach for drastically speeding up rust compile times by automatically using dynamic linking. This is super useful for tests, benchmarks, and general development since we can automatically split your workspace crates from your "true" dependencies and skip linking your dependencies on every build.

This means you can turn up opt levels and leave debug symbols (two things that generally slow down builds) which incurs a one-time cost and then continuously dynamically link your incremental object files against the dependencies dylib. Most OSes support a dyld_cache equivalent which keeps your dependencies.dylib memory mapped and cached between invocations which also greatly speed up launch times.

ZeroLink isn't really an "incremental linker" per se, but it behaves like one thanks to Rust's incremental compile system. In spirit it's very similar to marking a crate as a dylib crate in your crate graph (see bevy/dynamic) but it doesn't require you to change any of your crates and it supports WASM.

dx is standalone

I wanted to use zerolink with non-dioxus projects, so this PR also makes dx a standalone rust runner. You can dx run your project and dioxus does not need to be part of your crate graph for it to work. This lets us bootstrap dx by running dx with itself and making it easy to update the TUI without fully rebuilding the CLI.

wasm work

WASM does not support dynamic linking so we need to mess with the binaries ourselves. Fortunately this is as simple as linking the deps together to a relocatable object file, lifting the symbols into the export table, and recording the element segments.

When the patches load they need two things

  • addresses within the ifunc table for ifuncs
  • imports from the main module

unfortunately the wasm-bindgen pass runs ::gc so I don't think there's any cool combination of flags we can use against wasm-ld to do this for us automatically. However, all the work we put into wasm_split really comes in handy.

What's left

There's three avenues of work left here:

  • Propagating the change graph through the HotFn points
  • More platform support (windows, wasm, server_fn)
  • Bugs (better handling of statics, destructors, renaming symbols, changing signatures, and dioxus integration like Global)

I expect Windows + WASM to take the longest to get proper support and will prioritize that over propagating the change graph. Dioxus can function properly without a sophisticated change graph, but other libraries will want the richer detail available.

@DrewRidley
Copy link

Awesome work here! I might recommend adding .arg("-Zcodegen-backend=cranelift") as an optional user-facing argument when hot reloading.

I found on my M3 Pro Macbook it brings down the average times from ~600ms to ~300ms. The backend ships as a cargo component now so it should be a drop in replacement for desktop or possibly mobile platforms.

@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 19, 2025

Awesome work here! I might recommend adding .arg("-Zcodegen-backend=cranelift") as an optional user-facing argument when hot reloading.

I found on my M3 Pro Macbook it brings down the average times from ~600ms to ~300ms. The backend ships as a cargo component now so it should be a drop in replacement for desktop or possibly mobile platforms.

Wow that's incredible!

On my M1 I've been getting around 900ms on the dioxus harness with default dev profile and then 500-600 with the subsecond-dev profile:

[profile.subsecond-dev]
inherits = "dev"
debug = 0
strip = "debuginfo"

I'll add the cranelift backend option and then report back. In the interim you can check to see if that profile speeds up your cranelift builds at all.

I did some profiling of rustc and about 100-300ms is spent copying incremental artifacts on disk. That's pretty substantial given the whole process is like 500ms. Hopefully this is improved here:

rust-lang/rust#128320

I would like to see that time drop to 0ms at some point and then we'd basically have "blink and you miss it" hotpatching.

@DrewRidley
Copy link

DrewRidley commented Mar 19, 2025

I tried the profile and with or without it, its consistently ~300ms on my Mac. When doing self-profile I noticed that register allocation takes a huge portion of the total time spent

I discovered this (https://docs.wasmtime.dev/api/cranelift_codegen/settings/enum.RegallocAlgorithm.html) which might help if its been backported to codegen_clif as a flag or option.

That seemed to be a fluke in testing and actually the remaining time is mostly incremental cache related file IO. Not sure how much can be done about that.

Regardless, this is super exciting work, let me know if there's any other way I can help.

@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 22, 2025

I switched to a slightly modified approach (lower level, faster, more reliable, more complex).

This is implemented to work around a number of very challenging android issues

  • pointer tagging
  • mte
  • linker namespaces
  • read/write permissions

Since this is more flexible it should work across linux and windows (android and linux are the same). Last target is wasm.

Here's the android demo:

hotpatch-android.mp4

iOS:

ios-binarypatch.mp4

@DogeDark
Copy link
Member

DogeDark commented Mar 26, 2025

I'm trying this out on Linux Mint rustc 1.85.1 and am encountering a few issues:

  • Had to comment out tests at the end of packages/subsecond/subsecond-cli-support/src/wasm.rs since I don't have the files that are include_bytes!

Desktop app had these issues:

  • asset files weren't included
  • asset! macros panic on rebuilds (hot patches?).
  • When it successfully builds, it writes the patch and outputs symbol not found _Unwind_Resume and symbol not found memcpy. and build fails build panicked, not yet implemented

Trying wasm and first build fails rust-lld with note: rust-lld: error: unknown argument: -Wl,--whole-archive,-Wl,--no-gc-sections,-Wl,--export-all and a bunch of warnings: warning x archive member y is neither wasm object file nor llvm bitcode

Edit: looks like wasm might not have been ready yet.

we were supposedly 'leaking' the devtools message but not actually.

that is fixed.

also sometimes we placed table bases on overlapping edges causing segfaults. that's fixed
@jkelleyrtp
Copy link
Member Author

jkelleyrtp commented Mar 27, 2025

there were some issues with wasm that have now been fixed.

It should be possible to run the mini-cli I built for the harness with

RUST_LOG=info cargo run --package subsecond-cli -- --target wasm32-unknown-unknown

currently we aren't running the manganis step on the patch to register new assets. we need to do that. I haven't tried it with asset!() at all so I guess there might be other panics too, potentially related to hashes changing.

Also the todo!() on patching on x86 is expected to be not working right now since I've only tinkered with ld64 to know what args to pass to it:

let res = match target.architecture {
// usually just ld64 - uses your `cc`
target_lexicon::Architecture::Aarch64(_) => {
// todo: we should throw out symbols that we don't need and/or assemble them manually
Command::new("cc")
.args(object_files)
.arg("-Wl,-dylib")
// .arg("-Wl,-undefined,dynamic_lookup")
// .arg("-Wl,-export_dynamic")
.arg("-arch")
.arg("arm64")
.arg("-o")
.arg(&output_location)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.output()
.await?
}
target_lexicon::Architecture::Wasm32 => {
let table_base = 2000 * (aslr_reference + 1);
let global_base = (((aslr_reference) * (65536 * 3)) + (2097152)) as i32;
tracing::info!(
"using aslr of table: {} and global: {}",
table_base,
global_base
);
Command::new(wasm_ld().await.unwrap())
.args(object_files)
.arg("--import-memory")
.arg("--import-table")
.arg("--growable-table")
.arg("--export")
.arg("main")
.arg("--export-all")
// .arg("--export=__heap_base")
// .arg("--export=__data_end")
// .arg("--allow-undefined")
// .arg("--unresolved-symbols=ignore-all")
// .arg("--relocatable")
// .arg("-z")
// .arg("stack-size=1048576")
.arg("--stack-first")
.arg("--allow-undefined")
.arg("--no-demangle")
.arg("--no-entry")
.arg("--emit-relocs")
.arg(format!("--table-base={}", table_base))
.arg(format!("--global-base={}", global_base))
.arg("-o")
.arg(&output_location)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.output()
.await?
}
_ => todo!(),
};

I'm assuming the error about memcpy is due to a similar reason (not passing linker flags for x86 at all):

Though I think some of these are actually fixed with dx itself though wasm isn't properly integrated yet.

There are some pretty bag bugs in wasm that seem to be squashed now. Our state-preservation engine is naive right now (invalidates every hotpatch call) so we need to wire it up to the object file diffing logic from here:

fn diff(&self) -> Result<ObjectDiffResult<'_>> {

I was hardcoding the wasm linker before which I shouldn't be doing anymore. The fact that those args are being rejected seems to be a mismatch in linker or us incorrectly generating the stub object file (probably due to a panic or early bail).

Here's a little video of wasm now with all its bugs fixed:

hotpatch-wasm-complete.mp4

@jkelleyrtp jkelleyrtp marked this pull request as ready for review March 29, 2025 01:19
@jkelleyrtp jkelleyrtp requested a review from a team as a code owner March 29, 2025 01:19
@jkelleyrtp jkelleyrtp marked this pull request as draft March 29, 2025 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants