rbspy is a little complicated. I want other people to be able to contribute to it easily, so here is an architecture document to help you understand how it works.
Here’s what happens when you run rbspy snapshot --pid $PID. This is the simplest subcommand (it takes a
PID and gets you the current stack trace from that PID), and if you understand how snapshot works
you can relatively easily understand how the rest of the rbspy subcommands work as well.
The implementation of the snapshot function in main.rs is really simple. The goal of this
document is to explain how that code works behind the scenes.
let snap = recorder::snapshot(pid, lock_process, force_version)?;
println!("{}", snap);Our first goal is to create a struct (RubySpy) which we can call .get_stack_trace() on to get a
stack trace. This struct contains a PID, a function, and the address in the target process of the
current thread. The initialization code is somewhat complicated but has a simple interface: you give
it a PID, and it returns a struct that you can call .get_stack_trace() on:
let spy = RubySpy::new(pid, None)?;
let lock_process = false;
spy.get_stack_trace(lock_process)Here's what happens when you call RubySpy::new(pid, None).
Step 1: Find the Ruby version of the process. The code to do this is in a function called
get_ruby_version.
Step 2: Find the address of the ruby_current_thread global variable. This address is the
starting point for getting a stack trace from our Ruby process -- we start there every time. How we do
this depends on 2 things -- whether the Ruby process we’re profiling has symbols, and the Ruby
version (in 2.5.0+ there are some small differences).
If there are symbols, we find the address of the current thread using the symbol table.
(current_thread_address_location_symbol_table function). This is pretty straightforward. We look
up ruby_current_thread or ruby_current_execution_context_ptr depending on the Ruby version.
If there aren’t symbols, instead we use a heuristic
(current_thread_address_location_search_bss) where we search through the .bss section of our
binary’s memory for something that plausibly looks like the address of the current thread. This
assumes that the address we want is in the .bss section somewhere. How this works:
- Find the address of the
.bsssection and read it from memory - Cast the
.bsssection to an array ofusize(so an array of addresses). - Iterate through that array and for every address run the
is_maybe_threadfunction on that address.is_maybe_threadis a Ruby-version-specific function (we compile a different version of this function for every Ruby version). We'll explain this later. - Return an address if
is_maybe_threadreturns true for any of them. Otherwise abort.
Step 3: Get the right stack_trace function. We compile 30+ different functions to get
stack_traces (will explain this later). The code to decide which function to use is basically a huge
switch statement (see supported_ruby_versions.rs), depending on the Ruby version.
pub fn get(v: &str) -> Result<RubyVersion> {
match v {
...
"3.3.0" => Ok(RubyVersion {
semver_version: Version::new(3, 3, 0),
get_execution_context_fn: super::ruby_version::ruby_3_3_0::get_execution_context,
get_stack_trace_fn: super::ruby_version::ruby_3_3_0::get_stack_trace,
is_maybe_thread_fn: super::ruby_version::ruby_3_3_0::is_maybe_thread,
}),
...
}
}Step 4: Return the RubySpy struct.
Now we're done! We return our RubySpy struct.
Once we've initialized, all that remains is calling the get_stack_trace function. How does that function
work?
Like we said before -- we compile a different version of the code to get stack traces for every Ruby version. This is because every Ruby version has slightly different struct layouts.
The Ruby structs are defined in a ruby-bindings crate. All the code in that crate is autogenerated
by bindgen in xtask/src/bindgen.rs.
These functions are defined through a bunch of macros (4 different macros, for different ranges of
Ruby versions) which implement get_stack_trace for every Ruby version. Each one uses the right
Ruby.
There's a lot of code in ruby_version.rs but this is the core of how it works. First, it defines a
$ruby_version module and inside that module uses bindings::$ruby_version which includes all the
required struct definitions for that Ruby version.
Then it includes more macros which together make up the body of that module. This is because
some functions are the same across all Ruby versions (like get_cfps) and some are different
(like get_stack_frame which changes frequently because the way Ruby organizes that code changes a
lot).
macro_rules! ruby_version_v_2_0_to_2_2(
($ruby_version:ident) => (
pub mod $ruby_version {
use bindings::$ruby_version::*;
...
get_stack_trace!(rb_thread_struct);
get_execution_context_from_thread!(rb_thread_struct);
rstring_as_array_1_9_1!();
get_ruby_string_1_9_1!();
get_cfps!();
get_pos!(rb_iseq_struct);
get_lineno_2_0_0!();
get_stack_frame_2_0_0!();
stack_field_1_9_0!();
get_thread_id_1_9_0!();
get_cfunc_name_unsupported!();
}Several of rbspy's core functions, such as interpreting ruby strings and identifying C functions, were ported directly from gdb scripts in the official ruby repository or other community repositories.