Skip to content

libssh2 issues with Ruby profilers -- would a workaround PR be acceptable? #959

Open
@ivoanjo

Description

@ivoanjo

Hello there :)

I work on Datadog's Ruby profiler and I arrived here while investigating a customer issue where they were using the rugged gem and it breaks in combination with our profiler due to a libssh2 bug.

The TL;DR is that libssh2 doesn't yet correctly handle the interruption of system calls caused by profilers that use unix signals, and thus having such a profiler running breaks network calls using rugged.

This issue is not specific to the Datadog Ruby profiler 😭; the also great stackprof profiler gem is unfortunately affected by this issue:

require 'rugged'
require 'stackprof'

puts "Cloning..."
creds = Rugged::Credentials::SshKey.new(username: 'git', publickey: '...', privatekey: '...')

StackProf.run(mode: :wall, out: 'tmp/stackprof-cpu-myapp.dump') do
  Rugged::Repository.clone_at('ssh://[email protected]/libgit2/rugged.git', '/tmp/some-directory', credentials: creds)
end

puts "Cloned!"

gets this output:

Cloning...
Traceback (most recent call last):
	3: from repro-rugged.rb:8:in `<main>'
	2: from repro-rugged.rb:8:in `run'
	1: from repro-rugged.rb:9:in `block in <main>'
repro-rugged.rb:9:in `clone_at': remote rejected authentication: Error waiting on socket (Rugged::SshError)

...but removing StackProf makes the clone work fine -- it's not actually an authentication error, it's the system call interruption at work.

Ok so why am I double-reporting this issue when I've already reported it to the libssh2 developers as well?

Since it's common to use a system libssh2 with rugged, even if the fix to libssh2 was released today, it'll take months/years to arrive on Linux distros.

Furthermore, it's hard to detect from Ruby code if rugged is linked with a broken libssh2, because while rugged provides a Rugged.libgit2_version, there's no corresponding API to probe the libssh2 version (at least I didn't find one in libgit2 directly or rugged).

Currently, when rugged is detected, the Datadog profiler needs to fall back to an alternative code path that yields lower-quality data.
I would love for this to not be the case!

Thus my question is: Would you consider accepting a pull request modifying rugged calls that can trigger network operations (I think it's only clone/fetch/pull/push/submodule stuff?) with a method that temporarily disables signal handling (for SIGPROF and SIGALARM) during that call?

Something similar to:

 static VALUE rb_git_repo_clone_at(int argc, VALUE *argv, VALUE klass)
 {
        VALUE url, local_path, rb_options_hash;
        git_clone_options options = GIT_CLONE_OPTIONS_INIT;
        struct rugged_remote_cb_payload remote_payload = { Qnil, Qnil, Qnil, Qnil, Qnil, Qnil, Qnil, 0 };
        git_repository *repo;
        int error;
 
        rb_scan_args(argc, argv, "21", &url, &local_path, &rb_options_hash);
        Check_Type(url, T_STRING);
        FilePathValue(local_path);
 
        parse_clone_options(&options, rb_options_hash, &remote_payload);
 
+        block_profiling_signals();
        error = git_clone(&repo, StringValueCStr(url), StringValueCStr(local_path), &options);
+        unblock_profiling_signals();
 
        if (RTEST(remote_payload.exception))
                rb_jump_tag(remote_payload.exception);
        rugged_exception_check(error);
 
        return rugged_repo_new(klass, repo);
 }

This would make rugged work great under stackprof / the Datadog profiler and as a bonus we could detect the fixed version and avoid the fall back to an alternative code path that yields lower-quality data that we currently have.

Thoughts? I'm shamelessly tagging @tenderlove here since you're also a maintainer of stackprof ;)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions