libssh2 issues with Ruby profilers -- would a workaround PR be acceptable?

Hello there :)

I work on [Datadog's Ruby profiler](https://github.com/datadog/dd-trace-rb/) and I arrived here while investigating a customer issue where they were using the rugged gem and it breaks [in combination with our profiler](https://github.com/DataDog/dd-trace-rb/issues/2721) due to a [libssh2 bug](https://github.com/libssh2/libssh2/issues/955).

The TL;DR is that libssh2 doesn't yet correctly handle the interruption of system calls caused by profilers that use unix signals, and thus having such a profiler running breaks network calls using rugged.

This issue is not specific to the Datadog Ruby profiler :sob:; the also great [stackprof](https://github.com/tmm1/stackprof) profiler gem is unfortunately affected by this issue:

```ruby
require 'rugged'
require 'stackprof'

puts "Cloning..."
creds = Rugged::Credentials::SshKey.new(username: 'git', publickey: '...', privatekey: '...')

StackProf.run(mode: :wall, out: 'tmp/stackprof-cpu-myapp.dump') do
  Rugged::Repository.clone_at('ssh://git@github.com/libgit2/rugged.git', '/tmp/some-directory', credentials: creds)
end

puts "Cloned!"
```

gets this output:

```
Cloning...
Traceback (most recent call last):
	3: from repro-rugged.rb:8:in `<main>'
	2: from repro-rugged.rb:8:in `run'
	1: from repro-rugged.rb:9:in `block in <main>'
repro-rugged.rb:9:in `clone_at': remote rejected authentication: Error waiting on socket (Rugged::SshError)
```

...but removing `StackProf` makes the clone work fine -- it's not actually an authentication error, it's the system call interruption at work.

**Ok so why am I double-reporting this issue when I've already reported it to the libssh2 developers as well?**

Since it's common to use a system libssh2 with rugged, even if the fix to libssh2 was released today, it'll take months/years to arrive on Linux distros.

Furthermore, it's hard to detect from Ruby code if rugged is linked with a broken libssh2, because while rugged provides a `Rugged.libgit2_version`, there's no corresponding API to probe the libssh2 version (at least I didn't find one in libgit2 directly or rugged).

Currently, when rugged is detected, the Datadog profiler needs to [fall back to an alternative code path that yields lower-quality data](https://github.com/DataDog/dd-trace-rb/pull/2741).
I would love for this to not be the case!

**Thus my question is**: Would you consider accepting a pull request modifying rugged calls that can trigger network operations (I think it's only clone/fetch/pull/push/submodule stuff?) with a method that temporarily disables signal handling (for `SIGPROF` and `SIGALARM`) during that call?

Something similar to:

```diff
 static VALUE rb_git_repo_clone_at(int argc, VALUE *argv, VALUE klass)
 {
        VALUE url, local_path, rb_options_hash;
        git_clone_options options = GIT_CLONE_OPTIONS_INIT;
        struct rugged_remote_cb_payload remote_payload = { Qnil, Qnil, Qnil, Qnil, Qnil, Qnil, Qnil, 0 };
        git_repository *repo;
        int error;
 
        rb_scan_args(argc, argv, "21", &url, &local_path, &rb_options_hash);
        Check_Type(url, T_STRING);
        FilePathValue(local_path);
 
        parse_clone_options(&options, rb_options_hash, &remote_payload);
 
+        block_profiling_signals();
        error = git_clone(&repo, StringValueCStr(url), StringValueCStr(local_path), &options);
+        unblock_profiling_signals();
 
        if (RTEST(remote_payload.exception))
                rb_jump_tag(remote_payload.exception);
        rugged_exception_check(error);
 
        return rugged_repo_new(klass, repo);
 }
```

This would make rugged work great under stackprof / the Datadog profiler and as a bonus we could detect the fixed version and avoid the [fall back to an alternative code path that yields lower-quality data](https://github.com/DataDog/dd-trace-rb/pull/2741) that we currently have.

Thoughts? I'm shamelessly tagging @tenderlove here since you're also a maintainer of stackprof ;)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libssh2 issues with Ruby profilers -- would a workaround PR be acceptable? #959

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

libssh2 issues with Ruby profilers -- would a workaround PR be acceptable? #959

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions