Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop generating more methods than necessary #293

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

paracycle
Copy link
Member

@paracycle paracycle commented Apr 23, 2021

Motivation

We used to treat methods that didn't have a source location the same as methods that explicitly were not defined in the current gem. That resulted in Tapioca creating more method definitions than necessary.

We would only skip method generation for a method if the constant it was on was an ignored type (i.e. a built-in type), so that we wouldn't keep redefining methods for built-in types. However, for all other types, especially types that come from other gems, we would just keep on generating all the methods regardless of if they were defined by this gem or not.

Moreover, the source location check was happening at the wrong location, before unwrapping the method signature. Thus, many methods with signatures would not be generated when the previous problem was fixed, since our code would see them as being defined in Sorbet runtime.

Implementation

The fix is to return a more fine-grained result from method_in_gem? which signals yes/no/don't-have-source-location. Based on that we can skip generating don't-have-source-location cases if they are for built-in types and totally ignore the methods that have a source location and are definitely not defined in the current gem.

Additionally, if we try to unwrap the method signature and we get an exception, that means the signature block raised an error. If we continue with the method as is, the source location checks would think the method definition does not belong in the gem (since the method is still wrapped), and we would thus skip the method generation. To avoid that, the signature_for method is now raising a custom exception to signal that exceptional case, so that we can at least continue generating "a" method definition.

Tests

Updated existing tests.

@paracycle paracycle requested a review from a team April 23, 2021 19:26
@paracycle paracycle marked this pull request as draft May 12, 2021 18:58
@Morriar Morriar self-assigned this Jan 25, 2022
@Morriar Morriar added the enhancement New feature or request label Jan 25, 2022
@Morriar Morriar added this to the Tapioca 1.0 milestone Feb 10, 2022
@Morriar Morriar removed this from the Tapioca 1.0 milestone Jan 18, 2023
@paracycle paracycle force-pushed the uk-method-generation-improvements branch from 593bfa7 to da53327 Compare August 9, 2024 16:47
@paracycle paracycle marked this pull request as ready for review August 9, 2024 16:51
@mutecipher
Copy link
Contributor

giphy

@paracycle paracycle force-pushed the uk-method-generation-improvements branch from da53327 to 0d40ba9 Compare August 9, 2024 17:07
@paracycle
Copy link
Member Author

paracycle commented Aug 9, 2024

Unfortunately, while the CI is green, this work is still not complete, since the implementation ends up removing too many methods from generated RBI files.

In general there are 2 ways in which "gem B" can add methods to a constant Foo from "gem A":

  1. "gem B" reopens Foo and adds a method to it:
    # gem A
    class Foo
      def foo
      end
    end
    
    # gem B
    class Foo
      def bar
      end
    end
    This PR correctly attributes Foo#foo to "gem A" and Foo#bar to "gem B", and they are generated in the respective gem's RBI file. So, this use-case works properly today against this PR.
  2. "gem B" does not reopen the Foo constant from "gem A" but defines methods on Foo in respond to some mechanism triggered by the loading of constant Foo. For example:
    # gem A
    class Foo
      extend ModuleFromB
    
      def foo
      end
    
      add_methods_to_me :bar
    end
    
    # gem B
    module ModuleFromB
      def add_methods_to_me(*names)
        names.each do |name|
          define_method(name) { 42 }
        end
      end
    end
    In this case, the method Foo#bar will exist on Foo at runtime, but its source location will point to the file in "gem B". Thus, this PR will filter that method from the RBI file of "gem A". But, the method will not be generated in "gem B"'s RBI file either, since the Foo constant is never visited in "gem B". Thus, Tapioca ends up missing the definition of Foo#bar altogether.

To do this properly, we need to attribute such dynamic method definitions to the file require that triggered them, just like how we did with mixin tracking.

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

@egiurleo
Copy link
Contributor

egiurleo commented Aug 9, 2024

I've taken a stab at fixing the problem above. I have a failing test demonstrating Ufuk's use case and some half-working code. Here's my WIP PR, I'll return to it next week.

@amomchilov
Copy link
Contributor

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

Does hooking method_added solve that? @paracycle

@paracycle
Copy link
Member Author

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

Does hooking method_added solve that? @paracycle

That is exactly what @egiurleo's PR in the comment above is doing, but in general I am afraid it will be brittle since you need one constant to forget to call super from their method_added implementation and you never get your one called.

Gemfile Outdated Show resolved Hide resolved
@paracycle paracycle force-pushed the uk-method-generation-improvements branch from 330b100 to 862f422 Compare August 16, 2024 23:03
Copy link
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!

lib/tapioca/gem/pipeline.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/reflection.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/reflection.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/reflection.rb Outdated Show resolved Hide resolved
lib/tapioca/gem/pipeline.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/reflection.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/reflection.rb Outdated Show resolved Hide resolved
lib/tapioca/gem/listeners/source_location.rb Outdated Show resolved Hide resolved
lib/tapioca/gem/pipeline.rb Outdated Show resolved Hide resolved
lib/tapioca/runtime/trackers/method_definition.rb Outdated Show resolved Hide resolved
@paracycle paracycle force-pushed the uk-method-generation-improvements branch from d2329a9 to 7b05810 Compare January 16, 2025 20:40
@paracycle paracycle force-pushed the uk-method-generation-improvements branch 2 times, most recently from ab997ca to bdb8849 Compare January 28, 2025 22:20
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Two questions:

  1. Did we test this in Core? Does it pass type checking afterwards?
  2. Do you know if there is a performance hit or memory increase for tracking the methods?

@paracycle
Copy link
Member Author

This is great. Two questions:

Thanks for the review

  1. Did we test this in Core? Does it pass type checking afterwards?

I triggered the build pipeline for testing against Core now.

  1. Do you know if there is a performance hit or memory increase for tracking the methods?

There will definitely be a performance and memory hit, but I have no idea how much. We do extra operations for every method definition and we store their locations, etc, so this is certainly not free.

@paracycle
Copy link
Member Author

Ok, tested this against Core. The build pipeline was giving a lot of errors. I debugged against Core and found the problem and pushed a fix on this branch.

I also did time bin/tapioca gem --all timing for both the version of Tapioca on Core and this branch:

The current Tapioca version on Core:

bin/tapioca gem --all  2803.99s user 89.11s system 380% cpu 12:40.88 total

This branch:

bin/tapioca gem --all  2826.87s user 93.45s system 392% cpu 12:23.26 total

So, it seems like we use about the same amount of time to generate all gem RBIs on Core.

I am not sure if the timing diff is real, but if this branch is indeed faster, then I assume that would be because we collect all method definitions once when loading all the gems, and then we always check against that lookup table for any method. That somehow might be faster than looking up the source location for each method as we come across it.

paracycle and others added 10 commits January 30, 2025 22:36
We used to treat methods that didn't have a source location the same as
methods that explicitly were not defined in the current gem. That
resulted in Tapioca creating more method definitions than necessary.

We would only skip method generation for a method if the constant it was
on was an ignored type (i.e. a built-in type), so that we wouldn't keep
redefining methods for built-in types. However, for all other types,
especially types that come from other gems, we would just keep on
generating all the methods regardless of if they were defined by this
gem or not.

Moreover, the source location check was happening at the wrong location,
before unwrapping the method signature. Thus, many methods with
signatures would not be generated when the previous problem was fixed,
since our code would see them as being defined in Sorbet runtime.

The fix is to return a more fine-grained result from `method_in_gem?`
which signals yes/no/don't-have-source-location. Based on that we can
skip generating don't-have-source-location cases if they are for
built-in types and totally ignore the methods that have a source
location and are definitely not defined in the current gem.

Additionally, if we try to unwrap the method signature and we get an
exception, that means the signature block raised an error. If we
continue with the method as is, the source location checks would think
the method definition does not belong in the gem (since the method is
still wrapped), and we would thus skip the method generation. To avoid
that, the `signature_for` method is now raising a custom exception to
signal that exceptional case, so that we can at least continue
generating "a" method definition.
This commit adds a `MethodDefinition` tracker which properly attributes methods definitions to the gem that triggers them. This is useful for tracking which gem is responsible for a method definition, and is used in the `Tapioca::Gem::Pipeline` to check if a method should be generated for the given gem or not.
By tracking line numbers for method definitions, we can generate more accurate source locations for methods in RBI files. This will help with tools that rely on source locations, such as Ruby LSP.
When we are dealing with C-extension/native methods, we have been relying on the fact that `source_location` method returning `nil` for those methods. When we switched to using the method definition tracker, we ended up breaking this and started generating method definitions for all the stdlib gem constant methods.

This commit tries to catch the case the same case by looking at the previous frame and seeing if it is a `require`, which means that something was required but there are no more Ruby frames, which implies that the attribution should be to a C-extension/native method.
This commit does a few things:

1. It adds the ability to track method definitions for singleton methods.
2. Fixes a problem with aliased methods. Aliased methods point to the same `Method` object, so we can't use the method object the a key in a hash. But even indexing method definitions keyed by the owner class (the `self` at the time of `method_added`) doesn't work; it fails in an interesting way. In that case, the method object would be the same, but the method definition would be different. For example, `alias_method :foo, :bar` would cause the lookup of `foo` method to return the definition of `bar` method. That would cause us to fail generating `foo` method for a gem, if `bar` was defined in another gem.
3. We were storing a single definition for a method, but that proved to be not enough. For example, if a method was defined in a gem and then redefined in application code, we would only store the definition from the application location, and never be able to attribute the method to the gem it was originally defined in. That lead us to not generate that method for the gem. Instead, we now store all definitions of a method in `MethodDefinition` tracker. When looking up a method definition, we now look up all definitions. If there are no definitions, that means the method is probably a C-method, so we return `nil`. If there are definitions, we see if we can find one that matches the gem location. If not, we check for `(eval)` locations, in which case, we include the method but can't return a source location. If all fails, we return `false` to signal that we couldn't find a definition. If we are able to find a definition that matches the gem location, we return that definition.

This new logic is used both in `Tapioca::Gem::Listeners::Methods` and `Tapioca::Gem::Listeners::SourceLocation` listeners. The former uses it to check if the method should be included in the gem RBI, and the latter uses it to add more correct source location information in the comments.
By checking the same flag that Sorbet runtime uses to skip method_added handling, we can avoid processing method redefinitions that are triggered by Sorbet runtime.
@paracycle paracycle force-pushed the uk-method-generation-improvements branch from a6c6d35 to d7f0155 Compare January 30, 2025 20:47
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good news. It might use more memory, but with the duration being the same we should be good

@paracycle paracycle force-pushed the uk-method-generation-improvements branch from d7f0155 to d745655 Compare January 30, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants