Skip to content

Conversation

@liam923
Copy link
Contributor

@liam923 liam923 commented Sep 4, 2025

This PR fixes a bug reported by a few users. In the Jane Street internal codebase, there's a directory in which short-paths was frequently taking extremely long amounts of time (ex: 40 mins), making Merlin unusable. It turns out this was due to an exponential algorithm in short-paths. There is a comment in the PR diff that explains the issue and the fix.

This is a somewhat low-information PR, in that I have a very poor understanding of Merlin's short-paths algorithm. As a result, I am not confident that this fix is a) the best way of achieving this bugfix and b) sufficient, in that there isn't a similar issue in another related part of the algorithm. But I am confident that this PR fixes the issue at hand without breaking any existing behavior, and I don't want to sink too much time into this issue, so I don't want to let perfect be the enemy of good.

@lpw25
Copy link
Contributor

lpw25 commented Sep 5, 2025

I'm not particularly keen on this solution as it is very ad-hoc. I'd rather try to avoid the call to Graph.find_module within Module.find_module when it isn't useful. I think that a simple way to do this is to add Module.is_defined and Graph.is_module_defined that look something like:

let is_defined t =
    match t with
    | Definition _ -> true
    | Declaration _ -> false

and

let is_module_defined t id =
  match Ident_map.find id t.modules with
  | exception Not_found -> false
  | md -> Module.is_defined md

then in Module.normalize you can change the declaration case to something like:

    | Definition { sort = Sort.Declared id } | Declaration { id; _ } ->
        if not (Graph.is_defined root id) then normalize_loop root t
        else begin
          match Graph.find_type root (raw_path t) with
          | exception Not_found -> normalize_loop root t
          | t -> normalize_loop root t
        end

along with similar changes for types, class types and module types.

Some context to help understand the fix: short-paths is able to deal with modules first being declared without any details and then later having an actual definition added. The normalize function is trying to detect the case where the Module.t value has come from a declaration that has since been defined and so should normalize to the new value instead. It was doing that by just re-looking up the path, which will find the definition if there is one. Instead it can directly check whether the id of the declaration has now been replaced by a proper definition.

@liam923
Copy link
Contributor Author

liam923 commented Sep 5, 2025

I'm still having difficulty coming up with a small repro of the original issue that can go in a regression test, but I've verified that these changes fix the original issue.

@liam923 liam923 merged commit c90da2e into main Sep 17, 2025
1 check passed
@liam923 liam923 deleted the exponential-short-paths branch September 17, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants