Skip to content

Conversation

@suratkhan
Copy link
Contributor

This PR solves #1559 by refactoring the Revs implementation to improve performance by:

  • Pre-building an index of Rev structs from commit trees.
  • Eliminating repeated checkout_tree calls.
  • Using binary search for efficient find_version lookups.
  • Supporting iteration over a cached, sorted list of versions.

@suratkhan suratkhan requested a review from smoelius as a code owner April 7, 2025 07:46
@CLAassistant
Copy link

CLAassistant commented Apr 7, 2025

CLA assistant check
All committers have signed the CLA.

@suratkhan
Copy link
Contributor Author

@smoelius, here’s a summary of the changes I made in this PR:

  • Refactored Revs to pre-index all Rev structs at initialization, reducing repeated repository traversal.
  • Replaced checkout_tree logic with direct blob access and parsing for Cargo.toml and rust-toolchain.
  • Improved version lookup speed by introducing binary search in find_version.
  • Implemented a proper RevIter for forward iteration over all versions.
  • Maintained existing functionality, output format, and test coverage.

@smoelius
Copy link
Collaborator

smoelius commented Apr 7, 2025

Thanks, @suratkhan! I'll do my best to review this this week.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments below. I think they may require significant changes, so I'm going to pause here and let you reflect on them.

I really appreciate you working on this.

@suratkhan
Copy link
Contributor Author

hello @smoelius, I have applied the things you mentioned in the comments. Here is a summary of what I did:

  1. Refactored revs.rs:
    • Replaced the full commit history scan and upfront version indexing with a "lazy-loading" approach. Commits are loaded in batches as needed.
    • Implemented binary search over commits (instead of versions) to efficiently find the revision corresponding to a specific --rust-version.
    • Added a dedicated find_latest_rev function that scans backwards from HEAD for the latest version (used when --rust-version is omitted).
    • Removed the old build_version_index logic and the RevIter struct.
  2. Updated mod.rs:
    • Modified upgrade_package to call the new revs.find_version() and revs.find_latest_rev() methods.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've put a lot of work into this! Thank you for that.

I hope you can forgive me, I'm going to have to look at this a few times, just because of its complexity.

Thanks a lot for working on this!

rust_toolchain_backup
.disable()
.with_context(|| "Could not disable `Cargo.toml` backup")?;
.with_context(|| "Could not disable `rust-toolchain` backup")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 Would you mind creating a separate PR with this fix?

}
} else {
None // Cargo.toml not found
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rewrite this using if_chain?

Here is an example:

if_chain! {
if let Some(prev_rev) = prev_rev;
if prev_rev.version != curr_rev.version;
then {
self.curr_rev = Some(curr_rev);
return Ok(Some(prev_rev));
}
}

"nightly".to_string() // Blob not found, fallback
}
} else {
"nightly".to_string() // rust-toolchain not found, fallback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your reason for falling back to nightly? My inclination is to fail or return an error if a toolchain channel cannot be found.

};

// Only proceed if version was found
if let Some(version_str) = version {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better to return early.

.context("Failed to create revision walker")?;
revwalk.push_head().context("Failed to push HEAD")?;
revwalk
.set_sorting(Sort::TIME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.set_sorting(Sort::TIME)
.set_sorting(Sort::TOPOLOGICAL)

}
})()
.transpose()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than implement the binary search manually, would it be possible to use Vec::binary_search or one of its variants? https://doc.rust-lang.org/std/vec/struct.Vec.html#method.binary_search

channel: "nightly-2022-06-30".to_owned(),
oid: Oid::from_str("0cb0f7636851f9fcc57085cf80197a2ef6db098f").unwrap(),
},
// Add more examples if needed, ensure OIDs are correct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine expanding the list of Revs used for testing, but could we keep the original ones?


// The found version might be *newer* than the example if the example OID
// is not the *exact* commit the version bumped. The search finds the
// commit where the target version *became active*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a bug in the existing EXAMPLES?

})
.unwrap()
.unwrap();
assert_eq!(rev, *example);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than scrap this test, could it just be rewritten to use find_version?

let result = revs.find_version(ancient_version).unwrap();
// Depending on history, this might find the oldest known version or None if history starts
// later. If it finds *a* version, it should be the oldest one available.
if let Some(rev) = result {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a test that finds the earliest commit find_version possibly can!

But this check is kind of weird.

From what I can tell, the earliest commit in the rust-clippy repository is this one: rust-lang/rust-clippy@3722627

This appears to be the earliest commit with a rust-toolchain file: rust-lang/rust-clippy@40151d9

And this appears to be the earliest commit with a rust-toolchain file with a modern format, e.g., a channel: rust-lang/rust-clippy@f03edfd

Would it be possible to have the test find one of those commits, or some commit close thereto, and handle it appropriately?

@suratkhan
Copy link
Contributor Author

Hi @smoelius, good to see you again! I hope you're doing well.

Thanks for the detailed review! I worked on this about two weeks ago, so I'll need to take another look to refresh my memory on what I did and why. Hopefully, I remember everything. I'll try my best to address the feedback and get things done within this week.

In the meantime, here (#1606) is the PR that fixes the error messages.

@suratkhan suratkhan force-pushed the binary-search-version-revs branch from 6ce58ad to 94d9cd7 Compare May 20, 2025 06:31
@suratkhan
Copy link
Contributor Author

Hi @smoelius,

Sorry I’ve been inactive for the last couple of weeks—I was on vacation. I’ve just started back this week and made the changes you requested. After rebasing onto master and committing, I’m now seeing some new CI failures. Could you take a look and let me know whether they’re related to my changes, and if so how I should address them?

Thanks!

@smoelius
Copy link
Collaborator

I’m now seeing some new CI failures. Could you take a look and let me know whether they’re related to my changes, and if so how I should address them?

Sorry, @suratkhan. The answer is not immediately obvious to me. I will look at this more closely later on today.

} else {
return Ok(None);
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is here. It looks like this code is reading the entire rust-toolchain file and treating it as the toolchain channel. I think rust-toolchain files used to work that way, but no longer. I recommend using the toolchain_channel function to get the toolchain channel.

@suratkhan
Copy link
Contributor Author

Hi @smoelius,

I tried to address the issue as you mentioned, but unfortunately it didn’t work out. I'm not very familiar with the usage of the toolchain_channel function. When you have some time, could you please provide a bit more detail or guidance on how to use it?

Thank you!

@smoelius
Copy link
Collaborator

Sorry for not being more specific. This is the function I was referring to:

pub fn toolchain_channel(path: &Path) -> Result<String> {

You call it with the path of a rust-toolchain or rust-toolchain.toml file, and it extracts the channel field.

Here are two examples where it is used:

@suratkhan suratkhan force-pushed the binary-search-version-revs branch from 5b2c9a7 to 3ab708c Compare May 21, 2025 13:40
@smoelius
Copy link
Collaborator

Hi, @suratkhan. Are you still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants