Open
Conversation
da4e8f0 to
7734bdd
Compare
Contributor
Author
|
Looks like checks are hitting an unrelated type error: |
Collaborator
|
@MattGPT-ai this is due to a new mypy version and affected a deprecated class. I just fixed it in #3613. If you update this branch to current master the error should disappear. |
7734bdd to
c0f7e7d
Compare
Contributor
Author
|
Awesome that worked, checks passed! |
Contributor
Author
|
We have been using this change successfully in our fork for about a month now, it's been a major speed improvement especially when evaluation sets are large! |
…gather functions to distributed utils. This works by using a DistributedSampler to allocate samples across GPU processes, then aggregates all the predictions and losses from all processes before running the evaluation. Uses broadcast to ensure all processes return the same valid result.
c0f7e7d to
a89abbf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Flair now supports multi-GPU training, but not evaluation. This means that the work of n-1 GPUs is wasted during evaluation time, and this can dramatically reduce the benefit of multi-GPU training if your eval set is considerable in size. Even worse, I believe it can be slower than single GPU evaluation, as CPU portions of the evaluation code have to repeat n times, but are sharing the same CPU and memory resources.
This PR implements multi-GPU acceleration for
evaluatein theClassifier,TextRegressor, andTextPairRegressormodel types. It uses theDistributedSamplerto split the eval set between the GPUs, predictions are run, and the results of inference are aggregated between processes before the metrics are calculated in the main process and returned.