-
Notifications
You must be signed in to change notification settings - Fork 105
Add BestRQ pretraining #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@final | ||
@dataclass | ||
class BestRQOutput: | ||
"""Holds the output of a w2v-BERT model.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w2v-BERT -> BestRQ
@final | ||
@dataclass | ||
class BestRQLoss: | ||
"""Holds the loss of a w2v-BERT model.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w2v-BERT -> BestRQ
spatial_span_len: int = 10, | ||
max_spatial_mask_prob: float = 0.0, | ||
min_num_spatial_mask_spans: int = 2, | ||
mask_overlap_strategy: str = "no", # remove_masks (fs2 default), add_masks_jc (additive, v1), add_masks_mike (additive, v2), roll (Alex's suggestion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here is not useful for people outside of Arrival. Could you add more descriptions on what each strategy does?
spatial_mask_span_len: int = 10 | ||
"""The length of each spatial mask span that is applied over features.""" | ||
|
||
max_spatial_mask_prob: float = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add "Best-RQ does not have spatial masking during pre-training, and spatial masking is only used optionally during ASR fine-tuning to mimic SpecAugment."?
What does this PR do? Please describe:
This is a draft implementation of the BestRQ algorithm from https://arxiv.org/pdf/2202.01855
Fixes #{issue number}
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: