-
Notifications
You must be signed in to change notification settings - Fork 76
Custom modeling for training #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
b131882
[WIP] modeling
michaelbenayoun 447cadd
[WIP] modeling
michaelbenayoun c4107ba
[WIP] modeling
michaelbenayoun 93ce12b
[WIP] from_pretrained
michaelbenayoun 7410117
[WIP] from_pretrained
michaelbenayoun eef5e0e
[WIP] from_pretrained
michaelbenayoun f67a31f
Incomplete styling
michaelbenayoun 53f6900
Support flash_attention_v2
michaelbenayoun 54819bc
Merge branch 'main' into custom_modeling_introduction
michaelbenayoun 3cd352c
Support for GQA QKV
michaelbenayoun bdc65e0
[WIP] test
michaelbenayoun f5d0214
[WIP] test
michaelbenayoun c137748
[WIP] test
michaelbenayoun f4f0d8d
WIP
michaelbenayoun 7fb574b
WIP
michaelbenayoun 20245a8
Refactor
michaelbenayoun a8b247c
[WIP] save_pretrained
michaelbenayoun 22974a0
[WIP] save_pretrained
michaelbenayoun 7d80a8c
Merge branch 'main' into custom_modeling_introduction
michaelbenayoun a329249
[WIP]
michaelbenayoun 6b486b4
Fix
michaelbenayoun 88ae7ea
Fix
michaelbenayoun 91119e7
Gradient checkpointing
michaelbenayoun 39e5002
Merge branch 'main' into custom_modeling_introduction
michaelbenayoun 8b55f4d
Styling
michaelbenayoun 8d057ab
[WIP] consolidate
michaelbenayoun 3557f3c
[WIP] consolidate
michaelbenayoun e8752c2
[WIP] consolidate
michaelbenayoun 5e78a7a
styling
michaelbenayoun 7bf94d6
Cleanup
michaelbenayoun bc31a51
Refactor
michaelbenayoun 3ce97cc
Cleanup
michaelbenayoun 8ac7420
Fix
michaelbenayoun 3e0f2d9
Merge branch 'main' into custom_modeling_introduction
michaelbenayoun 546bd14
Disable PP tests since it is broken
michaelbenayoun c38a87e
Fix import
michaelbenayoun 73e3e0d
Fixes
michaelbenayoun 52ea95e
Fixes
michaelbenayoun efc11c4
Fixes
michaelbenayoun b9e8dc6
Fixes
michaelbenayoun 574afb2
Fixes
michaelbenayoun 2936011
Fixes
michaelbenayoun 226b4c7
Fixes
michaelbenayoun abd42e0
Fixes
michaelbenayoun a631c14
Add independant Llama implementation from Transformers
michaelbenayoun 3dbc63d
Remove fake support for cache
michaelbenayoun b164f5d
Raising an error if intermediate size is not divisible by tp size
michaelbenayoun 5a84a5b
Add the CustomModule class to explicitly mark which submodules need t…
michaelbenayoun 2c05d46
Remove transformers code that we do not use
michaelbenayoun 218aa46
[WIP] from_pretrained in details
michaelbenayoun 549cec2
[WIP] from_pretrained in details
michaelbenayoun 8f40973
[WIP] from_pretrained in details
michaelbenayoun e05c477
from_pretrained done
michaelbenayoun 9f7eea3
Add comment explaining what transformation_utils.py is about
michaelbenayoun 3fae083
Add comment explaining what transformation_utils.py is about
michaelbenayoun aad7cad
Change sharding
michaelbenayoun 7b37f12
Remove sharding.py
michaelbenayoun 50df135
Restore sharding.py, this can be removed in the Granite PR
michaelbenayoun f2a8023
Combime parallel linear tests
michaelbenayoun 109571c
Test with bigger sequence length
michaelbenayoun a542978
Remove duplicate flash attention test
michaelbenayoun be5c2db
Add recompute causal mask option
michaelbenayoun a1fa3c0
Remove _tp_plan and _pp_plan
michaelbenayoun 0fd9f7c
Remove commented code
michaelbenayoun 10f82b4
Remove from_tf and from_flax artifacts
michaelbenayoun 7c45001
Fix comparison
michaelbenayoun 0d8abe6
Tiny changes
michaelbenayoun f567b75
Add overfitting test
michaelbenayoun edcd8f0
Fix tests using all NCs
michaelbenayoun 78123e7
Styling
michaelbenayoun 7f00b7d
Remove tests from the former approach
michaelbenayoun d92f62e
Styling
michaelbenayoun 6ef46db
Remove tests from the former approach
michaelbenayoun 1746f76
Remove tests from the former approach
michaelbenayoun File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with Granite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should since it is defined in
optimum/neuron/models/training
.The reason we skip the rest is because with custom modeling we do not need lazy loading or anything, so the optimizer is created with the proper parameters already.