-
Notifications
You must be signed in to change notification settings - Fork 62
Modified OpenELM.py, mincpm.py, gpt2.py, gpt_bigcode.py, internlm2.py code to make it work with mypy #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ParamThakkar123
wants to merge
24
commits into
ml-explore:main
Choose a base branch
from
ParamThakkar123:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Modified OpenELM.py, mincpm.py, gpt2.py, gpt_bigcode.py, internlm2.py code to make it work with mypy #61
Changes from 19 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
e35846a
Made OpenELM mypy compatible
ParamThakkar123 d3cc4a6
Made gpt2.py mypy compatible
ParamThakkar123 d4771ff
Add gpt_bigcode, internlm2
ParamThakkar123 030e46f
Added minicpm.py
ParamThakkar123 660b76a
Updates
ParamThakkar123 e3f7d44
Added fixes to llama
ParamThakkar123 c1f7ee4
Updates to server.py and trainer.py
ParamThakkar123 3eccd0c
Formatted
ParamThakkar123 78b77ba
Modifief merge and utils
ParamThakkar123 81f52d1
Fixed utils.py
ParamThakkar123 de4724e
Made more files mypy compatible
ParamThakkar123 f64bd6c
Updates
ParamThakkar123 751b68a
Further updates
ParamThakkar123 ffd840e
Fixed isort errors
ParamThakkar123 c45e119
All changes made
ParamThakkar123 af3acc2
reformatted
ParamThakkar123 325e571
Merge branch 'main' into main
ParamThakkar123 3ede87c
Fixes
ParamThakkar123 979fce8
Merge branch 'main' of https://github.com/ParamThakkar123/mlx-lm
ParamThakkar123 ad9d5eb
resolved merge conflicts
ParamThakkar123 114368c
Update mlx_lm/server.py
ParamThakkar123 61bc924
Updated code as per requested changes
ParamThakkar123 252ea11
Fixed merge conflicts
ParamThakkar123 d4df424
Formats
ParamThakkar123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,7 +38,7 @@ class ModelArgs(BaseModelArgs): | |
max_position_embeddings: int = 2048 | ||
rms_norm_eps: float = 1e-6 | ||
rope_theta: float = 10000.0 | ||
rope_scaling: Dict = None | ||
rope_scaling: Dict[Any, Any] = {} | ||
attention_bias: bool = False | ||
|
||
|
||
|
@@ -189,6 +189,7 @@ def __init__(self, config: ModelArgs): | |
] | ||
if key in self.config.rope_scaling | ||
} | ||
|
||
self.rope = DeepseekV2YarnRotaryEmbedding( | ||
dim=self.qk_rope_head_dim, | ||
max_position_embeddings=self.max_position_embeddings, | ||
|
@@ -197,54 +198,13 @@ def __init__(self, config: ModelArgs): | |
**rope_kwargs, | ||
) | ||
|
||
def __call__( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's happening here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. How it got deleted. Wasn't intentional. Probably while fixing merge conflicts maybe? Just guessing |
||
self, | ||
x: mx.array, | ||
mask: Optional[mx.array] = None, | ||
cache: Optional[Any] = None, | ||
) -> mx.array: | ||
B, L, D = x.shape | ||
|
||
if self.q_lora_rank is None: | ||
q = self.q_proj(x) | ||
else: | ||
q = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(x))) | ||
|
||
q = q.reshape(B, L, self.num_heads, self.q_head_dim).transpose(0, 2, 1, 3) | ||
q_nope, q_pe = mx.split(q, [self.qk_nope_head_dim], axis=-1) | ||
compressed_kv = self.kv_a_proj_with_mqa(x) | ||
compressed_kv, k_pe = mx.split(compressed_kv, [self.kv_lora_rank], axis=-1) | ||
k_pe = k_pe.reshape(B, L, 1, self.qk_rope_head_dim).transpose(0, 2, 1, 3) | ||
kv = self.kv_b_proj(self.kv_a_layernorm(compressed_kv)) | ||
kv = kv.reshape(B, L, self.num_heads, -1).transpose(0, 2, 1, 3) | ||
|
||
k_nope, values = mx.split(kv, [self.qk_nope_head_dim], axis=-1) | ||
|
||
if cache is not None: | ||
q_pe = self.rope(q_pe, cache.offset) | ||
k_pe = self.rope(k_pe, cache.offset) | ||
k_pe = mx.repeat(k_pe, self.num_heads, axis=1) | ||
keys, values = cache.update_and_fetch( | ||
mx.concatenate([k_nope, k_pe], axis=-1), values | ||
) | ||
else: | ||
q_pe = self.rope(q_pe) | ||
k_pe = self.rope(k_pe) | ||
k_pe = mx.repeat(k_pe, self.num_heads, axis=1) | ||
keys = mx.concatenate([k_nope, k_pe], axis=-1) | ||
|
||
queries = mx.concatenate([q_nope, q_pe], axis=-1) | ||
|
||
output = scaled_dot_product_attention( | ||
queries, keys, values, cache=cache, scale=self.scale, mask=mask | ||
) | ||
output = output.transpose(0, 2, 1, 3).reshape(B, L, -1) | ||
return self.o_proj(output) | ||
|
||
|
||
class DeepseekV2MLP(nn.Module): | ||
def __init__( | ||
self, config: ModelArgs, hidden_size: int = None, intermediate_size: int = None | ||
self, | ||
config: ModelArgs, | ||
hidden_size: Optional[int] = None, | ||
intermediate_size: Optional[int] = None, | ||
): | ||
super().__init__() | ||
self.config = config | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like the same behavior. I'm just wondering what motivated the change here and if this code is still working? Or maybe it wasn't working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it gave some mypy errors initially. Doing this change fixed them.