Add Phi4 #2197

krammnic · 2024-12-21T23:55:01Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Model request. Phi4 #2190

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2024-12-21T23:55:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2197

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4f38c14 with merge base b3964af ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

krammnic · 2024-12-22T06:16:52Z

Should we wait until Phi-4 will be on HF?

krammnic · 2024-12-22T07:54:34Z

We require on changes in only tokenizer actually, as we already used full Attention in Phi3 without sliding one.

krammnic · 2024-12-24T09:11:36Z

I assume that I need to do run with Phi4

krammnic · 2024-12-25T08:00:12Z

@joecummings It seems to me that you haven't gone to holidays) Maybe you can give me some comments about this PR?

joecummings · 2024-12-25T13:58:56Z

@joecummings It seems to me that you haven't gone to holidays) Maybe you can give me some comments about this PR?

Haha yes I'm still (somewhat) here. I asked and it looks like the Phi4 team is planning on fixing some license issues with Hugging Face and should have the model on the Hub soon. So eventually the true test will be to grab the official model from Hugging Face and do a forward pass; however, if you want to iron out any potential discrepancies right away, I'd just grab one of the unofficial uploads like this for your testing.

Happy holidays to you @krammnic - been a pleasure working with you on torchtune this year!

krammnic · 2024-12-25T14:25:31Z

@joecummings Thanks for the comments!) Will do some runs with this then

ebsmothers · 2025-01-10T00:31:41Z

Hi @krammnic just checking in on this PR. I saw the model is on Hugging Face (as of yesterday I believe). Have you done a parity check with their model? And is this ready for review? If so let me know and we can take a look

krammnic · 2025-01-10T23:55:14Z

Hi @krammnic just checking in on this PR. I saw the model is on Hugging Face (as of yesterday I believe). Have you done a parity check with their model? And is this ready for review? If so let me know and we can take a look

Hi, will run tests today and will ping you when it will be ready for review!

insop · 2025-01-11T01:19:10Z

recipes/configs/phi4/evaluation.yaml

+# Config for EleutherEvalRecipe in eleuther_eval.py
+#
+# To launch, run the following command:
+#    tune run eleuther_eval --config phi3/evaluation


insop · 2025-01-11T01:19:31Z

recipes/configs/phi4/evaluation.yaml

+# Checkpointer
+checkpointer:
+  _component_: torchtune.training.FullModelHFCheckpointer
+  checkpoint_dir: /tmp/Phi-3-mini-4k-instruct


/Phi-3/Phi-4

insop · 2025-01-11T01:20:13Z

recipes/configs/phi4/evaluation.yaml

+  ]
+  recipe_checkpoint: null
+  output_dir: ${output_dir}
+  model_type: PHI3_MINI


/PHI3_MINI/PHI4_MINI

insop · 2025-01-11T01:20:33Z

recipes/configs/phi4/evaluation.yaml

+# Tokenizer
+tokenizer:
+  _component_: torchtune.models.phi3.phi3_mini_tokenizer
+  path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model


/torchtune.models.phi3.phi3_mini_tokenizer/torchtune.models.phi4.phi4_mini_tokenizer

/Phi-3/Phi-4

insop · 2025-01-11T01:21:02Z

recipes/configs/phi4/mini_full.yaml

@@ -0,0 +1,105 @@
+# Config for multi-device full finetuning in full_finetune_distributed.py
+# using a Phi3 Mini 4K Instruct


insop · 2025-01-11T01:21:36Z

recipes/configs/phi4/mini_full_low_memory.yaml

@@ -0,0 +1,106 @@
+# Config for single device full finetuning in full_finetune_single_device.py
+# using a Phi3 Mini 4K Instruct


search all Phi3 and replace with Phi4

Thank you for the comments! I'm still fixing naming actually. Will ping you when it will be ready!

krammnic · 2025-01-11T20:40:00Z

Probably forward is now working (I get OOM cause my cards a busy with some experiments). There is pretty weird point that I had to set num_heads = 20 which is twice less then real num_heads (I assume that it is feature of torchtune?). Also, there is some inconsistency with naming. Official description is: Phi-4 small language model but probably we can't name it "small".

krammnic · 2025-01-11T20:56:44Z

No, I can't do forward both for num_heads = 20 and num_heads=40:

For 20 I get:
size mismatch for layers.39._checkpoint_wrapped_module.attn.q_proj.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
For 40 (original value) I get:

        size mismatch for layers.39._checkpoint_wrapped_module.attn.k_proj.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).
        size mismatch for layers.39._checkpoint_wrapped_module.attn.v_proj.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([1280, 5120]).```

Similar issue for each layer. Took params directly from config.json. Am I missing something?

krammnic · 2025-01-11T21:01:51Z

Hardcoding like this fixes the issue:

 q_proj=nn.Linear(embed_dim, 2560, bias=False),
 k_proj=nn.Linear(embed_dim, 2560, bias=False),
 v_proj=nn.Linear(embed_dim, 2560, bias=False),

Probably we should revise formulas especially for phi4.

krammnic · 2025-01-11T21:06:43Z

Nit: For all configs should change tokenizer field

krammnic · 2025-01-11T23:06:47Z

Getting RuntimeError: shape '[2, 308, 40, 128]' is invalid for input of size 1576960 0%| Probably from same reason

krammnic · 2025-01-11T23:35:28Z

For num_heads=40, num_kv_heads=10, embed_dim=5120. Let's calculate:

head_dim = 5120 / 40 = 128

Already here:

 q_proj=nn.Linear(embed_dim, num_heads * head_dim, bias=False),
 k_proj=nn.Linear(embed_dim, num_kv_heads * head_dim, bias=False),
 v_proj=nn.Linear(embed_dim, num_kv_heads * head_dim, bias=False),

Already have a problem here as it will be not 2560 in all cases, but 5120, 1280, 1280. Assume that we "hardcoded" in a way that I have shown earlier. But then we get same problem here:

 q_per_kv = self.num_heads // self.num_kv_heads
 q = q.view(b, s_x, self.num_kv_heads * q_per_kv, self.head_dim)

Error:
RuntimeError: shape '[2, 308, 40, 128]' is invalid for input of size 1576960

Part of config.json for reference:

  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17920,
  "max_position_embeddings": 16384,
  "model_type": "phi3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 10,
  "original_max_position_embeddings": 16384,

So, the product should be twice less. Am I missing something? (I hope I have not miscalculated). Something weird is behind this problem. Will try to work out it asap. @ebsmothers I'm not really sure if it fixable without touching phi3 model or creating separate model for phi4.

krammnic · 2025-01-12T22:46:23Z

Oh, and also I assume that we first of all need to speak about this... #2212

felipemello1

just left some comments, not a review yet. Will come back to it.

felipemello1 · 2025-01-13T19:38:48Z

recipes/configs/phi3/evaluation.yaml

folder is phi3, but args are phi4

Yep, good point

Bumping this comment. Please go through both Phi-3 and Phi-4 eval files to make sure they contain the correct model references

felipemello1 · 2025-01-13T19:40:50Z

recipes/configs/phi4/evaluation.yaml

It seems that you made a copy from phi3, but made the changes in phi3/evaluation, instead of here

Yeah I think these two eval files need to be swapped

felipemello1 · 2025-01-13T19:42:03Z

recipes/configs/phi4/mini_full.yaml

n00b question: Is "mini" the right nomenclature? Or do they have a family of model sizes like phi4_7b, phi4_13B, etc?

Pretty arguing moment, in the description of Phi4 it is "mini model" in real life it is not

I wonder if we should drop the mini and just stick to model sizes, since its more informative. @ebsmothers , any thoughts?

Yeah seems like they are mostly using model sizes instead of "mini" in public docs, so maybe let's go with 14B instead of mini?

felipemello1 · 2025-01-13T19:43:42Z

recipes/configs/phi4/mini_full.yaml

+  ]
+  recipe_checkpoint: null
+  output_dir: ${output_dir}
+  model_type: PHI3_MINI


n00b question: Are there are differences between PHI3 and PHI4? Even if there arent, should we update the model_type for clarity? I believe that this is used in the checkpointer to map the HF format to torchtune format.

According to tech report there is difference in tokenizer and in attention in such way that it is not touching us. But some observations that I made upper might get us to different conclusion

i am tempted to say that even if PHI3_MINI == PHI4_MINI, every model should have its own nomenclature, so there is less cognitive load for the user. @ebsmothers , what do you think?

For now I would stick with the precedent we've set, which is to only use a new model type when the arch changes. This is what we do for the Llama family, where we have LLAMA3, LLAMA3_2, but not LLAMA3_1 or LLAMA3_3. I do agree with your point though @felipemello1 -- we can consider the renaming in a follow-up (at that time I would also probably drop the MINI from Phi model names too)

felipemello1 · 2025-01-13T19:45:32Z

recipes/configs/phi4/mini_full_low_memory.yaml

I think that this was already the naming convention for Phi3, but we should probably add "single_device" to the config name.

Phi3 uses low_memory. Personally I would like to change full_low_memory -> full_single_device across the board, but again would prioritize consistency with Phi3 in this PR.

felipemello1 · 2025-01-13T19:47:17Z

recipes/configs/phi4/mini_lora.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run --nnodes 1 --nproc_per_node 2 lora_finetune_distributed --config phi4/mini_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


we can probably remove --nnodes 1. We dont usually add it to distributed configs. Same goes to the other dist configs

Good catch!

we are working on providing better support for multi node. @joecummings , I am thinking that if you add some documentation about how to configure it (+ good error logging if we dont have have), i would be happy to bulk change every config to add --nnodes 1

Bumping this comment as well

felipemello1

Thanks for this PR @krammnic. Left some questions and minor bug fixes, e.g. phi3 -> phi4.

I personally don't feel comfortable approving this PR without a minimal forward pass comparison vs HF.

Ideally, we should be running evals to see if it matches HF: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=phi-4

Here is the script I used when i was checking llama 3.2. I don't think its ideal either because it doesn't test all of the tokenizer special tokens: https://gist.github.com/felipemello1/55ec8cdcf625b42c1542813c3f2ebf65

torchtune/_recipe_registry.py

felipemello1 · 2025-01-14T14:07:59Z

recipes/configs/phi4/mini_full.yaml

I wonder if we should drop the mini and just stick to model sizes, since its more informative. @ebsmothers , any thoughts?

felipemello1 · 2025-01-14T14:09:41Z

recipes/configs/phi4/mini_full.yaml

+  ]
+  recipe_checkpoint: null
+  output_dir: ${output_dir}
+  model_type: PHI3_MINI


i am tempted to say that even if PHI3_MINI == PHI4_MINI, every model should have its own nomenclature, so there is less cognitive load for the user. @ebsmothers , what do you think?

felipemello1 · 2025-01-14T14:11:04Z

recipes/configs/phi4/mini_lora.yaml

+# You can add specific overrides through the command line. For example
+# to override the checkpointer directory while launching training
+# you can run:
+#   tune run --nnodes 1 --nproc_per_node 2 lora_finetune_distributed --config phi4/mini_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>


we are working on providing better support for multi node. @joecummings , I am thinking that if you add some documentation about how to configure it (+ good error logging if we dont have have), i would be happy to bulk change every config to add --nnodes 1

torchtune/models/phi4/_model_builders.py

felipemello1 · 2025-01-14T16:34:48Z

torchtune/models/phi4/_tokenizer.py

+        self.eos_id = self.special_tokens["<|endoftext|>"]
+        self.bos_id = self.special_tokens["<|endoftext|>"]
+        self.pad_id = self.special_tokens["<|endoftext|>"]


according to Daniel Hans, this is not correct: https://x.com/danielhanchen/status/1877781452818968615. Do you mind taking a look?

I also seems weird that begin of sentence (bos) would be "endoftext"

This is straight from the HF repo, and is also what the underlying tokenizer class (GPT2Tokenizer) uses as defaults. We would need some confirmation from the Phi team that these are incorrect

So maybe we apply all Unsloth fixes at this point?

There is pretty comprehensive report about all this "incorrect" things with fixes

I think the changes will be approved and merged. Daniel and the Microsoft team are discussing the fixes here: https://huggingface.co/microsoft/phi-4/discussions/21.

Go ahead and do the fixes

felipemello1 · 2025-01-14T16:41:34Z

tests/torchtune/models/phi4/test_phi4_tokenizer.py

+        # m.model is a pretrained Sentencepiece model using the following command:
+        # spm.SentencePieceTrainer.train('--input=<TRAIN_FILE> --model_prefix=m --vocab_size=2000')


I believe that this comment is wrong, since it uses tiktoken

Yes, definitely will change all examples like this

Yeah you can copy this comment pointing to the script on how the toy tiktoken tokenizer was trained

torchtune/models/phi4/_tokenizer.py

felipemello1 · 2025-01-14T17:13:45Z

torchtune/models/phi4/_tokenizer.py

+
+            >>> # tokenize_messages encodes messages separately and concats
+            >>> tokenizer.tokenize_messages(messages)[0]
+            [1, 1788, 2643, 13, 1792, 9508, 13, 465, 22137, 2933, 2]


is this example still accurate for phi4?

felipemello1 · 2025-01-14T17:17:32Z

torchtune/models/phi4/_tokenizer.py

+            if message.role == "user":
+                tokenized_messages.append(self.special_tokens["<|im_start|>"])
+                encoded = self.encode(
+                    "system",


n00b question: is it supposed to be "system" for all of them?

RdoubleA

I don't see anywhere where phi4 is referred as phi4-mini. On HF it seems to be only called Phi4. My preference would be to match the commonly accepted name and drop the mini, so we don't confuse folks with Phi4 vs Phi4 Mini.

Would also like to see detailed testing in the PR summary. Specifically, on:

Comparing tokenizer and model forward against HF implementation
Loss curves
Running generation and eval on the model to ensure it gets reasonable outputs

RdoubleA · 2025-01-14T18:49:17Z

torchtune/models/phi4/_tokenizer.py

+
+        self.prompt_template = prompt_template
+
+        self.tt_model = TikTokenBaseTokenizer(


GPT2Tokenizer is probably closer to TikToken than SentencePiece (someone who's more knowledgeable can correct me), but I'm not sure if this will create the correct token map. You would need to test against the HF version of the tokenizer on the same text.

This is probably highlighting our issue with converting from HF tokenizers as mentioned in #2212. We'll need to think of a better long-term solution here.

RdoubleA · 2025-01-14T18:50:33Z

torchtune/models/phi4/_tokenizer.py

+                mask.append(message.masked)
+
+            # Add special tokens
+            if message.role == "user":


I would make this a separate method _tokenize_header and then pass in the role into self.encode to avoid all the if else statements

RdoubleA · 2025-01-14T18:51:44Z

torchtune/models/phi4/_model_builders.py

+    Returns:
+        TransformerDecoder: Instantiation of Phi4 Mini 16K Instruct Model
+    """
+    return phi3(


so there's no architectural difference between phi4 and phi3?

No real difference! See technical report. (We do not support sliding attention)

krammnic · 2025-01-14T20:10:08Z

For num_heads=40, num_kv_heads=10, embed_dim=5120. Let's calculate:

head_dim = 5120 / 40 = 128

Already here:
 q_proj=nn.Linear(embed_dim, num_heads * head_dim, bias=False),
 k_proj=nn.Linear(embed_dim, num_kv_heads * head_dim, bias=False),
 v_proj=nn.Linear(embed_dim, num_kv_heads * head_dim, bias=False),
Already have a problem here as it will be not 2560 in all cases, but 5120, 1280, 1280. Assume that we "hardcoded" in a way that I have shown earlier. But then we get same problem here:
 q_per_kv = self.num_heads // self.num_kv_heads
 q = q.view(b, s_x, self.num_kv_heads * q_per_kv, self.head_dim)
Error: RuntimeError: shape '[2, 308, 40, 128]' is invalid for input of size 1576960

Part of config.json for reference:
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17920,
  "max_position_embeddings": 16384,
  "model_type": "phi3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 10,
  "original_max_position_embeddings": 16384,
So, the product should be twice less. Am I missing something? (I hope I have not miscalculated). Something weird is behind this problem. Will try to work out it asap. @ebsmothers I'm not really sure if it fixable without touching phi3 model or creating separate model for phi4.

Only point about architecture is this. And this is not align with tech report actually

Co-authored-by: Felipe Mello <[email protected]>

krammnic · 2025-02-10T12:52:37Z

@ebsmothers @SalmanMohammadi Have done probably almost all fixes that were required (I need to still cover Phi4Tokenizer with test about special tokens). You can check it out

krammnic · 2025-02-10T18:42:09Z

@ebsmothers Probably done, except the
merge conflict. I'm not sure how to resolve it without write access

ebsmothers

OK just a few more small comments! Let's also make sure everything runs on the final version before merging (I can help with this)

ebsmothers · 2025-02-11T00:42:03Z

torchtune/modules/transforms/tokenizers/_gpt2.py

+    def _convert_id_to_token(self, index: int) -> str:
+        return self.decoder.get(index)
+
+    def decode(self, tokens: list) -> List[str]:


nit: put this method after encode (not blocking to land though)

ebsmothers · 2025-02-11T00:42:19Z

torchtune/modules/transforms/tokenizers/_gpt2.py

+            tokens (list): List of the integers, which represent encoded tokens.
+
+        Returns:
+            Decoced text.


nit

Suggested change

Decoced text.

Decoded text.

ebsmothers · 2025-02-11T00:43:28Z

recipes/configs/phi3/evaluation.yaml

  checkpoint_files: [
    model-00001-of-00002.safetensors,
    model-00002-of-00002.safetensors
  ]
  recipe_checkpoint: null
  output_dir: ${output_dir}
-  model_type: PHI3_MINI
+  model_type: PHI4_MINI


I think this is still not correct? (Same for L28)

ebsmothers · 2025-02-11T00:47:22Z

tests/assets/tiny_vocab.json

@@ -0,0 +1,6 @@
+{


I believe this file can be removed now?

ebsmothers · 2025-02-11T00:49:17Z

torchtune/models/phi4/_model_builders.py

+def phi4_14b_tokenizer(vocab_path: str = None, merges_path: str = None, path: str = None, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[_TemplateType] = None) -> Phi4MiniTokenizer:
+    """Phi4 (14B) tokenizer.
+    Args:
+        path (str): Path to the tiktoken tokenizer model.


I think the docstring needs to be updated? (Also still not sure why this isn't linting on CI)

ebsmothers · 2025-02-11T00:50:29Z

torchtune/models/phi4/_tokenizer.py

+
+    @property
+    def vocab_size(self):
+        return self.tt_model.vocab_size


I think this needs to be updated

Suggested change

return self.tt_model.vocab_size

return self.tokenizer_model.vocab_size

ebsmothers · 2025-02-11T00:50:39Z

torchtune/models/phi4/_tokenizer.py

+                continue
+            else:
+                ids_for_decode.append(token_id)
+        return self.tt_model.decode(ids_for_decode)


This too

Suggested change

return self.tt_model.decode(ids_for_decode)

return self.tokenizer_model.decode(ids_for_decode)

ebsmothers · 2025-02-11T00:53:13Z

torchtune/modules/transforms/tokenizers/_gpt2.py

+    Args:
+        vocab_path (str): Path to vocab.json file.
+        merges_path (str): Path to merges.txt file.
+        errors (str): Paradigm to follow when decoding.


What are the acceptable values here? Looking at the docstring it's not clear to me how this is used

ebsmothers · 2025-02-11T00:54:43Z

torchtune/modules/transforms/tokenizers/_gpt2.py

+        return bpe_tokens
+
+    def _convert_token_to_id(self, token: str) -> int:
+        return self.encoder.get(token, self.encoder.get(self.unk_id))


nit but can probably just cache self.unk_token = self.encoder.get(self.unk_id) during __init__ or something so you don't have to call it repeatedly

krammnic · 2025-02-11T08:39:17Z

Done with all requested fixes

ebsmothers

Thanks @krammnic for adding Phi4 support to torchtune! Really excited to be able to support this

krammnic added 2 commits December 22, 2024 02:20

Add Phi4 support

1a43259

Add Phi4

3630908

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 21, 2024

krammnic changed the title ~~[WIP] Add Phi4~~ Add Phi4 Dec 24, 2024

insop reviewed Jan 11, 2025

View reviewed changes

krammnic force-pushed the main branch from 751a75f to 3630908 Compare January 11, 2025 19:14

krammnic added 2 commits January 11, 2025 22:24

fix names

18f8bc5

More fixes. Able to do forward

e69a77c

ebsmothers mentioned this pull request Jan 13, 2025

Support for phi3 small and medium? #1013

Closed

felipemello1 reviewed Jan 13, 2025

View reviewed changes

felipemello1 reviewed Jan 14, 2025

View reviewed changes

RdoubleA reviewed Jan 14, 2025

View reviewed changes

Update torchtune/models/phi4/_tokenizer.py

a94b742

Co-authored-by: Felipe Mello <[email protected]>

fix test

55d7ae0

krammnic force-pushed the main branch from 1e64ff8 to 55d7ae0 Compare February 10, 2025 18:04

Mark Obozov added 6 commits February 10, 2025 21:09

phi4 -> phi4_14b

e7b43d6

resolve conflict

b4de41d

resolve conflict

4440768

update __init__

d39e717

update __init__

54d477d

update __init__

0be4b8e

ebsmothers and others added 4 commits February 10, 2025 11:37

Merge branch 'main' into main

ad8562e

add GPT2BaseTokenizer in transforms/tokenizers/__init__.py + fix lint

518a769

fix imports

e29aca6

fix __init__ and namings

d533355

ebsmothers reviewed Feb 11, 2025

View reviewed changes

Mark Obozov added 5 commits February 11, 2025 11:30

swap encode decode

012f433

correct eval recipe

ebcd1d6

fix docstring

d4435b0

remove useless argument

7f5ccd8

nit: unk token

36eeaa8

Mark Obozov added 6 commits February 11, 2025 17:35

fixes tokenizer

af5a824

fix gpt2tokenizer test

2002f50

fix lora config

01ac202

renamings

6003044

fix phi4 drop eos + test

7aea0ca

recipe registry

4f38c14

ebsmothers approved these changes Feb 11, 2025

View reviewed changes

ebsmothers merged commit c09f9b6 into pytorch:main Feb 11, 2025
17 checks passed

bogdansalyp mentioned this pull request Feb 13, 2025

Test PR InflectionAI/torchtune#1

Closed

		@@ -0,0 +1,105 @@
		# Config for multi-device full finetuning in full_finetune_distributed.py
		# using a Phi3 Mini 4K Instruct

		@@ -0,0 +1,106 @@
		# Config for single device full finetuning in full_finetune_single_device.py
		# using a Phi3 Mini 4K Instruct

		# m.model is a pretrained Sentencepiece model using the following command:
		# spm.SentencePieceTrainer.train('--input=<TRAIN_FILE> --model_prefix=m --vocab_size=2000')


		self.prompt_template = prompt_template

		self.tt_model = TikTokenBaseTokenizer(

	return self.tt_model.vocab_size
	return self.tokenizer_model.vocab_size

	return self.tt_model.decode(ids_for_decode)
	return self.tokenizer_model.decode(ids_for_decode)

Add Phi4 #2197

Add Phi4 #2197

Conversation

krammnic commented Dec 21, 2024

Context

Changelog

Test plan

UX

pytorch-bot bot commented Dec 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2197

✅ No Failures

krammnic commented Dec 22, 2024

krammnic commented Dec 22, 2024

krammnic commented Dec 24, 2024

krammnic commented Dec 25, 2024

joecummings commented Dec 25, 2024

krammnic commented Dec 25, 2024

ebsmothers commented Jan 10, 2025

krammnic commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krammnic commented Jan 11, 2025

krammnic commented Jan 11, 2025

krammnic commented Jan 11, 2025

krammnic commented Jan 11, 2025

krammnic commented Jan 11, 2025

krammnic commented Jan 11, 2025 • edited Loading

krammnic commented Jan 12, 2025

felipemello1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felipemello1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RdoubleA Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

krammnic Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RdoubleA left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pytorch-bot bot commented Dec 21, 2024 •

edited

Loading

krammnic commented Jan 11, 2025 •

edited

Loading

RdoubleA Jan 14, 2025 •

edited

Loading

krammnic Jan 14, 2025 •

edited

Loading

RdoubleA left a comment •

edited

Loading

krammnic commented Feb 10, 2025 •

edited

Loading