-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Phi4 #2197
base: main
Are you sure you want to change the base?
Add Phi4 #2197
Changes from all commits
1a43259
3630908
18f8bc5
e69a77c
a94b742
1d03294
bdf478f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that you made a copy from phi3, but made the changes in phi3/evaluation, instead of here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think these two eval files need to be swapped |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Config for EleutherEvalRecipe in eleuther_eval.py | ||
# | ||
# To launch, run the following command: | ||
# tune run eleuther_eval --config phi3/evaluation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /phi3/phi4 |
||
|
||
output_dir: ./ # Not needed | ||
|
||
# Model Arguments | ||
model: | ||
_component_: torchtune.models.phi3.phi3_mini | ||
|
||
# Checkpointer | ||
checkpointer: | ||
_component_: torchtune.training.FullModelHFCheckpointer | ||
checkpoint_dir: /tmp/Phi-3-mini-4k-instruct | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /Phi-3/Phi-4 |
||
checkpoint_files: [ | ||
model-00001-of-00002.safetensors, | ||
model-00002-of-00002.safetensors | ||
] | ||
recipe_checkpoint: null | ||
output_dir: ${output_dir} | ||
model_type: PHI3_MINI | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /PHI3_MINI/PHI4_MINI |
||
resume_from_checkpoint: False | ||
|
||
# Tokenizer | ||
tokenizer: | ||
_component_: torchtune.models.phi3.phi3_mini_tokenizer | ||
path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /torchtune.models.phi3.phi3_mini_tokenizer/torchtune.models.phi4.phi4_mini_tokenizer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /Phi-3/Phi-4 |
||
max_seq_len: null | ||
|
||
# Environment | ||
device: cuda | ||
dtype: bf16 | ||
seed: 1234 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed | ||
|
||
# EleutherAI specific eval args | ||
tasks: ["truthfulqa_mc2"] | ||
limit: null | ||
max_seq_length: 4096 | ||
batch_size: 8 | ||
enable_kv_cache: True | ||
|
||
# Quantization specific args | ||
quantizer: null |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. n00b question: Is "mini" the right nomenclature? Or do they have a family of model sizes like phi4_7b, phi4_13B, etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pretty arguing moment, in the description of Phi4 it is "mini model" in real life it is not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we should drop the mini and just stick to model sizes, since its more informative. @ebsmothers , any thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah seems like they are mostly using model sizes instead of "mini" in public docs, so maybe let's go with 14B instead of mini? |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Config for multi-device full finetuning in full_finetune_distributed.py | ||
# using a Phi4 16K Instruct | ||
# | ||
# This config assumes that you've run the following command before launching | ||
# this run: | ||
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN> | ||
# | ||
# Run this config on 4 GPUs using the following: | ||
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/mini_full | ||
# | ||
# You can add specific overrides through the command line. For example | ||
# to override the checkpointer directory while launching training | ||
# you can run: | ||
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/mini_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR> | ||
# | ||
# This config works best when the model is being fine-tuned on 2+ GPUs. | ||
# Single device full finetuning requires more memory optimizations. It's | ||
# best to use mini_low_memory.yaml for those cases | ||
|
||
output_dir: /tmp/torchtune/phi4_mini/full # /tmp may be deleted by your system. Change it to your preference. | ||
|
||
# Model arguments | ||
model: | ||
_component_: torchtune.models.phi4.phi4_mini | ||
|
||
# Tokenizer | ||
tokenizer: | ||
_component_: torchtune.models.phi4.phi4_mini_tokenizer | ||
path: /tmp/phi-4/tokenizer.model | ||
max_seq_len: null | ||
|
||
# Checkpointer | ||
checkpointer: | ||
_component_: torchtune.training.FullModelHFCheckpointer | ||
checkpoint_dir: /tmp/phi-4 | ||
checkpoint_files: [ | ||
model-00001-of-00006.safetensors, | ||
model-00002-of-00006.safetensors, | ||
model-00003-of-00006.safetensors, | ||
model-00004-of-00006.safetensors, | ||
model-00005-of-00006.safetensors, | ||
model-00006-of-00006.safetensors, | ||
] | ||
recipe_checkpoint: null | ||
output_dir: ${output_dir} | ||
model_type: PHI3_MINI | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. n00b question: Are there are differences between PHI3 and PHI4? Even if there arent, should we update the model_type for clarity? I believe that this is used in the checkpointer to map the HF format to torchtune format. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to tech report there is difference in tokenizer and in attention in such way that it is not touching us. But some observations that I made upper might get us to different conclusion There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i am tempted to say that even if PHI3_MINI == PHI4_MINI, every model should have its own nomenclature, so there is less cognitive load for the user. @ebsmothers , what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now I would stick with the precedent we've set, which is to only use a new model type when the arch changes. This is what we do for the Llama family, where we have |
||
resume_from_checkpoint: False | ||
|
||
# Dataset | ||
dataset: | ||
_component_: torchtune.datasets.alpaca_cleaned_dataset | ||
packed: False # True increases speed | ||
seed: null | ||
shuffle: True | ||
|
||
# Fine-tuning arguments | ||
epochs: 1 | ||
max_steps_per_epoch: null | ||
batch_size: 2 | ||
gradient_accumulation_steps: 8 # Use to increase effective batch size | ||
optimizer: | ||
_component_: torch.optim.AdamW | ||
fused: True | ||
lr: 5e-6 | ||
loss: | ||
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss | ||
compile: False # torch.compile the model + loss, True increases speed + decreases memory | ||
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1 | ||
|
||
# Training env | ||
device: cuda | ||
|
||
# Memory management | ||
enable_activation_checkpointing: True # True reduces memory | ||
enable_activation_offloading: False # True reduces memory | ||
dtype: bf16 | ||
|
||
# Logging | ||
metric_logger: | ||
_component_: torchtune.training.metric_logging.DiskLogger | ||
log_dir: ${output_dir}/logs | ||
log_every_n_steps: 1 | ||
log_peak_memory_stats: True | ||
|
||
|
||
# Profiler (disabled) | ||
profiler: | ||
_component_: torchtune.training.setup_torch_profiler | ||
enabled: False | ||
|
||
#Output directory of trace artifacts | ||
output_dir: ${output_dir}/profiling_outputs | ||
|
||
#`torch.profiler.ProfilerActivity` types to trace | ||
cpu: True | ||
cuda: True | ||
|
||
#trace options passed to `torch.profiler.profile` | ||
profile_memory: False | ||
with_stack: False | ||
record_shapes: True | ||
with_flops: False | ||
|
||
# `torch.profiler.schedule` options: | ||
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat | ||
wait_steps: 5 | ||
warmup_steps: 3 | ||
active_steps: 2 | ||
num_cycles: 1 |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this was already the naming convention for Phi3, but we should probably add "single_device" to the config name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Phi3 uses |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# Config for single device full finetuning in full_finetune_single_device.py | ||
# using a Phi4 16K Instruct | ||
# | ||
# This config assumes that you've run the following command before launching | ||
# this run: | ||
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN> | ||
# | ||
# The default config uses an optimizer from bitsandbytes. If you do not have it installed, | ||
# you can install it with | ||
# pip install bitsandbytes | ||
# | ||
# To launch on a single device, run the following command from root: | ||
# tune run full_finetune_single_device --config phi4/mini_full_low_memory | ||
# | ||
# You can add specific overrides through the command line. For example | ||
# to override the checkpointer directory while launching training | ||
# you can run: | ||
# tune run full_finetune_single_device --config phi4/mini_full_low_memory checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR> | ||
# | ||
# This config works only for training on single device. | ||
|
||
output_dir: /tmp/torchtune/phi4_mini/full_low_memory # /tmp may be deleted by your system. Change it to your preference. | ||
|
||
# Model arguments | ||
model: | ||
_component_: torchtune.models.phi4.phi4_mini | ||
|
||
# Tokenizer | ||
tokenizer: | ||
_component_: torchtune.models.phi4.phi4_mini_tokenizer | ||
path: /tmp/phi-4/tokenizer.model | ||
max_seq_len: null | ||
|
||
# Checkpointer | ||
checkpointer: | ||
_component_: torchtune.training.FullModelHFCheckpointer | ||
checkpoint_dir: /tmp/phi-4 | ||
checkpoint_files: [ | ||
model-00001-of-00006.safetensors, | ||
model-00002-of-00006.safetensors, | ||
model-00003-of-00006.safetensors, | ||
model-00004-of-00006.safetensors, | ||
model-00005-of-00006.safetensors, | ||
model-00006-of-00006.safetensors, | ||
] | ||
recipe_checkpoint: null | ||
output_dir: ${output_dir} | ||
model_type: PHI3_MINI | ||
resume_from_checkpoint: False | ||
|
||
# Dataset | ||
dataset: | ||
_component_: torchtune.datasets.alpaca_cleaned_dataset | ||
packed: False # True increases speed | ||
seed: null | ||
shuffle: True | ||
|
||
# Fine-tuning arguments | ||
epochs: 1 | ||
max_steps_per_epoch: null | ||
batch_size: 2 | ||
gradient_accumulation_steps: 1 # Use to increase effective batch size | ||
optimizer: | ||
_component_: bitsandbytes.optim.PagedAdamW | ||
lr: 5e-6 | ||
optimizer_in_bwd: True # True saves memory. Requires gradient_accumulation_steps=1 | ||
loss: | ||
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss | ||
compile: False # torch.compile the model + loss, True increases speed + decreases memory | ||
|
||
# Training env | ||
device: cuda | ||
|
||
# Memory management | ||
enable_activation_checkpointing: True # True reduces memory | ||
enable_activation_offloading: True # True reduces memory | ||
dtype: bf16 | ||
|
||
# Logging | ||
metric_logger: | ||
_component_: torchtune.training.metric_logging.DiskLogger | ||
log_dir: ${output_dir}/logs | ||
log_every_n_steps: 1 | ||
log_peak_memory_stats: True | ||
|
||
|
||
# Profiler (disabled) | ||
profiler: | ||
_component_: torchtune.training.setup_torch_profiler | ||
enabled: False | ||
|
||
#Output directory of trace artifacts | ||
output_dir: ${output_dir}/profiling_outputs | ||
|
||
#`torch.profiler.ProfilerActivity` types to trace | ||
cpu: True | ||
cuda: True | ||
|
||
#trace options passed to `torch.profiler.profile` | ||
profile_memory: False | ||
with_stack: False | ||
record_shapes: True | ||
with_flops: False | ||
|
||
# `torch.profiler.schedule` options: | ||
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat | ||
wait_steps: 5 | ||
warmup_steps: 3 | ||
active_steps: 2 | ||
num_cycles: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this matches the format of other directories