Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi4 #2197

Merged
merged 51 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
1a43259
Add Phi4 support
krammnic Dec 21, 2024
3630908
Add Phi4
krammnic Dec 21, 2024
18f8bc5
fix names
krammnic Jan 11, 2025
e69a77c
More fixes. Able to do forward
krammnic Jan 11, 2025
a94b742
Update torchtune/models/phi4/_tokenizer.py
krammnic Jan 14, 2025
1d03294
Update torchtune/_recipe_registry.py
krammnic Jan 14, 2025
bdf478f
Update torchtune/models/phi4/_model_builders.py
krammnic Jan 14, 2025
78cd1e6
more fixes
Feb 2, 2025
d8b2ea3
nit SPM -> TikToken
Feb 2, 2025
3d55e55
fixed tokenizer + fix model loading problem (credits: ebsmothers)
Feb 8, 2025
7ee22b6
remove useless comments
Feb 8, 2025
e515f06
gpt2 tokenizer
Feb 8, 2025
d1cae68
gpt2 tokenizer
Feb 8, 2025
3c1780d
fixed configs
krammnic Feb 8, 2025
18c0033
fix docstring in tokenizer
krammnic Feb 8, 2025
fc1d2db
fix lint and docstrings
krammnic Feb 8, 2025
99a1ce5
fix lint and docstrings
krammnic Feb 8, 2025
ce626a4
cover gpt2 tokenizer with test
krammnic Feb 8, 2025
e3768ee
fix lint
krammnic Feb 8, 2025
c84c74c
fix phi4tokenizer tests
krammnic Feb 8, 2025
cbc5ca1
fix tests
krammnic Feb 8, 2025
dc64290
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
46bede4
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
cc36700
Update torchtune/modules/tokenizers/_gpt2.py
krammnic Feb 10, 2025
c9a483c
fix eval configs
Feb 10, 2025
c1b6394
remove nnodes from configs
Feb 10, 2025
47dd749
naming fixes
Feb 10, 2025
146cac3
fix lint
Feb 10, 2025
6e50261
fixes
Feb 10, 2025
55d7ae0
fix test
Feb 10, 2025
e7b43d6
phi4 -> phi4_14b
Feb 10, 2025
b4de41d
resolve conflict
Feb 10, 2025
4440768
resolve conflict
Feb 10, 2025
d39e717
update __init__
Feb 10, 2025
54d477d
update __init__
Feb 10, 2025
0be4b8e
update __init__
Feb 10, 2025
ad8562e
Merge branch 'main' into main
ebsmothers Feb 10, 2025
518a769
add GPT2BaseTokenizer in transforms/tokenizers/__init__.py + fix lint
Feb 10, 2025
e29aca6
fix imports
Feb 10, 2025
d533355
fix __init__ and namings
Feb 10, 2025
012f433
swap encode decode
Feb 11, 2025
ebcd1d6
correct eval recipe
Feb 11, 2025
d4435b0
fix docstring
Feb 11, 2025
7f5ccd8
remove useless argument
Feb 11, 2025
36eeaa8
nit: unk token
Feb 11, 2025
af5a824
fixes tokenizer
Feb 11, 2025
2002f50
fix gpt2tokenizer test
Feb 11, 2025
01ac202
fix lora config
Feb 11, 2025
6003044
renamings
Feb 11, 2025
7aea0ca
fix phi4 drop eos + test
Feb 11, 2025
4f38c14
recipe registry
Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions recipes/configs/phi3/evaluation.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

folder is phi3, but args are phi4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping this comment. Please go through both Phi-3 and Phi-4 eval files to make sure they contain the correct model references

Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ model:
# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Phi-3-mini-4k-instruct
checkpoint_dir: /tmp/phi-3
checkpoint_files: [
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors
Expand All @@ -25,7 +25,7 @@ resume_from_checkpoint: False
# Tokenizer
tokenizer:
_component_: torchtune.models.phi3.phi3_mini_tokenizer
path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model
path: /tmp/phi-3/tokenizer.model
max_seq_len: null

# Environment
Expand Down
45 changes: 45 additions & 0 deletions recipes/configs/phi4/evaluation.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you made a copy from phi3, but made the changes in phi3/evaluation, instead of here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think these two eval files need to be swapped

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Config for EleutherEvalRecipe in eleuther_eval.py
#
# To launch, run the following command:
# tune run eleuther_eval --config phi4/evaluation

output_dir: ./ # Not needed

# Model Arguments
model:
_component_: torchtune.models.phi4.phi4_14b

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/phi-4
checkpoint_files: [
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/PHI3_MINI/PHI4_MINI

resume_from_checkpoint: False

# Tokenizer
tokenizer:
_component_: torchtune.models.phi4.phi4_14b_tokenizer
vocab_path: /tmp/phi-4/vocab.json
merges_path: /tmp/phi-4/merges.txt
max_seq_len: null

# Environment
device: cuda
dtype: bf16
seed: 1234 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed

# EleutherAI specific eval args
tasks: ["truthfulqa_mc2"]
limit: null
max_seq_length: 4096
batch_size: 8
enable_kv_cache: True

# Quantization specific args
quantizer: null
110 changes: 110 additions & 0 deletions recipes/configs/phi4/full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Config for multi-device full finetuning in full_finetune_distributed.py
# using a Phi4 16K Instruct
#
# This config assumes that you've run the following command before launching
# this run:
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN>
#
# Run this config on 4 GPUs using the following:
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/full
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# Single device full finetuning requires more memory optimizations. It's
# best to use mini_low_memory.yaml for those cases

output_dir: /tmp/torchtune/phi-4/full # /tmp may be deleted by your system. Change it to your preference.

# Model arguments
model:
_component_: torchtune.models.phi4.phi4_14b

# Tokenizer
tokenizer:
_component_: torchtune.models.phi4.phi4_14b_tokenizer
vocab_path: /tmp/phi-4/vocab.json
merges_path: /tmp/phi-4/merges.txt
max_seq_len: null

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/phi-4
checkpoint_files: [
model-00001-of-00006.safetensors,
model-00002-of-00006.safetensors,
model-00003-of-00006.safetensors,
model-00004-of-00006.safetensors,
model-00005-of-00006.safetensors,
model-00006-of-00006.safetensors,
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
resume_from_checkpoint: False

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
packed: False # True increases speed
seed: null
shuffle: True

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 8 # Use to increase effective batch size
optimizer:
_component_: torch.optim.AdamW
fused: True
lr: 5e-6
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
compile: False # torch.compile the model + loss, True increases speed + decreases memory
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True # True reduces memory
enable_activation_offloading: False # True reduces memory
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True


# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
111 changes: 111 additions & 0 deletions recipes/configs/phi4/full_low_memory.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Config for single device full finetuning in full_finetune_single_device.py
# using a Phi4 16K Instruct
#
# This config assumes that you've run the following command before launching
# this run:
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN>
#
# The default config uses an optimizer from bitsandbytes. If you do not have it installed,
# you can install it with
# pip install bitsandbytes
#
# To launch on a single device, run the following command from root:
# tune run full_finetune_single_device --config phi4/full_low_memory
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run full_finetune_single_device --config phi4/full_low_memory checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

output_dir: /tmp/torchtune/phi-4/full_low_memory # /tmp may be deleted by your system. Change it to your preference.

# Model arguments
model:
_component_: torchtune.models.phi4.phi4_14b

# Tokenizer
tokenizer:
_component_: torchtune.models.phi4.phi4_14b_tokenizer
vocab_path: /tmp/phi-4/vocab.json
merges_path: /tmp/phi-4/merges.txt
max_seq_len: null

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/phi-4
checkpoint_files: [
model-00001-of-00006.safetensors,
model-00002-of-00006.safetensors,
model-00003-of-00006.safetensors,
model-00004-of-00006.safetensors,
model-00005-of-00006.safetensors,
model-00006-of-00006.safetensors,
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
resume_from_checkpoint: False

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
packed: False # True increases speed
seed: null
shuffle: True

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 1 # Use to increase effective batch size
optimizer:
_component_: bitsandbytes.optim.PagedAdamW
lr: 5e-6
optimizer_in_bwd: True # True saves memory. Requires gradient_accumulation_steps=1
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
compile: False # torch.compile the model + loss, True increases speed + decreases memory

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True # True reduces memory
enable_activation_offloading: True # True reduces memory
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True


# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
Loading