Skip to content

Commit c07d6c3

Browse files
Add experiment config
1 parent fdda7be commit c07d6c3

File tree

3 files changed

+67
-0
lines changed

3 files changed

+67
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ make test-full
7474
```bash
7575
uv run hf download songlab/gpn-animal-promoter-dataset --repo-type dataset --local-dir data/gpn-animal-promoter-dataset
7676
uv run hf download gonzalobenegas/Angiosperm_16_genomes_sharded --repo-type dataset --local-dir data/gonzalobenegas/Angiosperm_16_genomes_sharded
77+
uv run hf download gonzalobenegas/genomes-v2-genome_set-animals-intervals-v1_512_256 --repo-type dataset --local-dir data/gonzalobenegas/genomes-v2-genome_set-animals-intervals-v1_512_256
7778
```
7879

7980
## How to run
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# @package _global_
2+
3+
# to execute this experiment run:
4+
# python glm_experiments/train.py experiment=clm_transformer_base_new_dataset
5+
6+
defaults:
7+
- override /data: gpn_animal_promoter
8+
- override /model: clm_transformer_base
9+
- override /trainer: gpn_animal_promoter
10+
11+
logger:
12+
wandb:
13+
name: experiment-clm-transformer-base-new-dataset
14+
tags: ["experiment", "clm", "transformer", "base", "new-dataset"]
15+
16+
data:
17+
_target_: glm_experiments.data.lm_datamodule.CLMDataModule
18+
dataset_name: data/gonzalobenegas/genomes-v2-genome_set-animals-intervals-v1_512_256
19+
per_device_batch_size: 256
20+
data_augmentation: false
21+
22+
model:
23+
scheduler:
24+
_target_: transformers.get_cosine_with_min_lr_schedule_with_warmup
25+
_partial_: true
26+
num_warmup_steps: 2000
27+
num_training_steps: ${trainer.max_steps}
28+
min_lr_rate: 0.1 # Decay to 10% of max lr
29+
30+
trainer:
31+
max_steps: 20000
32+
log_every_n_steps: 1000
33+
val_check_interval: 1000
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# @package _global_
2+
3+
# to execute this experiment run:
4+
# python glm_experiments/train.py experiment=clm_transformer_base_new_dataset
5+
6+
defaults:
7+
- override /data: gpn_animal_promoter
8+
- override /model: clm_transformer_base
9+
- override /trainer: gpn_animal_promoter
10+
11+
logger:
12+
wandb:
13+
name: experiment-clm-transformer-base-new-dataset
14+
tags: ["experiment", "clm", "transformer", "base", "new-dataset", "vertebrates"]
15+
16+
data:
17+
_target_: glm_experiments.data.lm_datamodule.CLMDataModule
18+
dataset_name: data/gonzalobenegas/genomes-v2-genome_set-vertebrates-intervals-v1_512_256
19+
per_device_batch_size: 256
20+
data_augmentation: false
21+
22+
model:
23+
scheduler:
24+
_target_: transformers.get_cosine_with_min_lr_schedule_with_warmup
25+
_partial_: true
26+
num_warmup_steps: 2000
27+
num_training_steps: ${trainer.max_steps}
28+
min_lr_rate: 0.1 # Decay to 10% of max lr
29+
30+
trainer:
31+
max_steps: 20000
32+
log_every_n_steps: 1000
33+
val_check_interval: 1000

0 commit comments

Comments
 (0)