You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/workflows/.cpu_ci_on_pr.yml
+3-1
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,5 @@
1
+
# This file is hidden (.cpu_cpi_on_pr.yml) to minimize the number of runner minutes consumed.
2
+
1
3
name: "Pull Request CPU Tests"
2
4
3
5
on:
@@ -7,7 +9,7 @@ on:
7
9
8
10
jobs:
9
11
run-tests:
10
-
runs-on: [ 'test', 'self-hosted' ]
12
+
runs-on: ubuntu-22.04 # ubuntu-latest currently points to ubuntu-22.04 but 24.04 is in beta - recommend testing on 24.04 and then changing instead of using ubuntu-latest
Copy file name to clipboardExpand all lines: README.md
+48-16
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,21 @@ GPT-NeoX leverages many of the same features and technologies as the popular Meg
15
15
* Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
16
16
* Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 \& 2
17
17
* Curriculum Learning
18
-
* Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, logging via [WandB](https://wandb.ai/site), and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).
18
+
* Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, monitor experiments via [WandB](https://wandb.ai/site)/[Comet](https://www.comet.com/site/)/TensorBoard, and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).
19
19
20
20
## News
21
+
**[9/9/2024]** We now support preference learning via [DPO](https://arxiv.org/abs/2305.18290), [KTO](https://arxiv.org/abs/2402.01306), and reward modeling
22
+
23
+
**[9/9/2024]** We now support integration with [Comet ML](https://www.comet.com/site/), a machine learning monitoring platform
24
+
25
+
**[5/21/2024]** We now support [RWKV](https://www.rwkv.com/) with pipeline parallelism!. See the PRs for [RWKV](https://github.com/EleutherAI/gpt-neox/pull/1198) and [RWKV+pipeline](https://github.com/EleutherAI/gpt-neox/pull/1221)
26
+
27
+
**[3/21/2024]** We now support Mixture-of-Experts (MoE)
28
+
29
+
**[3/17/2024]** We now support AMD MI250X GPUs
30
+
31
+
**[3/15/2024]** We now support [Mamba](https://github.com/state-spaces/mamba) with tensor parallelism! See [the PR](https://github.com/EleutherAI/gpt-neox/pull/1184)
32
+
21
33
**[8/10/2023]** We now support checkpointing with AWS S3! Activate with the `s3_path` config option (for more detail, see [the PR](https://github.com/EleutherAI/gpt-neox/pull/1010))
22
34
23
35
**[9/20/2023]** As of https://github.com/EleutherAI/gpt-neox/pull/1035, we have deprecated Flash Attention 0.x and 1.x, and migrated support to Flash Attention 2.x. We don't believe this will cause problems, but if you have a specific use-case that requires old flash support using the latest GPT-NeoX, please raise an issue.
@@ -88,14 +100,15 @@ Prior to 3/9/2023, GPT-NeoX relied on [DeeperSpeed](https://github.com/EleutherA
88
100
89
101
### Host Setup
90
102
91
-
First make sure you are in an environment with Python 3.8 with an appropriate version of PyTorch 1.8 or later installed. **Note:** Some of the libraries that GPT-NeoX depends on have not been updated to be compatible with Python 3.10+. Python 3.9 appears to work, but this codebase has been developed and tested for Python 3.8.
103
+
This codebase has primarily developed and tested for Python 3.8-3.10, and PyTorch 1.8-2.0. This is not a strict requirement, and other versions and combinations of libraries may work.
92
104
93
105
To install the remaining basic dependencies, run:
94
106
95
107
```bash
96
108
pip install -r requirements/requirements.txt
97
109
pip install -r requirements/requirements-wandb.txt # optional, if logging using WandB
98
110
pip install -r requirements/requirements-tensorboard.txt # optional, if logging via tensorboard
111
+
pip install -r requirements/requirements-comet.txt # optional, if logging via Comet
99
112
```
100
113
101
114
from the repository root.
@@ -294,7 +307,7 @@ You can then run any job you want from inside the container.
294
307
Concerns when running for a long time or in detached mode include
295
308
- You will have to terminate the container manually when you are no longer using it
296
309
- If you want processes to continue running when your shell session ends, you will need to background them.
297
-
- If you then want logging, you will have to make sure to pipe logs to disk or set up wandb.
310
+
- If you then want logging, you will have to make sure to pipe logs to disk, and set up wandb and/or Comet logging.
298
311
299
312
If you prefer to run the prebuilt container image from dockerhub, you can run the docker compose commands with ```-f docker-compose-dockerhub.yml``` instead, e.g.,
300
313
@@ -457,7 +470,7 @@ You can pass in an arbitrary number of configs which will all be merged at runti
457
470
458
471
You can also optionally pass in a config prefix, which will assume all your configs are in the same folder and append that prefix to their path.
@@ -574,15 +587,28 @@ To convert from a Hugging Face model into a NeoX-loadable, run `tools/ckpts/conv
574
587
575
588
# Monitoring
576
589
577
-
In addition to storing logs locally, we provide built-in support for two popular experiment monitoring frameworks: [Weights & Biases](https://wandb.ai/site) and [TensorBoard](https://www.tensorflow.org/tensorboard/)
590
+
In addition to storing logs locally, we provide built-in support for two popular experiment monitoring frameworks: [Weights & Biases](https://wandb.ai/site), [TensorBoard](https://www.tensorflow.org/tensorboard/), and [Comet](https://www.comet.com/site)
578
591
579
592
## Weights and Biases
580
593
581
-
EleutherAI is currently using [Weights & Biases to record our experiments](https://wandb.ai/eleutherai/neox). If you are logged into Weights & Biases on your machine—you can do this by executing `wandb login`—your runs will automatically be recorded. There are two optional fields associated with Weights & Biases: <code><var>wandb_group</var></code> allows you to name the run group and <code><var>wandb_team</var></code> allows you to assign your runs to an organization or team account.
594
+
[Weights & Biases to record our experiments](https://wandb.ai/eleutherai/neox) is a machine learning monitoring platform. To use wandb to monitor your gpt-neox experiments:
595
+
1. Create an account at https://wandb.ai/site to generate your API key
596
+
2. Log into Weights & Biases on your machine—you can do this by executing `wandb login`—your runs will automatically be recorded.
597
+
3. Dependencies required for wandb monitoring can be found in and installed from `./requirements/requirements-wandb.txt`. An example config is provided in `./configs/local_setup_wandb.yml`.
598
+
4. There are two optional fields associated with Weights & Biases: <code><var>wandb_group</var></code> allows you to name the run group and <code><var>wandb_team</var></code> allows you to assign your runs to an organization or team account. An example config is provided in `./configs/local_setup_wandb.yml`.
582
599
583
600
## TensorBoard
584
601
585
-
We also support using TensorBoard via the <code><var>tensorboard-dir</var></code> field. Dependencies required for TensorBoard monitoring can be found in and installed from `./requirements/requirements-tensorboard.txt`.
602
+
We support using TensorBoard via the <code><var>tensorboard-dir</var></code> field. Dependencies required for TensorBoard monitoring can be found in and installed from `./requirements/requirements-tensorboard.txt`.
603
+
604
+
## Comet
605
+
606
+
[Comet](https://www.comet.com/site) is a machine learning monitoring platform. To use comet to monitor your gpt-neox experiments:
607
+
1. Create an account at https://www.comet.com/login to generate your API key.
608
+
2. Once generated, link your API key at runtime by running `comet login` or passing `export COMET_API_KEY=<your-key-here>`
609
+
3. Install `comet_ml` and any dependency libraries via `pip install -r requirements/requirements-comet.txt`
610
+
4. Enable Comet with `use_comet: True`. You can also customize where data is being logged with `comet_workspace` and `comet_project`. A full example config with comet enabled is provided in `configs/local_setup_comet.yml`.
611
+
5. Run your experiment, and monitor metrics in the Comet workspace that you passed!
586
612
587
613
# Running on multi-node
588
614
@@ -594,7 +620,9 @@ We support profiling with Nsight Systems, the PyTorch Profiler, and PyTorch Memo
594
620
595
621
## Nsight Systems Profiling
596
622
597
-
To use the Nsight Systems profiling, set config options `profile`, `profile_step_start`, and `profile_step_stop`. Launch training with:
623
+
To use the Nsight Systems profiling, set config options `profile`, `profile_step_start`, and `profile_step_stop` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
The generated output file can then by viewed with the Nsight Systems GUI:
606
634
607
-

635
+

608
636
609
637
## PyTorch Profiling
610
638
611
-
To use the built-in PyTorch profiler, set config options `profile`, `profile_step_start`, and `profile_step_stop`.
639
+
To use the built-in PyTorch profiler, set config options `profile`, `profile_step_start`, and `profile_step_stop` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
612
640
613
641
The PyTorch profiler will save traces to your `tensorboard` log directory. You can view these traces within
614
642
TensorBoard by following the steps [here](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html).
615
643
616
-

644
+

617
645
618
646
## PyTorch Memory Profiling
619
647
620
-
To use PyTorch Memory Profiling, set config options `memory_profiling` and `memory_profiling_path`.
648
+
To use PyTorch Memory Profiling, set config options `memory_profiling` and `memory_profiling_path` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
621
649
622
-

650
+

623
651
624
652
View the generated profile with the [memory_viz.py](https://github.com/pytorch/pytorch/blob/main/torch/cuda/_memory_viz.py) script. Run with:
625
653
@@ -677,7 +705,7 @@ The following publications by other research groups use this library:
677
705
The following models were trained using this library:
678
706
679
707
### English LLMs
680
-
- EleutherAI's [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Pythia (70M through 13B)](https://github.com/EleutherAI/pythia), and [LLeMMA (34B)](https://arxiv.org/abs/2310.10631)
708
+
- EleutherAI's [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b) and [Pythia (70M through 13B)](https://github.com/EleutherAI/pythia)
- Rinna Co.'s [japanese-gpt-neox-3.6b](https://huggingface.co/rinna/japanese-gpt-neox-3.6b) (Japanese) and [bilingual-gpt-neox-4b](https://huggingface.co/rinna/bilingual-gpt-neox-4b) (English / Japanese)
694
722
- CyberAgent's [Open-CLM (125M through 7B)](https://huggingface.co/cyberagent/open-calm-7b) (Japanese)
695
723
- The Hungarian Research Centre for Linguistics's [PULI GPTrio (6.7B)](https://huggingface.co/NYTK/PULI-GPTrio) (Hungarian / English / Chinese)
696
724
- The University of Tokyo's [weblab-10b](https://huggingface.co/Kojima777/weblab-10b) and [weblab-10b-instruct](https://huggingface.co/Kojima777/weblab-10b-instruction-sft) (Japanese)
0 commit comments