Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

support pytorch lightning 1.7 #196

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d9e6ea7
update lightning
JiahaoYao Aug 11, 2022
5992c73
change gpu to cuda
JiahaoYao Aug 12, 2022
63bb622
accelerator is successfully removed
JiahaoYao Aug 12, 2022
61a3775
remove the accelerator
JiahaoYao Aug 12, 2022
db2f6ca
progressive bar
JiahaoYao Aug 12, 2022
282b993
adding the transform
JiahaoYao Aug 12, 2022
fad6be9
change the max step to be -1
JiahaoYao Aug 12, 2022
868688d
fix ci
JiahaoYao Aug 12, 2022
612d08f
checkpoint
JiahaoYao Aug 12, 2022
302f047
checkpoint_callback
JiahaoYao Aug 12, 2022
5ba8227
adding the version
JiahaoYao Aug 13, 2022
878c4b9
fix 'MNISTDataModule' object has no attribute 'train_transforms' issue
JiahaoYao Aug 15, 2022
7eb7c83
fix the issue
JiahaoYao Aug 15, 2022
d063b41
nit
JiahaoYao Aug 15, 2022
bb5f8ff
remove progress bar
JiahaoYao Aug 15, 2022
966d819
remove accelerator
JiahaoYao Aug 15, 2022
b3bc93d
adding remote back
JiahaoYao Aug 15, 2022
c3f5ce8
update bolts
JiahaoYao Aug 15, 2022
e6e3817
kit start
JiahaoYao Aug 16, 2022
d15f24c
split the testing
JiahaoYao Aug 18, 2022
8bfdb20
Merge remote-tracking branch 'upstream/main' into rlt_1.7_0811
JiahaoYao Aug 18, 2022
6bb1acd
put it back
JiahaoYao Aug 18, 2022
d868524
fix the ci here
JiahaoYao Aug 18, 2022
f7ed645
test memory
JiahaoYao Aug 18, 2022
1c63f48
update the cpu number to 6
JiahaoYao Aug 18, 2022
55e514e
adding the pip list
JiahaoYao Aug 19, 2022
06b84c9
addikng this
JiahaoYao Aug 19, 2022
c607f0d
adding the debug
JiahaoYao Aug 19, 2022
645fed2
Merge remote-tracking branch 'upstream/main' into rlt_1.7_0811
JiahaoYao Sep 29, 2022
81fb6a4
update the lightning version
JiahaoYao Sep 29, 2022
ecd9fac
rerun the ci test
JiahaoYao Sep 29, 2022
98ea680
adding the siginature pack
JiahaoYao Oct 3, 2022
fc630a6
new line
JiahaoYao Oct 3, 2022
b0d4cd1
import lib breaks due eto https://stackoverflow.com/questions/7392956…
JiahaoYao Oct 3, 2022
665e8fd
fix the lint
JiahaoYao Oct 3, 2022
3868cff
switch
JiahaoYao Oct 3, 2022
a277600
import failure
JiahaoYao Oct 3, 2022
c08a8f5
nit
JiahaoYao Oct 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ ray.init("ray://<head_node_host>:10001")
```
Now you can run your training script on the laptop, but have it execute as if your laptop has all the resources of the cluster essentially providing you with an **infinite laptop**.

**Note:** When using with Ray Client, you must disable checkpointing and logging for your Trainer by setting `checkpoint_callback` and `logger` to `False`.
**Note:** When using with Ray Client, you must disable checkpointing and logging for your Trainer by setting `enable_checkpointing` and `logger` to `False`.

## Horovod Strategy on Ray
Or if you prefer to use Horovod as the distributed training protocol, use the `HorovodRayStrategy` instead.
Expand Down
4 changes: 2 additions & 2 deletions ray_lightning/accelerators/delayed_gpu_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@
import torch

from pytorch_lightning.accelerators import Accelerator,\
GPUAccelerator
CUDAAccelerator


class _GPUAccelerator(GPUAccelerator):
class _GPUAccelerator(CUDAAccelerator):
"""Accelerator for GPU devices.

adapted from:
Expand Down
4 changes: 3 additions & 1 deletion ray_lightning/examples/ray_ddp_sharded_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ def download_data():
num_workers=num_workers, use_gpu=use_gpu, init_hook=download_data)

dm = MNISTDataModule(data_dir, batch_size=batch_size)
dm.train_transforms = None
dm.val_transforms = None

model = ImageGPT(
embed_dim=embed_dim, layers=16, heads=4, vocab_size=32, num_pixels=28)
Expand Down Expand Up @@ -130,4 +132,4 @@ def download_data():
batch_size=args.batch_size,
embed_dim=args.embed_dim,
max_epochs=args.num_epochs,
max_steps=None)
max_steps=-1)
4 changes: 3 additions & 1 deletion ray_lightning/examples/ray_ddp_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,13 @@ def download_data():
trainer = pl.Trainer(
max_epochs=num_epochs,
callbacks=callbacks,
progress_bar_refresh_rate=0,
enable_progress_bar=False,
strategy=RayStrategy(
num_workers=num_workers, use_gpu=use_gpu, init_hook=download_data))
dm = MNISTDataModule(
data_dir=data_dir, num_workers=1, batch_size=config["batch_size"])
dm.train_transforms = None
dm.val_transforms = None
trainer.fit(model, dm)


Expand Down
5 changes: 0 additions & 5 deletions ray_lightning/ray_ddp.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,3 @@ def distributed_sampler_kwargs(self):
def _is_single_process_single_device(self):
"""Return True if the process is single process and single device."""
return True

def teardown(self) -> None:
"""Teardown the workers and pytorch DDP connections."""
self.accelerator = None
super().teardown()
1 change: 0 additions & 1 deletion ray_lightning/ray_horovod.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,6 @@ def world_size(self) -> int:
def teardown(self) -> None:
"""Teardown the strategy."""
self.join()
self.accelerator = None
super().teardown()

@property
Expand Down
2 changes: 1 addition & 1 deletion requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ flake8-comprehensions
flake8-quotes
yapf==0.23.0
pytest
pytorch-lightning==1.6.4
pytorch-lightning==1.7.1
lightning-bolts==0.3.3
ray[tune]
torch==1.12.0
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
setup(
name="ray_lightning",
packages=find_packages(where=".", include="ray_lightning*"),
version="0.3.0",
version="0.4.0",
author="Ray Team",
description="Ray distributed strategies for Pytorch Lightning.",
long_description="Custom Pytorch Lightning distributed strategies "
"built on top of distributed computing framework Ray.",
url="https://github.com/ray-project/ray_lightning_accelerators",
install_requires=["pytorch-lightning==1.6.*", "ray"])
install_requires=["pytorch-lightning==1.7.*", "ray"])
JiahaoYao marked this conversation as resolved.
Show resolved Hide resolved