[ROCm] add support for ROCm/HIP device #6086

jeffdaily · 2023-09-08T16:21:34Z

To build for ROCm:

./helpers/hipify.sh
mkdir build
cd build
cmake -DUSE_ROCM=1 ..

CUDA source files are hipified in-place using the helper script before running cmake. The "cuda" device is re-used for rocm, so device=cuda will work the same for rocm builds.

Summary of changes:

CMakeLists.txt ROCm updates, also replace glob with explicit file list
support both warpSize 32 and 64
helpers/hipify.sh script added
.gitignore to ignore generated hip source files *.prehip
disable compiler warnings
move __device__ template function PercentileDevice into header
bug fixes for __host__ __define__

- CMakeLists.txt ROCm updates, also replace glob with explicit file list - initial warpSize interop changes - helpers/hipify.sh script added - .gitignore to ignore generated hip source files

- disable compiler warnings - move PercentileDevice __device__ template function into header - bug fixes for __host__ __define__ and __HIP__ preprocessor symbols

jameslamb

Thanks for your interest in LightGBM. Since I'm not aware of any prior conversation in this project about adding support like this, we have some questions before spending time supporting this.

what is ROCm/HIP? Where can we read to learn more?
what is the value of this addition to LightGBM's users? What does this offer that the OpenCL-based and CUDA-based builds of LightGBM don't already offer?
- this project's OpenCL-based GPU build is already struggling from a severe lack of maintenance... I'm very skeptical of taking on a third GPU build
how might we test this? What types of devices should we expect to be supported?

shiyu1994 · 2023-09-08T16:52:52Z

@jeffdaily Thank you, this is very exciting! @jameslamb ROCm is the counterpart of CUDA for AMD GPU. I don't have any prior discussion with @jeffdaily about this. But it is very exciting if we can enlarge the devices supported by LightGBM.

jeffdaily · 2023-09-08T17:22:59Z

Apologies for coming out of nowhere with this. We use LightGBM; the OpenCL-based 'gpu' device already works on our AMD GPUs. But we were curious if we could get better performance if we ported the 'cuda' device to AMD GPUs. This started as a proof of concept, but it seemed useful to share even in its current state.

Using the GPU-Tutorial, here are my results on our MI210.

what is evaluated	CPU	GPU/OpenCL	"cuda" but really ROCm
correctness	auc : 0.821268 18.547533 seconds	auc : 0.821268 20.386780 seconds	auc : 0.821268 9.049307 seconds
speed objective=binary metric=auc	22.604444 seconds	18.028674 seconds	7.787303 seconds
speed objective=regression_l2 metric=l2	18.961535 seconds	14.491217 seconds	7.871302 seconds

jeffdaily · 2023-09-08T17:55:11Z

what is ROCm/HIP? Where can we read to learn more?

https://rocm.docs.amd.com/en/latest/rocm.html

what is the value of this addition to LightGBM's users? What does this offer that the OpenCL-based and CUDA-based builds of LightGBM don't already offer?

See the perf results from the comment above.

this project's OpenCL-based GPU build is already struggling from a severe lack of maintenance... I'm very skeptical of taking on a third GPU build

how might we test this? What types of devices should we expect to be supported?

Here is the current list of supported AMD GPUs.

To test this, you'll need to run on one of the supported AMD GPUs. How is the cuda device currently tested?

ibustany · 2023-09-08T19:20:35Z

Thank you and kudos Jeff!
This work has been much needed!
Best regards,
Ismail

jameslamb · 2023-09-08T20:03:09Z

To test this, you'll need to run on one of the supported AMD GPUs. How is the cuda device currently tested?

We run a VM in Azure with a Tesla V100 on it, and schedule jobs onto it via GitHub Actions.

example build link: https://github.com/microsoft/LightGBM/actions/runs/6123938185/job/16622920873#step:5:34-51
configuration:

LightGBM/.github/workflows/cuda.yml

Line 25 in 04b66e0

runs-on: [self-hosted, linux]

Are you aware of any free CI service supporting AMD GPUs? Otherwise, since I see you work for AMD and since merging this might further AMD's interests... would AMD maybe be willing to fund testing resources for this project? Maybe that's something you and @shiyu1994 (the only maintainer here who's employed by Microsoft) could coordinate?

jeffdaily · 2023-09-08T20:09:14Z

Are you aware of any free CI service supporting AMD GPUs? Otherwise, since I see you work for AMD and since merging this might further AMD's interests... would AMD maybe be willing to fund testing resources for this project? Maybe that's something you and @shiyu1994 (the only maintainer here who's employed by Microsoft) could coordinate?

Microsoft does have an AMD GPU deployment. I'm aware of it being used for onnxruntime CI purposes. I wonder if some of those resources could be used here? @shiyu1994?

jeffdaily · 2023-09-08T22:45:42Z

Noting that the only CI failure currently is not related to my changes. It seems to be a perhaps temporary environment setup issue for that job.

shiyu1994 · 2023-09-13T03:34:28Z

I have access to some AMD MI100 GPUs. But we still need separate budget for an agent with an AMD GPU if we want to test automatically in ci. Do you think it is acceptable if I run the tests for AMD GPU offline without an additional agent for ci? Given that the code for GPU version is shared by both CUDA and ROCm. @jameslamb @guolinke @jeffdaily.

jameslamb · 2023-09-13T14:10:08Z

Do you think it is acceptable if I run the tests for AMD GPU offline without an additional agent for ci?

If you feel confident in these changes based on that, and you think the added complexity in the CUDA code is worth it, that's fine with me. I'll defer to your opinion.

But without a CI job, there's a high risk that future refactorings will break this support again.

dismissing

jameslamb · 2023-09-13T14:12:53Z

I dismissed my review, so that it doesn't block merging. My initial questions have been answered, thanks very much for those links and all that information!

@shiyu1994 and @guolinke seem excited about this addition... that's good enough for me 😊

I'll defer to them to review the code, as I know very little about CUDA.

shiyu1994 · 2023-10-08T15:26:20Z

@jeffdaily Thanks for the great work! I'll review this in the next few days.

StrikerRUS

@shiyu1994 Thank you for considering my comments! I think that two of them were not addressed by your recent refactoring. Please check #6086 (comment) and #6086 (comment).

shiyu1994 · 2025-06-10T03:45:43Z

@shiyu1994 Thank you for considering my comments! I think that two of them were not addressed by your recent refactoring. Please check #6086 (comment) and #6086 (comment).

Thank you for the very careful check. I've done the fixes in 28d4648. Could you please review it again? @StrikerRUS

StrikerRUS

@shiyu1994

I've done the fixes in 28d4648.

I believe that right commit would be 6732b79.

I left my last suggestion for this PR below.

StrikerRUS · 2025-06-15T19:51:30Z

src/objective/cuda/cuda_rank_objective.cu

    __shared__ double shared_rho[SHARED_MEMORY_SIZE];
    // assert that warpSize == 32
-    __shared__ double shared_buffer[32];
+    __shared__ double shared_buffer[WARPSIZE];


I guess it should be 1024 / WARPSIZE similarly to L530 in this file.

Done via d4676d9.
Please check.
Thanks for the finding!

shiyu1994 · 2025-06-17T06:57:03Z

@StrikerRUS Thanks for your review. Could you please check this PR again? If there's no other problems, let's merge this.

shiyu1994 · 2025-06-17T09:16:10Z

/AzurePipelines run

StrikerRUS

@jeffdaily Thank you so much for proposing this PR!

And thanks a lot @shiyu1994 for finishing it!

StrikerRUS · 2025-06-22T19:20:46Z

@jameslamb Could you please refresh your blocking review?

#6086 (review)

StrikerRUS · 2025-07-14T17:35:01Z

Kindly ping @jameslamb for unblocking the merge of this PR

jameslamb · 2025-07-14T17:46:51Z

Sorry for the delay, I will try to look tonight (about 8 hours from now in my timezone).

Has the most recent version of this PR been tested on a machine with an AMD GPU? I know @shiyu1994 had mentioned doing that (#6086 (comment)) but I don't see any comments on this PR saying that anyone has tested this.

My plan for reviewing was to provision a machine from AWS with an AMD GPU and try cloning this branch, building, and testing. I am OK merging this without CI set up (as it seems that it will be difficult to do that), to make it easier for other people to test... but we should at least see the latest version build and pass tests manually once before this is merged.

getting out of the way

jameslamb · 2025-07-18T05:38:24Z

I'm very sorry, I haven't been able to test this and won't be able to for a while. I've removed my blocking review so @StrikerRUS you can merge this if you are confident enough in it.

Sorry for delaying this.

StrikerRUS · 2025-07-21T15:14:02Z

@jameslamb No problem at all!
I like your idea about manual testing. But unfortunately I don't have any access to AMD GPU and not able to get any cloud one.

jeffdaily · 2025-07-21T15:47:25Z

https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html

shiyu1994 · 2025-07-24T03:21:07Z

Sorry, I merged this before reading the above comments...
But given that the CI tests have been passed, this PR should not influence the current functionality of the repo.
Let's find a way to build a CI test for AMD version, maybe use the developer resource linked by @jeffdaily.

jameslamb · 2025-07-26T01:58:05Z

Ok thanks, yes please. It would be good to check that it at least minimally works before releasing. Otherwise I think it's very likely to be broken by future development.

StrikerRUS · 2025-07-28T12:11:40Z

@jeffdaily Thank a lot for the provided link! I registered there but unfortunately I'm not able to add funds to my account from Russia. Maybe someone else will be able to do this or we may try to reuse any AMD runner from another Microsoft open-source project.

Previously microsoft#6086 added ROCm support but after numerous rebases it lost critical changes. This PR restores the ROCm build. There are many source file changes but most were automated using the following: ```bash for f in `grep -rl '#ifdef USE_CUDA'` do sed -i 's@#ifdef USE_CUDA@#if defined(USE_CUDA) || defined(USE_ROCM)@g' $f done for f in `grep -rl '#endif // USE_CUDA'` do sed -i 's@#endif // USE_CUDA@#endif // USE_CUDA || USE_ROCM@g' $f done ```

jeffdaily added 4 commits September 7, 2023 20:48

[ROCm] add support for ROCm/HIP

01ff268

- CMakeLists.txt ROCm updates, also replace glob with explicit file list - initial warpSize interop changes - helpers/hipify.sh script added - .gitignore to ignore generated hip source files

more rocm updates

ac966a5

- disable compiler warnings - move PercentileDevice __device__ template function into header - bug fixes for __host__ __define__ and __HIP__ preprocessor symbols

more bug fixes

6ae6432

warp 32 vs 64 updates

5b87bcd

jeffdaily requested review from guolinke, jameslamb, jmoralez and shiyu1994 as code owners September 8, 2023 16:21

jameslamb previously requested changes Sep 8, 2023

View reviewed changes

jameslamb added in progress feature labels Sep 8, 2023

lint fixes

3ad89f8

missing device_index variable

7b8b6a0

accidental inclusion of hip headers

62aa30b

jeffdaily and others added 3 commits September 11, 2023 16:21

copyright notice compliance

bb27c55

Merge branch 'master' into rocm2

58ace9c

Merge branch 'master' into rocm2

0bc1cfb

Merge branch 'master' into rocm2

a7c9653

StrikerRUS requested changes Apr 13, 2025

View reviewed changes

shiyu1994 and others added 2 commits April 16, 2025 12:40

Merge branch 'master' into rocm2

c66b8f3

Merge branch 'master' into rocm2

8591248

StrikerRUS mentioned this pull request Apr 20, 2025

[CUDA] Multi-GPU for CUDA Version #6138

Open

shiyu1994 and others added 3 commits June 10, 2025 11:38

Merge branch 'master' into rocm2

8b1c95a

use WARPSIZE

6732b79

Merge branch 'rocm2' of https://github.com/jeffdaily/LightGBM into HEAD

28d4648

StrikerRUS reviewed Jun 15, 2025

View reviewed changes

shiyu1994 added 2 commits June 17, 2025 14:42

Merge branch 'master' into rocm2

9a19de6

fix share buffer size

d4676d9

shiyu1994 added a commit that referenced this pull request Jun 17, 2025

remove WARPSIZE before #6086 is merged

c29082e

StrikerRUS removed the awaiting review label Jun 22, 2025

StrikerRUS approved these changes Jun 22, 2025

View reviewed changes

jameslamb mentioned this pull request Jul 7, 2025

[RFC] new GPU issue labels #6962

Closed

shiyu1994 merged commit a0fde1b into microsoft:master Jul 24, 2025
51 checks passed

jeffdaily mentioned this pull request Sep 22, 2025

[ROCm] add ROCm support (pt. 2) #7039

Open

[ROCm] add support for ROCm/HIP device #6086

[ROCm] add support for ROCm/HIP device #6086

Uh oh!

Conversation

jeffdaily commented Sep 8, 2023

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

shiyu1994 commented Sep 8, 2023

Uh oh!

jeffdaily commented Sep 8, 2023

Uh oh!

jeffdaily commented Sep 8, 2023

Uh oh!

ibustany commented Sep 8, 2023

Uh oh!

jameslamb commented Sep 8, 2023

Uh oh!

jeffdaily commented Sep 8, 2023

Uh oh!

jeffdaily commented Sep 8, 2023

Uh oh!

shiyu1994 commented Sep 13, 2023

Uh oh!

jameslamb commented Sep 13, 2023

Uh oh!

jameslamb commented Sep 13, 2023

Uh oh!

shiyu1994 commented Oct 8, 2023

Uh oh!

StrikerRUS left a comment

Choose a reason for hiding this comment

Uh oh!

shiyu1994 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StrikerRUS left a comment

Choose a reason for hiding this comment

Uh oh!

StrikerRUS Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

shiyu1994 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

shiyu1994 commented Jun 17, 2025

Uh oh!

shiyu1994 commented Jun 17, 2025

Uh oh!

StrikerRUS left a comment

Choose a reason for hiding this comment

Uh oh!

StrikerRUS commented Jun 22, 2025

Uh oh!

StrikerRUS commented Jul 14, 2025

Uh oh!

jameslamb commented Jul 14, 2025

Uh oh!

jameslamb commented Jul 18, 2025

Uh oh!

StrikerRUS commented Jul 21, 2025

Uh oh!

jeffdaily commented Jul 21, 2025

Uh oh!

Uh oh!

shiyu1994 commented Jul 24, 2025

Uh oh!

jameslamb commented Jul 26, 2025

Uh oh!

StrikerRUS commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shiyu1994 commented Jun 10, 2025 •

edited

Loading