Skip to content

[CI] Enable Java test in CI workflow #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: branch-25.06
Choose a base branch
from

Conversation

rhdong
Copy link
Member

@rhdong rhdong commented Apr 3, 2025

No description provided.

@rhdong rhdong requested review from a team as code owners April 3, 2025 22:04
@rhdong rhdong requested a review from jameslamb April 3, 2025 22:04
@github-actions github-actions bot added the ci label Apr 3, 2025
@rhdong rhdong added feature request New feature or request non-breaking Introduces a non-breaking change labels Apr 3, 2025
@rhdong rhdong requested a review from cjnolet April 3, 2025 22:04
@rhdong rhdong requested a review from raydouglass April 3, 2025 22:07
@jameslamb
Copy link
Member

@rhdong could you please put this PR into draft until you're ready for reviews? That'd reduce the notifications reviewers are getting, and help them understand when it's time to come review.

@rhdong
Copy link
Member Author

rhdong commented Apr 4, 2025

@rhdong could you please put this PR into draft until you're ready for reviews? That'd reduce the notifications reviewers are getting, and help them understand when it's time to come review.

Thanks for the reminder! I’ve marked the PR as draft now.

@rhdong rhdong closed this Apr 4, 2025
@rhdong rhdong reopened this Apr 4, 2025
@rhdong rhdong marked this pull request as draft April 4, 2025 15:02
Copy link

copy-pr-bot bot commented Apr 4, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rhdong
Copy link
Member Author

rhdong commented Apr 4, 2025

/ok to test

@rhdong
Copy link
Member Author

rhdong commented Apr 7, 2025

/ok to test

@rhdong
Copy link
Member Author

rhdong commented Apr 7, 2025

/ok to test

trap "EXITCODE=1" ERR
set +e

rapids-logger "Run Java build and tests"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually run tests requiring a GPU? If yes, can be changed so that only the PR & test workflows run tests?

If it doesn't run tests, then would be good update this line and also to switch the node type for the build workflow from gpu-l4-latest-1 to a CPU runner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will run tests.

@rhdong rhdong requested a review from raydouglass April 22, 2025 19:39
Copy link

copy-pr-bot bot commented Apr 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rhdong
Copy link
Member Author

rhdong commented Apr 22, 2025

/ok to test

Copy link

copy-pr-bot bot commented Apr 22, 2025

/ok to test

@rhdong, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@rhdong
Copy link
Member Author

rhdong commented Apr 22, 2025

/ok to test 79a477e

@narangvivek10
Copy link

narangvivek10 commented Apr 24, 2025

The reason for the Java build failure is that the script could not find jextract (needed for generating Panama bindings before Java build and tests). The CI servers need to have jextract preinstalled from https://jdk.java.net/jextract

@cjnolet @rhdong

Screenshot from 2025-04-24 12-59-56

@chatman
Copy link
Contributor

chatman commented Apr 24, 2025

@narangvivek10 @rhdong @cjnolet I've committed a fix [0] to download jextract automatically if not already installed. Reason for doing this is that jextract doesn't have a .deb or apt package for Ubuntu, and hence it the download of jextract needs to be scripted anyway.

[0] - 570fa2a in #831

@rhdong
Copy link
Member Author

rhdong commented Apr 24, 2025

/ok to test 04f0bba

@chatman
Copy link
Contributor

chatman commented Apr 24, 2025

@narangvivek10 The jextract process failed due to:

c_api.h:19:10: error: 'cuda_runtime.h' file not found
fatal: Unexpected exception org.openjdk.jextract.clang.TypeLayoutError: Invalid. segment: org.openjdk.jextract.clang.Type@cc813a2e, fieldName: n_probes occurred
Jextract encountered issues (returned value 5)
Bindings generation did not complete normally (returned value 5)
Forcing this build process to abort

Any ideas where the cuda_runtime.h will be found?

@chatman
Copy link
Contributor

chatman commented Apr 24, 2025

@rhdong We have attempted to find the CUDA_HOME diras:
CUDA_HOME=$(which nvcc | cut -d/ -f-4)

And then tried to add the $CUDA_HOME/include dir to the include paths. Any ideas if this was the problem and is there a better way?

@chatman
Copy link
Contributor

chatman commented Apr 24, 2025

Also, I see the following:

2025-04-24T16:03:41.9414077Z Forcing this build process to abort
2025-04-24T16:03:41.9513171Z 
2025-04-24T16:03:41.9516434Z �[32mRAPIDS logger�[0m » [04/24/25 16:03:41]
2025-04-24T16:03:41.9517878Z �[32m┌─────────────────────────────────────────────────────────────────────────────┐�[0m
2025-04-24T16:03:41.9519658Z �[32m|    Initial Java build & test failed. Retrying with 'mvn clean verify -X'    |�[0m
2025-04-24T16:03:41.9521203Z �[32m└─────────────────────────────────────────────────────────────────────────────┘�[0m
2025-04-24T16:03:41.9522013Z 

I think this retrying is not necessary, and not correct either, since here the failure was in a step even before Maven is invoked (failure is in the generate-bindings.sh file). Due to this retry, the logs are polluted with a lot of symbol not found issues via Maven, and it masks the original problem that the Panama bindings were not properly generated.

@chatman
Copy link
Contributor

chatman commented Apr 24, 2025

@rhdong I've made the following changes:

  • Debug printing of the CUDA_HOME variable, and the contents of the $CUDA_HOME/include
  • If there's no include inside CUDA_HOME, try CUDA_HOME to be /usr/local/cuda

https://github.com/rapidsai/cuvs/pull/831/files/570fa2a7a792b39cb70c4ff1232661481ba8ecaa..306229d29b0123bc7f6e72adca6e7d155047f528

I'm hoping it will make things work. Can you please merge that and retest here?

@rhdong
Copy link
Member Author

rhdong commented Apr 24, 2025

@rhdong We have attempted to find the CUDA_HOME diras: CUDA_HOME=$(which nvcc | cut -d/ -f-4)

And then tried to add the $CUDA_HOME/include dir to the include paths. Any ideas if this was the problem and is there a better way?

Hi @chatman @narangvivek10 , The docker image is rapidsai/ci-conda:latest, the cuda includes will be installed when creating the conda env test, as my local experiment, the cuda_runtime.h is in /opt/conda/envs/test/targets/x86_64-linux/include/cuda_runtime.h , the test env name is test. So I fixed it by the top commit, and the new error comes up:

jextract-22/bin/jextract.ps1
jextract downloaded to /cuvs/java/jextract-22
common.h:21:10: error: 'dlpack/dlpack.h' file not found
fatal: Unexpected exception org.openjdk.jextract.clang.TypeLayoutError: Invalid. segment: org.openjdk.jextract.clang.Type@1c99c732, fieldName: addr occurred
Jextract encountered issues (returned value 5)
Bindings generation did not complete normally (returned value 5)
Forcing this build process to abort

RAPIDS logger » [04/24/25 20:26:13]

@rhdong
Copy link
Member Author

rhdong commented Apr 24, 2025

Also, I see the following:

2025-04-24T16:03:41.9414077Z Forcing this build process to abort
2025-04-24T16:03:41.9513171Z 
2025-04-24T16:03:41.9516434Z �[32mRAPIDS logger�[0m » [04/24/25 16:03:41]
2025-04-24T16:03:41.9517878Z �[32m┌─────────────────────────────────────────────────────────────────────────────┐�[0m
2025-04-24T16:03:41.9519658Z �[32m|    Initial Java build & test failed. Retrying with 'mvn clean verify -X'    |�[0m
2025-04-24T16:03:41.9521203Z �[32m└─────────────────────────────────────────────────────────────────────────────┘�[0m
2025-04-24T16:03:41.9522013Z 

I think this retrying is not necessary, and not correct either, since here the failure was in a step even before Maven is invoked (failure is in the generate-bindings.sh file). Due to this retry, the logs are polluted with a lot of symbol not found issues via Maven, and it masks the original problem that the Panama bindings were not properly generated.

Yeah, I agree, the main goal of the retry is to debug the issue of HNSW(has been resolved). The retry only happens when test fails. We can remove it at last.

@rhdong
Copy link
Member Author

rhdong commented Apr 24, 2025

/ok to test cb5d1ba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci feature request New feature or request non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants