Skip to content

Conversation

jdmpapin
Copy link
Contributor

@jdmpapin jdmpapin commented Oct 8, 2025

...within each compilation.

There will be at least two data structures that are sensitive to the set of permanent loaders. These could potentially allow the precise set of permanent loaders to vary between the client and server, but the reasoning needed to determine whether certain kinds of variation are allowed gets hairy very quickly. It's much simpler if there is only one set of permanent loaders that will be used for all purposes on both the client and the server.

This change also simplifies reasoning about which permanent loaders the server already has. The client no longer attempts to keep track of that at all. Instead, that information is purely in the purview of the server, where it's easy to maintain consistency.


Additionally:

  • Use comp->permanentLoaders() for client retained methods analysis

@jdmpapin jdmpapin requested a review from dsouzai as a code owner October 8, 2025 22:51
@jdmpapin jdmpapin added the comp:jitserver Artifacts related to JIT-as-a-Service project label Oct 8, 2025
@jdmpapin jdmpapin requested review from mpirvu and removed request for dsouzai October 8, 2025 22:52
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Oct 8, 2025

@mpirvu, could you please review?

@mpirvu mpirvu self-assigned this Oct 9, 2025
...within each compilation.

There will be at least two data structures that are sensitive to the
set of permanent loaders. These could potentially allow the precise set
of permanent loaders to vary between the client and server, but the
reasoning needed to determine whether certain kinds of variation are
allowed gets hairy very quickly. It's much simpler if there is only one
set of permanent loaders that will be used for all purposes on both the
client and the server.

This change also simplifies reasoning about which permanent loaders the
server already has. The client no longer attempts to keep track of that
at all. Instead, that information is purely in the purview of the
server, where it's easy to maintain consistency.
This was avoiding getting the permanent loaders from the compilation
object to guard against a scenario where the server side of the
compilation was aware of more permanent loaders than the client side.

That scenario is no longer possible because a compilation on the server
now limits itself to the same set of permanent loaders that the
corresponding compilation is aware of on the client. It's no longer
necessary for J9::RetainedMethodSet to allow the caller to specify the
permanent loaders on creation.
@jdmpapin jdmpapin force-pushed the jitserver-permanent-loaders-consistency branch from 48e2439 to 03efb95 Compare October 9, 2025 14:34
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Oct 9, 2025

Pushed a small correction to a comment

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpirvu
Copy link
Contributor

mpirvu commented Oct 14, 2025

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk21

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Oct 15, 2025

Checks have finished. Rundown of the problems...

On all platforms: Missing #VECTOR API output and timeouts in vector tests (#20995 (comment)). Since posting that, I've been seeing the same problems in the vector API tests every time I look at a sanity.openjdk job for JDK21 with JITServer.

On AArch64: java/lang/Thread/virtual/Collectable.java: timeout: #18463

On x86-64: java/util/concurrent/ConcurrentHashMap/ConcurrentAssociateTest.java: timeout: #14538 (comment)

On AArch64: java/lang/Math/HypotTests.java: NPE from com.sun.tools.javac.code.Flags.asFlagSet: #22771

There were a bunch of other instances of NPE from within javac code on x86-64. They don't have the same stack trace or detail message, but I'll be pretty surprised if there are multiple separate reasons why NPE has started popping up in javac all of a sudden. #22771 was opened today.

A bit more concerning...

x86-64: java/util/stream/StreamBuilderTest.java: multiple occurrences each of IllegalStateException: called wrong accept method and ClassCastException: java.lang.Integer incompatible with java.lang.Double. This doesn't seem to be a known issue (even a past one).

That said, I struggle to see any exception as a realistic failure mode for this PR. The change in the set of permanent loaders should affect maybe about one compilation per process, where it should only influence the set of bond assumptions we might create, unless somehow dontInlineUnloadableMethods is set, which would mean that we're recompiling a JIT body that was invalidated due to unloading even before bootstrapping has finished. And even if dontInlineUnloadableMethods is set, the change in permanent loaders would only influence inlining within that compilation.

Most concerning...

PPC64LE: There were a number of crashes in handleServerMessage(). Most of them had this stack:

_ZN11TR_J9VMBase10getMethodsEP19TR_OpaqueClassBlock.localalias+0x34
_ZN16JITServerHelpers22packRemoteROMClassInfoB5cxx11EP7J9ClassP10J9VMThreadP9TR_Memoryb+0xac
_ZL19handleServerMessagePN9JITServer12ClientStreamEP7TR_J9VMRNS_11MessageTypeE+0xbe84

but one had this stack instead:

instanceOfOrCheckCastNoCacheUpdate+0xc (0x000073898FCA448C [libj9jit29.so+0xcf448c])
_ZN11TR_J9VMBase34instanceOfOrCheckCastNoCacheUpdateEP7J9ClassS1_+0x20
_ZL19handleServerMessagePN9JITServer12ClientStreamEP7TR_J9VMRNS_11MessageTypeE+0x1bbc

For the moment I'll take a closer look at the crashes. Please let me know what you think about the other failures.

@mpirvu
Copy link
Contributor

mpirvu commented Oct 15, 2025

I would prefer to have eclipse-omr/omr#7983 and #22768 delivered first, then a rebase and another round of tests.

@jdmpapin
Copy link
Contributor Author

I did look at that, but based on the PR description I thought that the relocations would only cause compilation failures. But I could be wrong. And I should be moving eclipse-omr/omr#7983 along regardless. So that plan sounds fine to me.

@mpirvu
Copy link
Contributor

mpirvu commented Oct 15, 2025

I have verified that the fix in eclipse-omr/omr#7983 clears a functional jitserver test with Java11 that started when #22530 was delivered. Since changes in 22530 are not JDK version specific, it's possible that all JDKs are affected in various ways. I would rather eliminate 22530 as the source of potential failures, by merging eclipse-omr/omr#7983 first.

@mpirvu
Copy link
Contributor

mpirvu commented Oct 16, 2025

jenkins sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk21 depends eclipse/omr#master

1 similar comment
@mpirvu
Copy link
Contributor

mpirvu commented Oct 16, 2025

jenkins sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk21 depends eclipse/omr#master

@jdmpapin
Copy link
Contributor Author

Jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk21 depends eclipse/omr#master

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Oct 16, 2025

The sanity.openjdk jobs that have finished so far (AArch64, PPC64LE, x86-64) all hit #20995 as expected

The only other problem so far is that on AArch64, testList_2 failed due to an infra problem. It had already passed the same six tests that passed in testList_2 on each of PPC64LE and x86-64. I checked, and the partition into test lists is consistent across platforms, so the AArch64 testList_2 passed all of the tests that we can expect it to pass given #20995. There were just a few vector API tests left to run.

Still waiting for sanity.openjdk on z, and sanity.functional on z and x86-64

(edited to correct the referenced issue number)

@mpirvu
Copy link
Contributor

mpirvu commented Oct 17, 2025

The failures on sanity.functional on xlinux are all JFR and not related to changes from this PR.
E.g.:

13:40:11  Running command: "/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/jdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer  -XX:StartFlightRecording -Dibm.java9.forceCommonCleanerShutdown=true -Xint -Xcheck:memory --add-exports java.base/com.ibm.oti.vm=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jfr/jfr.jar org.openj9.test.WorkLoad 10 100 10
13:40:11  Time spent starting: 63 milliseconds
14:30:21  ***[TEST INFO 2025/10/16 18:30:06] ProcessKiller detected a timeout after 3000000 milliseconds!***
...............
14:30:28  Testing: test jfr system process - approx 30 seconds
14:30:28  Test start time: 2025/10/16 18:30:28 Coordinated Universal Time
14:30:28  Running command: /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_x86-64_linux_jit_Personal_testList_0/jdkbinary/j2sdk-image/bin/jfr print --xml --events "SystemProcess" defaultJ9recording.jfr
14:30:28  Time spent starting: 3 milliseconds
14:30:29  Time spent executing: 546 milliseconds
14:30:29  Test result: FAILED
14:30:29  Output from test:
14:30:29   [OUT] <?xml version="1.0" encoding="UTF-8"?>
14:30:29   [OUT] <recording xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
14:30:29   [OUT]   <events>
14:30:29   [OUT]   </events>
14:30:29   [OUT] </recording>
14:30:29  >> Required condition was found: [Output match: http://www.w3.org/2001/XMLSchema-instance]
14:30:29  >> Success condition was not found: [Output match: jdk.SystemProcess]
14:30:29  >> Required condition was not found: [Output match: pid]
14:30:29  >> Required condition was not found: [Output match: commandLine]
14:30:29  >> Failure condition was not found: [Output match: jfr print: could not read recording]

The failures from sanity.openjdk on zlinux are vector API which currently does not work with JITServer.

All in all, this PR has not introduced any new failures, so it's good to be merged.

@mpirvu mpirvu merged commit 4e352c0 into eclipse-openj9:master Oct 17, 2025
9 of 15 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in JIT as a Service Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:jit comp:jitserver Artifacts related to JIT-as-a-Service project

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants