Skip to content

Conversation

@kdnilsen
Copy link
Contributor

@kdnilsen kdnilsen commented Dec 22, 2025

This is a proof of concept implementation that passed tests long ago. It has not been updated. I am creating this DRAFT pull request to allow me to study the changes that had been implemented at the time. I plan to divide the contributions of this branch into three distinct pull requets:

  1. phase-accounting
  2. adaptive-evac-ratio
  3. worker-surge

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Warnings

 ⚠️ Patch contains a binary file (src/java.base/share/classes/jdk/internal/icu/impl/data/icudt76b/nfc.nrm)
 ⚠️ Patch contains a binary file (src/java.base/share/classes/jdk/internal/icu/impl/data/icudt76b/nfkc.nrm)
 ⚠️ Patch contains a binary file (src/java.base/share/classes/jdk/internal/icu/impl/data/icudt76b/ubidi.icu)
 ⚠️ Patch contains a binary file (src/java.base/share/classes/jdk/internal/icu/impl/data/icudt76b/uprops.icu)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/BorderLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/FlowLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridBagLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridBagLayout-2.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridLayout-1.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/java/awt/doc-files/GridLayout-2.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/javax/swing/beaninfo/images/JAppletColor16.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/javax/swing/beaninfo/images/JAppletColor32.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/javax/swing/beaninfo/images/JAppletMono16.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/javax/swing/beaninfo/images/JAppletMono32.gif)
 ⚠️ Patch contains a binary file (src/java.desktop/share/classes/javax/swing/doc-files/JRootPane-1.gif)
 ⚠️ Patch contains a binary file (test/hotspot/jtreg/runtime/ClassFile/JsrRewritingTestCase.jar)
 ⚠️ Patch contains a binary file (test/hotspot/jtreg/runtime/ClassFile/testcase.jar)
 ⚠️ Patch contains a binary file (test/jdk/javax/net/ssl/HttpsURLConnection/crisubn.jks)
 ⚠️ Patch contains a binary file (test/jdk/javax/net/ssl/HttpsURLConnection/trusted.jks)
 ⚠️ Patch contains a binary file (test/jdk/javax/rmi/ssl/keystore)
 ⚠️ Patch contains a binary file (test/jdk/javax/rmi/ssl/truststore)
 ⚠️ Patch contains a binary file (test/jdk/jdk/internal/loader/URLClassPath/testclasses.jar)
 ⚠️ Patch contains a binary file (test/jdk/sun/net/www/protocol/https/HttpsClient/dnsstore)
 ⚠️ Patch contains a binary file (test/jdk/sun/net/www/protocol/https/HttpsClient/ipstore)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28955/head:pull/28955
$ git checkout pull/28955

Update a local copy of the PR:
$ git checkout pull/28955
$ git pull https://git.openjdk.org/jdk.git pull/28955/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28955

View PR using the GUI difftool:
$ git pr show -t 28955

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28955.diff

This enables mixed evacuations to be more productive more quickly.
Before this commit, release build made reference to a variable that
is only present in debug builds.
At init update refs, we transfer remaining memory from Collector and
OldCollector partitions in order to allow more mutator allocations
to succeed during update-refs.  A conditional transfer is not reliable
because soft_max_capacity() constraints may cause the transfer to fail.
At the end of each GC cycle, recompute the live memory in each
mixed-evac candidate region.
Code compiles but lots of jtreg failures...
Now passes hotspot jtreg tests
With the ability to surge old gc workers, we can reduce the default
minimum time slide.  Under maximum surge, it will be as if we are
able to complete 150 ms of work within this 50 ms time slice.
Highlights:
1. Exclude surged phase metrics from history-based predictions of future
   phase times.
2. Attenuate surge levels.  Be more willing to surge in small ways, less
   willing to surge in big ways.
3. Decrease minimum time slice for old to 80 ms from 100 ms.  But we
   allow surging of old within this 80 ms, unlike when 100 ms time slice
   was established as default.
4. Do not predict_gc_time() in should_surge_phase() unless the phase is
   mark.
Previous configuration appears to be stable.  But we're still observing
suboptimal behavior.  In particular, on the Extremem medium workload, we
see three back-to-back old collections, where each is delayed until the
previous old has finished its mixed evacuations and the
coalesce-and-fill efforts that follow it.  Independently, a study of
specjbb 2015 suggests that G1 and ZGC are both much more relaxed about
how much they allow Old-gen to expand without any effort to collect it.
And this works well for their specjbb scores.  So we want to be a bit
more lazy about old collections.

  ShenandoahIgnoreOldGrowthBelowPercentage defaults to 40 (not 10)
  ShenandoahDoNotIgnoreGrowthAfterYoungCycles defaults to 500 (not 50)
  ShenandoahOldGrowthPercent defaults to 25 (not 12.5)

On the same Extremem medium workload, we observe that when we are trying
"desperately" to make progress on both old-gen and young-gen collections
concurrently, the old is being starved and not making quick enough
progress because it only gets 80 ms time slices between young cycles and
it is not surging its own efforts.  So I want to allow old to surge more
quickly (not attenuate its surges), and I think it best to allow young
to run more quickly when it is under duress (by not attenuating its
surges).  The intention is that if young can run more quickly, there
will be an increase of idle time between young cycles, during which old
can get more than its minimal time slice.

The hope is that the combination of allowing the important and urgent
old collection to start earlier (because it doesn't have to wait for
a previous unnecessary and redundant old collection to finish) and
more aggressive surging of workers once we start old will allow old
to finish in a more timely and effective manner.
Be less eager to promote by shared allocation.  Note that failure to
promote is "not" harmless because we have not reserved for evacuation of
these objects in young reserve.  But it is slightly easier (usually) to
redirect an evacuation to an existing mutator region than to flip an
empty mutator region.
This configuration has demonstrated good results on the Extremem medium
workload for one run.  Will be testing more runs momentarily.

1. Use non-conservative (without adjustment by stdev) gc time prediction
   for OldEvacRatio computations.  Otherwise, the predictions are so
   conservative as to "scare" the heuristic away from attempting mixed
   evacuations.  We still use conservative calculations for triggering
   GC and for deciding whether to surge.
2. Calculate live at last old mark using SATB protocol for live at start
   of last mark so that we exclude all the floating garbage created
   during concurrent old marking.
3. Use adaptive old-gc time slice, starting with 150 ms and increasing
   by 50 ms after every 4 old-gen mark increments up to 10% of young
   cycle time..
4. Feed anticipated promotion reserve into calculations of OldEvacRatio.
5. Add a new parameter ShenandoanMinimumOldGrowthHeapPercent, default
   value 5.  This allows greater old-gen triggering sensitivity when
   old-gen is very large, without requiring excessive old triggers
   during initialization (while old-gen may be small).
kdnilsen added 10 commits May 5, 2025 15:29
A behavior manifest in this development branch is that there is greater
variation in collection set evacuation workloads.  When we happen to
have very large evacuation workloads, more of the evacuation effort is
handled by mutator threads, causing the available reserves to be
scattered between a larger number of GCLABs and PLABs.  The end results
is that we are much more likely to experience OOM during evac.

Plan to restore original default values when we implement a new
OOM-during-evac protocol that does not require upgrade to Full GC.
@kdnilsen kdnilsen marked this pull request as draft December 22, 2025 14:47
#include "gc/shenandoah/shenandoahSharedVariables.hpp"
#include "memory/allocation.hpp"
#include "utilities/numberSeq.hpp"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes belong to phase-accounting

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 22, 2025

👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

virtual bool is_diagnostic() { return false; }
virtual bool is_experimental() { return false; }

virtual void record_phase_duration(ShenandoahMajorGCPhase p, double x, double duration) override;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes belong to phase-accounting

@openjdk
Copy link

openjdk bot commented Dec 22, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 22, 2025

@kdnilsen this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout adaptive-evac-with-surge
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch hotspot-gc [email protected] shenandoah [email protected] labels Dec 22, 2025
@openjdk
Copy link

openjdk bot commented Dec 22, 2025

@kdnilsen The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

virtual void choose_collection_set_from_regiondata(ShenandoahCollectionSet* cset,
RegionData* data, size_t size,
size_t actual_free);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is declared by shenandoahHeuristics, overrides in shenandoahAdaptiveHeuristics. ShenandoahAdaptiveHeuristics uses this to set the start time for the mark phase.

Add override declaration.

This change belongs to phase-accounting

return _phase_stats[ShenandoahMajorGCPhase::_final_roots].predict_at_without_stdev((double) anticipated_pip_words);
}

uint ShenandoahYoungHeuristics::should_surge_phase(ShenandoahMajorGCPhase phase, double now) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs in worker-surge
worker-surge depends on phase-accounting

return MIN3(evac_slack_spiking, evac_slack_avg, evac_min_threshold);
}

void ShenandoahYoungHeuristics::adjust_old_evac_ratio(size_t old_cset_regions, size_t young_cset_regions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code belongs in adjust-old-evac-ratio


ShenandoahYoungHeuristics::ShenandoahYoungHeuristics(ShenandoahYoungGeneration* generation)
: ShenandoahGenerationalHeuristics(generation) {
: ShenandoahGenerationalHeuristics(generation),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new initializers belong to phase-accounting

size_t cur_young_garbage = add_preselected_regions_to_collection_set(cset, data, size);

choose_young_collection_set(cset, data, size, actual_free, cur_young_garbage);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most of this code can be discarded. Need to carefully read through the changes.

I believe I want to preserve only the following, as part of phase-accounting:

  • Rather than expediting mixed evacuations, use the phase-accounting prediction of GC time to trigger mixed evacuation at the appropriate time.
  • Enhance this subsequently, when we have adaptive-old-evac-ratio

// Checks that an old cycle has run for at least ShenandoahMinimumOldTimeMs before allowing a young cycle.
if (ShenandoahMinimumOldTimeMs > 0) {
#undef KELVIN_START_YOUNG
#ifdef KELVIN_START_YOUNG
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some code here that relates to adjustable minimum time for old marking. Make this part of the worker-surge. (Increasing the time slice for old may depend on being able to surge the workers during the "short-changed young" time slice.

if (ShenandoahAdaptiveHeuristics::should_start_gc()) {
return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of this code. Test and probably delete. (There have been lots of improvements to old triggering heuristics since this proof of concept was last demonstrated.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc [email protected] merge-conflict Pull request has merge conflict with target branch shenandoah [email protected]

Development

Successfully merging this pull request may close these issues.

1 participant