Skip to content

Migrate NodeHealthAPI to JAX-RS, flip delegation so V2 owns logic, add replication lag monitoring docs#22

Draft
Copilot wants to merge 44 commits intomainfrom
copilot/migrate-node-health-api
Draft

Migrate NodeHealthAPI to JAX-RS, flip delegation so V2 owns logic, add replication lag monitoring docs#22
Copilot wants to merge 44 commits intomainfrom
copilot/migrate-node-health-api

Conversation

Copy link

Copilot AI commented Feb 22, 2026

Migrates org.apache.solr.handler.admin.api.NodeHealthAPI from the homegrown @EndPoint annotation to standard JAX-RS annotations, following the established V2 API pattern. Includes a full inversion of the delegation model, enum serialization fix, and new reference-guide coverage of maxGenerationLag monitoring.

NodeHealthAPI → JAX-RS

  • NodeHealthApi interface added to solr/api with @Path("/node/health"), @GET, @Operation (tag: node)
  • NodeHealthResponse model added with status (NodeStatus enum), message, num_cores_unhealthy fields
  • NodeHealth implements NodeHealthApi via JerseyResource; registered through HealthCheckHandler.getJerseyResources()
  • OpenAPI generation produces NodeApi.Healthcheck SolrJ request class

Delegation inverted: V2 owns logic, V1 bridges

Previously NodeHealthAPI delegated to HealthCheckHandler. Now reversed:

  • NodeHealth owns all business logic: cloud-mode check (ZK liveness, live-nodes, requireHealthyCores), legacy-mode replication-lag check (isWithinGenerationLag, findUnhealthyCores), UNHEALTHY_STATES — all using NodeHealthResponse/NodeStatus throughout, no NamedList in business logic
  • HealthCheckHandler is now a thin V1 bridge: extracts params, calls new NodeHealth(coreContainer).healthcheck(requireHealthyCores, maxGenerationLag), squashes result via V2ApiUtils
  • findUnhealthyCores moved to NodeHealth as a public static utility; HealthCheckHandler retains a @Deprecated delegation shim

Enum serialization fix (Utils.getReflectWriter)

NodeStatus.OK was serialized as "org.apache.…NodeStatus:OK" through the NamedList/javabin path because enums have no @JsonProperty fields and fell through to the string-representation fallback. Added an early-exit for Enum instances that returns ((Enum<?>) o).name() so V1 consumers (e.g., HealthCheckHandlerTest comparing against "OK") continue to work.

Bug fix in isWithinGenerationLag

Pre-existing logic error: the condition generationDiff < maxGenerationLag was inverted — healthy cores were flagged as lagging. Corrected to > maxGenerationLag; return values adjusted to true = within acceptable lag, false = lagging too far. Also fixed missing slf4j format arguments in the negative-diff warn call.

Tests

  • NodeHealthAPITest — mock-based unit tests for cloud and legacy paths
  • NodeHealthAPITest2 — integration tests using real CoreContainer (no mocks)
  • HealthCheckHandlerTest updated to call NodeHealth.findUnhealthyCores() directly

Reference guide

user-managed-index-replication.adoc — new == Monitoring Follower Replication Lag section:

  • Explains maxGenerationLag semantics and omission behavior
  • V1 + V2 API examples with tabbed layout
  • Success/failure JSON response examples
  • Warning against maxGenerationLag=0 in production

implicit-requesthandlers.adoc — Health handler entry:

  • Removed inaccurate "available only in SolrCloud mode" qualifier
  • Added inline descriptions of requireHealthyCores (cloud) and maxGenerationLag (legacy) with cross-reference to new monitoring section

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
Copilot AI changed the title [WIP] Migrate NodeHealthAPI to JAX-RS annotations Migrate NodeHealthAPI from homegrown @EndPoint to JAX-RS annotations Feb 22, 2026
Copilot AI requested a review from epugh February 22, 2026 13:26
Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
Copilot AI changed the title Migrate NodeHealthAPI from homegrown @EndPoint to JAX-RS annotations Migrate NodeHealthAPI from homegrown @EndPoint to JAX-RS; add mock-free integration tests Feb 22, 2026
…ef guide

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
Copilot AI changed the title Migrate NodeHealthAPI from homegrown @EndPoint to JAX-RS; add mock-free integration tests Migrate NodeHealthAPI from homegrown @EndPoint to JAX-RS; add ref guide link Feb 22, 2026
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 28, 2026
@epugh
Copy link
Owner

epugh commented Feb 28, 2026

Superseded by apache#4171 — retargeted to the upstream repo.

@epugh epugh closed this Feb 28, 2026
@github-actions github-actions bot removed the docker label Mar 11, 2026
epugh and others added 11 commits March 12, 2026 15:37
I used claude for this regressoin test and I don't love how verbose they are.  I tried a mock approach first and it was worse.
…he#4170)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* startSolr don't specify temp dir
* newCollection don't specify collection1
* getSolrClient don't specify collection1
* withConfigSet use Path if possible

org.apache.solr.SolrTestCaseJ4.getFile should return an absolute file to reduce ambiguity
…wn (apache#4220)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…pache#4196)

Renames bench/MiniClusterState.java to bench/SolrBenchState.java, and flattens its structure, which had an inner class. Two lifecycle methods containing "miniCluster" in the name were replaced with "solr" to be generic, and I improved javadocs slightly.

This is a preparatory refactoring step on a short journey to solr/benchmark supporting multiple backends (not just MiniSolrCloudCluster). 

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rted (apache#4224)

This is mostly for tests. It makes sure a replica cannot be elected leader for a very short time while all nodes are shutting down.
epugh and others added 13 commits March 18, 2026 09:24
- NodeHealthApi: add @QueryParam("maxGenerationLag") Integer maxGenerationLag
  with @parameter description to healthcheck()
- NodeHealth: update healthcheck() to accept and forward maxGenerationLag;
  remove now-redundant checkNodeHealth() bridge method
- HealthCheckHandler: call healthcheck() directly (no more checkNodeHealth())
- NodeApi (generated SolrJ): regenerated - Healthcheck gains setMaxGenerationLag()
  setter and includes the param in getParams()
- NodeHealthStandaloneTest: remove FIXME; test negative-maxGenerationLag via
  the real V2 HTTP path using NodeApi.Healthcheck.setMaxGenerationLag(-1)

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
- Add new "Monitoring Follower Replication Lag" section to
  user-managed-index-replication.adoc with V1+V2 API examples,
  example responses (success and failure), and a warning about
  using maxGenerationLag=0 in production.
- Update implicit-requesthandlers.adoc Health entry: remove the
  inaccurate "available only in SolrCloud mode" qualifier; add a
  concise description of both SolrCloud params (requireHealthyCores)
  and legacy-mode params (maxGenerationLag) with a cross-reference
  to the new monitoring section.

Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants