Skip to content

[SPARK-56630][INFRA] Surface javadoc crash culprit in unidoc failure output#55548

Draft
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:javadoc-error-reporting
Draft

[SPARK-56630][INFRA] Surface javadoc crash culprit in unidoc failure output#55548
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:javadoc-error-reporting

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds a diagnostic banner to the unidoc step in docs/_plugins/build_api_docs.rb. When build/sbt unidoc fails, the script now scans the captured sbt output and prints a framed summary naming the <Class>.html javadoc was generating when it died, the inferred source class to audit, and a one-paragraph hint about the usual scaladoc triggers.

Implementation:

  • stream_and_capture tees sbt output to both stdout and target/unidoc-build.log (Ruby-only, no shell pipefail reliance).
  • diagnose_unidoc_failure finds the last Generating .../<Class>.html... line before javadoc exited with exit code N and prints a culprit-pointer banner. ANSI colour codes are stripped before regex matching.
  • When the failure mode doesn't match the mid-HTML-crash pattern (e.g. scaladoc failure, sbt env issue), the banner says so and points back to the full log.

Why are the changes needed?

Today, when javadoc hard-exits during unidoc HTML generation -- typically because of a specific scaladoc construct (e.g. wiki-style [[Class]] links or backtick-inline code refs) in an exposed Scala source -- the failing PR's CI log shows ~100 [error] lines on target/java/... files. Those errors are benign: they're genjavadoc-emitted Java stubs (static public abstract R apply(T1, T2, T3, T4)) that every PR produces, and javadoc always complains about them but normally still finishes. They are not the cause of the failure.

The actual signal is the last Generating .../<Class>.html... line before javadoc exited with exit code 1, which a developer has to find by hand in a multi-thousand-line log. The error reporting does not differentiate the benign noise from the real crash, so the failure consistently looks like it's "in" ErrorInfo.java / LexicalThreadLocal.java / similar, when it's actually in a Scala source that none of those names point to.

A recent example: PR #51419 hit this exact misdirection -- the log was full of errors on common/utils/target/java/... stubs, but the real culprit was a doc comment in CatalogV2Implicits.IdentifierHelper that triggered a hard exit during HTML generation. The diagnostic in this PR would have named that class directly.

Does this PR introduce any user-facing change?

No. CI-only output change visible in the unidoc step of the doc-gen job.

How was this patch tested?

  • Dry-ran the parser logic against the captured failing log from PR [SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs #51419 -- it correctly extracts org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html as the crash class.
  • The second commit on this branch (DO NOT MERGE: break a docstring to validate the unidoc diagnostic) intentionally reintroduces the same [[...]]+backtick-inline scaladoc pattern in CatalogV2Implicits.IdentifierHelper.asTableIdentifierOpt so that this PR's CI run actually exercises the new path. Once the banner fires and names that class on the failing CI run, that commit will be dropped from this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

Unidoc currently fails with ~100 `[error]` lines on genjavadoc-generated
Java stubs under `target/java/...` (private Scala case-class `apply`
methods that produce invalid `static public abstract R apply(T1, ...)`
Java). These errors are benign -- every PR emits them -- but they
overshadow the real cause when javadoc hard-exits mid-stream on specific
doc-comment content. The actual crash signal is the last
`Generating .../<Class>.html...` line before
`javadoc exited with exit code 1`, which a developer has to hunt for by
hand in multi-thousand-line CI logs.

Tee sbt output to `target/unidoc-build.log` and, on failure, print a
framed banner with:
  - the HTML file javadoc was generating when it died,
  - the inferred source class to audit,
  - a one-paragraph hint about the usual scaladoc triggers
    (wiki-style `[[...]]` links, inline-backtick code refs),
  - an explicit note that the `[error]` lines on `target/java/...`
    stubs are not the cause.

Heuristic only; when the log doesn't match the mid-HTML-crash pattern
(e.g. scaladoc failure, sbt env issue) the banner says so and points
back to the full log above.

Co-authored-by: Isaac
Intentionally reintroduces the scaladoc pattern that hard-exited
javadoc on PR apache#51419 (wiki-style [[TableIdentifier]] /
[[toQualifiedNameParts]] refs plus backtick-inline `Seq[String]`)
in CatalogV2Implicits.IdentifierHelper. CI should fail at the unidoc
step and the new diagnostic banner should name this class as the
culprit. Drop this commit before merging.

Co-authored-by: Isaac
@cloud-fan
Copy link
Copy Markdown
Contributor Author

The bait commit on this branch (a4b30e83a3dDO NOT MERGE: break a docstring) successfully exercised the new diagnostic in CI on the latest run. Captured banner from the failing Run / Documentation generation job (log):

==============================================================================
Unidoc failed -- diagnostic summary
==============================================================================

  Javadoc crashed while generating: org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html
  Likely culprit: doc comment in org.apache.spark.sql.connector.catalog.CatalogV2Implicits.IdentifierHelper

  Javadoc can hard-exit (not just warn) on specific scaladoc
  patterns once they have been passed through genjavadoc --
  wiki-style `[[Class]]` / `[[method]]` links or inline-backticked
  code refs in the Scala source for the class above are common
  triggers. Start by auditing any recent doc-string changes in
  that source file.

  NOTE: the '[error]' lines above on files under
  target/java/... are benign genjavadoc stubs -- every PR
  emits them and they do not cause the exit. Ignore them.
==============================================================================

The banner correctly named CatalogV2Implicits.IdentifierHelper — the exact class the bait commit broke — instead of leaving the reader to scroll past ~100 benign [error] lines on target/java/... stubs. Reverting the bait commit next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant