Add OSS-Fuzz harnesses for HLO parser and proto deserialization#42055
Add OSS-Fuzz harnesses for HLO parser and proto deserialization#42055ricaskew wants to merge 1 commit into
Conversation
|
Companion OSS-Fuzz integration PR: google/oss-fuzz#15464 |
|
@GleasonK are you a good person to look at this? I am curious if we even want a fuzzer on HLO, since we mostly just care about the subset of HLO that JAX emits (AFAIK). |
|
Two thoughts on this. The text parser's fuzz value comes specifically from exploring inputs outside the JAX-emitted subset — malformed and edge-case inputs are where parser bugs tend to live, and PR #37766's recursion-depth limit is a recent example of that class. An in-repo harness also gives the team a regression signal for future grammar or recursion changes. Separately, the proto harness covers a distinct surface — |
|
Isn't the reason bugs still live there because they are never exercised by real users or real code? ;) |
There was a problem hiding this comment.
From OSS-Fuzz perspective (google/oss-fuzz#15464) we'd be happy to integrate this but am waiting for maintainers to be onboard.
Add OSS-Fuzz harnesses for HLO parser and proto deserialization
This PR adds two libFuzzer harnesses to xla/fuzz/ as part of an
OSS-Fuzz integration for openxla/xla. A companion PR in google/oss-fuzz
will be opened concurrently to wire the build and register the project.
The HLO text-format parser and proto deserialization path were selected
as the primary fuzzing surfaces because they accept arbitrary external
input, have significant parsing complexity, and currently have no
OSS-Fuzz coverage.
Harnesses
hlo_parser_fuzz — exercises
xla::ParseAndReturnUnverifiedModuleagainst arbitrary text input bytes via
absl::string_view. Targetsthe HLO text-format parser surface.
hlo_proto_fuzz — exercises
xla::HloModule::CreateFromProtoagainst arbitrary byte sequences. Stage 1 deserializes raw bytes into
HloModuleProtoviaParseFromArray; stage 2 converts the proto intoan
HloModule. Includes an explicit size guard against integer overflowon the
ParseFromArraysize argument (guardssize > INT_MAXbeforethe cast to
int, implemented viastd::numeric_limits<int>::max()from
<limits>).Build
Both harnesses build via Bazel under
//xla/fuzz/. The BUILD file usescc_binarywithfuzz_targetandnobuildertags following standardOSS-Fuzz practice. Dependencies are minimal and correctly scoped per
target.
The OSS-Fuzz build uses XLA's hermetic LLVM18 toolchain which
dynamic-links libc++. The companion oss-fuzz PR resolves packaging by
embedding
--linkopt=-Wl,-rpath,$ORIGINand copying the requiredlibc++ shared objects into
$OUT/at build time.Testing
Both binaries were smoke-tested inside the OSS-Fuzz Docker base image:
hlo_parser_fuzz: 100 runs, zero crashes, 2,371 coverage PCs withgrowth into
xla::HloLexerand related parser code pathshlo_proto_fuzz: 100 runs, zero crashes, 819 coverage PCs withgrowth into
xla::HloModuleConfigand protobufTcParserfamily