Skip to content

Refine triaging logic, improve web crash details and add better "TP/FP" indicator #656

Open
@DavidKorczynski

Description

@DavidKorczynski

There are some weird attributes in the "Bug" column for benchmark results, which makes it a bit difficult to understand the meaning of it. I also think the use of Bug is a bit too vague in this context. The "Bug" column is supposed to indicate whether the bug is in the driver code or the project code.

Consider the following result: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-09-29-655-d-cov-103-all/benchmark/output-htslib-sam_index_build2/index.html

The two results that crash have:

  • Triaging --> Driver for both
  • Diagnosis --> Semantics vs non-semantic
  • Crashes --> True for both

Both are clearly false positives, but "Bug" is set to opposites between them.

Bug, in the report, is defined:

<td style="color: {{ 'red' if sample.result.crashes and not sample.result.is_semantic_error else 'black' }}">{{ sample.result.crashes and not sample.result.is_semantic_error }}</td>

i.e. sample.result.crashes and not sample.result.is_semantic_error

So if there is no semantic error and the issue crashes, bug becomes True and colored red.

Based on the definition in template, I think True means the bug is considered a valid bug. The color coding makes me a bit confusing though -- my intuition would be to have it green if it was a true positive.

I think we should do a couple of improvements here:

  1. Rename Bug to be a bit more descriptive
  2. Add the classification logic to the core rather than in the web app itself (i.e. include sample.result.crashes and not sample.result.is_semantic_error in the core)
  3. Add the LLM-based triage verdict into the conclusion of the a bug is a TP or FP
  4. Include all semantic validations in the crash triaging logic, and show them all in the UI. The logic starting here:
    symptom = SemanticCheckResult.extract_symptom(fuzzlog)
    crash_stacks = self._parse_stacks_from_libfuzzer_logs(lines)
    crash_func = self._parse_func_from_stacks(project_name, crash_stacks)
    crash_info = SemanticCheckResult.extract_crash_info(fuzzlog)
    only includes a single semantic validation. However, the semantic checks are not mutually exclusive, and in many cases it would be good to know them all, e.g. "the crash is a NULL-deref (
    if symptom == 'null-deref':
    ) and happens in the first iterations of the running (
    if lastround is None or lastround <= EARLY_FUZZING_ROUND_THRESHOLD:
    ) and in a trace close to the harness (
    if len(crash_stacks) > 0:
    )
  5. based on the above improvements come up with a new definition of "True Positive vs False Positive", e.g. based on a more fine-grained scoring system

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions