Skip to content

docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104

Draft
anfredette wants to merge 3 commits intoredhat-et:mainfrom
anfredette:config-exlorer-eval
Draft

docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104
anfredette wants to merge 3 commits intoredhat-et:mainfrom
anfredette:config-exlorer-eval

Conversation

@anfredette
Copy link
Member

@anfredette anfredette commented Mar 3, 2026

Docs Include

  • Comparative technical evaluation of config_explorer and NeuralNav
  • Integration proposal analyzing complementary capabilities and synergy opportunities
  • Draft GitHub issue for proposing collaboration to SIG Benchmarking

Context

These documents support an upcoming proposal to the llm-d SIG Benchmarking
community to contribute NeuralNav and explore integration with Config Explorer.

NOTES:

  • This PR is created for discussion purposes and may or may not get merged into the repo.
  • CONFIG_EXPLORER_EVALUATION.md was produced almost entirely by Claude Code for my own info. I didn't spend any time trying to make it suitable for public consumption, and it will likely not be merged.

Summary by CodeRabbit

  • Documentation
    • Added integration proposal documentation detailing project combination strategy.
    • Added comparative technical evaluation documentation with integration models and feasibility assessment.
    • Added benchmarking ecosystem contribution proposal with collaboration structure.

Structured technical evaluation covering functional comparison,
architectural analysis, overlap/complementarity assessment, and
integration feasibility between llm-d-benchmark's config_explorer
and NeuralNav. Proposes offline synthetic benchmark generation as
the recommended integration path.

Assisted-by: Claude <noreply@anthropic.com>
Signed-off-by: Andre Fredette <afredette@redhat.com>
Makes the case for merging both projects into a single upstream
tool in the llm-d organization. Covers strategic rationale,
unified architecture, phased integration roadmap, and risk
analysis with mitigations.

Signed-off-by: Andre Fredette <afredette@redhat.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef4b740d-bb05-43c7-b497-d4464e9d35fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Three new documentation files propose and evaluate the integration of config_explorer and NeuralNav projects. The documents include architectural comparisons, integration strategies, feasibility assessments, and governance models for combining the two tools into a unified upstream project or ecosystem collaboration.

Changes

Cohort / File(s) Summary
Integration Proposal & Evaluation Documentation
docs/CE_NN_INTEGRATION_PROPOSAL.md, docs/CONFIG_EXPLORER_EVALUATION.md, docs/SIG_BENCHMARKING_ISSUE.md
Three new proposal documents detailing the integration of config_explorer and NeuralNav. Includes technical evaluation comparing functional capabilities, data models, and APIs; integration rationale covering guided workflows, evaluation expansion, and ranking; feasibility assessment with multiple integration models (offline generation, loose coupling, full merge); risk analysis; and governance proposals for SIG Benchmarking collaboration. No code or API changes introduced.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main changes: adding documentation files that present a collaborative analysis between Config Explorer and NeuralNav, plus a proposal to the SIG Benchmarking community.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/CE_NN_INTEGRATION_PROPOSAL.md (1)

36-53: Make comparative capability claims auditable.

The capability matrix uses definitive “None/Shared” overlap labels, but doesn’t point to evidence boundaries (code refs, benchmark artifacts, or date/commit scope). For a governance-facing doc, this makes technical conclusions hard to validate and maintain over time.

A compact “Evidence & scope” note under the table (sources + snapshot date) would make the claims reviewable.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` around lines 36 - 53, Add an "Evidence &
scope" note immediately under the capability matrix that lists the data sources
and snapshot scope used to label overlaps (e.g., code references, benchmark
artifact names, and commit hashes or dates) and a short statement of the review
cutoff date; explicitly map ambiguous labels ("None"/"Shared") to the evidence
(for example: "None — verified by absence of features in config_explorer repo at
commit abc123 and no benchmarking artifacts; Shared — both projects reference
GPU cost lookup data source X dated 2025-01-15"). Ensure the note is compact
(1–3 lines) and includes at least one pointer to code/benchmark artifacts and
the snapshot date so the claims become auditable and maintainable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Line 1: The proposal currently presents a unified upstream project but
conflicts with the alternative independent-collaboration approach in
docs/SIG_BENCHMARKING_ISSUE.md; add a short "Positioning" section at the top of
CE_NN_INTEGRATION_PROPOSAL.md (near the title) that explicitly states whether
the document advocates (1) the preferred direction of a single unified upstream
project or (2) an alternative option to pursue independent collaboration between
config_explorer and NeuralNav, and briefly summarize the rationale and
implications for SIG review; also update the corresponding paragraphs around
lines referenced (15-16 and 61-67) to reflect this positioning so the narrative
is consistent with SIG_BENCHMARKING_ISSUE.md.

In `@docs/CONFIG_EXPLORER_EVALUATION.md`:
- Around line 331-334: Tighten the POC acceptance thresholds in the "Accuracy"
criterion: change the aggregate pass-rate requirement from ">60% of tested
combos within 30%" to a stricter rule such as "≥80% of tested (model, GPU,
traffic_profile) combos within ±20%" and add per-traffic_profile minimums (e.g.,
each traffic_profile must meet ≥75% pass rate). For p95-sensitive metrics
(TTFT_p95, ITL_p95, E2E_p95) tighten the allowable error band to ±15% and
require these metrics individually to pass the threshold (not just aggregate).
Remove or narrow the caveat allowing 30–50% error—replace with a hard fail
threshold (e.g., any metric >25% error flagged as fail) and require documented
acceptance justification for any exceptions. Ensure these changes apply
alongside the "Coverage expansion" and "End-to-end flow" checks so synthetic
benchmarks only proceed when both aggregate and per-profile/p95 conditions are
met.

---

Nitpick comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Around line 36-53: Add an "Evidence & scope" note immediately under the
capability matrix that lists the data sources and snapshot scope used to label
overlaps (e.g., code references, benchmark artifact names, and commit hashes or
dates) and a short statement of the review cutoff date; explicitly map ambiguous
labels ("None"/"Shared") to the evidence (for example: "None — verified by
absence of features in config_explorer repo at commit abc123 and no benchmarking
artifacts; Shared — both projects reference GPU cost lookup data source X dated
2025-01-15"). Ensure the note is compact (1–3 lines) and includes at least one
pointer to code/benchmark artifacts and the snapshot date so the claims become
auditable and maintainable.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ad7c395 and 1bfb3bb.

📒 Files selected for processing (3)
  • docs/CE_NN_INTEGRATION_PROPOSAL.md
  • docs/CONFIG_EXPLORER_EVALUATION.md
  • docs/SIG_BENCHMARKING_ISSUE.md

@@ -0,0 +1,103 @@
# Proposal: Unifying config_explorer and NeuralNav into a Single Upstream Project
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align this document’s strategy with the SIG proposal narrative.

This file positions a single unified upstream project, while docs/SIG_BENCHMARKING_ISSUE.md explicitly proposes collaboration with both tools remaining independent. That strategic mismatch will confuse stakeholders during SIG review and weakens the proposal package coherence.

Consider adding a short “Positioning” section near Line 1 that clearly states whether this is:

  1. the preferred direction, or
  2. an alternative to the independent-collaboration path in the SIG issue.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 15-16, 61-67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` at line 1, The proposal currently
presents a unified upstream project but conflicts with the alternative
independent-collaboration approach in docs/SIG_BENCHMARKING_ISSUE.md; add a
short "Positioning" section at the top of CE_NN_INTEGRATION_PROPOSAL.md (near
the title) that explicitly states whether the document advocates (1) the
preferred direction of a single unified upstream project or (2) an alternative
option to pursue independent collaboration between config_explorer and
NeuralNav, and briefly summarize the rationale and implications for SIG review;
also update the corresponding paragraphs around lines referenced (15-16 and
61-67) to reflect this positioning so the narrative is consistent with
SIG_BENCHMARKING_ISSUE.md.

Comment on lines +331 to +334
1. **Accuracy**: For (model, GPU, traffic_profile) combos where NeuralNav has real GuideLLM benchmarks, the roofline estimates agree within 30% on TTFT, ITL, and E2E latency for the majority (>60%) of tested combos.
2. **Coverage expansion**: The synthetic generator produces valid estimates for at least 3 model+GPU combos that NeuralNav currently cannot recommend.
3. **End-to-end flow**: Synthetic benchmarks loaded into NeuralNav are scored, ranked, and displayed with an "Estimated" indicator -- no existing recommendation quality degraded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

POC acceptance thresholds are too loose for ranking-quality decisions.

Proceeding when latency estimates are within 30% for just >60% of combos (and still allowing 30–50% error with caveats) risks introducing noisy synthetic data into recommendation ranking. That can degrade trust in “best” outputs.

Please tighten go/no-go criteria (e.g., stricter error bands for p95-sensitive metrics and minimum pass rates per traffic profile), not only aggregate pass rates.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 373-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CONFIG_EXPLORER_EVALUATION.md` around lines 331 - 334, Tighten the POC
acceptance thresholds in the "Accuracy" criterion: change the aggregate
pass-rate requirement from ">60% of tested combos within 30%" to a stricter rule
such as "≥80% of tested (model, GPU, traffic_profile) combos within ±20%" and
add per-traffic_profile minimums (e.g., each traffic_profile must meet ≥75% pass
rate). For p95-sensitive metrics (TTFT_p95, ITL_p95, E2E_p95) tighten the
allowable error band to ±15% and require these metrics individually to pass the
threshold (not just aggregate). Remove or narrow the caveat allowing 30–50%
error—replace with a hard fail threshold (e.g., any metric >25% error flagged as
fail) and require documented acceptance justification for any exceptions. Ensure
these changes apply alongside the "Coverage expansion" and "End-to-end flow"
checks so synthetic benchmarks only proceed when both aggregate and
per-profile/p95 conditions are met.

@anfredette anfredette marked this pull request as draft March 3, 2026 23:45
Draft issue proposing contributing NeuralNav to the llm-d SIG
Benchmarking ecosystem and enabling collaboration with Config Explorer.

Signed-off-by: Andre Fredette <afredette@redhat.com>
@anfredette anfredette force-pushed the config-exlorer-eval branch from 1bfb3bb to d8f55f3 Compare March 4, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants