docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal by anfredette · Pull Request #104 · redhat-et/neuralnav

anfredette · 2026-03-03T23:32:50Z

Docs Include

Comparative technical evaluation of config_explorer and NeuralNav
Integration proposal analyzing complementary capabilities and synergy opportunities
Draft GitHub issue for proposing collaboration to SIG Benchmarking

Context

These documents support an upcoming proposal to the llm-d SIG Benchmarking
community to contribute NeuralNav and explore integration with Config Explorer.

NOTES:

This PR is created for discussion purposes and may or may not get merged into the repo.
CONFIG_EXPLORER_EVALUATION.md was produced almost entirely by Claude Code for my own info. I didn't spend any time trying to make it suitable for public consumption, and it will likely not be merged.

Summary by CodeRabbit

Documentation
- Added integration proposal documentation detailing project combination strategy.
- Added comparative technical evaluation documentation with integration models and feasibility assessment.
- Added benchmarking ecosystem contribution proposal with collaboration structure.

Structured technical evaluation covering functional comparison, architectural analysis, overlap/complementarity assessment, and integration feasibility between llm-d-benchmark's config_explorer and NeuralNav. Proposes offline synthetic benchmark generation as the recommended integration path. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Andre Fredette <afredette@redhat.com>

Makes the case for merging both projects into a single upstream tool in the llm-d organization. Covers strategic rationale, unified architecture, phased integration roadmap, and risk analysis with mitigations. Signed-off-by: Andre Fredette <afredette@redhat.com>

coderabbitai · 2026-03-03T23:33:08Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef4b740d-bb05-43c7-b497-d4464e9d35fc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Three new documentation files propose and evaluate the integration of config_explorer and NeuralNav projects. The documents include architectural comparisons, integration strategies, feasibility assessments, and governance models for combining the two tools into a unified upstream project or ecosystem collaboration.

Changes

Cohort / File(s)	Summary
Integration Proposal & Evaluation Documentation `docs/CE_NN_INTEGRATION_PROPOSAL.md`, `docs/CONFIG_EXPLORER_EVALUATION.md`, `docs/SIG_BENCHMARKING_ISSUE.md`	Three new proposal documents detailing the integration of config_explorer and NeuralNav. Includes technical evaluation comparing functional capabilities, data models, and APIs; integration rationale covering guided workflows, evaluation expansion, and ranking; feasibility assessment with multiple integration models (offline generation, loose coupling, full merge); risk analysis; and governance proposals for SIG Benchmarking collaboration. No code or API changes introduced.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main changes: adding documentation files that present a collaborative analysis between Config Explorer and NeuralNav, plus a proposal to the SIG Benchmarking community.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

docs/CE_NN_INTEGRATION_PROPOSAL.md (1)
36-53: Make comparative capability claims auditable.

The capability matrix uses definitive “None/Shared” overlap labels, but doesn’t point to evidence boundaries (code refs, benchmark artifacts, or date/commit scope). For a governance-facing doc, this makes technical conclusions hard to validate and maintain over time.

A compact “Evidence & scope” note under the table (sources + snapshot date) would make the claims reviewable.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` around lines 36 - 53, Add an "Evidence &
scope" note immediately under the capability matrix that lists the data sources
and snapshot scope used to label overlaps (e.g., code references, benchmark
artifact names, and commit hashes or dates) and a short statement of the review
cutoff date; explicitly map ambiguous labels ("None"/"Shared") to the evidence
(for example: "None — verified by absence of features in config_explorer repo at
commit abc123 and no benchmarking artifacts; Shared — both projects reference
GPU cost lookup data source X dated 2025-01-15"). Ensure the note is compact
(1–3 lines) and includes at least one pointer to code/benchmark artifacts and
the snapshot date so the claims become auditable and maintainable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Line 1: The proposal currently presents a unified upstream project but
conflicts with the alternative independent-collaboration approach in
docs/SIG_BENCHMARKING_ISSUE.md; add a short "Positioning" section at the top of
CE_NN_INTEGRATION_PROPOSAL.md (near the title) that explicitly states whether
the document advocates (1) the preferred direction of a single unified upstream
project or (2) an alternative option to pursue independent collaboration between
config_explorer and NeuralNav, and briefly summarize the rationale and
implications for SIG review; also update the corresponding paragraphs around
lines referenced (15-16 and 61-67) to reflect this positioning so the narrative
is consistent with SIG_BENCHMARKING_ISSUE.md.

In `@docs/CONFIG_EXPLORER_EVALUATION.md`:
- Around line 331-334: Tighten the POC acceptance thresholds in the "Accuracy"
criterion: change the aggregate pass-rate requirement from ">60% of tested
combos within 30%" to a stricter rule such as "≥80% of tested (model, GPU,
traffic_profile) combos within ±20%" and add per-traffic_profile minimums (e.g.,
each traffic_profile must meet ≥75% pass rate). For p95-sensitive metrics
(TTFT_p95, ITL_p95, E2E_p95) tighten the allowable error band to ±15% and
require these metrics individually to pass the threshold (not just aggregate).
Remove or narrow the caveat allowing 30–50% error—replace with a hard fail
threshold (e.g., any metric >25% error flagged as fail) and require documented
acceptance justification for any exceptions. Ensure these changes apply
alongside the "Coverage expansion" and "End-to-end flow" checks so synthetic
benchmarks only proceed when both aggregate and per-profile/p95 conditions are
met.

---

Nitpick comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Around line 36-53: Add an "Evidence & scope" note immediately under the
capability matrix that lists the data sources and snapshot scope used to label
overlaps (e.g., code references, benchmark artifact names, and commit hashes or
dates) and a short statement of the review cutoff date; explicitly map ambiguous
labels ("None"/"Shared") to the evidence (for example: "None — verified by
absence of features in config_explorer repo at commit abc123 and no benchmarking
artifacts; Shared — both projects reference GPU cost lookup data source X dated
2025-01-15"). Ensure the note is compact (1–3 lines) and includes at least one
pointer to code/benchmark artifacts and the snapshot date so the claims become
auditable and maintainable.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ad7c395 and 1bfb3bb.

📒 Files selected for processing (3)

docs/CE_NN_INTEGRATION_PROPOSAL.md
docs/CONFIG_EXPLORER_EVALUATION.md
docs/SIG_BENCHMARKING_ISSUE.md

coderabbitai · 2026-03-03T23:34:56Z

docs/CE_NN_INTEGRATION_PROPOSAL.md

@@ -0,0 +1,103 @@
+# Proposal: Unifying config_explorer and NeuralNav into a Single Upstream Project


⚠️ Potential issue | 🟠 Major

Align this document’s strategy with the SIG proposal narrative.

This file positions a single unified upstream project, while docs/SIG_BENCHMARKING_ISSUE.md explicitly proposes collaboration with both tools remaining independent. That strategic mismatch will confuse stakeholders during SIG review and weakens the proposal package coherence.

Consider adding a short “Positioning” section near Line 1 that clearly states whether this is:

the preferred direction, or

an alternative to the independent-collaboration path in the SIG issue.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 15-16, 61-67

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` at line 1, The proposal currently presents a unified upstream project but conflicts with the alternative independent-collaboration approach in docs/SIG_BENCHMARKING_ISSUE.md; add a short "Positioning" section at the top of CE_NN_INTEGRATION_PROPOSAL.md (near the title) that explicitly states whether the document advocates (1) the preferred direction of a single unified upstream project or (2) an alternative option to pursue independent collaboration between config_explorer and NeuralNav, and briefly summarize the rationale and implications for SIG review; also update the corresponding paragraphs around lines referenced (15-16 and 61-67) to reflect this positioning so the narrative is consistent with SIG_BENCHMARKING_ISSUE.md.

coderabbitai · 2026-03-03T23:34:56Z

docs/CONFIG_EXPLORER_EVALUATION.md

+1. **Accuracy**: For (model, GPU, traffic_profile) combos where NeuralNav has real GuideLLM benchmarks, the roofline estimates agree within 30% on TTFT, ITL, and E2E latency for the majority (>60%) of tested combos.
+2. **Coverage expansion**: The synthetic generator produces valid estimates for at least 3 model+GPU combos that NeuralNav currently cannot recommend.
+3. **End-to-end flow**: Synthetic benchmarks loaded into NeuralNav are scored, ranked, and displayed with an "Estimated" indicator -- no existing recommendation quality degraded.
+


⚠️ Potential issue | 🟠 Major

POC acceptance thresholds are too loose for ranking-quality decisions.

Proceeding when latency estimates are within 30% for just >60% of combos (and still allowing 30–50% error with caveats) risks introducing noisy synthetic data into recommendation ranking. That can degrade trust in “best” outputs.

Please tighten go/no-go criteria (e.g., stricter error bands for p95-sensitive metrics and minimum pass rates per traffic profile), not only aggregate pass rates.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 373-375

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/CONFIG_EXPLORER_EVALUATION.md` around lines 331 - 334, Tighten the POC acceptance thresholds in the "Accuracy" criterion: change the aggregate pass-rate requirement from ">60% of tested combos within 30%" to a stricter rule such as "≥80% of tested (model, GPU, traffic_profile) combos within ±20%" and add per-traffic_profile minimums (e.g., each traffic_profile must meet ≥75% pass rate). For p95-sensitive metrics (TTFT_p95, ITL_p95, E2E_p95) tighten the allowable error band to ±15% and require these metrics individually to pass the threshold (not just aggregate). Remove or narrow the caveat allowing 30–50% error—replace with a hard fail threshold (e.g., any metric >25% error flagged as fail) and require documented acceptance justification for any exceptions. Ensure these changes apply alongside the "Coverage expansion" and "End-to-end flow" checks so synthetic benchmarks only proceed when both aggregate and per-profile/p95 conditions are met.

Draft issue proposing contributing NeuralNav to the llm-d SIG Benchmarking ecosystem and enabling collaboration with Config Explorer. Signed-off-by: Andre Fredette <afredette@redhat.com>

anfredette added 2 commits March 3, 2026 17:45

anfredette assigned amito Mar 3, 2026

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

anfredette marked this pull request as draft March 3, 2026 23:45

docs: add draft GitHub issue for SIG Benchmarking collaboration proposal

d8f55f3

Draft issue proposing contributing NeuralNav to the llm-d SIG Benchmarking ecosystem and enabling collaboration with Config Explorer. Signed-off-by: Andre Fredette <afredette@redhat.com>

anfredette force-pushed the config-exlorer-eval branch from 1bfb3bb to d8f55f3 Compare March 4, 2026 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104

docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104
anfredette wants to merge 3 commits intoredhat-et:mainfrom
anfredette:config-exlorer-eval

anfredette commented Mar 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,103 @@
		# Proposal: Unifying config_explorer and NeuralNav into a Single Upstream Project

Conversation

anfredette commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docs Include

Context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anfredette commented Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading