docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104
docs: add Config Explorer / NeuralNav collaboration analysis and SIG proposal#104anfredette wants to merge 3 commits intoredhat-et:mainfrom
Conversation
Structured technical evaluation covering functional comparison, architectural analysis, overlap/complementarity assessment, and integration feasibility between llm-d-benchmark's config_explorer and NeuralNav. Proposes offline synthetic benchmark generation as the recommended integration path. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Andre Fredette <afredette@redhat.com>
Makes the case for merging both projects into a single upstream tool in the llm-d organization. Covers strategic rationale, unified architecture, phased integration roadmap, and risk analysis with mitigations. Signed-off-by: Andre Fredette <afredette@redhat.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThree new documentation files propose and evaluate the integration of config_explorer and NeuralNav projects. The documents include architectural comparisons, integration strategies, feasibility assessments, and governance models for combining the two tools into a unified upstream project or ecosystem collaboration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/CE_NN_INTEGRATION_PROPOSAL.md (1)
36-53: Make comparative capability claims auditable.The capability matrix uses definitive “None/Shared” overlap labels, but doesn’t point to evidence boundaries (code refs, benchmark artifacts, or date/commit scope). For a governance-facing doc, this makes technical conclusions hard to validate and maintain over time.
A compact “Evidence & scope” note under the table (sources + snapshot date) would make the claims reviewable.
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` around lines 36 - 53, Add an "Evidence & scope" note immediately under the capability matrix that lists the data sources and snapshot scope used to label overlaps (e.g., code references, benchmark artifact names, and commit hashes or dates) and a short statement of the review cutoff date; explicitly map ambiguous labels ("None"/"Shared") to the evidence (for example: "None — verified by absence of features in config_explorer repo at commit abc123 and no benchmarking artifacts; Shared — both projects reference GPU cost lookup data source X dated 2025-01-15"). Ensure the note is compact (1–3 lines) and includes at least one pointer to code/benchmark artifacts and the snapshot date so the claims become auditable and maintainable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Line 1: The proposal currently presents a unified upstream project but
conflicts with the alternative independent-collaboration approach in
docs/SIG_BENCHMARKING_ISSUE.md; add a short "Positioning" section at the top of
CE_NN_INTEGRATION_PROPOSAL.md (near the title) that explicitly states whether
the document advocates (1) the preferred direction of a single unified upstream
project or (2) an alternative option to pursue independent collaboration between
config_explorer and NeuralNav, and briefly summarize the rationale and
implications for SIG review; also update the corresponding paragraphs around
lines referenced (15-16 and 61-67) to reflect this positioning so the narrative
is consistent with SIG_BENCHMARKING_ISSUE.md.
In `@docs/CONFIG_EXPLORER_EVALUATION.md`:
- Around line 331-334: Tighten the POC acceptance thresholds in the "Accuracy"
criterion: change the aggregate pass-rate requirement from ">60% of tested
combos within 30%" to a stricter rule such as "≥80% of tested (model, GPU,
traffic_profile) combos within ±20%" and add per-traffic_profile minimums (e.g.,
each traffic_profile must meet ≥75% pass rate). For p95-sensitive metrics
(TTFT_p95, ITL_p95, E2E_p95) tighten the allowable error band to ±15% and
require these metrics individually to pass the threshold (not just aggregate).
Remove or narrow the caveat allowing 30–50% error—replace with a hard fail
threshold (e.g., any metric >25% error flagged as fail) and require documented
acceptance justification for any exceptions. Ensure these changes apply
alongside the "Coverage expansion" and "End-to-end flow" checks so synthetic
benchmarks only proceed when both aggregate and per-profile/p95 conditions are
met.
---
Nitpick comments:
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md`:
- Around line 36-53: Add an "Evidence & scope" note immediately under the
capability matrix that lists the data sources and snapshot scope used to label
overlaps (e.g., code references, benchmark artifact names, and commit hashes or
dates) and a short statement of the review cutoff date; explicitly map ambiguous
labels ("None"/"Shared") to the evidence (for example: "None — verified by
absence of features in config_explorer repo at commit abc123 and no benchmarking
artifacts; Shared — both projects reference GPU cost lookup data source X dated
2025-01-15"). Ensure the note is compact (1–3 lines) and includes at least one
pointer to code/benchmark artifacts and the snapshot date so the claims become
auditable and maintainable.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
docs/CE_NN_INTEGRATION_PROPOSAL.mddocs/CONFIG_EXPLORER_EVALUATION.mddocs/SIG_BENCHMARKING_ISSUE.md
| @@ -0,0 +1,103 @@ | |||
| # Proposal: Unifying config_explorer and NeuralNav into a Single Upstream Project | |||
There was a problem hiding this comment.
Align this document’s strategy with the SIG proposal narrative.
This file positions a single unified upstream project, while docs/SIG_BENCHMARKING_ISSUE.md explicitly proposes collaboration with both tools remaining independent. That strategic mismatch will confuse stakeholders during SIG review and weakens the proposal package coherence.
Consider adding a short “Positioning” section near Line 1 that clearly states whether this is:
- the preferred direction, or
- an alternative to the independent-collaboration path in the SIG issue.
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 15-16, 61-67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/CE_NN_INTEGRATION_PROPOSAL.md` at line 1, The proposal currently
presents a unified upstream project but conflicts with the alternative
independent-collaboration approach in docs/SIG_BENCHMARKING_ISSUE.md; add a
short "Positioning" section at the top of CE_NN_INTEGRATION_PROPOSAL.md (near
the title) that explicitly states whether the document advocates (1) the
preferred direction of a single unified upstream project or (2) an alternative
option to pursue independent collaboration between config_explorer and
NeuralNav, and briefly summarize the rationale and implications for SIG review;
also update the corresponding paragraphs around lines referenced (15-16 and
61-67) to reflect this positioning so the narrative is consistent with
SIG_BENCHMARKING_ISSUE.md.
| 1. **Accuracy**: For (model, GPU, traffic_profile) combos where NeuralNav has real GuideLLM benchmarks, the roofline estimates agree within 30% on TTFT, ITL, and E2E latency for the majority (>60%) of tested combos. | ||
| 2. **Coverage expansion**: The synthetic generator produces valid estimates for at least 3 model+GPU combos that NeuralNav currently cannot recommend. | ||
| 3. **End-to-end flow**: Synthetic benchmarks loaded into NeuralNav are scored, ranked, and displayed with an "Estimated" indicator -- no existing recommendation quality degraded. | ||
|
|
There was a problem hiding this comment.
POC acceptance thresholds are too loose for ranking-quality decisions.
Proceeding when latency estimates are within 30% for just >60% of combos (and still allowing 30–50% error with caveats) risks introducing noisy synthetic data into recommendation ranking. That can degrade trust in “best” outputs.
Please tighten go/no-go criteria (e.g., stricter error bands for p95-sensitive metrics and minimum pass rates per traffic profile), not only aggregate pass rates.
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 373-375
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/CONFIG_EXPLORER_EVALUATION.md` around lines 331 - 334, Tighten the POC
acceptance thresholds in the "Accuracy" criterion: change the aggregate
pass-rate requirement from ">60% of tested combos within 30%" to a stricter rule
such as "≥80% of tested (model, GPU, traffic_profile) combos within ±20%" and
add per-traffic_profile minimums (e.g., each traffic_profile must meet ≥75% pass
rate). For p95-sensitive metrics (TTFT_p95, ITL_p95, E2E_p95) tighten the
allowable error band to ±15% and require these metrics individually to pass the
threshold (not just aggregate). Remove or narrow the caveat allowing 30–50%
error—replace with a hard fail threshold (e.g., any metric >25% error flagged as
fail) and require documented acceptance justification for any exceptions. Ensure
these changes apply alongside the "Coverage expansion" and "End-to-end flow"
checks so synthetic benchmarks only proceed when both aggregate and
per-profile/p95 conditions are met.
Draft issue proposing contributing NeuralNav to the llm-d SIG Benchmarking ecosystem and enabling collaboration with Config Explorer. Signed-off-by: Andre Fredette <afredette@redhat.com>
1bfb3bb to
d8f55f3
Compare
Docs Include
Context
These documents support an upcoming proposal to the llm-d SIG Benchmarking
community to contribute NeuralNav and explore integration with Config Explorer.
NOTES:
Summary by CodeRabbit