RPOPC-1317: Extract STREAMS benchmark version from CSV metadata by grdumas · Pull Request #49 · redhat-performance/chronicler

grdumas · 2026-06-16T02:58:28Z

Summary

Fix StreamsProcessor to extract the STREAMS benchmark version from CSV metadata comments instead of incorrectly using the wrapper version.

Problem

StreamsProcessor was using the wrapper version (e.g., "v2.8") for test.version instead of the actual STREAMS benchmark version (e.g., "5.10") found in CSV metadata.

Acceptance Criteria

Extract benchmark version from CSV metadata comments during _parse_streams_csv()
Use regex pattern to match streams_version_# followed by version number
Store extracted version in self._benchmark_version
Override build_test_info() to use the extracted benchmark version instead of wrapper version
test.version contains the benchmark version (e.g., "5.10") not the wrapper version (e.g., "v2.8")

Changes

Added _benchmark_version instance variable to StreamsProcessor
Extract version from # streams_version_# X.Y comments in _parse_streams_csv()
Use first occurrence if multiple version comments present
Override build_test_info() to prioritize benchmark version over wrapper version
Fall back to wrapper version when no benchmark version found

Testing

Unit tests added for version extraction (5 new tests)
All existing tests passing (273 total tests)
Edge cases covered: whitespace variations, missing version, multiple versions, different formats

- Add test for extracting version from CSV comment - Add test for whitespace variations - Add test for fallback to wrapper version when missing - Add test for using first occurrence when multiple versions - Add test for different version number formats Part of RPOPC-1317. Tests currently fail (RED phase).

- Add _benchmark_version instance variable to store extracted version - Extract version from '# streams_version_# X.Y' CSV comment in _parse_streams_csv() - Use first occurrence if multiple version comments present - Override build_test_info() to use benchmark version for test.version - Fall back to wrapper version when benchmark version not found Implements RPOPC-1317. All tests passing (GREEN phase). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai · 2026-06-16T02:58:40Z

Warning

Review limit reached

@grdumas, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 53 minutes and 35 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 2ced592c-1521-4fc1-9fae-a1e319afcff3

📥 Commits

Reviewing files that changed from the base of the PR and between 2a64221 and 6d78954.

📒 Files selected for processing (2)

src/chronicler/processors/streams_processor.py
tests/test_streams_version_extraction.py

📝 Walkthrough

Walkthrough

StreamsProcessor gains an __init__ that initializes _benchmark_version and a build_test_info() override that populates TestInfo.version with a version string parsed from # streams_version_# <version> comment lines in results_streams.csv. A new test module covers extraction, whitespace tolerance, fallback, first-occurrence, and multiple version string formats.

Changes

STREAMS benchmark version extraction

Layer / File(s)	Summary
Version extraction and `build_test_info` override `src/chronicler/processors/streams_processor.py`	`__init__` initializes `_benchmark_version = None`; the CSV parsing loop detects `# streams_version_# ...` comment lines via regex and sets `_benchmark_version` on the first match only; `build_test_info()` calls the base implementation and substitutes `TestInfo.version` with the extracted version while keeping `wrapper_version` from the parent.
Unit tests for version extraction `tests/test_streams_version_extraction.py`	Adds `_write_csv` helper and five test functions covering: basic comment extraction, whitespace variations, fallback to wrapper version, first-occurrence-only behavior, and acceptance of `x.y`, `x.y.z`, `vX.Y`, and `YYYY.X` version formats.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: extracting STREAMS benchmark version from CSV metadata, which is the primary focus of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, providing a comprehensive summary of the problem, solution, acceptance criteria, and testing approach for extracting benchmark versions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

grdumas

PR Review: RPOPC-1317: Extract STREAMS benchmark version from CSV metadata

Summary

This PR successfully addresses the version conflation issue for the STREAMS benchmark by correctly parsing its version from the CSV metadata comments and utilizing the newly established build_test_info() override pattern.

Critical Issues (MUST FIX)

None found.

Security Delta

None found. No security-sensitive code was removed or weakened.

Major Issues (SHOULD FIX)

None found.

Minor Issues (NICE TO HAVE)

None found.

Nitpicks (OPTIONAL)

None found.

Positive Notes

Pattern Adoption: Excellent use of the build_test_info() override pattern established in the base processor. The fallback logic self._benchmark_version or base_info.version is clean and defensive.
Edge Case Handling: The regex r'streams_version_#\s+(\S+)' correctly handles arbitrary whitespace variations, and the condition if self._benchmark_version is None: safely ensures that only the first version comment in the file is captured.
Testing: The 5 new unit tests are comprehensive, covering variations in whitespace, missing versions, multiple version strings, and different version formats.

Overall Assessment

Status: APPROVE
Reasoning: The code cleanly extracts the correct benchmark version without impacting the existing wrapper version assignment. It handles the parsing robustly and includes a strong suite of tests.
Next Steps: Ready to merge.

Reviewed by: Gemini Pro via automated code review

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_streams_version_extraction.py (1)
23-109: ⚡ Quick win

Add a regression test for processor reuse across multiple parses.

Current tests don’t exercise calling parse_runs() twice on the same StreamsProcessor instance (first CSV has streams_version_#, second CSV omits it). That scenario should assert test.version falls back correctly on the second parse and does not retain stale benchmark version.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_streams_version_extraction.py` around lines 23 - 109, Add a new
regression test function to verify that the StreamsProcessor correctly handles
multiple sequential parse_runs() calls without retaining stale state. Create a
test that instantiates a StreamsProcessor once, calls parse_runs with a CSV
containing a streams_version_# comment (e.g., "5.10"), then calls parse_runs
again with a different CSV that omits the version comment, and verifies that
build_test_info() returns the fallback wrapper version (not the stale "5.10"
from the first parse). This ensures that the processor resets its benchmark
version appropriately on subsequent parses when the comment is absent.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/chronicler/processors/streams_processor.py`:
- Around line 223-227: The `_benchmark_version` attribute in the
`StreamsProcessor` class is captured only once per processor instance and never
reset before a new parse, causing stale benchmark version metadata to persist
across multiple input parses when the same processor instance is reused. Reset
`_benchmark_version` to None at the beginning of the parse method (or whichever
method initiates a new parse operation) to ensure each parse starts with a clean
state and captures the correct version for the current input.

---

Nitpick comments:
In `@tests/test_streams_version_extraction.py`:
- Around line 23-109: Add a new regression test function to verify that the
StreamsProcessor correctly handles multiple sequential parse_runs() calls
without retaining stale state. Create a test that instantiates a
StreamsProcessor once, calls parse_runs with a CSV containing a
streams_version_# comment (e.g., "5.10"), then calls parse_runs again with a
different CSV that omits the version comment, and verifies that
build_test_info() returns the fallback wrapper version (not the stale "5.10"
from the first parse). This ensures that the processor resets its benchmark
version appropriately on subsequent parses when the comment is absent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 8a31cf16-7453-44eb-902e-ebee28b4d7dd

📥 Commits

Reviewing files that changed from the base of the PR and between fe9c9db and 2a64221.

📒 Files selected for processing (2)

src/chronicler/processors/streams_processor.py
tests/test_streams_version_extraction.py

Add test to verify _benchmark_version doesn't leak between parse_runs() calls when reusing the same processor instance. Currently fails (RED phase) - demonstrates the bug where second parse incorrectly retains first parse's version. Addresses review feedback on PR #49.

Reset self._benchmark_version to None at the start of parse_runs() to prevent state leakage when the same processor instance is reused across multiple parses. Without this fix, if a processor parsed CSV1 (with version) then CSV2 (without version), CSV2 would incorrectly inherit CSV1's benchmark version instead of falling back to wrapper version. Addresses review feedback on PR #49. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

grdumas · 2026-06-16T03:05:14Z

PR Update: Addressed Review Feedback

What was done

Added regression test for processor state leakage (commit 10ae8d3)
- Test verifies _benchmark_version doesn't leak between parse_runs() calls when reusing the same processor instance
- First parse with version "5.10", second parse without version should fall back to wrapper "v2.8"
- Test initially failed (RED phase), confirming the bug
Reset benchmark version state at start of each parse (commit 6d78954)
- Added self._benchmark_version = None at the beginning of parse_runs()
- Ensures clean state for each parse operation
- Prevents stale version metadata from leaking across multiple parses

What was not done

None - all requested changes were implemented.

Why this approach

The state leakage bug was subtle because all existing tests created fresh StreamsProcessor instances for each test case, never exercising the reuse scenario. In production, if the same processor instance processes multiple result files, the second file would incorrectly inherit the first file's benchmark version instead of extracting its own or falling back to the wrapper version.

Resetting _benchmark_version at the start of parse_runs() is the minimal fix that:

Ensures each parse starts with clean state
Follows the principle that parse_runs() is the entry point for processing a new result
Has zero performance impact
Maintains backward compatibility

Verification

All 274 tests pass (6 version extraction tests + 7 other STREAMS tests + 261 other tests).

Before fix:

test_streams_version_resets_between_parses FAILED
AssertionError: Second parse should fall back to wrapper version, not retain stale '5.10'
assert '5.10' == 'v2.8'

After fix:

tests/test_streams_version_extraction.py::test_streams_version_resets_between_parses PASSED

The PR is now ready for re-review.

Responded by: Claude Sonnet 4.5 via automated workflow

grdumas

PR Review: RPOPC-1317: Extract STREAMS benchmark version from CSV metadata

Summary

This update cleanly addresses the potential issue of state leakage when a single StreamsProcessor instance processes multiple distinct test runs sequentially.

Critical Issues (MUST FIX)

None found.

Major Issues (SHOULD FIX)

None found.

Minor Issues (NICE TO HAVE)

None found.

Nitpicks (OPTIONAL)

None found.

Positive Notes

State Management: Resetting self._benchmark_version = None at the top of parse_runs() is a great catch and an excellent practice for defensive programming, ensuring no stale data leaks between processing tasks.
Testing: The new test, test_streams_version_resets_between_parses, flawlessly proves the regression is avoided. Testing stateful behaviors like this is critical for long-running batch systems.

Overall Assessment

Status: APPROVE
Reasoning: The core implementation remains robust, and the state-leakage fix ensures safety during batch processing. Test coverage remains excellent and all 274 tests pass.
Next Steps: Ready to merge.

Reviewed by: Gemini Pro via automated code review

grdumas

LGTM

Agent VM and others added 2 commits June 15, 2026 22:55

grdumas self-assigned this Jun 16, 2026

grdumas added the bug Something isn't working label Jun 16, 2026

grdumas commented Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/chronicler/processors/streams_processor.py

Agent VM and others added 2 commits June 15, 2026 23:03

grdumas commented Jun 16, 2026

View reviewed changes

grdumas merged commit 1614097 into main Jun 16, 2026
2 checks passed

grdumas deleted the fix/RPOPC-1317-streams-version-extraction branch June 16, 2026 03:08

This was referenced Jun 16, 2026

RPOPC-1318: Move FIO benchmark version to test.version #50

Merged

RPOPC-1320: Fix remaining 7 processors version extraction #52

Merged

STREAMS processor does not extract benchmark version from CSV comments #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPOPC-1317: Extract STREAMS benchmark version from CSV metadata#49

RPOPC-1317: Extract STREAMS benchmark version from CSV metadata#49
grdumas merged 4 commits into
mainfrom
fix/RPOPC-1317-streams-version-extraction

grdumas commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

grdumas left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

grdumas commented Jun 16, 2026

Uh oh!

grdumas left a comment

Uh oh!

grdumas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

grdumas commented Jun 16, 2026

Summary

Problem

Acceptance Criteria

Changes

Testing

Related

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

grdumas left a comment

Choose a reason for hiding this comment

PR Review: RPOPC-1317: Extract STREAMS benchmark version from CSV metadata

Summary

Critical Issues (MUST FIX)

Security Delta

Major Issues (SHOULD FIX)

Minor Issues (NICE TO HAVE)

Nitpicks (OPTIONAL)

Positive Notes

Overall Assessment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

grdumas commented Jun 16, 2026

PR Update: Addressed Review Feedback

What was done

What was not done

Why this approach

Verification

Uh oh!

grdumas left a comment

Choose a reason for hiding this comment

PR Review: RPOPC-1317: Extract STREAMS benchmark version from CSV metadata

Summary

Critical Issues (MUST FIX)

Major Issues (SHOULD FIX)

Minor Issues (NICE TO HAVE)

Nitpicks (OPTIONAL)

Positive Notes

Overall Assessment

Uh oh!

grdumas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading