Skip to content

chore: refactor e2e tests#183

Merged
asamal4 merged 1 commit intolightspeed-core:mainfrom
VladimirKadlec:e2etests
Mar 10, 2026
Merged

chore: refactor e2e tests#183
asamal4 merged 1 commit intolightspeed-core:mainfrom
VladimirKadlec:e2etests

Conversation

@VladimirKadlec
Copy link
Copy Markdown
Member

@VladimirKadlec VladimirKadlec commented Mar 6, 2026

Description

Refactor e2e tests using pytest.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude, CodeRabbit

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Summary by CodeRabbit

  • Tests

    • Added comprehensive end-to-end integration tests covering full evaluation flows, API connectivity, response enrichment, metrics verification, and runtime validation.
    • Test data updated to force live API calls for richer validation.
  • Chores

    • Simplified CI to run a single centralized end-to-end test target.
    • Updated test runner config to skip integration tests by default and added a timeout dev dependency for reliability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 6, 2026

Walkthrough

Consolidates CI e2e orchestration into a single Make target, adds pytest-timeout, adjusts pytest defaults to exclude integration tests, and introduces new end-to-end integration tests plus a small test-data tweak to force API calls.

Changes

Cohort / File(s) Summary
Workflow
.github/workflows/e2e_tests.yaml
Removed matrix-based variant handling and per-variant steps; replaced with a single centralized execution calling make e2e_tests and fixed LSC image name.
Make target
Makefile
Added e2e_tests target (depends on install-deps-test) that runs uv run pytest tests/integration -v -m integration.
Test config & deps
pyproject.toml, pytest.ini
Added pytest-timeout>=2.4.0 to dev deps; changed pytest section header to [pytest], removed --disable-warnings, and added -m "not integration" to default addopts.
Integration tests & data
tests/integration/test_full_evaluation.py, tests/integration/test_evaluation_data.yaml
Added a new comprehensive integration test module (fixtures, connectivity checks, full evaluation flows for query/streaming endpoints) and modified integration test YAML to include response: null on first turn to force API invocation.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Integration Test
    participant ConfigLoader as Config Loader
    participant EvalFramework as Evaluation Framework
    participant LSAPI as Lightspeed Stack API
    participant OpenAI as OpenAI API

    Test->>ConfigLoader: load system config (query/streaming)
    ConfigLoader-->>Test: return config
    Test->>EvalFramework: load evaluation data (YAML)
    EvalFramework-->>Test: return evaluation data
    Test->>EvalFramework: evaluate(config, eval_data)
    EvalFramework->>LSAPI: send request(s) to endpoint
    LSAPI->>OpenAI: forward requests (uses OPENAI_API_KEY)
    OpenAI-->>LSAPI: return responses & token usage
    LSAPI-->>EvalFramework: return enriched responses
    EvalFramework-->>Test: return EvaluationResult (metrics, scores)
    Test->>Test: assert results, metrics, thresholds, token usage
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: refactor e2e tests' accurately reflects the main objective of the PR, which is to refactor end-to-end tests from a workflow-based approach to a pytest-based framework.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@VladimirKadlec VladimirKadlec marked this pull request as ready for review March 6, 2026 15:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
Makefile (1)

62-63: Add e2e_tests to .PHONY.

Without this, a file or directory named e2e_tests will cause the recipe to be skipped. Since this is a public automation target, it should be declared phony like help and test.

♻️ Proposed fix
-.PHONY: help test
+.PHONY: help test e2e_tests
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 62 - 63, The Makefile defines the e2e_tests target but
doesn't list it in the .PHONY declaration, so add "e2e_tests" to the existing
.PHONY target (the line that currently contains targets like "help" and "test")
to mark it as phony; edit the .PHONY entry to include e2e_tests (ensuring
spacing/formatting matches the current .PHONY line) so the e2e_tests recipe
always runs regardless of any file or directory named e2e_tests.
pyproject.toml (1)

70-70: Please confirm the direct pytest-timeout dependency is intentional.

This repo previously kept the timeout = 300 config without declaring pytest-timeout, so adding it here changes the dev dependency policy for every --group dev sync. If that maintainer decision has not changed, this line should stay out of pyproject.toml.

Based on learnings: "In the lightspeed-evaluation project, the pytest.ini timeout configuration is intentionally configured without adding pytest-timeout as a declared dependency, as confirmed by the project maintainer."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 70, Confirm whether adding the direct dev dependency
"pytest-timeout>=2.4.0" is intentional; if it is not, remove the
"pytest-timeout>=2.4.0" entry from pyproject.toml so the repo continues to rely
on the existing pytest.ini "timeout = 300" configuration without declaring the
plugin, and if it is intentional, leave the line but update project docs or
dev-dependency policy to record the decision and reasoning for declaring
pytest-timeout explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Line 105: The workflow currently invokes the PR-controlled target "make
e2e_tests" under the "pull_request_target" context which allows the PR's
Makefile to execute with secrets (e.g. OPENAI_API_KEY) — change this by either
replacing the "make e2e_tests" step with explicit, inline test commands that run
against the trusted checkout/ref (so nothing from the PR Makefile executes), or
move the job to run on "pull_request" instead so the PR head is checked out
without exposing secrets; update the step that calls "make e2e_tests" and any
related prerequisites to reference the trusted commands or change the workflow
trigger accordingly.

In `@tests/integration/test_full_evaluation.py`:
- Line 1: Remove the file-level "# pylint: disable" and all "# type: ignore"
usages and fix the root issues: move the nested DataValidator class to module
scope (rename if needed) so it’s importable/testable, create a shared helper
function (e.g., load_validate_evaluate or evaluate_fixture) to encapsulate the
repeated load/validate/evaluate sequence used across tests to reduce local
variable churn, and replace any suppressed comparisons by asserting non-None for
score and threshold then assigning them to locals and comparing those locals
(e.g., assert score is not None; s = score; assert threshold is not None; t =
threshold; assert s >= t) instead of using type/lint ignores; update tests that
referenced the old nested class or inlined flow to use the module-level
DataValidator and the new helper.
- Around line 27-34: The check_api_available function currently probes "/" and
requires a 200 which can produce false failures; change it to request
"/v1/models" (the same endpoint the workflow waits on) and treat the service as
available for any non-5xx response (e.g., status_code < 500) so 200/401/404
still count as up; also broaden the exception catch to httpx.HTTPError (or
include RequestError/Timeout) to avoid uncaught errors. Update the logic inside
check_api_available accordingly and keep the same timeout variable and return
boolean semantics.
- Around line 173-175: The test currently checks relevancy_result.response with
`is not None`, which is ineffective because EvaluationResult.response defaults
to "" (empty string); update the assertions to verify a real API-enrichment
predicate such as a non-empty string (e.g., assert relevancy_result.response !=
"" or assert bool(relevancy_result.response)) and similarly replace the other
`is not None` checks (lines ~237-243) so the test selects the API-enriched
record instead of the first result; refer to the relevancy_result variable and
the EvaluationResult.response field when making these changes.

---

Nitpick comments:
In `@Makefile`:
- Around line 62-63: The Makefile defines the e2e_tests target but doesn't list
it in the .PHONY declaration, so add "e2e_tests" to the existing .PHONY target
(the line that currently contains targets like "help" and "test") to mark it as
phony; edit the .PHONY entry to include e2e_tests (ensuring spacing/formatting
matches the current .PHONY line) so the e2e_tests recipe always runs regardless
of any file or directory named e2e_tests.

In `@pyproject.toml`:
- Line 70: Confirm whether adding the direct dev dependency
"pytest-timeout>=2.4.0" is intentional; if it is not, remove the
"pytest-timeout>=2.4.0" entry from pyproject.toml so the repo continues to rely
on the existing pytest.ini "timeout = 300" configuration without declaring the
plugin, and if it is intentional, leave the line but update project docs or
dev-dependency policy to record the decision and reasoning for declaring
pytest-timeout explicitly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24713dcf-7a54-4f39-aed9-590845801e58

📥 Commits

Reviewing files that changed from the base of the PR and between adc2689 and 41ba64d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • .github/workflows/e2e_tests.yaml
  • Makefile
  • pyproject.toml
  • pytest.ini
  • tests/integration/test_evaluation_data.yaml
  • tests/integration/test_full_evaluation.py

Comment thread .github/workflows/e2e_tests.yaml
Comment thread tests/integration/test_full_evaluation.py
Comment thread tests/integration/test_full_evaluation.py
Comment thread tests/integration/test_full_evaluation.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/e2e_tests.yaml (1)

11-18: ⚠️ Potential issue | 🟠 Major

matrix.mode is undefined; job name and env variable will be empty.

The matrix only defines lsc_image_path, but lines 15 and 18 reference matrix.mode which doesn't exist. This will result in an empty string in the job name and LSC_IMAGE_NAME environment variable.

🐛 Proposed fix

Either remove the mode reference if it's no longer needed:

     strategy:
       matrix:
         lsc_image_path: ["quay.io/lightspeed-core/lightspeed-stack:latest"]
     
-    name: "E2E Lightspeed Evaluation Test, mode: ${{ matrix.mode }}"
+    name: "E2E Lightspeed Evaluation Test"

     env:
-      LSC_IMAGE_NAME: "lightspeed-stack-test-mode-${{ matrix.mode }}"
+      LSC_IMAGE_NAME: "lightspeed-stack-test"

Or add the mode back to the matrix if per-variant execution is still needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml around lines 11 - 18, The workflow
references an undefined matrix variable matrix.mode in the job name ("name") and
the environment variable LSC_IMAGE_NAME; either restore a mode entry in the
strategy.matrix (e.g., add a "mode" array alongside lsc_image_path) so
matrix.mode resolves, or remove the ${ { matrix.mode }} interpolation from the
"name" and LSC_IMAGE_NAME env declaration so they no longer reference
matrix.mode; update the "name" field and the env LSC_IMAGE_NAME accordingly.
♻️ Duplicate comments (2)
tests/integration/test_full_evaluation.py (1)

1-1: 🛠️ Refactor suggestion | 🟠 Major

Drop the lint suppressions and fix the underlying causes.

The file-level # pylint: disable at line 1 and method-level suppressions at line 81 and # type: ignore at line 170 violate the repository's coding guidelines. Consider:

  • Extract DataValidator import to module level or use a helper to reduce locals
  • For the type comparison at line 170, use explicit non-None assertions before the comparison

As per coding guidelines: **/*.py: Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_full_evaluation.py` at line 1, Remove the file-level
pylint disable and the method-level suppressions; instead move the DataValidator
import out of the local scope into module-level imports (or add a small helper
function that returns the validator) to avoid redefined-outer-name/locals
warnings, and replace the blind `# type: ignore` used around the type comparison
by asserting the value is not None before comparing (use explicit runtime checks
or typing.cast/assertIsNotNone equivalents) so the type-checker and pylint are
satisfied; update any affected test functions (those referencing DataValidator
and the comparison) to use the module-level import/helper and the non-None
assertion.
.github/workflows/e2e_tests.yaml (1)

105-105: ⚠️ Potential issue | 🔴 Critical

Security risk: PR-controlled make target runs under pull_request_target with secrets.

The workflow checks out untrusted PR code (lines 26, 30) and then runs make e2e_tests with OPENAI_API_KEY available. A malicious fork PR could modify the Makefile to exfiltrate the secret.

Consider either:

  1. Switching to pull_request trigger (secrets won't be available to forks, but tests won't run on fork PRs)
  2. Keeping test commands inline rather than delegating to Makefile
  3. Using a two-workflow approach where approval gates the secret-using job

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/e2e_tests.yaml at line 105, The workflow currently runs
the Makefile target make e2e_tests under a pull_request_target trigger with
OPENAI_API_KEY available, which lets untrusted PR code modify the Makefile and
exfiltrate secrets; fix by either (A) switching the workflow trigger from
pull_request_target to pull_request so secrets aren’t exposed to forked PRs, (B)
inlining the e2e commands directly in the job instead of invoking make e2e_tests
so the checked-out PR repo cannot alter secret-using commands, or (C) split into
two workflows/jobs where the job that uses OPENAI_API_KEY (the job that runs
e2e_tests) is gated/approved or runs only after a trusted merge to prevent
PR-controlled Makefile execution—apply the chosen change to the job that invokes
make e2e_tests and ensure OPENAI_API_KEY is not accessible to untrusted runs.
🧹 Nitpick comments (2)
tests/integration/test_full_evaluation.py (2)

122-128: Consider moving DataValidator import to module level.

The inline import at lines 122 and 225 contributes to the high local variable count that necessitates the too-many-locals suppression. Moving this import to the module level would help address the underlying lint issue.

♻️ Proposed fix at module level

Add to the imports section (around line 23):

from lightspeed_evaluation.core.system import DataValidator

Then remove the inline imports at lines 122 and 225.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_full_evaluation.py` around lines 122 - 128, Move the
inline import of DataValidator to the module-level imports: add "from
lightspeed_evaluation.core.system import DataValidator" near the top of the test
module and remove the two inline imports that instantiate DataValidator (the
ones before creating validator/evaluation_data at the two locations currently
importing DataValidator inline). This reduces local variable count and lets you
remove the related too-many-locals suppression while preserving the existing
DataValidator usage when creating validator and calling
validator.load_evaluation_data.

168-175: Redundant assertion at line 174.

Line 168 already asserts relevancy_result.response.strip() is truthy, which implies the response exists and is non-empty. The assertion at line 174 (response is not None) is redundant since EvaluationResult.response defaults to "" (never None).

♻️ Proposed fix
         # Verify score is above threshold
         assert relevancy_result.response.strip(), "Should have response from API"
         assert (
             relevancy_result.score >= relevancy_result.threshold  # type: ignore
         ), f"Score {relevancy_result.score} should be >= threshold {relevancy_result.threshold}"

-        # Verify we got a response from the API
-        assert relevancy_result.response is not None, "Should have response from API"
         assert relevancy_result.query == "What is the capital of France?"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_full_evaluation.py` around lines 168 - 175, Remove the
redundant assertion that checks relevancy_result.response is not None because
earlier assert relevancy_result.response.strip() already guarantees a non-empty
string and EvaluationResult.response defaults to ""; delete the line containing
the redundant assertion and keep the existing strip() check and the
query/assertion on relevancy_result.query to validate behavior of
relevancy_result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 11-18: The workflow references an undefined matrix variable
matrix.mode in the job name ("name") and the environment variable
LSC_IMAGE_NAME; either restore a mode entry in the strategy.matrix (e.g., add a
"mode" array alongside lsc_image_path) so matrix.mode resolves, or remove the ${
{ matrix.mode }} interpolation from the "name" and LSC_IMAGE_NAME env
declaration so they no longer reference matrix.mode; update the "name" field and
the env LSC_IMAGE_NAME accordingly.

---

Duplicate comments:
In @.github/workflows/e2e_tests.yaml:
- Line 105: The workflow currently runs the Makefile target make e2e_tests under
a pull_request_target trigger with OPENAI_API_KEY available, which lets
untrusted PR code modify the Makefile and exfiltrate secrets; fix by either (A)
switching the workflow trigger from pull_request_target to pull_request so
secrets aren’t exposed to forked PRs, (B) inlining the e2e commands directly in
the job instead of invoking make e2e_tests so the checked-out PR repo cannot
alter secret-using commands, or (C) split into two workflows/jobs where the job
that uses OPENAI_API_KEY (the job that runs e2e_tests) is gated/approved or runs
only after a trusted merge to prevent PR-controlled Makefile execution—apply the
chosen change to the job that invokes make e2e_tests and ensure OPENAI_API_KEY
is not accessible to untrusted runs.

In `@tests/integration/test_full_evaluation.py`:
- Line 1: Remove the file-level pylint disable and the method-level
suppressions; instead move the DataValidator import out of the local scope into
module-level imports (or add a small helper function that returns the validator)
to avoid redefined-outer-name/locals warnings, and replace the blind `# type:
ignore` used around the type comparison by asserting the value is not None
before comparing (use explicit runtime checks or typing.cast/assertIsNotNone
equivalents) so the type-checker and pylint are satisfied; update any affected
test functions (those referencing DataValidator and the comparison) to use the
module-level import/helper and the non-None assertion.

---

Nitpick comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 122-128: Move the inline import of DataValidator to the
module-level imports: add "from lightspeed_evaluation.core.system import
DataValidator" near the top of the test module and remove the two inline imports
that instantiate DataValidator (the ones before creating
validator/evaluation_data at the two locations currently importing DataValidator
inline). This reduces local variable count and lets you remove the related
too-many-locals suppression while preserving the existing DataValidator usage
when creating validator and calling validator.load_evaluation_data.
- Around line 168-175: Remove the redundant assertion that checks
relevancy_result.response is not None because earlier assert
relevancy_result.response.strip() already guarantees a non-empty string and
EvaluationResult.response defaults to ""; delete the line containing the
redundant assertion and keep the existing strip() check and the query/assertion
on relevancy_result.query to validate behavior of relevancy_result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fe639636-4f4c-452c-be16-b5619fb0af36

📥 Commits

Reviewing files that changed from the base of the PR and between 41ba64d and 188d64b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • .github/workflows/e2e_tests.yaml
  • Makefile
  • pyproject.toml
  • pytest.ini
  • tests/integration/test_evaluation_data.yaml
  • tests/integration/test_full_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/test_evaluation_data.yaml
  • Makefile
  • pyproject.toml

Copy link
Copy Markdown
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !!

Comment thread .github/workflows/e2e_tests.yaml Outdated
Comment thread uv.lock
Comment thread tests/integration/test_full_evaluation.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
Makefile (1)

62-64: Consider adding help text for the new target.

Other Makefile targets include ## comment for the help system (e.g., test: install-deps-test ## Execute tests with Pytest). Adding similar documentation would ensure make help shows this new target.

📝 Suggested change
-e2e_tests: install-deps-test
+e2e_tests: install-deps-test ## Execute e2e integration tests with Pytest
 	uv run pytest tests/integration -v -m integration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 62 - 64, The new Makefile target e2e_tests lacks the
help annotation used by the project's help system; update the e2e_tests target
(symbol: e2e_tests) to include a help comment using the same pattern as other
targets (add a "## ..." comment on the target line or immediately after it,
e.g., "## Run end-to-end integration tests") so it appears in make help, keeping
the existing dependency (install-deps-test) and command (uv run pytest
tests/integration -v -m integration) unchanged.
tests/integration/test_full_evaluation.py (1)

168-175: Remove the redundant is not None assertion.

Line 174's assertion relevancy_result.response is not None is redundant after Line 168's assert relevancy_result.response.strip(). Since EvaluationResult.response defaults to "" (not None), the is not None check always passes and doesn't add value.

♻️ Suggested simplification
         # Verify we got a response from the API
-        assert relevancy_result.response.strip(), "Should have response from API"
-        assert (
-            relevancy_result.score >= relevancy_result.threshold  # type: ignore
-        ), f"Score {relevancy_result.score} should be >= threshold {relevancy_result.threshold}"
-
-        # Verify we got a response from the API
-        assert relevancy_result.response is not None, "Should have response from API"
+        assert relevancy_result.response.strip(), "Should have non-empty response from API"
+        assert relevancy_result.score is not None and relevancy_result.threshold is not None, (
+            "Score and threshold should be set"
+        )
+        assert (
+            relevancy_result.score >= relevancy_result.threshold
+        ), f"Score {relevancy_result.score} should be >= threshold {relevancy_result.threshold}"
         assert relevancy_result.query == "What is the capital of France?"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_full_evaluation.py` around lines 168 - 175, Remove the
redundant assertion that checks relevancy_result.response is not None; the
earlier assertion using relevancy_result.response.strip() already guarantees a
non-empty string (EvaluationResult.response defaults to ""), so delete the
assert relevancy_result.response is not None line and keep the existing strip()
check and other assertions (e.g., relevancy_result.query and score comparisons)
intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Makefile`:
- Around line 62-64: The new Makefile target e2e_tests lacks the help annotation
used by the project's help system; update the e2e_tests target (symbol:
e2e_tests) to include a help comment using the same pattern as other targets
(add a "## ..." comment on the target line or immediately after it, e.g., "##
Run end-to-end integration tests") so it appears in make help, keeping the
existing dependency (install-deps-test) and command (uv run pytest
tests/integration -v -m integration) unchanged.

In `@tests/integration/test_full_evaluation.py`:
- Around line 168-175: Remove the redundant assertion that checks
relevancy_result.response is not None; the earlier assertion using
relevancy_result.response.strip() already guarantees a non-empty string
(EvaluationResult.response defaults to ""), so delete the assert
relevancy_result.response is not None line and keep the existing strip() check
and other assertions (e.g., relevancy_result.query and score comparisons)
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1850b091-f917-44c6-a042-d552c57f1daf

📥 Commits

Reviewing files that changed from the base of the PR and between 188d64b and 21fff16.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • .github/workflows/e2e_tests.yaml
  • Makefile
  • pyproject.toml
  • pytest.ini
  • tests/integration/test_evaluation_data.yaml
  • tests/integration/test_full_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • pytest.ini

Copy link
Copy Markdown
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asamal4 asamal4 merged commit 34cbae4 into lightspeed-core:main Mar 10, 2026
17 checks passed
@VladimirKadlec VladimirKadlec deleted the e2etests branch March 10, 2026 10:40
emac-E pushed a commit to emac-E/lightspeed-evaluation that referenced this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants