chore: add prerelease tests and skill by leonardmq · Pull Request #1424 · Kiln-AI/Kiln

leonardmq · 2026-05-20T18:42:10Z

What does this PR do?

We have paid tests (@pytest.mark.paid), but we don't usually run them as there are too many. This PR adds a new tag @pytest.mark.prerelease and marks a few select tests and new ones for double checking integration with providers prior to release. Also adds a Skill to orchestrate the prerelease checks and auto-update the prerelease checks if some of the models go stale, etc.

Changes:

Adds --runprerelease pytest flag + @pytest.mark.prerelease marker; runs a curated subset of paid tests (~100 cases, down from ~2300 if we'd blanket-tagged all the paid tests) covering vertex live, OpenAI/Claude/Gemini/Groq/Bedrock smokes, structured output, embeddings, streaming, extraction, reranker, thinking levels, prompt caching.
Centralized model whitelist in libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py; fan-out tests (embeddings/extraction/thinking) get sibling *_prerelease_smoke variants that iterate only over the whitelist, while the full-fan-out tests stay @pytest.mark.paid.
New .agents/skills/kiln-prerelease-check skill: runs checks.sh + --runprerelease, sweeps whitelist & hardcoded slugs for deprecated/provider-rejected/newer-sibling cases (mandatory every run), verifies any swap with run-old-then-run-new, writes a timestamped report to .prerelease/<ts>/REPORT.md.
.prerelease/ added to .gitignore; no prod-path code touched.

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

coderabbitai · 2026-05-20T18:43:18Z

<review_stack_artifact>

</review_stack_artifact>

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: add prerelease tests and skill' accurately and concisely describes the main changes: adding a prerelease test marker/flag and a new orchestration Skill.
Description check	✅ Passed	The PR description covers all required template sections: 'What does this PR do?' with detailed changes, Related Issues (not applicable), CLA confirmation, and both checklist items marked complete.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch leonard/kil-666-ops-prerelease-skill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-20T18:44:53Z

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py (100%)

Summary

Total: 7 lines
Missing: 0 lines
Coverage: 100%

📊 HTML Coverage Report - Interactive coverage report
📈 Diff Coverage Report - Detailed diff analysis
Github Actions Run - View the full coverage report

gemini-code-assist

Code Review

This pull request introduces a new automated pre-release check skill for Kiln, which integrates standard CI checks with a curated suite of live-API smoke tests. Key additions include a new prerelease pytest marker, a --runprerelease flag, and a whitelist of representative models to ensure cost-effective and focused verification. Review feedback correctly identified a typo in a model name within the whitelist and recommended more robust shell command patterns for environment variable handling and git history auditing to prevent potential failures in edge cases.

gemini-code-assist · 2026-05-20T18:55:45Z

+# reasoning channel; the test verifies the reasoning content channel is
+# populated (or absent for "none").
+PRERELEASE_THINKING_MODELS: list[tuple[str, str, str]] = [
+    (ModelProviderName.openai.value, "gpt_o4_mini_low", "low"),


The model name gpt_o4_mini_low appears to be a typo. It seems to have swapped 'o' and '4' and incorrectly includes the thinking level in the model name. Based on other entries, it should likely be gpt_4o_mini.

Suggested change

(ModelProviderName.openai.value, "gpt_o4_mini_low", "low"),

(ModelProviderName.openai.value, "gpt_4o_mini", "low"),

gemini-code-assist · 2026-05-20T18:55:45Z

+- **Network access is required.** Every phase here makes real API calls or hits remote model-list endpoints. If you are running this skill inside a sandboxed Bash session, request `required_permissions: ["all"]` for the test commands.
+- **Env vars:** Source `.env` before running anything that needs API keys:
+  ```bash
+  export $(grep -v '^#' .env | xargs)


The command export $(grep -v '^#' .env | xargs) is not robust. It will fail to parse environment variable values that contain spaces if they are not quoted (e.g., VAR=some value). xargs splits input by spaces by default, which can lead to incorrect exports.

A safer way to export variables from a .env file is to process it line by line. Consider using a while read loop for more reliable parsing:

while IFS= read -r line; do [[ -z "$line" || "$line" =~ ^# ]] && continue export "$line" done < .env

gemini-code-assist · 2026-05-20T18:55:45Z

+git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- \
+  $(awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u) | sort -u | \
+  head -50 > "${OUT}/recent_paid_files.txt"


This command can fail with an "Argument list too long" error if there are many files in paid_only.txt. It's safer to pipe the file list to xargs to handle a large number of files gracefully.

Consider this more robust alternative:

awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u | \ xargs git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- | \ sort -u | head -50 > "${OUT}/recent_paid_files.txt"

coderabbitai

🧹 Nitpick comments (1)

.agents/skills/kiln-prerelease-check/SKILL.md (1)
1-432: 💤 Low value

Consider standardizing hyphenation of "prerelease" throughout the document.

The document mixes "pre-release" (e.g., in the title "Kiln Pre-release Check") and "prerelease" (e.g., in code references like @pytest.mark.prerelease). While the unhyphenated form is necessary for code identifiers, consider standardizing on one variant for prose text to improve consistency.

Suggestion: Use "prerelease" consistently in prose to match the technical terminology used in code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/kiln-prerelease-check/SKILL.md around lines 1 - 432, The
document mixes "pre-release" and "prerelease" in prose; update all
human-readable occurrences of "pre-release" or "pre release" to the single
unhyphenated form "prerelease" for consistency with code identifiers (do not
change any code symbols like `@pytest.mark.prerelease` or filenames). Search for
and replace the title "Kiln Pre-release Check", headings, and all prose
instances (e.g., "pre-release verification", "pre-release test", "pre-release
set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test",
etc., while preserving backtick-quoted code identifiers and examples exactly as
they are.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.agents/skills/kiln-prerelease-check/SKILL.md:
- Around line 1-432: The document mixes "pre-release" and "prerelease" in prose;
update all human-readable occurrences of "pre-release" or "pre release" to the
single unhyphenated form "prerelease" for consistency with code identifiers (do
not change any code symbols like `@pytest.mark.prerelease` or filenames). Search
for and replace the title "Kiln Pre-release Check", headings, and all prose
instances (e.g., "pre-release verification", "pre-release test", "pre-release
set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test",
etc., while preserving backtick-quoted code identifiers and examples exactly as
they are.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 514c8bd1-d48a-450e-a498-0a4211eccec7

📥 Commits

Reviewing files that changed from the base of the PR and between 0448d84 and 6f1e4cf.

📒 Files selected for processing (5)

.agents/skills/kiln-prerelease-check/SKILL.md
libs/core/kiln_ai/adapters/model_adapters/test_prompt_caching.py
libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py
libs/core/kiln_ai/adapters/test_prompt_adaptors.py

leonardmq added 2 commits May 21, 2026 02:12

chore: add prerelease tests and skill

40b72f2

chore: update skill

0448d84

leonardmq marked this pull request as draft May 20, 2026 18:47

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

refine and update slugs

6f1e4cf

leonardmq requested review from scosman and tawnymanticore May 25, 2026 19:38

leonardmq marked this pull request as ready for review May 25, 2026 19:38

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

chiang-daniel approved these changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add prerelease tests and skill#1424

chore: add prerelease tests and skill#1424
leonardmq wants to merge 3 commits into
mainfrom
leonard/kil-666-ops-prerelease-skill

leonardmq commented May 20, 2026

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	(ModelProviderName.openai.value, "gpt_o4_mini_low", "low"),
	(ModelProviderName.openai.value, "gpt_4o_mini", "low"),

Conversation

leonardmq commented May 20, 2026

What does this PR do?

Checklists

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Coverage Report

Diff: origin/main...HEAD

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading