chore: add prerelease tests and skill#1424
Conversation
|
<review_stack_artifact> </review_stack_artifact> 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage ReportOverall Coverage: 92% Diff: origin/main...HEAD
Summary
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new automated pre-release check skill for Kiln, which integrates standard CI checks with a curated suite of live-API smoke tests. Key additions include a new prerelease pytest marker, a --runprerelease flag, and a whitelist of representative models to ensure cost-effective and focused verification. Review feedback correctly identified a typo in a model name within the whitelist and recommended more robust shell command patterns for environment variable handling and git history auditing to prevent potential failures in edge cases.
| # reasoning channel; the test verifies the reasoning content channel is | ||
| # populated (or absent for "none"). | ||
| PRERELEASE_THINKING_MODELS: list[tuple[str, str, str]] = [ | ||
| (ModelProviderName.openai.value, "gpt_o4_mini_low", "low"), |
There was a problem hiding this comment.
The model name gpt_o4_mini_low appears to be a typo. It seems to have swapped 'o' and '4' and incorrectly includes the thinking level in the model name. Based on other entries, it should likely be gpt_4o_mini.
| (ModelProviderName.openai.value, "gpt_o4_mini_low", "low"), | |
| (ModelProviderName.openai.value, "gpt_4o_mini", "low"), |
| - **Network access is required.** Every phase here makes real API calls or hits remote model-list endpoints. If you are running this skill inside a sandboxed Bash session, request `required_permissions: ["all"]` for the test commands. | ||
| - **Env vars:** Source `.env` before running anything that needs API keys: | ||
| ```bash | ||
| export $(grep -v '^#' .env | xargs) |
There was a problem hiding this comment.
The command export $(grep -v '^#' .env | xargs) is not robust. It will fail to parse environment variable values that contain spaces if they are not quoted (e.g., VAR=some value). xargs splits input by spaces by default, which can lead to incorrect exports.
A safer way to export variables from a .env file is to process it line by line. Consider using a while read loop for more reliable parsing:
while IFS= read -r line; do
[[ -z "$line" || "$line" =~ ^# ]] && continue
export "$line"
done < .env| git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- \ | ||
| $(awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u) | sort -u | \ | ||
| head -50 > "${OUT}/recent_paid_files.txt" |
There was a problem hiding this comment.
This command can fail with an "Argument list too long" error if there are many files in paid_only.txt. It's safer to pipe the file list to xargs to handle a large number of files gracefully.
Consider this more robust alternative:
awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u | \
xargs git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- | \
sort -u | head -50 > "${OUT}/recent_paid_files.txt"There was a problem hiding this comment.
🧹 Nitpick comments (1)
.agents/skills/kiln-prerelease-check/SKILL.md (1)
1-432: 💤 Low valueConsider standardizing hyphenation of "prerelease" throughout the document.
The document mixes "pre-release" (e.g., in the title "Kiln Pre-release Check") and "prerelease" (e.g., in code references like
@pytest.mark.prerelease). While the unhyphenated form is necessary for code identifiers, consider standardizing on one variant for prose text to improve consistency.Suggestion: Use "prerelease" consistently in prose to match the technical terminology used in code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/kiln-prerelease-check/SKILL.md around lines 1 - 432, The document mixes "pre-release" and "prerelease" in prose; update all human-readable occurrences of "pre-release" or "pre release" to the single unhyphenated form "prerelease" for consistency with code identifiers (do not change any code symbols like `@pytest.mark.prerelease` or filenames). Search for and replace the title "Kiln Pre-release Check", headings, and all prose instances (e.g., "pre-release verification", "pre-release test", "pre-release set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test", etc., while preserving backtick-quoted code identifiers and examples exactly as they are.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.agents/skills/kiln-prerelease-check/SKILL.md:
- Around line 1-432: The document mixes "pre-release" and "prerelease" in prose;
update all human-readable occurrences of "pre-release" or "pre release" to the
single unhyphenated form "prerelease" for consistency with code identifiers (do
not change any code symbols like `@pytest.mark.prerelease` or filenames). Search
for and replace the title "Kiln Pre-release Check", headings, and all prose
instances (e.g., "pre-release verification", "pre-release test", "pre-release
set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test",
etc., while preserving backtick-quoted code identifiers and examples exactly as
they are.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 514c8bd1-d48a-450e-a498-0a4211eccec7
📒 Files selected for processing (5)
.agents/skills/kiln-prerelease-check/SKILL.mdlibs/core/kiln_ai/adapters/model_adapters/test_prompt_caching.pylibs/core/kiln_ai/adapters/model_adapters/test_structured_output.pylibs/core/kiln_ai/adapters/pytest_prerelease_whitelist.pylibs/core/kiln_ai/adapters/test_prompt_adaptors.py
What does this PR do?
We have paid tests (
@pytest.mark.paid), but we don't usually run them as there are too many. This PR adds a new tag@pytest.mark.prereleaseand marks a few select tests and new ones for double checking integration with providers prior to release. Also adds a Skill to orchestrate the prerelease checks and auto-update the prerelease checks if some of the models go stale, etc.Changes:
--runprereleasepytest flag +@pytest.mark.prereleasemarker; runs a curated subset of paid tests (~100 cases, down from ~2300 if we'd blanket-tagged all the paid tests) covering vertex live, OpenAI/Claude/Gemini/Groq/Bedrock smokes, structured output, embeddings, streaming, extraction, reranker, thinking levels, prompt caching.libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py; fan-out tests (embeddings/extraction/thinking) get sibling*_prerelease_smokevariants that iterate only over the whitelist, while the full-fan-out tests stay@pytest.mark.paid..agents/skills/kiln-prerelease-checkskill: runschecks.sh+--runprerelease, sweeps whitelist & hardcoded slugs for deprecated/provider-rejected/newer-sibling cases (mandatory every run), verifies any swap with run-old-then-run-new, writes a timestamped report to.prerelease/<ts>/REPORT.md..prerelease/added to.gitignore; no prod-path code touched.Checklists