Skip to content

chore: add prerelease tests and skill#1424

Open
leonardmq wants to merge 3 commits into
mainfrom
leonard/kil-666-ops-prerelease-skill
Open

chore: add prerelease tests and skill#1424
leonardmq wants to merge 3 commits into
mainfrom
leonard/kil-666-ops-prerelease-skill

Conversation

@leonardmq
Copy link
Copy Markdown
Collaborator

What does this PR do?

We have paid tests (@pytest.mark.paid), but we don't usually run them as there are too many. This PR adds a new tag @pytest.mark.prerelease and marks a few select tests and new ones for double checking integration with providers prior to release. Also adds a Skill to orchestrate the prerelease checks and auto-update the prerelease checks if some of the models go stale, etc.

Changes:

  • Adds --runprerelease pytest flag + @pytest.mark.prerelease marker; runs a curated subset of paid tests (~100 cases, down from ~2300 if we'd blanket-tagged all the paid tests) covering vertex live, OpenAI/Claude/Gemini/Groq/Bedrock smokes, structured output, embeddings, streaming, extraction, reranker, thinking levels, prompt caching.
  • Centralized model whitelist in libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py; fan-out tests (embeddings/extraction/thinking) get sibling *_prerelease_smoke variants that iterate only over the whitelist, while the full-fan-out tests stay @pytest.mark.paid.
  • New .agents/skills/kiln-prerelease-check skill: runs checks.sh + --runprerelease, sweeps whitelist & hardcoded slugs for deprecated/provider-rejected/newer-sibling cases (mandatory every run), verifies any swap with run-old-then-run-new, writes a timestamped report to .prerelease/<ts>/REPORT.md.
  • .prerelease/ added to .gitignore; no prod-path code touched.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

<review_stack_artifact>

</review_stack_artifact>

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: add prerelease tests and skill' accurately and concisely describes the main changes: adding a prerelease test marker/flag and a new orchestration Skill.
Description check ✅ Passed The PR description covers all required template sections: 'What does this PR do?' with detailed changes, Related Issues (not applicable), CLA confirmation, and both checklist items marked complete.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch leonard/kil-666-ops-prerelease-skill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

📊 Coverage Report

Overall Coverage: 92%

Diff: origin/main...HEAD

  • libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py (100%)

Summary

  • Total: 7 lines
  • Missing: 0 lines
  • Coverage: 100%

@leonardmq leonardmq marked this pull request as draft May 20, 2026 18:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new automated pre-release check skill for Kiln, which integrates standard CI checks with a curated suite of live-API smoke tests. Key additions include a new prerelease pytest marker, a --runprerelease flag, and a whitelist of representative models to ensure cost-effective and focused verification. Review feedback correctly identified a typo in a model name within the whitelist and recommended more robust shell command patterns for environment variable handling and git history auditing to prevent potential failures in edge cases.

# reasoning channel; the test verifies the reasoning content channel is
# populated (or absent for "none").
PRERELEASE_THINKING_MODELS: list[tuple[str, str, str]] = [
(ModelProviderName.openai.value, "gpt_o4_mini_low", "low"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The model name gpt_o4_mini_low appears to be a typo. It seems to have swapped 'o' and '4' and incorrectly includes the thinking level in the model name. Based on other entries, it should likely be gpt_4o_mini.

Suggested change
(ModelProviderName.openai.value, "gpt_o4_mini_low", "low"),
(ModelProviderName.openai.value, "gpt_4o_mini", "low"),

- **Network access is required.** Every phase here makes real API calls or hits remote model-list endpoints. If you are running this skill inside a sandboxed Bash session, request `required_permissions: ["all"]` for the test commands.
- **Env vars:** Source `.env` before running anything that needs API keys:
```bash
export $(grep -v '^#' .env | xargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command export $(grep -v '^#' .env | xargs) is not robust. It will fail to parse environment variable values that contain spaces if they are not quoted (e.g., VAR=some value). xargs splits input by spaces by default, which can lead to incorrect exports.

A safer way to export variables from a .env file is to process it line by line. Consider using a while read loop for more reliable parsing:

while IFS= read -r line; do
  [[ -z "$line" || "$line" =~ ^# ]] && continue
  export "$line"
done < .env

Comment on lines +272 to +274
git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- \
$(awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u) | sort -u | \
head -50 > "${OUT}/recent_paid_files.txt"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This command can fail with an "Argument list too long" error if there are many files in paid_only.txt. It's safer to pipe the file list to xargs to handle a large number of files gracefully.

Consider this more robust alternative:

awk -F: '{print $1}' "${OUT}/paid_only.txt" | sort -u | \
  xargs git log --since="3 months ago" --diff-filter=AM --name-only --pretty=format: -- | \
  sort -u | head -50 > "${OUT}/recent_paid_files.txt"

@leonardmq leonardmq marked this pull request as ready for review May 25, 2026 19:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.agents/skills/kiln-prerelease-check/SKILL.md (1)

1-432: 💤 Low value

Consider standardizing hyphenation of "prerelease" throughout the document.

The document mixes "pre-release" (e.g., in the title "Kiln Pre-release Check") and "prerelease" (e.g., in code references like @pytest.mark.prerelease). While the unhyphenated form is necessary for code identifiers, consider standardizing on one variant for prose text to improve consistency.

Suggestion: Use "prerelease" consistently in prose to match the technical terminology used in code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/kiln-prerelease-check/SKILL.md around lines 1 - 432, The
document mixes "pre-release" and "prerelease" in prose; update all
human-readable occurrences of "pre-release" or "pre release" to the single
unhyphenated form "prerelease" for consistency with code identifiers (do not
change any code symbols like `@pytest.mark.prerelease` or filenames). Search for
and replace the title "Kiln Pre-release Check", headings, and all prose
instances (e.g., "pre-release verification", "pre-release test", "pre-release
set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test",
etc., while preserving backtick-quoted code identifiers and examples exactly as
they are.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.agents/skills/kiln-prerelease-check/SKILL.md:
- Around line 1-432: The document mixes "pre-release" and "prerelease" in prose;
update all human-readable occurrences of "pre-release" or "pre release" to the
single unhyphenated form "prerelease" for consistency with code identifiers (do
not change any code symbols like `@pytest.mark.prerelease` or filenames). Search
for and replace the title "Kiln Pre-release Check", headings, and all prose
instances (e.g., "pre-release verification", "pre-release test", "pre-release
set") to "Kiln Prerelease Check", "prerelease verification", "prerelease test",
etc., while preserving backtick-quoted code identifiers and examples exactly as
they are.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 514c8bd1-d48a-450e-a498-0a4211eccec7

📥 Commits

Reviewing files that changed from the base of the PR and between 0448d84 and 6f1e4cf.

📒 Files selected for processing (5)
  • .agents/skills/kiln-prerelease-check/SKILL.md
  • libs/core/kiln_ai/adapters/model_adapters/test_prompt_caching.py
  • libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
  • libs/core/kiln_ai/adapters/pytest_prerelease_whitelist.py
  • libs/core/kiln_ai/adapters/test_prompt_adaptors.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants