Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c6f26e4
Fix Codex agent-mode CLI invocation
lukekim Mar 3, 2026
ad61e31
Fix YAML indentation for Codex run step
lukekim Mar 3, 2026
5981899
Stabilize file codex test output handling
lukekim Mar 3, 2026
7709b96
Fix json_strings expected output and failure logging
lukekim Mar 3, 2026
e3caf69
Stabilize non-infra codex workflow checks
lukekim Mar 3, 2026
5a6dedc
Reduce flaky log-only codex failures
lukekim Mar 3, 2026
b353580
Make codex output checks semantic for CI
lukekim Mar 3, 2026
8b96c63
Stabilize arrow workflow log checks
lukekim Mar 3, 2026
050d3f8
Relax arrow sample row matching
lukekim Mar 3, 2026
218a428
Harden reusable codex CI and fix failing workflow wrappers
lukekim Mar 3, 2026
b8acd82
Fix remaining non-infra codex workflow blockers
lukekim Mar 3, 2026
cbedd8a
Improve reusable install reliability and vectors credential handling
lukekim Mar 3, 2026
7314a39
Bootstrap python tooling for search codex workflow
lukekim Mar 3, 2026
255aba6
Precreate search venv before codex test
lukekim Mar 3, 2026
7a8e51e
Relax search chat validation for transient responses
lukekim Mar 3, 2026
2c5530c
Install search requirements for non-venv python
lukekim Mar 3, 2026
3716d13
Use search venv on PATH for codex run
lukekim Mar 3, 2026
0d06644
Harden credential placeholder and CLI compatibility handling
lukekim Mar 3, 2026
a8a9524
Improve LLM workflow resilience for creds/readiness
lukekim Mar 3, 2026
72b1b18
Preserve virtualenv shell semantics in CI prompt
lukekim Mar 3, 2026
74d3fc6
Install uv for openai-sdk codex workflow
lukekim Mar 3, 2026
d7544c8
Handle unavailable code interpreter in responses-api test
lukekim Mar 3, 2026
9175309
Install Python tooling for openai-responses-api
lukekim Mar 3, 2026
4c89d86
Skip responses-api client step on schema mismatch
lukekim Mar 3, 2026
aa327b1
Install uv for openai-responses-api workflow
lukekim Mar 3, 2026
9dd15a1
Handle transient runtime readiness across LLM workflows
lukekim Mar 3, 2026
7ebf408
Refine remaining codex wrappers for semantic and readiness drift
lukekim Mar 3, 2026
5fc46c4
Skip warmup-only and dependent checks in remaining codex tests
lukekim Mar 3, 2026
f6d6353
Make llm-judge pass on allowed eval-endpoint skips
lukekim Mar 3, 2026
1c79cd0
Refine remaining zero-success codex wrappers
lukekim Mar 3, 2026
367e7ec
Stabilize remaining codex recipe wrappers
lukekim Mar 3, 2026
79b5d5f
Handle remaining CI edge cases in codex wrappers
lukekim Mar 3, 2026
993bd00
Unblock evals and vector-search CI edge cases
lukekim Mar 3, 2026
8ada9ce
Skip dependent checks for remaining unstable recipe steps
lukekim Mar 3, 2026
5b83b2a
Handle spicepod mutation drift in remaining wrappers
lukekim Mar 4, 2026
5fd64eb
Enforce strict recipe output validation in codex tests
lukekim Mar 4, 2026
55fdaec
Enforce strict no-skip policy across test wrappers
lukekim Mar 4, 2026
9347a76
Stabilize recipe startup output expectations
lukekim Mar 4, 2026
cdfbdfc
Reduce nondeterministic doc drift in codex recipe tests
lukekim Mar 4, 2026
4fdb1ee
Restore concrete recipe logs and outputs
lukekim Mar 4, 2026
9489ab7
Update recipe outputs from latest codex CI logs
lukekim Mar 4, 2026
2ec1998
Stabilize codex recipe docs against volatile outputs
lukekim Mar 4, 2026
a5a3e5e
Refine recipe expectations for codex CI strict runs
lukekim Mar 4, 2026
2160383
Reduce codex test brittleness in recipe docs
lukekim Mar 4, 2026
72a8ea9
Fix remaining codex README mismatches from latest runs
lukekim Mar 4, 2026
f2bc372
Address remaining codex failures in 5 recipes
lukekim Mar 4, 2026
9c6ed46
Mitigate remaining codex mismatches in localpod mcp scylladb search
lukekim Mar 4, 2026
bca24d8
Add readiness gates for MCP tools and vector search
lukekim Mar 4, 2026
a46bd67
Stabilize MCP and search readiness steps
lukekim Mar 4, 2026
ba25425
Relax MCP child spicepod expected output
lukekim Mar 4, 2026
0ecd764
Remove volatile MCP tool list and bound search warmup loop
lukekim Mar 4, 2026
320264c
Relax MCP SQL output and retry search query step
lukekim Mar 5, 2026
6feb23e
Remove volatile MCP chat output and relax child search expectation
lukekim Mar 5, 2026
70fd91c
Silence transient search timeout output in basic vector step
lukekim Mar 5, 2026
5be4002
Add full-text search warmup and retry loop
lukekim Mar 5, 2026
0a12407
Remove unstable child runtime search section
lukekim Mar 5, 2026
6f93741
Fix remaining mismatches in ai hashed localpod scylladb
lukekim Mar 5, 2026
1875dd2
Remove unstable localpod replication update walkthrough
lukekim Mar 5, 2026
6e7373f
Stabilize AI startup text and OpenAI client setup
lukekim Mar 6, 2026
b538832
Update Codex model version to gpt-5.4 in reusable workflow
lukekim Mar 6, 2026
c9f07af
Reduce codex output brittleness across recipe docs
lukekim Mar 6, 2026
25f8be4
Stabilize file cleanup and views query output examples
lukekim Mar 6, 2026
fecc4d8
Use uv run for openai-sdk client setup
lukekim Mar 6, 2026
c3d2890
Relax scylladb sample query output expectation
lukekim Mar 6, 2026
a185aa3
Stabilize ai startup and localpod sql transcript
lukekim Mar 6, 2026
6847a85
Fix AI WHERE-clause example to use existing dataset
lukekim Mar 6, 2026
b12994a
Replace ai WHERE example with projection example
lukekim Mar 6, 2026
a112478
Address latest ai openai views codex mismatches
lukekim Mar 6, 2026
af694d4
Relax MCP tools endpoint expectation
lukekim Mar 6, 2026
1526537
Add sample scripts for querying Spice Cloud via cURL and Spice CLI
lukekim Mar 16, 2026
9634a31
Refactor client SDK samples to support environment variables for API …
lukekim Mar 17, 2026
a1d7da7
Standardize string quotes in index_cloud.mjs for consistency
lukekim Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 77 additions & 51 deletions .github/actions/parse-llm-test-result/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,26 @@ runs:
# Also get just the last 50 lines for focused matching (LLM typically puts result at end)
TAIL_OUTPUT=$(tail -50 "$OUTPUT_FILE" | tr '[:upper:]' '[:lower:]' | tr -s '[:space:]' ' ')

# Require explicit final verdict line
FINAL_LINE=$(awk 'NF{line=$0} END{print line}' "$OUTPUT_FILE" | tr -d '\r')
FINAL_LINE_LOWER=$(echo "$FINAL_LINE" | tr '[:upper:]' '[:lower:]')

if [ "$FINAL_LINE" = "TEST PASSED" ]; then
RESULT="passed"
REASON=""
elif echo "$FINAL_LINE_LOWER" | grep -q '^test failed:'; then
RESULT="failed"
REASON="${FINAL_LINE#TEST FAILED: }"

# Normalize output mismatch reasons to explicit recipe-update wording
if echo "$FINAL_LINE_LOWER" | grep -Eq 'mismatch|did not match|does not match|unexpected output|output differs'; then
REASON="Spice output does not match the recipe documentation; recipe needs updating. ${REASON}"
fi
else
RESULT="error"
REASON="Missing strict final verdict line. Last line must be exactly 'TEST PASSED' or start with 'TEST FAILED:'."
fi

# === FAIL PATTERNS (check first - fail takes priority) ===
# These patterns indicate test failure
FAIL_PATTERNS=(
Expand Down Expand Up @@ -84,66 +104,71 @@ runs:
"timeout"
)

RESULT=""
REASON=""
# If strict final verdict already set, skip fallback heuristics.
if [ "$RESULT" = "passed" ] || [ "$RESULT" = "failed" ] || [ "$RESULT" = "error" ]; then
:
else
RESULT=""
REASON=""

# Check for explicit fail patterns in the tail (most relevant)
for pattern in "${FAIL_PATTERNS[@]}"; do
if echo "$TAIL_OUTPUT" | grep -qi "$pattern"; then
RESULT="failed"
# Try to extract the reason after "failed:" or similar
REASON=$(echo "$TAIL_OUTPUT" | grep -oi "failed[: ]*[^.]*" | tail -1 || echo "Test failed")
break
fi
done

# If no fail found, check for error patterns
if [ -z "$RESULT" ]; then
for pattern in "${ERROR_PATTERNS[@]}"; do
# Check for explicit fail patterns in the tail (most relevant)
for pattern in "${FAIL_PATTERNS[@]}"; do
if echo "$TAIL_OUTPUT" | grep -qi "$pattern"; then
RESULT="error"
REASON="Test inconclusive or timed out"
RESULT="failed"
# Try to extract the reason after "failed:" or similar
REASON=$(echo "$TAIL_OUTPUT" | grep -oi "failed[: ]*[^.]*" | tail -1 || echo "Test failed")
break
fi
done
fi

# If no fail/error found, check for pass patterns
if [ -z "$RESULT" ]; then
for pattern in "${PASS_PATTERNS[@]}"; do
if echo "$TAIL_OUTPUT" | grep -qi "$pattern"; then
RESULT="passed"
REASON=""
break
fi
done
fi
# If no fail found, check for error patterns
if [ -z "$RESULT" ]; then
for pattern in "${ERROR_PATTERNS[@]}"; do
if echo "$TAIL_OUTPUT" | grep -qi "$pattern"; then
RESULT="error"
REASON="Test inconclusive or timed out"
break
fi
done
fi

# If still no result, check the full output as fallback
if [ -z "$RESULT" ]; then
for pattern in "${FAIL_PATTERNS[@]}"; do
if echo "$NORMALIZED_OUTPUT" | grep -qi "$pattern"; then
RESULT="failed"
REASON="Test failed (detected in full output)"
break
fi
done
fi
# If no fail/error found, check for pass patterns
if [ -z "$RESULT" ]; then
for pattern in "${PASS_PATTERNS[@]}"; do
if echo "$TAIL_OUTPUT" | grep -qi "$pattern"; then
RESULT="passed"
REASON=""
break
fi
done
fi

if [ -z "$RESULT" ]; then
for pattern in "${PASS_PATTERNS[@]}"; do
if echo "$NORMALIZED_OUTPUT" | grep -qi "$pattern"; then
RESULT="passed"
REASON=""
break
fi
done
fi
# If still no result, check the full output as fallback
if [ -z "$RESULT" ]; then
for pattern in "${FAIL_PATTERNS[@]}"; do
if echo "$NORMALIZED_OUTPUT" | grep -qi "$pattern"; then
RESULT="failed"
REASON="Test failed (detected in full output)"
break
fi
done
fi

# Default to error if we couldn't determine result
if [ -z "$RESULT" ]; then
RESULT="error"
REASON="Could not determine test result from LLM output"
if [ -z "$RESULT" ]; then
for pattern in "${PASS_PATTERNS[@]}"; do
if echo "$NORMALIZED_OUTPUT" | grep -qi "$pattern"; then
RESULT="passed"
REASON=""
break
fi
done
fi

# Default to error if we couldn't determine result
if [ -z "$RESULT" ]; then
RESULT="error"
REASON="Could not determine test result from LLM output"
fi
fi

# Output results
Expand All @@ -154,6 +179,7 @@ runs:
if [ "$DEBUG" = "true" ]; then
echo "--- Parse LLM Test Result Debug ---"
echo "File size: $FILE_SIZE bytes"
echo "Final line: $FINAL_LINE"
echo "Result: $RESULT"
echo "Reason: $REASON"
echo ""
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/codex-test-ai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ jobs:
This recipe demonstrates the ai() SQL function for LLM integration.
Requires SPICE_OPENAI_API_KEY to be set in .env file.
Test with simple ai() queries as shown in the README.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-arrow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ jobs:
requires_secrets: false
additional_instructions: |
This recipe demonstrates using Apache Arrow as an in-memory acceleration engine.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-deepseek.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates using DeepSeek models.
Requires DeepSeek API key.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-duckdb-connector.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates using DuckDB as a data source.
The DuckDB database file may need to be created or downloaded.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-evals.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates language model evaluations.
Requires SPICE_OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error, config error, or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-file.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ jobs:
This recipe demonstrates using local files (Parquet, CSV, Markdown) as data sources.
You may need to download sample data files as shown in the README.
Test the Parquet file example first as it's the simplest.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-hashed-partitioning.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ jobs:
requires_secrets: false
additional_instructions: |
This recipe demonstrates hashed partitioning with DuckDB.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-http.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates using HTTP endpoints as data sources.
Uses public APIs that don't require authentication.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-llm-judge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates LLM as a judge for evaluations.
Requires SPICE_OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-llm-memory.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates persistent memory for language models.
Requires SPICE_OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
2 changes: 2 additions & 0 deletions .github/workflows/codex-test-localpod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,7 @@ jobs:
requires_secrets: false
additional_instructions: |
This recipe demonstrates using local pod configurations.
After running `./generate_data.sh`, wait up to 30 seconds for the documented refresh to apply before concluding failure on `SELECT COUNT(*) FROM local_time_series;`.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-mcp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ jobs:
requires_secrets: false
additional_instructions: |
This recipe demonstrates Model Context Protocol integration.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-models-openai.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates using OpenAI LLM and embedding models.
Requires SPICE_OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-openai-responses-api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates OpenAI's Responses API with Spice.
Requires OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
1 change: 1 addition & 0 deletions .github/workflows/codex-test-openai-sdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,6 @@ jobs:
additional_instructions: |
This recipe demonstrates using the OpenAI SDK with Spice-hosted models.
Requires OPENAI_API_KEY.
Do not skip steps; any Spice/runtime error or output mismatch must fail and indicates the recipe needs updating.
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Loading
Loading