Skip to content

Print ci commands#11497

Open
jkwak-work wants to merge 21 commits into
shader-slang:masterfrom
jkwak-work:print-ci-commands
Open

Print ci commands#11497
jkwak-work wants to merge 21 commits into
shader-slang:masterfrom
jkwak-work:print-ci-commands

Conversation

@jkwak-work
Copy link
Copy Markdown
Collaborator

@jkwak-work jkwak-work commented Jun 6, 2026

It turned out that we haven't been testing all of mtl tests on mac-os.
This PR removes the confusing option of -test-category and enables mtl test on mac-os tests.

Summary

CI maintenance for how slang-test is invoked in GitHub Actions, plus a related
slang-test option-parsing fix surfaced along the way. Changes:

1. Print the fully-resolved slang-test command

The conditional bash that assembles slang_test_args is echoed as source in the
log (the if [[ ... ]]; then ... lines), but the final argument list is never
printed — so it's impossible to tell which branches fired or which arguments
slang-test actually ran with. Print the resolved command with printf %q
(shell-quoted, copy-pasteable, prefixed with + like set -x) immediately
before each invocation in ci-slang-test.yml, ci-slang-sanitizer.yml, and
ci-slang-test-container.yml.

2. Stop passing -category; use slang-test's default

Remove the test-category input/usage everywhere so slang-test runs without
-category and falls back to its default category (full). Drops the input
definitions, the hardcoded -category full in the container workflow, all
test-category: values passed from ci.yml / ci-nightly-sanitizer.yml, and
the unused test-category matrix entries in the falcor/regression/perf/cts/release
workflows.

3. Select a single -api expression per job (macOS → Metal only)

slang-test honors only one -api option, so the workflow now picks exactly one
expression by branch: cpu-onlycpu+llvm; macOS → mtl (Metal is the only GPU
API on macOS); gpu-api-only → the GPU API list; otherwise all. On macOS,
(cpu)/(llvm) variants are skipped (covered by the CPU-only Linux tier) while
no-API tests still run (their requiredFlags==0). -api all matches slang-test's
default, so non-macOS behavior is unchanged.

4. slang-test: error on a -api that silently discards an earlier one

RenderApiUtil::parseApiFlags only preserves the previous -api result when the
next expression begins with an operator (+/-); a name-leading expression
(e.g. cpu, vk, all) resets the API set from scratch, so -api vk -api cpu
silently dropped vk with no diagnostic. slang-test now rejects a second-or-later
-api whose expression does not begin with + or -, pointing the user at the
two valid forms. Operator-leading accumulation (-api all -api -vk) still works.

Notes to the reviewers

  • printf %q verified compatible with macOS bash 3.2.57; empty-array expansion is
    safe because the Actions default shell uses -e -o pipefail, not set -u.
  • The three smoke jobs (macOS-debug, linux-debug-aarch64, linux-release-aarch64)
    now run the full suite since full is slang-test's default category.
  • The -api guard is intentionally narrow: it fires only on the name-leading
    case that silently overrides; operator-leading expressions still accumulate.

jkwak-work and others added 19 commits June 5, 2026 18:11
The conditional logic that assembles slang_test_args only shows the bash
source in the log, not the final arguments, making it hard to tell which
branches fired. Print the resolved command with printf %q before each
invocation so the exact arguments are visible and copy-pasteable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the test-category input/usage from the reusable test workflows so
slang-test runs without -category and falls back to its default category
(full). Drops the input definitions in ci-slang-test.yml and
ci-slang-sanitizer.yml, the hardcoded -category full in the container
workflow, all test-category: values passed from ci.yml and
ci-nightly-sanitizer.yml, and the unused test-category matrix entries in
the falcor/regression/perf/cts/release workflows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
macOS has no native Vulkan backend, so exclude the vk API on macOS test
jobs; Vulkan-only test variants are filtered out by the test runner
rather than run and failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address review feedback: separate the printf command-echo block from the
slang-test invocation with a blank line for readability.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
slang-test honors only one -api option; a second occurrence silently
overrides the first. Build one API expression from the cpu-only,
gpu-api-only, and macOS-no-vk restrictions and append it once, folding
the macOS vk exclusion into the existing expression (all-vk on its own,
or -vk subtracted from a cpu/gpu restriction) instead of adding a second
-api.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restructure the API selection into one explicit if/elif/else that picks
exactly one expression: cpu-only -> cpu+llvm; gpu-api-only -> the GPU API
list (without vk on macOS); otherwise all (all-vk on macOS). macOS has no
native Vulkan, so its expressions exclude vk; -api all matches slang-test's
default of all APIs enabled.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
slang-test parses each -api expression relative to the previous result,
but RenderApiUtil::parseApiFlags only keeps that previous value when the
expression begins with an operator ('+' or '-'). A name-leading expression
resets the API set from scratch, so a second -api like 'cpu' after '-api vk'
silently dropped the earlier selection with no diagnostic.

Reject a second-or-later -api whose expression does not begin with '+' or
'-', pointing the user at the two valid forms (combine into one expression,
or lead with an operator to accumulate). Operator-leading expressions such
as '-api all -api -vk' still accumulate as before.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
-synthesizedTestApi feeds RenderApiUtil::parseApiFlags the same way as
-api, so a name-leading expression after an earlier -synthesizedTestApi
silently discards it. Reject that case symmetrically, tracking the two
options independently.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The script required Bash 4 only for readarray and the |& pipe operator.
Replace readarray with a read_lines helper (a portable readarray -t
emulation), swap |& for 2>&1 |, and lower the version gate to 3.2 so the
formatter runs on a stock macOS shell without 'brew install bash'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ted-target

This neural diagnostic test fails in CI; tracked as a separate issue. Add
all four target variants (-target hlsl/wgsl/glsl/cpp) to
expected-failure-github.txt to unblock CI for now.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Only the base (-target hlsl) variant fails on macOS; the .1/.2/.3
(wgsl/glsl/cpp) variants pass. Drop the over-broad entries so those
variants keep reporting real status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dropping -category means the GitHub-hosted aarch64 jobs (no GPU) now run
the full suite, which includes gfx-unit-test-tool Vulkan tests that fail
to create a Vulkan instance without a GPU. Add the four that were not yet
covered to expected-failure-no-gpu.txt, alongside the existing Vulkan
entries. The list is only applied to full-gpu-tests=false jobs, so GPU
runners still expect these to pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…macOS

This dynamic-dispatch test is flaky on the macOS test server: the two
existential-typed constant buffers' outputs come back swapped, so the
FileCheck for 15/16 fails. It passes locally and on the macOS debug job.
Add it to expected-failure-github.txt to unblock CI; the ordering root
cause (from shader-slang#10393) is tracked as a separate issue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… test

The macOS failure is intermittent (passes locally and on the macOS debug
job), so it does not belong in expected-failure-github.txt, which is for
consistent failures. Handle it as a flaky test via CI rerun instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expand the -api help to document the expression syntax (names reset the
set; +/- add/remove) and enumerate every available keyword: all, none,
vk (vulkan), dx11 (d3d11), dx12 (d3d12), mtl (metal), cuda, cpu, llvm,
wgpu (webgpu). Also note -synthesizedTestApi accepts the same expression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Metal is the only GPU API available on macOS, so select -api mtl for the
macOS jobs instead of all-vk. (cpu)/(llvm) variants are skipped here and
covered by the CPU-only Linux tier; no-API tests still run because their
requiredFlags==0. This also stops running CPU-variant GPU-dispatch tests
on the macOS test server.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jkwak-work jkwak-work requested a review from a team as a code owner June 6, 2026 05:22
@jkwak-work jkwak-work requested review from bmillsNV and removed request for a team June 6, 2026 05:22
@jkwak-work jkwak-work self-assigned this Jun 6, 2026
@jkwak-work jkwak-work added the pr: non-breaking PRs without breaking changes label Jun 6, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

Need the big picture first? Review this PR in Change Stack to see what changed before going file by file.

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 15361651-a3d1-4c60-9f93-8915962d5ed7

📥 Commits

Reviewing files that changed from the base of the PR and between 7e23e74 and 556ccf8.

📒 Files selected for processing (3)
  • .github/workflows/ci-slang-test.yml
  • extras/formatting.sh
  • tests/expected-failure-aarch64.txt

📝 Walkthrough

Walkthrough

This PR removes test-category inputs/usages from CI workflows, adds logging that prints fully-resolved slang-test commands before execution, relaxes the formatting script Bash requirement to 3.2+ with a read_lines helper, and adds validation for repeated API filter options in the test binary.

Changes

Test Category Removal and Command Logging

Layer / File(s) Summary
Reusable workflow input removal and sanitizer job update
.github/workflows/ci-slang-sanitizer.yml, .github/workflows/ci-nightly-sanitizer.yml
Removes the test-category workflow input and its usages from sanitizer workflows; updates the nightly sanitizer invocation/comments.
Add pre-execution test command logging
.github/workflows/ci-slang-sanitizer.yml, .github/workflows/ci-slang-test-container.yml, .github/workflows/ci-slang-test.yml
Prints fully-resolved slang-test commands (shell-escaped) to CI logs before executing the test binary; removes -category full from container invocations.
Refactor slang_test_args and aarch64 handling
.github/workflows/ci-slang-test.yml
Compute a single api_expr appended once as -api "$api_expr", append an aarch64-specific expected-failure list when applicable, and remove test-category input usage.
Remove test-category from main CI workflow jobs
.github/workflows/ci.yml
Drops test-category from multiple test job with: blocks (Linux x86_64 CPU-only, macOS debug/release, Linux ARM64 debug/release, Windows GPU debug/release), relying on existing flags like full-gpu-tests and cpu-only.
Remove test-category from workflow matrix configurations
.github/workflows/compile-regression-test.yml, .github/workflows/falcor-compiler-perf-test.yml, .github/workflows/falcor-test.yml, .github/workflows/release.yml, .github/workflows/vk-gl-cts-nightly.yml
Remove test-category from matrix include entries; one matrix item now uses full-gpu-tests: false instead.

Bash 3.2 Compatibility for Formatting Script

Layer / File(s) Summary
Update Bash version requirement and add read_lines helper
extras/formatting.sh
Relax Bash requirement to 3.2+ and add read_lines() to populate the global files array from stdin while preserving final no-newline lines.
Update formatting functions to use read_lines
extras/formatting.sh
Replace readarray -t usages with read_lines across file-discovery functions and change `

Test API Filter Option Validation and Tests

Layer / File(s) Summary
Add help text and repeated-option validation for API filters
tools/slang-test/options.cpp
Expand -api and -synthesizedTestApi help text and add parse-state tracking to reject name-leading expressions when a prior corresponding option has been seen.
AArch64 expected failures and diagnostic update
tests/expected-failure-aarch64.txt, tests/neural/network-parameter-layout-converter-unsupported-target.slang
Add aarch64 expected-failure Vulkan test identifiers and update a diagnostic test's HLSL profile from cs_6_0 to cs_6_6.

Suggested reviewers

  • bmillsNV
  • jkiviluoto-nv
  • expipiplus1
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Print ci commands' is vague and does not adequately summarize the main change of removing the confusing test-category option and enabling mtl tests on macOS, which is the core objective. Consider a more descriptive title like 'CI: Remove test-category option and enable macOS mtl tests' or 'CI: Drop test-category, add command logging, select single -api per job'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description is well-related to the changeset, providing detailed explanations of the four main changes, rationale, and notes to reviewers about compatibility and behavior impacts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

github-actions[bot]

This comment was marked as outdated.

@jkwak-work
Copy link
Copy Markdown
Collaborator Author

For the failing aarch64 tests, I think we need to introduce a new file tests/expected-failure-aarch64 and silence it for this PR.

Dropping -category means the GitHub-hosted aarch64 jobs run the full suite
with -api all, which includes gfx-unit-test-tool Vulkan tests that fail to
create a Vulkan instance on the aarch64 runners. Add a dedicated
tests/expected-failure-aarch64.txt, applied only when platform == aarch64,
listing the four affected tests. The underlying aarch64 Vulkan behavior is
tracked as a separate follow-up issue.
All callers pass the literal name 'files', so the dynamic-name indirection
that motivated eval is unnecessary. Populate the global 'files' array
directly, which is Bash 3.2 compatible (no readarray/mapfile or namerefs)
and avoids eval entirely.
@jkwak-work
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 6, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Collaborator Author

@jkwak-work jkwak-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: 🟡 Has issues — 2 gap(s)

CI-infrastructure refactor that drops -category (slang-test default = full), selects a single -api per job, and prints the resolved command via printf %q. Adds a new apiOptionSeen / synthesizedApiOptionSeen guard in tools/slang-test/options.cpp that turns silent override of repeated -api into a hard error. Also bumps extras/formatting.sh to support bash 3.2 and bumps the HLSL profile in one neural diagnostic test from cs_6_0cs_6_6 (verified necessary because descriptor_handle requires _sm_6_6 on HLSL per slang-capabilities.capdef:1425).

Changes Overview

slang-test option parser (tools/slang-test/options.cpp)

  • New: rejects a second-or-later -api (and -synthesizedTestApi) whose expression begins with a name (e.g. -api vk -api cpu), since RenderApiUtil::parseApiFlags would silently discard the earlier value. Operator-leading expressions (+cuda, -vk) still accumulate. Help text rewritten to enumerate keywords and operator semantics.

CI workflows (.github/workflows/ci.yml, ci-slang-test.yml, ci-slang-test-container.yml, ci-slang-sanitizer.yml, ci-nightly-sanitizer.yml, plus compile-regression-test.yml, falcor-*.yml, release.yml, vk-gl-cts-nightly.yml)

  • Removed the test-category workflow input/usage everywhere; slang-test falls back to its default full. Smoke tiers (test-macos-debug-clang-aarch64, test-linux-debug-gcc-aarch64, test-linux-release-gcc-aarch64) widen from smoke to full.
  • ci-slang-test.yml collapses the previous two--api chain into a single expression: cpu-onlycpu+llvm; os == macosmtl; gpu-api-onlyvk+cuda+dx11+dx12+mtl+wgpu; otherwise all.
  • Adds printf '+ %q' … immediately before each slang-test invocation in three workflows so CI logs show the fully-resolved command line.

New aarch64 expected-failure list (tests/expected-failure-aarch64.txt)

  • Silences four gfx-unit-test-tool/*Vulkan.internal tests that fail to create a Vulkan instance on the aarch64 GitHub-hosted runners. Wired in under inputs.platform == aarch64 in ci-slang-test.yml.

Build script (extras/formatting.sh)

  • Lower minimum bash from 4 to 3.2 (macOS default /bin/bash). Adds a read_lines function emulating readarray -t files. Replaces |& (bash 4) with 2>&1 | (bash 3.2-compatible) in two clang-format pipelines.

Test profile bump (tests/neural/network-parameter-layout-converter-unsupported-target.slang)

  • HLSL diagnostic test profile bumped cs_6_0cs_6_6 so the descriptor_handle capability (needed by RWStructuredBuffer<T>.Handle) is satisfied and the __target_switch default-arm static_assert is the diagnostic that fires.
Findings (2 total)
Severity Location Finding
🟡 Gap tools/slang-test/options.cpp:~447,~486 New -api / -synthesizedTestApi rejection path is a user-visible CLI behavior change with no regression test; the duplicated guard is also a future-drift hazard.
🟡 Gap tests/expected-failure-aarch64.txt + ci.yml smoke→full conversions smoke→full widening covers Linux aarch64 only — test-macos-debug-clang-aarch64 has no tests/expected-failure-macos*.txt safety net; the new aarch64 file also lacks a tracking-issue reference.

Comment thread tools/slang-test/options.cpp
Comment thread tests/expected-failure-aarch64.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant