Skip to content

Commit fd273a7

Browse files
committed
chore: add SynapseML local setup skill
## Summary Add a project-scoped SynapseML agent skill that diagnoses local toolchain state, selects JDK 11 for SBT commands, runs a safe local Spark smoke test, and flags live-service tests before agents run them. ## Prompting Intent The engineer asked the agent to create a skill that helps any future agent get SynapseML working locally after the PR 2556 review exposed a local Java 21 and Scala 2.12 compiler-bridge failure. The engineer also asked to create a PR for the skill addition before continuing the original external PR review. ## Linked Sources - User request in current session: create a skill that will help any agent be able to get SynapseML working locally. - Follow-up user request in current session: create a PR for that skill addition and continue using it to review PR 2556. - Existing project-scoped skill convention: .agents/skills/code-review/SKILL.md. - Local validation output: doctor_status=ok, JDK 11 dry-run selected JAVA_HOME, smoke test passed, Azure Search tests flagged review_required. ## Rationale A project-scoped SynapseML skill keeps local setup guidance with the repository where future agents need it. The scripts use explicit parameters rather than session state, force JDK 11 for Scala 2.12 SBT commands, and include a live-service guard so agents do not accidentally create or delete Azure Search resources while validating changes.
1 parent b4ead5e commit fd273a7

6 files changed

Lines changed: 423 additions & 0 deletions

File tree

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
name: synapseml-local-setup
3+
description: Set up and validate SynapseML locally in WSL or Linux. Use when an agent needs SynapseML working locally, runs sbt compile/test, sees Java 21, Scala 2.12 compiler-bridge, bad constant pool index, Spark, or local validation failures.
4+
compatibility: Linux/WSL with bash, git, rg, sbt, and JDK 11 installed. Designed for the SynapseML repo.
5+
---
6+
7+
# SynapseML Local Setup
8+
9+
Use this skill before any local SynapseML build, compile, or test validation.
10+
11+
## Important
12+
13+
- Always use an explicit SynapseML repo path.
14+
- Do not run SynapseML SBT with the machine default Java 21. Use JDK 11:
15+
`/usr/lib/jvm/java-11-openjdk-amd64`
16+
- Java 21 can fail before project code compiles with `bad constant pool index: 0` while building Scala 2.12 `compiler-bridge_2.12`.
17+
- Compile commands are safe. Some cognitive service tests create, write, list, or delete real Azure resources. Inspect before running those tests and ask for approval if live resources are involved.
18+
19+
## Workflow
20+
21+
### 1. Diagnose the repo and toolchain
22+
23+
Run [scripts/synapseml-doctor.sh](scripts/synapseml-doctor.sh):
24+
25+
```bash
26+
scripts/synapseml-doctor.sh --repo <synapseml-repo>
27+
```
28+
29+
Capture:
30+
31+
- Git branch and dirty state.
32+
- Default Java version.
33+
- JDK 11 availability.
34+
- sbt version and SynapseML Scala/Spark versions.
35+
36+
### 2. Compile with JDK 11
37+
38+
Run [scripts/synapseml-sbt.sh](scripts/synapseml-sbt.sh):
39+
40+
```bash
41+
scripts/synapseml-sbt.sh --repo <synapseml-repo> -- cognitive/Test/compile
42+
```
43+
44+
Expected result:
45+
46+
- sbt welcome line says Java 11.
47+
- `core` and `cognitive` main/test classes compile.
48+
- Command exits with `[success]`.
49+
50+
### 3. Run a safe local smoke test
51+
52+
Run [scripts/synapseml-smoke-test.sh](scripts/synapseml-smoke-test.sh):
53+
54+
```bash
55+
scripts/synapseml-smoke-test.sh --repo <synapseml-repo>
56+
```
57+
58+
Expected result:
59+
60+
- One local Spark test runs.
61+
- Output includes `All tests passed.`
62+
63+
### 4. Inspect PR-specific tests before running them
64+
65+
Before running service tests, run [scripts/check-live-service-tests.sh](scripts/check-live-service-tests.sh):
66+
67+
```bash
68+
scripts/check-live-service-tests.sh --path <test-file-or-directory>
69+
```
70+
71+
If it reports live-service hooks, ask the user before running that suite. Do not create or delete Azure Search indexes just to test a PR.
72+
73+
### 5. Run targeted tests only after safety review
74+
75+
Use the JDK 11 wrapper for any targeted SBT command:
76+
77+
```bash
78+
scripts/synapseml-sbt.sh --repo <synapseml-repo> -- '<module>/testOnly <SuiteName> -- -z "<test filter>"'
79+
```
80+
81+
If tests fail before compiling project code, load [references/troubleshooting.md](references/troubleshooting.md).
82+
83+
## Known-good baseline
84+
85+
On 2026-05-05, this setup was validated with:
86+
87+
- JDK: `/usr/lib/jvm/java-11-openjdk-amd64`
88+
- Java: `openjdk version "11.0.30"`
89+
- SynapseML: Scala `2.12.17`, Spark `3.5.0`, sbt `1.10.11`
90+
- Compile: `sbt 'cognitive/Test/compile'` succeeded.
91+
- Smoke test: `UDFTransformerSuite` filtered test succeeded.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SynapseML Local Setup Troubleshooting
2+
3+
## Java 21 compiler bridge failure
4+
5+
Failure signature:
6+
7+
```text
8+
Non-compiled module 'compiler-bridge_2.12' for Scala 2.12.17. Compiling...
9+
bad constant pool index: 0
10+
while compiling: <no file>
11+
library version: version 2.12.17
12+
compiler version: version 2.12.17
13+
```
14+
15+
Cause: the local default Java 21 runtime is not suitable for this SynapseML Scala 2.12 build path.
16+
17+
Fix:
18+
19+
```bash
20+
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
21+
export PATH="$JAVA_HOME/bin:$PATH"
22+
```
23+
24+
Then rerun SBT.
25+
26+
## Known-good validation
27+
28+
This was validated on 2026-05-05:
29+
30+
```bash
31+
sbt 'cognitive/Test/compile'
32+
sbt 'core/testOnly com.microsoft.azure.synapse.ml.stages.UDFTransformerSuite -- -z "Apply inputCol after inputCols error"'
33+
```
34+
35+
Results:
36+
37+
- `cognitive/Test/compile` passed in 242 seconds.
38+
- The filtered `UDFTransformerSuite` smoke test passed in under 10 seconds after compilation.
39+
40+
## External service test safety
41+
42+
Azure Search tests can create and delete real indexes. Search for live hooks before running:
43+
44+
```bash
45+
rg -n "beforeAll\\(|afterEach\\(|SearchIndex\\.createIfNoneExists|AzureSearchWriter\\.write\\(|AzureSearchWriter\\.stream\\(|getExisting\\(|deleteIndex" <test-file-or-dir>
46+
```
47+
48+
If matches are present, inspect the suite and ask the user before running it.
49+
50+
## Python notes
51+
52+
SynapseML Python wrappers are generated from Scala. Do not edit generated files under `target/`. Use `sbt codegen` when wrapper regeneration is needed.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
#!/usr/bin/env bash
2+
# SYNOPSIS
3+
# Detect SynapseML test files that appear to call live external services.
4+
5+
set -euo pipefail
6+
7+
usage() {
8+
cat <<'EOF'
9+
Usage: check-live-service-tests.sh --path <test-file-or-directory>
10+
11+
Searches for common live-service hooks in SynapseML tests. If matches are found,
12+
ask the user before running the suite.
13+
EOF
14+
}
15+
16+
target=""
17+
18+
while [[ $# -gt 0 ]]; do
19+
case "$1" in
20+
--path)
21+
target="${2:-}"
22+
shift 2
23+
;;
24+
-h|--help)
25+
usage
26+
exit 0
27+
;;
28+
*)
29+
echo "Unknown argument: $1" >&2
30+
usage >&2
31+
exit 2
32+
;;
33+
esac
34+
done
35+
36+
if [[ -z "$target" ]]; then
37+
echo "Missing --path <test-file-or-directory>." >&2
38+
usage >&2
39+
exit 2
40+
fi
41+
42+
if [[ ! -e "$target" ]]; then
43+
echo "Path does not exist: $target" >&2
44+
exit 2
45+
fi
46+
47+
pattern='beforeAll\(|afterAll\(|afterEach\(|SearchIndex\.createIfNoneExists|AzureSearchWriter\.write\(|AzureSearchWriter\.stream\(|getExisting\(|deleteIndex|OpenAIEmbedding\(|CognitiveServices|Secrets\.|sys\.env\.getOrElse'
48+
49+
if rg -n "$pattern" "$target"; then
50+
echo "live_service_status=review_required"
51+
echo "Do not run this suite without explicit user approval if it creates or mutates external resources."
52+
exit 1
53+
else
54+
echo "live_service_status=no_common_hooks_found"
55+
fi
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
#!/usr/bin/env bash
2+
# SYNOPSIS
3+
# Diagnose whether a SynapseML repo is ready for local SBT validation.
4+
5+
set -euo pipefail
6+
7+
usage() {
8+
cat <<'EOF'
9+
Usage: synapseml-doctor.sh --repo <path> [--jdk <java-home>]
10+
11+
Checks repo shape, git state, default Java, JDK 11 availability, sbt, and
12+
SynapseML Scala/Spark versions. Does not compile or run tests.
13+
EOF
14+
}
15+
16+
repo=""
17+
jdk="/usr/lib/jvm/java-11-openjdk-amd64"
18+
19+
while [[ $# -gt 0 ]]; do
20+
case "$1" in
21+
--repo)
22+
repo="${2:-}"
23+
shift 2
24+
;;
25+
--jdk)
26+
jdk="${2:-}"
27+
shift 2
28+
;;
29+
-h|--help)
30+
usage
31+
exit 0
32+
;;
33+
*)
34+
echo "Unknown argument: $1" >&2
35+
usage >&2
36+
exit 2
37+
;;
38+
esac
39+
done
40+
41+
if [[ -z "$repo" ]]; then
42+
echo "ERROR repo path is required. Pass --repo <synapseml-repo>." >&2
43+
exit 2
44+
fi
45+
46+
if [[ ! -d "$repo" ]]; then
47+
echo "ERROR repo path does not exist: $repo" >&2
48+
exit 2
49+
fi
50+
51+
if [[ ! -f "$repo/build.sbt" || ! -f "$repo/project/build.properties" ]]; then
52+
echo "ERROR path does not look like a SynapseML sbt repo: $repo" >&2
53+
exit 2
54+
fi
55+
56+
echo "repo=$repo"
57+
echo "repo_status=ok"
58+
59+
if git -C "$repo" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
60+
echo "git_head=$(git -C "$repo" rev-parse --short HEAD)"
61+
echo "git_branch=$(git -C "$repo" branch --show-current || true)"
62+
dirty_count="$(git -C "$repo" status --short | wc -l | tr -d ' ')"
63+
echo "git_dirty_count=$dirty_count"
64+
else
65+
echo "git_status=not-a-git-worktree"
66+
fi
67+
68+
echo "default_java=$(command -v java || true)"
69+
if command -v java >/dev/null 2>&1; then
70+
java -version 2>&1 | sed 's/^/default_java_version: /'
71+
fi
72+
73+
if [[ -x "$jdk/bin/java" ]]; then
74+
echo "jdk11=$jdk"
75+
"$jdk/bin/java" -version 2>&1 | sed 's/^/jdk11_version: /'
76+
else
77+
echo "ERROR jdk11_missing=$jdk/bin/java" >&2
78+
echo "Install JDK 11 or pass --jdk <java-home>." >&2
79+
exit 3
80+
fi
81+
82+
if command -v sbt >/dev/null 2>&1; then
83+
echo "sbt=$(command -v sbt)"
84+
sbt --script-version 2>/dev/null | sed 's/^/sbt_runner_version: /' || true
85+
else
86+
echo "ERROR sbt_missing=true" >&2
87+
exit 3
88+
fi
89+
90+
sed -n '1,20p' "$repo/project/build.properties" | sed 's/^/build_properties: /'
91+
rg -n 'scalaVersion|sparkVersion' "$repo/build.sbt" | sed 's/^/build_sbt: /' || true
92+
93+
echo "doctor_status=ok"
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
#!/usr/bin/env bash
2+
# SYNOPSIS
3+
# Run SynapseML sbt commands with JDK 11 by default.
4+
5+
set -euo pipefail
6+
7+
usage() {
8+
cat <<'EOF'
9+
Usage: synapseml-sbt.sh --repo <path> [--jdk <java-home>] [--dry-run] -- <sbt-arg> [<sbt-arg> ...]
10+
11+
Runs sbt in a SynapseML checkout with JAVA_HOME set to JDK 11 by default.
12+
EOF
13+
}
14+
15+
repo=""
16+
jdk="/usr/lib/jvm/java-11-openjdk-amd64"
17+
dry_run=0
18+
19+
while [[ $# -gt 0 ]]; do
20+
case "$1" in
21+
--repo)
22+
repo="${2:-}"
23+
shift 2
24+
;;
25+
--jdk)
26+
jdk="${2:-}"
27+
shift 2
28+
;;
29+
--dry-run)
30+
dry_run=1
31+
shift
32+
;;
33+
--)
34+
shift
35+
break
36+
;;
37+
-h|--help)
38+
usage
39+
exit 0
40+
;;
41+
*)
42+
echo "Unknown argument before --: $1" >&2
43+
usage >&2
44+
exit 2
45+
;;
46+
esac
47+
done
48+
49+
if [[ -z "$repo" ]]; then
50+
echo "Missing --repo <path>." >&2
51+
usage >&2
52+
exit 2
53+
fi
54+
55+
if [[ $# -eq 0 ]]; then
56+
echo "Missing sbt command after --." >&2
57+
usage >&2
58+
exit 2
59+
fi
60+
61+
if [[ ! -f "$repo/build.sbt" || ! -f "$repo/project/build.properties" ]]; then
62+
echo "Path does not look like a SynapseML sbt repo: $repo" >&2
63+
exit 2
64+
fi
65+
66+
if [[ ! -x "$jdk/bin/java" ]]; then
67+
echo "JDK java executable not found: $jdk/bin/java" >&2
68+
exit 2
69+
fi
70+
71+
export JAVA_HOME="$jdk"
72+
export PATH="$JAVA_HOME/bin:$PATH"
73+
74+
echo "repo=$repo"
75+
echo "JAVA_HOME=$JAVA_HOME"
76+
java -version 2>&1 | sed 's/^/java: /'
77+
echo "sbt_args=$*"
78+
79+
if [[ "$dry_run" -eq 1 ]]; then
80+
exit 0
81+
fi
82+
83+
cd "$repo"
84+
exec sbt "$@"

0 commit comments

Comments
 (0)