Skip to content

Commit e3cb4e5

Browse files
shai-almogclaude
andauthored
Native Windows port: wheel scrolling, shell launch services, native file picker (#5209)
* Native Windows port: wheel scrolling, shell launch services, file picker Closes the most actionable gaps in Ports/WindowsPort/status.md. Mouse wheel (gap 1b) is now a proper shared API instead of a per-port hack: CodenameOneImplementation.pointerWheelMoved(x, y, scrollX, scrollY) owns the synthetic press/drag/release scroll gesture (spread over four EDT cycles, with the component under the cursor temporarily made non-focusable) and the scrollWheeling / isScrollWheeling() state. The JavaSE port, which carried the original inline implementation, is refactored onto it so every desktop port maps the wheel identically. The Windows WndProc pushes WM_MOUSEWHEEL / WM_MOUSEHWHEEL into the input ring (new CN1_EVENT_MOUSE_WHEEL/_HWHEEL) and drainInput converts the delta to a DPI-scaled distance. Shell launch services (gap 4): a native shellOpen() (ShellExecuteW) backs honest desktop implementations of execute(url), dial() (tel:), sendSMS() (sms:, so getSMSSupport() reports SMS_INTERACTIVE) and sendMessage() (mailto:). Nothing is fabricated -- an absent handler reports failure. Native file picker (gap 4): GetOpenFileNameW (comdlg32) run modally on the window-owning pump thread via a blocking WM_CN1_FILEDIALOG SendMessage, filtered by media type. openGallery / openImageGallery now use the real OS picker and return a file:// path FileSystemStorage opens, instead of the in-app FileTree fallback. comdlg32 + shell32 added to the Windows link set. status.md updated: these gaps move to "done"; the remaining hardware/OS-account capabilities (camera, sensors, location, contacts, push, biometric, audio recording, SIMD) stay honestly unsupported. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: include <shellapi.h> for ShellExecuteW WIN32_LEAN_AND_MEAN keeps shellapi.h out of windows.h and shlobj.h does not pull it in under clang-cl, so ShellExecuteW was an implicit declaration and the clean-target build failed (call to undeclared function / int->HINSTANCE). Add the explicit include. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * pointerWheelMoved: add @OverRide on the callSerially Runnables (PMD) PMD's MissingOverride rule is enforced on core; the four anonymous Runnable run() methods added for the shared wheel-scroll gesture need the annotation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: WindowsSimd (SSE2/NEON) + benchmark tally Implements native SIMD for the Windows port (status.md gap 2), the x86/ARM analog of IOSSimd. WindowsSimd overrides the hot-path vector ops with SSE2 (x64) / NEON (arm64) intrinsics in cn1_windows_simd.c; Simd's @concrete gains win=com.codename1.impl.windows.WindowsSimd and WindowsImplementation.createSimd() returns it, so Simd.get().isSupported() is true on Windows. Each kernel vectorizes the bulk with a scalar tail (unaligned load/store, so no aligned allocator); ops SSE2 lacks (int32 mul/min/max/dot) stay scalar on x64 but vectorize on arm64, and any op not overridden inherits the correct portable Simd scalar loop. Covered: int add/sub/mul/min/max/and/or/xor/sum/dot, float add/sub/mul/min/max/ sum/dot, byte add/sub(saturating)/and/or/xor, plus fused replaceTopByteFromUnsignedBytes / blendByMaskTestNonzero. SimdApiTest (already in the Windows suite) gates correctness; new SimdBenchmarkTest times native vs an inline Java scalar loop over a 64K workload, verifies the native result matches, and logs CN1SS:SIMD:BENCH ... speedup=Nx so CI shows the benefit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: DPAPI-backed SecureStorage (keystore) Implements getSecureStorage() (status.md gap 4) so the networking layer can read API keys / tokens at rest. WindowsSecureStorage encrypts each value with the Windows Data Protection API (CryptProtectData, bound to the current user's logon) via native dpapiProtect/dpapiUnprotect and persists the ciphertext through CN1 Storage -- the desktop analog of the iOS keychain / Android EncryptedSharedPreferences non-prompting store. The biometric-prompting overloads map to the same store (DPAPI is itself the user-account auth boundary). crypt32 added to the link set. SecureStorageTest round-trips set/get/remove in the suite (self-skips where unsupported, e.g. JS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: local notifications via Shell_NotifyIcon Implements scheduleLocalNotification/cancelLocalNotification (status.md gap 4), mirroring the JavaSE desktop semantic: while the app runs a Timer fires the notification at its scheduled time (with REPEAT_* support) and Shell_NotifyIcon shows a tray balloon; clicking it routes the id (WM_CN1_TRAY -> drainInput poll) to the app's LocalNotificationCallback. Native tray/balloon lives in cn1_windows_notify.c, marshaled to the window-owning pump thread via WM_CN1_NOTIFY. Background scheduling fires only while the process runs (no OS scheduler survives app exit on desktop) -- a documented limitation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: audio recording via waveIn -> WAV Implements createMediaRecorder / captureAudio (status.md gap 4): records from the default mic via the classic waveIn (winmm) API to a 16-bit PCM WAV, a worker thread draining capture buffers to disk and patching the RIFF/data sizes on stop (cn1_windows_audiorec.c + WindowsAudioRecorder). getAvailableRecordingMimeTypes reports audio/wav (also decodable by the port's MF playback). waveIn over an MF encode pipeline: dependency-free, no codec negotiation. winmm added to the link set. Verified on a real Windows ARM64 VM: compiles clean and waveIn captures 88200 bytes/s from the mic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: biometric (Windows Hello) via WinRT + CN1_HAVE_WINRT guard Implements getBiometrics() (status.md gap 4) backed by the WinRT UserConsentVerifier (face/fingerprint/PIN): isSupported()/canAuthenticate() map to CheckAvailabilityAsync, authenticate() runs the Hello prompt off the EDT and completes the AsyncResource. WinRT is consumed via the WRL ABI projection (cn1_windows_winrt.cpp) -- the same COM mechanism the Media Foundation layer uses, no cppwinrt needed. Adds the CN1_HAVE_WINRT build gate: the generated CMake probes the toolchain (check_cxx_source_compiles of a minimal WinRT TU + runtimeobject) and defines CN1_HAVE_WINRT / links runtimeobject only when WinRT is available, so a cross-compile sysroot without WinRT compiles the natives as honest 'unsupported' stubs and stays green -- mirroring the WebView2 gate. This same file/gate will carry the upcoming WinRT location/contacts/share services. Verified on a real Windows ARM64 VM: both the real and stub native paths compile; a standalone WRL test activates UserConsentVerifier and awaits the async op, correctly reporting DeviceNotPresent on the VM (no Hello hardware). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: location via WinRT Geolocator Implements getLocationManager() (status.md gap 4) -> WindowsLocationManager backed by the WinRT Geolocator (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate). getCurrentLocation/getLastKnownLocation resolve one fix (lat/lon/accuracy/altitude/ heading/speed); a continuous LocationListener is served by a polling thread. When Windows location is disabled / no provider answers it reports OUT_OF_SERVICE / throws (no fabricated fix), and getLocationManager returns null on a WinRT-less build. Verified on the Windows ARM64 VM: Geolocator activates and GetGeoposition returns E_ACCESSDENIED (location off there), surfaced honestly as unavailable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: contacts via WinRT ContactStore Implements getAllContacts/getContactById (status.md gap 4) via the WinRT ContactStore (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate). One native call returns every contact as a delimited blob (id/name/phone/email, read via the base IContact + versioned IContact2/IContactManagerStatics2 interfaces) which the impl parses and briefly caches so the base's id-then-fetch loop shares a single store read. Returns nothing when the store is inaccessible (no WinRT / access denied), never fabricated. Compiles on the Windows ARM64 VM via the proven WRL await pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: system share via WinRT DataTransferManager Implements isNativeShareSupported()/share() (status.md gap 4) via the WinRT DataTransferManager: the EDT-facing shareText marshals to the window thread (WM_CN1_SHARE), where IDataTransferManagerInterop GetForWindow + ShowShareUIForWindow open the system share flyout for the unpackaged Win32 window and a DataRequested handler supplies text/title (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate). Shares text today (image-file sharing via SetStorageItems is a follow-up). Also documents that print has no CN1 core API to hook. Compiles (real + stub) on the Windows ARM64 VM; the flyout is interactive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: camera still capture (capturePhoto) via Media Foundation Implements the legacy Capture API capturePhoto (status.md gap 3): grabs one real frame from the default webcam via Media Foundation (MFEnumDeviceSources(VIDCAP) -> IMFSourceReader -> RGB32, discarding the first few frames so exposure settles), returns it as a CN1 ARGB int[], and Java encodes the PNG + writes the file (cn1_windows_camera.cpp). A desktop has no built-in capture UI, so this is the honest snapshot -- a real frame, never synthetic. createCameraImpl() (live preview peer / video) stays null pending generic native-peer placement (gap 5a). Verified on the Windows ARM64 VM: a 640x480 frame with genuine image data (~150K non-zero pixels) is captured from the passed-through camera. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: drop C++ STL from WinRT natives (cross-compile fix) The Linux cross-compile failed compiling cn1_windows_winrt.cpp: the <string> I used for the contacts/share string building pulls the MSVC STL (yvals_core.h), which hard-asserts a very recent Clang (STL1000) that the xwin cross-toolchain predates. The rest of the port is deliberately STL-free for exactly this reason (cn1_windows_browser.cpp's std:: usage is behind the WebView2 gate, off on the cross-compile). Replace std::string/std::wstring with a small C growable buffer (CN1Buf) and owned WCHAR* so no C++ STL header is pulled (verified via /showIncludes: zero STL headers, only the C <string.h>). clean-target (real Windows, newer clang) already passed; this unblocks the cross-compile gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: drop the CN1_HAVE_WINRT guard, always compile real WinRT The guard was defensive insurance against a build toolchain lacking WinRT, but the cross-compile CI proved otherwise: it had already DEFINED CN1_HAVE_WINRT (the earlier STL error fired inside the #ifdef on the Linux/xwin leg), i.e. the xwin-laid-out SDK ships the WinRT ABI headers + runtimeobject. So stub mode never actually triggered anywhere the port builds. Remove the CMake probe + the per-function #ifdef/#else stub branches and link runtimeobject unconditionally; the compiled output is identical to the prior green build, just simpler. Also drop the now-pointless locationSupported() native (getLocationManager always returns the manager; isNativeShareSupported checks for a host window). Each service still degrades honestly at RUNTIME (no device / disabled / denied -> false/null). status.md updated to drop the gate/stub language. Trade-off (accepted): a future build SDK without WinRT would now fail the build instead of degrading to stubs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: cross-build the suite exe on Linux, run it on Windows (real shipping pipeline) Adds a CI workflow that mirrors how a Codename One app is actually shipped for Windows -- compiled on a (Linux) build host, executed on the user's Windows machine. Neither existing Windows workflow did this end to end: windows-cross-compile.yml builds on Linux but only links (never runs), and parparvm-tests-windows.yml builds natively on Windows and runs. Here the binary that renders on Windows is the exact one cross-compiled on Linux, browser included. New workflow windows-cross-build-run.yml, artifact-chained within a run: - cross-build (ubuntu): pins LLVM 19 (the WebView2 peer's MSVC STL needs a recent clang), lays out the Windows SDK via xwin, fetches the WebView2 NuGet SDK on Linux, builds core + Windows port + the hellocodenameone suite, then translates and clang-cl/xwin cross-compiles the full suite .exe (WebView2 linked) and uploads it. - run-on-windows (windows-latest x64): downloads that Linux-built exe and runs the full screenshot suite over the cn1ss WebSocket, capturing ~112 PNGs. - compare-comment (ubuntu): diffs the screenshots against the in-repo baseline and posts them to the PR under a distinct marker. Harness refactor (CleanTargetIntegrationTest): - Split buildHelloCodenameOneExe into a host-agnostic translateHelloSuiteDist() (pure Java translation) + the native clang-cl build, and extract crossBuildDist() out of crossCompilesWindowsExeWithXwin so a dist can be cross-compiled on Linux. - Add crossBuildsHelloSuiteExe(): translates the full suite and xwin-cross-builds it to CN1_CROSS_EXE_OUT (WEBVIEW2_SDK_DIR flows into the generated CMake). - capturesHelloSuiteOverWebSocket honors CN1_PREBUILT_EXE to run a provided exe instead of building one, so the Windows runner only needs a JDK. scripts/windows/fetch-webview2-sdk.sh: portable bash counterpart of the PowerShell fetch (curl the NuGet nupkg + unzip), laying out build/native for the Linux cross-build's WEBVIEW2_SDK_DIR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows cross-build/run: build hellocodenameone-common under Xvfb The CN1 CSS compiler (codenameone-maven-plugin css goal) renders via CEF/AWT and threw java.awt.HeadlessException on the display-less Linux cross-build runner. Install xvfb and run the hellocodenameone-common Maven build under a virtual X server so the css/transcode rendering has a display. Core + port + LLVM 19 + xwin + WebView2 SDK fetch all succeeded before this point; this unblocks the actual suite cross-compile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: surface SIMD benchmark in PR comment + refresh stale StatusBarTap golden Two CI/reporting fixes flagged on the PR: - SIMD benchmark now appears in the PR comment. SimdBenchmarkTest emits ready-to-render "CN1SS:SIMD:STAT <key> : <value>" lines (backend, int-add/float-mul speedup, correctness); the capture harness collects them into windows-simd-stats.txt next to the PNGs; cn1ss.sh passes it via --extra-stats and RenderScreenshotReport renders it as a Benchmark Results table (the same mechanism iOS uses for base64-performance-stats.txt). Both the native and cross-compiled comment jobs lift the file to the artifacts dir cn1ss scans. - The StatusBarTapDiagnosticScreenshotTest tile that showed as "updated" was a stale golden, not flakiness: the native-Windows and Linux-cross renders are byte-identical to each other (deterministic) and the glass-pane content (counter 0->3, scroll, native:no) is correct; only the earlier golden's tile scroll positions differed. Refreshed the golden to the current deterministic render. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows port: generic native-peer placement + device camera API (com.codename1.camera) Native peer support (cn1_windows_peer.cpp + WindowsGenericPeer): wraps an @NativeInterface-returned child HWND (boxed as long[]) in a PeerComponent that reparents it onto the host window and tracks the lightweight component's bounds (peerInitialized/peerSetBounds/peerSetVisible/peerDeinitialized), the analog of iOS NativeIPhoneView. In the offscreen screenshot pipeline -- where a live HWND is not composited -- it falls back to a PrintWindow peer image, mirroring the WebView2 peer. WindowsImplementation.createNativePeer now routes long[] handles here, so the generic peer-placement path (gap #5a) exists for native-interface widgets. Camera (WindowsCameraImpl + cn1_windows_camera.cpp): implements the device camera API (com.codename1.camera.CameraImpl), not just the legacy capturePhoto. A Media Foundation source-reader session runs on a worker thread keeping the latest frame; the preview is an image-based PeerComponent (browser-style, so it renders headlessly and live), takePhoto encodes the freshest frame, enumerateCameras lists devices, and a frame listener is polled at its fps. Video recording / flash / optical zoom / focus-point are honestly reported unsupported (a generic webcam exposes none via the source reader), per the port's "real or unsupported" rule. createCameraImpl now returns this instead of null. Also: register cn1_windows_peer.cpp in the clean-target native compile list, and trim status.md to a TODO-only list (camera + native peers moved out of the gaps) so the file can be deleted at merge. Windows port compiles clean; native build verified by the cross-compile leg. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows camera: drop Thread.setDaemon (not in the ParparVM runtime) The clean (ParparVM) target has no java.lang.Thread.setDaemon, so the translated WindowsCameraImpl.c failed to compile (undeclared virtual_java_lang_Thread_setDaemon). The takePhoto worker exits as soon as the frame is captured, so a non-daemon thread is fine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows camera: enable advanced video processing (YUY2/NV12 -> RGB32) VM testing on a real webcam revealed the source reader was delivering 2-bpp frames (YUY2, the common webcam format: 640x480 -> 614400 bytes), not RGB32 (1228800), because SetCurrentMediaType(RGB32) is silently ignored unless the reader's video processor is enabled. The width*height*4 size check then rejected every frame, so the preview/session captured nothing. Create both source readers (the continuous session and the legacy capturePhoto) with MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING=TRUE so the reader inserts a converter to RGB32. Verified on the Windows ARM64 VM: the worker-thread session now delivers full 640x480 RGB32 frames (polled via cameraSessionLatestFrame), and the generic-peer PrintWindow capture returns real pixels. CI runners have no camera, so this path is only exercisable on a real machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows report: run the real shared benchmarks + per-arch (x64/arm64) comments Two fixes to the Windows screenshot/benchmark report: 1. Real benchmarks instead of a toy SIMD micro-bench. Base64NativePerformanceTest (the same shared test that produces the iOS/Metal base64 + image numbers) used to early-return on the native Windows port because there is no app @NativeInterface Base64 bridge. It now skips only the native-vs-CN1 base64 comparison and still runs the CN1 + SIMD base64 and the image SIMD benchmarks (createMask / applyMask / modifyAlpha / PNG encode, SIMD on vs off) -- gated on Simd.isSupported(), which WindowsSimd provides. iOS/Android behaviour is unchanged (they have the native bridge: hasNative=true, isWindows()=false make every new guard a no-op there). The capture harness now collects the shared CN1SS:STAT: markers (not a bespoke one) into windows-benchmark-stats.txt, and SimdBenchmarkTest emits via the same marker so its raw-kernel numbers join the table. JPEG + native-base64 degrade honestly to "unsupported"/"unavailable" on a webcam-less, bridge-less desktop. 2. Per-architecture comments. x64 (Intel) and arm64 each post a SEPARATE PR comment with a distinct marker, each depending only on its own screenshot leg -- so one architecture's pipeline failing or re-running no longer overrides or hides the other's result (previously a single combined comment showed only x64 and was skipped entirely if the x64 leg failed). Each comment carries that arch's own benchmark table (SSE2 on x64, NEON on arm64). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows SIMD: implement the byte codec/image kernels (fixes SIMD slower-than-scalar) The benchmark honestly showed base64 SIMD ~6.5x SLOWER than scalar and a few image ops slower too. Root cause: the ops the Base64 SIMD codec and Image.createMask use (shl/shrLogical/lookupBytes/pack+unpack interleaved/unpackLookup/packIntToByteTruncate) were not implemented in WindowsSimd, so they fell through to the generic Simd scalar DEFAULTS -- lane-scratch loops with per-op dispatch -- which are slower than the straight-line scalar codec. Only the fused ops (replaceTopByteFromUnsignedBytes, used by modifyAlpha -66%) were native and won. iOS implements all of these in NEON, which is why iOS base64 SIMD is faster. Implement them natively in cn1_windows_simd.c: NEON-vectorized on arm64 (mirroring IOSSimd.m: vld3q/vst3q/vld4q/vst4q interleave, vshlq_u8 byte shifts), SSE2 on x64 (byte shift via 16-bit shift + per-byte mask, since SSE2 has no byte shift), and scalar table lookups on both (exactly as IOSSimd does -- the lookup is scalar there too). A native C kernel (one call, tight loop) already beats the scalar-default fallback, so SIMD stops losing. unpackLookupBytesInterleaved4 matches the base Simd contract precisely (out-of-range -> 0, returns the OR of all outputs). Verified: NEON kernels pass a standalone correctness harness on the arm64 VM (shl/ shrLogical x8 shifts, unpack3/pack3/pack4 vs scalar references); the C compiles clean arm64. x64/SSE2 correctness is gated by the base64/image SIMD validation in the benchmark (byteArraysEqual) on CI. Also deletes Ports/WindowsPort/status.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Windows SIMD: native blendByMaskTestNonzeroSubstituteOnKeepEq (fixes modifyAlpha removeColor) removeColor was the one image op still on the scalar fallback (8% slower on x64, 27% on arm64) -- WindowsSimd didn't override blendByMaskTestNonzeroSubstituteOnKeepEq, so Image.modifyAlpha(alpha, removeColor) hit the generic Simd scalar default. Add it as a fused vectorized blend (SSE2 on x64, NEON on arm64), mirroring its sibling blendByMaskTestNonzero (which modifyAlpha uses and wins 70%). Verified on the arm64 VM: the NEON kernel matches the scalar reference exactly over mixed transparent / removeColor-matching / opaque pixels. With this, every image SIMD op (createMask, applyMask, modifyAlpha, modifyAlpha+removeColor, PNG encode) beats scalar. (base64 SIMD remains ~2x slower on x64 -- it is bound by ParparVM per-native-call overhead across ~6000 calls/encode plus the /O2 auto-vectorized scalar competitor; on arm64 base64 ENCODE is already 25% faster. base64 SIMD is explicit opt-in; standard Base64.encode uses the fast scalar path.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Test builds: optimize the translated C at -O2 (honest SIMD-vs-scalar benchmark) The iOS and Mac UI-test builds ran -configuration Debug, which inherits Xcode's GCC_OPTIMIZATION_LEVEL=0 (-O0) -- so the benchmark's scalar baseline was unvectorized and the SIMD speedups were inflated (SIMD beating unoptimized code, not real shipping code). Windows already builds the benchmark Release (/O2, auto-vectorized scalar), which is why its SIMD margins are smaller and base64 micro-SIMD even loses there. Override GCC_OPTIMIZATION_LEVEL to 2 (env CN1_TEST_OPT_LEVEL, 0/1/2/3/s) on both the iOS and Mac test builds so the compiler auto-vectorizes the scalar baseline -- making the SIMD-vs-scalar comparison apples-to-apples across all three ports. This is the measurement foundation for pruning the SIMD that doesn't beat optimized scalar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Base64 SIMD: gate the byte codec to platforms where it beats autovectorized scalar The O2 measurement settled it: with -O2 the explicit NEON Base64 codec is still 75-83% faster than autovectorized scalar on Apple Silicon (iOS/Mac), but on x86-64 it is ~2x slower -- /O2 already autovectorizes the scalar codec and SSE2 has no 3-way interleave, so the per-op SIMD just adds overhead. The fused image kernels win on every platform (they can't be autovectorized) and are unaffected. Add Simd.isByteShuffleAccelerated() -- true only where the chained byte shuffle/interleave pipeline actually beats scalar: IOSSimd returns true (NEON); WindowsSimd returns it as a per-arch native constant (arm64 true, x86-64 false); the base scalar Simd returns false. Base64.encodeNoNewlineSimd / decodeNoWhitespaceSimd consult it and, when false, skip the SIMD loop so the autovectorized scalar tail encodes everything (still fully correct -- SimdTest's scalar-vs-SIMD equality test passes). So x86-64 base64 SIMD now matches scalar instead of losing 2x, while ARM keeps its 75-83% win. The benchmark emits the gate state ("active (NEON)" vs "gated to scalar"). These methods are explicit opt-in; no production code auto-uses them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix SpotBugs + gate base64 SIMD off on all Windows + drop noisy PNG encode bench - SpotBugs (the build-test failure): drop the redundant `simd != null` checks in Base64.encode/decodeNoWhitespaceSimd -- Simd.get() is provably non-null there (RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE). - base64 decode was 27% slower on Windows arm64: my unpackLookupBytesInterleaved4 / lookupBytes are scalar (iOS uses NEON vqtbl), so the codec's decode loses there even though encode wins -- not a clear net win. So WindowsSimd.isByteShuffleAccelerated() is now false on both Windows arches (was true on arm64); only iOS/Mac (full NEON, 75-83% faster) enable the base64 SIMD path. Fused image kernels are unaffected. - PNG/JPEG "encode SIMD on/off" ratio was misleading (+20% x64, -12.8% arm64): that benchmark is dominated by the platform image encoder (native WIC on Windows), which SIMD does not touch, so the ratio is encoder noise, not a SIMD measurement. Removed it; the four pure image ops (createMask/applyMask/modifyAlpha/removeColor) measure the SIMD-affected work directly and all win. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Base64: one declaration per line (PMD OneDeclarationPerLine) Split the gate's combined byte[] declaration into one per line to satisfy PMD (the SpotBugs fix in the prior commit cleared; this is the remaining static-analysis nit on the same code). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent e9491e3 commit e3cb4e5

39 files changed

Lines changed: 5352 additions & 389 deletions

.github/workflows/parparvm-tests-windows.yml

Lines changed: 73 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -483,14 +483,15 @@ jobs:
483483
if-no-files-found: warn
484484
retention-days: 14
485485

486-
# Posts the captured PNG(s) as a PR comment using the shared CN1SS machinery.
487-
# Runs on Linux -- where that posting path is exercised by the other ports --
488-
# so the Windows jobs only have to produce and upload the images. Depends on
489-
# both arch legs; the arm64 leg is continue-on-error so a missing arm64 image
490-
# never blocks the comment (the x64 image is always posted).
491-
windows-port-screenshot-comment:
492-
name: screenshot-comment
493-
needs: [windows-port-screenshot, windows-port-screenshot-arm64]
486+
# Per-architecture PR comments. x64 (Intel/AMD) and arm64 (Apple Silicon / Arm)
487+
# post SEPARATE comments with DISTINCT markers, each depending only on its own
488+
# screenshot leg -- so a failure or re-run in one architecture's pipeline never
489+
# overrides or hides the other's result (the previous single combined comment
490+
# showed only x64 and was skipped entirely if the x64 leg failed). Each comment
491+
# carries that arch's own benchmark table (SSE2 on x64, NEON on arm64).
492+
windows-port-screenshot-comment-x64:
493+
name: screenshot-comment (x64)
494+
needs: windows-port-screenshot
494495
if: github.event_name == 'pull_request'
495496
runs-on: ubuntu-latest
496497
permissions:
@@ -503,67 +504,98 @@ jobs:
503504
steps:
504505
- name: Check out repository
505506
uses: actions/checkout@v4
506-
507507
- name: Set up JDK 17
508508
uses: actions/setup-java@v5
509509
with:
510510
distribution: 'temurin'
511511
java-version: '17'
512-
513-
# Download both arch legs into separate dirs; the arm64 artifact may be
514-
# absent if that experimental leg failed -- the comment still posts x64.
515512
- name: Download x64 screenshot artifact
516513
uses: actions/download-artifact@v4
517514
with:
518515
name: windows-port-screenshot-raw-x64
519-
path: artifacts/windows-port/raw-x64
516+
path: artifacts/windows-port/raw
517+
- name: Post x64 screenshots to PR
518+
shell: bash
519+
env:
520+
CN1SS_COMMENT_MARKER: '<!-- CN1SS_WINDOWS_NATIVE_X64_COMMENT -->'
521+
CN1SS_COMMENT_LOG_PREFIX: '[windows-native-port x64]'
522+
CN1SS_PREVIEW_SUBDIR: 'windows-native-x64'
523+
CN1SS_SUCCESS_MESSAGE: 'Native Windows port (x64 / Intel-AMD): full hellocodenameone screenshot suite rendered offscreen with Direct2D/DirectWrite, plus the real benchmarks (base64 native/CN1/SIMD, image createMask/applyMask/modifyAlpha/PNG/JPEG, SSE2 SIMD kernels). Compared against the in-repo baseline in scripts/windows/screenshots.'
524+
run: |
525+
set -e
526+
ART="artifacts/windows-port"
527+
entries=()
528+
if [ -d "$ART/raw" ]; then
529+
for f in "$ART"/raw/*.png; do
530+
[ -f "$f" ] || continue
531+
entries+=("$(basename "$f" .png)=$f")
532+
done
533+
fi
534+
if [ ${#entries[@]} -eq 0 ]; then echo "No x64 screenshots; skipping comment."; exit 0; fi
535+
echo "Posting ${#entries[@]} x64 screenshot(s)"
536+
mkdir -p "$ART/previews"
537+
cp -f "$ART"/raw/windows-benchmark-stats.txt "$ART/" 2>/dev/null || true
538+
source scripts/lib/cn1ss.sh
539+
cn1ss_setup "$JAVA_HOME/bin/java" "$(pwd)/scripts/common/java"
540+
REF_DIR="$(pwd)/scripts/windows/screenshots"
541+
cn1ss_process_and_report \
542+
"Native Windows port (x64)" \
543+
"$ART/compare.json" "$ART/summary.txt" "$ART/comment.md" \
544+
"$REF_DIR" "$ART/previews" "$ART" \
545+
"${entries[@]}"
520546
547+
windows-port-screenshot-comment-arm64:
548+
name: screenshot-comment (arm64)
549+
needs: windows-port-screenshot-arm64
550+
if: github.event_name == 'pull_request'
551+
continue-on-error: true
552+
runs-on: ubuntu-latest
553+
permissions:
554+
contents: read
555+
pull-requests: write
556+
issues: write
557+
env:
558+
GITHUB_TOKEN: ${{ secrets.CN1SS_GH_TOKEN }}
559+
GH_TOKEN: ${{ secrets.CN1SS_GH_TOKEN }}
560+
steps:
561+
- name: Check out repository
562+
uses: actions/checkout@v4
563+
- name: Set up JDK 17
564+
uses: actions/setup-java@v5
565+
with:
566+
distribution: 'temurin'
567+
java-version: '17'
521568
- name: Download arm64 screenshot artifact
522-
continue-on-error: true
523569
uses: actions/download-artifact@v4
524570
with:
525571
name: windows-port-screenshot-raw-arm64
526-
path: artifacts/windows-port/raw-arm64
527-
528-
- name: Post screenshot to PR
572+
path: artifacts/windows-port/raw
573+
- name: Post arm64 screenshots to PR
529574
shell: bash
530575
env:
531-
CN1SS_COMMENT_MARKER: '<!-- CN1SS_WINDOWS_NATIVE_COMMENT -->'
532-
CN1SS_COMMENT_LOG_PREFIX: '[windows-native-port]'
533-
CN1SS_PREVIEW_SUBDIR: 'windows-native'
534-
CN1SS_SUCCESS_MESSAGE: 'Native Windows port: full hellocodenameone screenshot suite, rendered offscreen with Direct2D/DirectWrite (x64). Compared against the in-repo baseline in scripts/windows/screenshots; every tile has a baseline, so any difference posts as "changed" for review.'
576+
CN1SS_COMMENT_MARKER: '<!-- CN1SS_WINDOWS_NATIVE_ARM64_COMMENT -->'
577+
CN1SS_COMMENT_LOG_PREFIX: '[windows-native-port arm64]'
578+
CN1SS_PREVIEW_SUBDIR: 'windows-native-arm64'
579+
CN1SS_SUCCESS_MESSAGE: 'Native Windows port (arm64 / Apple Silicon - Arm): full hellocodenameone screenshot suite rendered offscreen with Direct2D/DirectWrite, plus the real benchmarks (base64 native/CN1/SIMD, image createMask/applyMask/modifyAlpha/PNG/JPEG, NEON SIMD kernels). Compared against the in-repo baseline in scripts/windows/screenshots.'
535580
run: |
536581
set -e
537582
ART="artifacts/windows-port"
538-
# Post the x64 suite (the gate). Each captured PNG is named <test>.png;
539-
# use that as the cn1ss entry name. arm64 stays an uploaded artifact (the
540-
# full suite is ~112 images -- too many to post both legs inline).
541583
entries=()
542-
if [ -d "$ART/raw-x64" ]; then
543-
for f in "$ART"/raw-x64/*.png; do
584+
if [ -d "$ART/raw" ]; then
585+
for f in "$ART"/raw/*.png; do
544586
[ -f "$f" ] || continue
545-
name=$(basename "$f" .png)
546-
entries+=("$name=$f")
587+
entries+=("$(basename "$f" .png)=$f")
547588
done
548589
fi
549-
if [ ${#entries[@]} -eq 0 ]; then
550-
echo "No screenshots were produced; skipping PR comment."
551-
exit 0
552-
fi
553-
echo "Posting ${#entries[@]} screenshot(s)"
590+
if [ ${#entries[@]} -eq 0 ]; then echo "No arm64 screenshots; skipping comment."; exit 0; fi
591+
echo "Posting ${#entries[@]} arm64 screenshot(s)"
554592
mkdir -p "$ART/previews"
593+
cp -f "$ART"/raw/windows-benchmark-stats.txt "$ART/" 2>/dev/null || true
555594
source scripts/lib/cn1ss.sh
556595
cn1ss_setup "$JAVA_HOME/bin/java" "$(pwd)/scripts/common/java"
557-
# Compare against the in-repo baseline (scripts/windows/screenshots).
558-
# ProcessScreenshots marks each captured image matched / changed / new
559-
# vs that reference and renders the diff into the PR comment. A mismatch
560-
# is reported, not fatal (the helper only returns non-zero on its own
561-
# tool failures), so a rendering regression shows up for review without
562-
# blocking the gate. Every tile now has a baseline, so differences post
563-
# as "changed" rather than "new".
564596
REF_DIR="$(pwd)/scripts/windows/screenshots"
565597
cn1ss_process_and_report \
566-
"Native Windows port" \
598+
"Native Windows port (arm64)" \
567599
"$ART/compare.json" "$ART/summary.txt" "$ART/comment.md" \
568600
"$REF_DIR" "$ART/previews" "$ART" \
569601
"${entries[@]}"

0 commit comments

Comments
 (0)