Commit e3cb4e5
Native Windows port: wheel scrolling, shell launch services, native file picker (#5209)
* Native Windows port: wheel scrolling, shell launch services, file picker
Closes the most actionable gaps in Ports/WindowsPort/status.md.
Mouse wheel (gap 1b) is now a proper shared API instead of a per-port hack:
CodenameOneImplementation.pointerWheelMoved(x, y, scrollX, scrollY) owns the
synthetic press/drag/release scroll gesture (spread over four EDT cycles, with
the component under the cursor temporarily made non-focusable) and the
scrollWheeling / isScrollWheeling() state. The JavaSE port, which carried the
original inline implementation, is refactored onto it so every desktop port maps
the wheel identically. The Windows WndProc pushes WM_MOUSEWHEEL / WM_MOUSEHWHEEL
into the input ring (new CN1_EVENT_MOUSE_WHEEL/_HWHEEL) and drainInput converts
the delta to a DPI-scaled distance.
Shell launch services (gap 4): a native shellOpen() (ShellExecuteW) backs honest
desktop implementations of execute(url), dial() (tel:), sendSMS() (sms:, so
getSMSSupport() reports SMS_INTERACTIVE) and sendMessage() (mailto:). Nothing is
fabricated -- an absent handler reports failure.
Native file picker (gap 4): GetOpenFileNameW (comdlg32) run modally on the
window-owning pump thread via a blocking WM_CN1_FILEDIALOG SendMessage, filtered
by media type. openGallery / openImageGallery now use the real OS picker and
return a file:// path FileSystemStorage opens, instead of the in-app FileTree
fallback. comdlg32 + shell32 added to the Windows link set.
status.md updated: these gaps move to "done"; the remaining hardware/OS-account
capabilities (camera, sensors, location, contacts, push, biometric, audio
recording, SIMD) stay honestly unsupported.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: include <shellapi.h> for ShellExecuteW
WIN32_LEAN_AND_MEAN keeps shellapi.h out of windows.h and shlobj.h does not
pull it in under clang-cl, so ShellExecuteW was an implicit declaration and the
clean-target build failed (call to undeclared function / int->HINSTANCE). Add
the explicit include.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* pointerWheelMoved: add @OverRide on the callSerially Runnables (PMD)
PMD's MissingOverride rule is enforced on core; the four anonymous Runnable
run() methods added for the shared wheel-scroll gesture need the annotation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: WindowsSimd (SSE2/NEON) + benchmark tally
Implements native SIMD for the Windows port (status.md gap 2), the x86/ARM analog
of IOSSimd. WindowsSimd overrides the hot-path vector ops with SSE2 (x64) / NEON
(arm64) intrinsics in cn1_windows_simd.c; Simd's @concrete gains
win=com.codename1.impl.windows.WindowsSimd and WindowsImplementation.createSimd()
returns it, so Simd.get().isSupported() is true on Windows. Each kernel vectorizes
the bulk with a scalar tail (unaligned load/store, so no aligned allocator); ops
SSE2 lacks (int32 mul/min/max/dot) stay scalar on x64 but vectorize on arm64, and
any op not overridden inherits the correct portable Simd scalar loop.
Covered: int add/sub/mul/min/max/and/or/xor/sum/dot, float add/sub/mul/min/max/
sum/dot, byte add/sub(saturating)/and/or/xor, plus fused
replaceTopByteFromUnsignedBytes / blendByMaskTestNonzero.
SimdApiTest (already in the Windows suite) gates correctness; new SimdBenchmarkTest
times native vs an inline Java scalar loop over a 64K workload, verifies the native
result matches, and logs CN1SS:SIMD:BENCH ... speedup=Nx so CI shows the benefit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: DPAPI-backed SecureStorage (keystore)
Implements getSecureStorage() (status.md gap 4) so the networking layer can read
API keys / tokens at rest. WindowsSecureStorage encrypts each value with the
Windows Data Protection API (CryptProtectData, bound to the current user's logon)
via native dpapiProtect/dpapiUnprotect and persists the ciphertext through CN1
Storage -- the desktop analog of the iOS keychain / Android
EncryptedSharedPreferences non-prompting store. The biometric-prompting overloads
map to the same store (DPAPI is itself the user-account auth boundary). crypt32
added to the link set. SecureStorageTest round-trips set/get/remove in the suite
(self-skips where unsupported, e.g. JS).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: local notifications via Shell_NotifyIcon
Implements scheduleLocalNotification/cancelLocalNotification (status.md gap 4),
mirroring the JavaSE desktop semantic: while the app runs a Timer fires the
notification at its scheduled time (with REPEAT_* support) and Shell_NotifyIcon
shows a tray balloon; clicking it routes the id (WM_CN1_TRAY -> drainInput poll)
to the app's LocalNotificationCallback. Native tray/balloon lives in
cn1_windows_notify.c, marshaled to the window-owning pump thread via WM_CN1_NOTIFY.
Background scheduling fires only while the process runs (no OS scheduler survives
app exit on desktop) -- a documented limitation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: audio recording via waveIn -> WAV
Implements createMediaRecorder / captureAudio (status.md gap 4): records from the
default mic via the classic waveIn (winmm) API to a 16-bit PCM WAV, a worker
thread draining capture buffers to disk and patching the RIFF/data sizes on stop
(cn1_windows_audiorec.c + WindowsAudioRecorder). getAvailableRecordingMimeTypes
reports audio/wav (also decodable by the port's MF playback). waveIn over an MF
encode pipeline: dependency-free, no codec negotiation. winmm added to the link
set. Verified on a real Windows ARM64 VM: compiles clean and waveIn captures
88200 bytes/s from the mic.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: biometric (Windows Hello) via WinRT + CN1_HAVE_WINRT guard
Implements getBiometrics() (status.md gap 4) backed by the WinRT
UserConsentVerifier (face/fingerprint/PIN): isSupported()/canAuthenticate() map
to CheckAvailabilityAsync, authenticate() runs the Hello prompt off the EDT and
completes the AsyncResource. WinRT is consumed via the WRL ABI projection
(cn1_windows_winrt.cpp) -- the same COM mechanism the Media Foundation layer uses,
no cppwinrt needed.
Adds the CN1_HAVE_WINRT build gate: the generated CMake probes the toolchain
(check_cxx_source_compiles of a minimal WinRT TU + runtimeobject) and defines
CN1_HAVE_WINRT / links runtimeobject only when WinRT is available, so a
cross-compile sysroot without WinRT compiles the natives as honest 'unsupported'
stubs and stays green -- mirroring the WebView2 gate. This same file/gate will
carry the upcoming WinRT location/contacts/share services.
Verified on a real Windows ARM64 VM: both the real and stub native paths compile;
a standalone WRL test activates UserConsentVerifier and awaits the async op,
correctly reporting DeviceNotPresent on the VM (no Hello hardware).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: location via WinRT Geolocator
Implements getLocationManager() (status.md gap 4) -> WindowsLocationManager backed
by the WinRT Geolocator (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate).
getCurrentLocation/getLastKnownLocation resolve one fix (lat/lon/accuracy/altitude/
heading/speed); a continuous LocationListener is served by a polling thread. When
Windows location is disabled / no provider answers it reports OUT_OF_SERVICE /
throws (no fabricated fix), and getLocationManager returns null on a WinRT-less
build. Verified on the Windows ARM64 VM: Geolocator activates and GetGeoposition
returns E_ACCESSDENIED (location off there), surfaced honestly as unavailable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: contacts via WinRT ContactStore
Implements getAllContacts/getContactById (status.md gap 4) via the WinRT
ContactStore (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate). One native call
returns every contact as a delimited blob (id/name/phone/email, read via the base
IContact + versioned IContact2/IContactManagerStatics2 interfaces) which the impl
parses and briefly caches so the base's id-then-fetch loop shares a single store
read. Returns nothing when the store is inaccessible (no WinRT / access denied),
never fabricated. Compiles on the Windows ARM64 VM via the proven WRL await
pattern.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: system share via WinRT DataTransferManager
Implements isNativeShareSupported()/share() (status.md gap 4) via the WinRT
DataTransferManager: the EDT-facing shareText marshals to the window thread
(WM_CN1_SHARE), where IDataTransferManagerInterop GetForWindow +
ShowShareUIForWindow open the system share flyout for the unpackaged Win32 window
and a DataRequested handler supplies text/title (cn1_windows_winrt.cpp, same
CN1_HAVE_WINRT gate). Shares text today (image-file sharing via SetStorageItems
is a follow-up). Also documents that print has no CN1 core API to hook. Compiles
(real + stub) on the Windows ARM64 VM; the flyout is interactive.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: camera still capture (capturePhoto) via Media Foundation
Implements the legacy Capture API capturePhoto (status.md gap 3): grabs one real
frame from the default webcam via Media Foundation (MFEnumDeviceSources(VIDCAP) ->
IMFSourceReader -> RGB32, discarding the first few frames so exposure settles),
returns it as a CN1 ARGB int[], and Java encodes the PNG + writes the file
(cn1_windows_camera.cpp). A desktop has no built-in capture UI, so this is the
honest snapshot -- a real frame, never synthetic. createCameraImpl() (live preview
peer / video) stays null pending generic native-peer placement (gap 5a).
Verified on the Windows ARM64 VM: a 640x480 frame with genuine image data (~150K
non-zero pixels) is captured from the passed-through camera.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: drop C++ STL from WinRT natives (cross-compile fix)
The Linux cross-compile failed compiling cn1_windows_winrt.cpp: the <string> I
used for the contacts/share string building pulls the MSVC STL (yvals_core.h),
which hard-asserts a very recent Clang (STL1000) that the xwin cross-toolchain
predates. The rest of the port is deliberately STL-free for exactly this reason
(cn1_windows_browser.cpp's std:: usage is behind the WebView2 gate, off on the
cross-compile). Replace std::string/std::wstring with a small C growable buffer
(CN1Buf) and owned WCHAR* so no C++ STL header is pulled (verified via
/showIncludes: zero STL headers, only the C <string.h>). clean-target (real
Windows, newer clang) already passed; this unblocks the cross-compile gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: drop the CN1_HAVE_WINRT guard, always compile real WinRT
The guard was defensive insurance against a build toolchain lacking WinRT, but the
cross-compile CI proved otherwise: it had already DEFINED CN1_HAVE_WINRT (the
earlier STL error fired inside the #ifdef on the Linux/xwin leg), i.e. the
xwin-laid-out SDK ships the WinRT ABI headers + runtimeobject. So stub mode never
actually triggered anywhere the port builds.
Remove the CMake probe + the per-function #ifdef/#else stub branches and link
runtimeobject unconditionally; the compiled output is identical to the prior green
build, just simpler. Also drop the now-pointless locationSupported() native
(getLocationManager always returns the manager; isNativeShareSupported checks for
a host window). Each service still degrades honestly at RUNTIME (no device /
disabled / denied -> false/null). status.md updated to drop the gate/stub language.
Trade-off (accepted): a future build SDK without WinRT would now fail the build
instead of degrading to stubs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: cross-build the suite exe on Linux, run it on Windows (real shipping pipeline)
Adds a CI workflow that mirrors how a Codename One app is actually shipped for
Windows -- compiled on a (Linux) build host, executed on the user's Windows
machine. Neither existing Windows workflow did this end to end:
windows-cross-compile.yml builds on Linux but only links (never runs), and
parparvm-tests-windows.yml builds natively on Windows and runs. Here the binary
that renders on Windows is the exact one cross-compiled on Linux, browser
included.
New workflow windows-cross-build-run.yml, artifact-chained within a run:
- cross-build (ubuntu): pins LLVM 19 (the WebView2 peer's MSVC STL needs a recent
clang), lays out the Windows SDK via xwin, fetches the WebView2 NuGet SDK on
Linux, builds core + Windows port + the hellocodenameone suite, then translates
and clang-cl/xwin cross-compiles the full suite .exe (WebView2 linked) and
uploads it.
- run-on-windows (windows-latest x64): downloads that Linux-built exe and runs the
full screenshot suite over the cn1ss WebSocket, capturing ~112 PNGs.
- compare-comment (ubuntu): diffs the screenshots against the in-repo baseline
and posts them to the PR under a distinct marker.
Harness refactor (CleanTargetIntegrationTest):
- Split buildHelloCodenameOneExe into a host-agnostic translateHelloSuiteDist()
(pure Java translation) + the native clang-cl build, and extract crossBuildDist()
out of crossCompilesWindowsExeWithXwin so a dist can be cross-compiled on Linux.
- Add crossBuildsHelloSuiteExe(): translates the full suite and xwin-cross-builds
it to CN1_CROSS_EXE_OUT (WEBVIEW2_SDK_DIR flows into the generated CMake).
- capturesHelloSuiteOverWebSocket honors CN1_PREBUILT_EXE to run a provided exe
instead of building one, so the Windows runner only needs a JDK.
scripts/windows/fetch-webview2-sdk.sh: portable bash counterpart of the PowerShell
fetch (curl the NuGet nupkg + unzip), laying out build/native for the Linux
cross-build's WEBVIEW2_SDK_DIR.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows cross-build/run: build hellocodenameone-common under Xvfb
The CN1 CSS compiler (codenameone-maven-plugin css goal) renders via CEF/AWT and
threw java.awt.HeadlessException on the display-less Linux cross-build runner.
Install xvfb and run the hellocodenameone-common Maven build under a virtual X
server so the css/transcode rendering has a display. Core + port + LLVM 19 + xwin
+ WebView2 SDK fetch all succeeded before this point; this unblocks the actual
suite cross-compile.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: surface SIMD benchmark in PR comment + refresh stale StatusBarTap golden
Two CI/reporting fixes flagged on the PR:
- SIMD benchmark now appears in the PR comment. SimdBenchmarkTest emits ready-to-render
"CN1SS:SIMD:STAT <key> : <value>" lines (backend, int-add/float-mul speedup, correctness);
the capture harness collects them into windows-simd-stats.txt next to the PNGs; cn1ss.sh
passes it via --extra-stats and RenderScreenshotReport renders it as a Benchmark Results
table (the same mechanism iOS uses for base64-performance-stats.txt). Both the native and
cross-compiled comment jobs lift the file to the artifacts dir cn1ss scans.
- The StatusBarTapDiagnosticScreenshotTest tile that showed as "updated" was a stale golden,
not flakiness: the native-Windows and Linux-cross renders are byte-identical to each other
(deterministic) and the glass-pane content (counter 0->3, scroll, native:no) is correct;
only the earlier golden's tile scroll positions differed. Refreshed the golden to the
current deterministic render.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows port: generic native-peer placement + device camera API (com.codename1.camera)
Native peer support (cn1_windows_peer.cpp + WindowsGenericPeer): wraps an
@NativeInterface-returned child HWND (boxed as long[]) in a PeerComponent that
reparents it onto the host window and tracks the lightweight component's bounds
(peerInitialized/peerSetBounds/peerSetVisible/peerDeinitialized), the analog of iOS
NativeIPhoneView. In the offscreen screenshot pipeline -- where a live HWND is not
composited -- it falls back to a PrintWindow peer image, mirroring the WebView2 peer.
WindowsImplementation.createNativePeer now routes long[] handles here, so the generic
peer-placement path (gap #5a) exists for native-interface widgets.
Camera (WindowsCameraImpl + cn1_windows_camera.cpp): implements the device camera
API (com.codename1.camera.CameraImpl), not just the legacy capturePhoto. A Media
Foundation source-reader session runs on a worker thread keeping the latest frame;
the preview is an image-based PeerComponent (browser-style, so it renders headlessly
and live), takePhoto encodes the freshest frame, enumerateCameras lists devices, and
a frame listener is polled at its fps. Video recording / flash / optical zoom /
focus-point are honestly reported unsupported (a generic webcam exposes none via the
source reader), per the port's "real or unsupported" rule. createCameraImpl now
returns this instead of null.
Also: register cn1_windows_peer.cpp in the clean-target native compile list, and trim
status.md to a TODO-only list (camera + native peers moved out of the gaps) so the
file can be deleted at merge. Windows port compiles clean; native build verified by
the cross-compile leg.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows camera: drop Thread.setDaemon (not in the ParparVM runtime)
The clean (ParparVM) target has no java.lang.Thread.setDaemon, so the translated
WindowsCameraImpl.c failed to compile (undeclared virtual_java_lang_Thread_setDaemon).
The takePhoto worker exits as soon as the frame is captured, so a non-daemon thread
is fine.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows camera: enable advanced video processing (YUY2/NV12 -> RGB32)
VM testing on a real webcam revealed the source reader was delivering 2-bpp frames
(YUY2, the common webcam format: 640x480 -> 614400 bytes), not RGB32 (1228800),
because SetCurrentMediaType(RGB32) is silently ignored unless the reader's video
processor is enabled. The width*height*4 size check then rejected every frame, so
the preview/session captured nothing.
Create both source readers (the continuous session and the legacy capturePhoto)
with MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING=TRUE so the reader inserts a
converter to RGB32. Verified on the Windows ARM64 VM: the worker-thread session now
delivers full 640x480 RGB32 frames (polled via cameraSessionLatestFrame), and the
generic-peer PrintWindow capture returns real pixels. CI runners have no camera, so
this path is only exercisable on a real machine.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows report: run the real shared benchmarks + per-arch (x64/arm64) comments
Two fixes to the Windows screenshot/benchmark report:
1. Real benchmarks instead of a toy SIMD micro-bench. Base64NativePerformanceTest
(the same shared test that produces the iOS/Metal base64 + image numbers) used to
early-return on the native Windows port because there is no app @NativeInterface
Base64 bridge. It now skips only the native-vs-CN1 base64 comparison and still runs
the CN1 + SIMD base64 and the image SIMD benchmarks (createMask / applyMask /
modifyAlpha / PNG encode, SIMD on vs off) -- gated on Simd.isSupported(), which
WindowsSimd provides. iOS/Android behaviour is unchanged (they have the native
bridge: hasNative=true, isWindows()=false make every new guard a no-op there). The
capture harness now collects the shared CN1SS:STAT: markers (not a bespoke one)
into windows-benchmark-stats.txt, and SimdBenchmarkTest emits via the same marker
so its raw-kernel numbers join the table. JPEG + native-base64 degrade honestly to
"unsupported"/"unavailable" on a webcam-less, bridge-less desktop.
2. Per-architecture comments. x64 (Intel) and arm64 each post a SEPARATE PR comment
with a distinct marker, each depending only on its own screenshot leg -- so one
architecture's pipeline failing or re-running no longer overrides or hides the
other's result (previously a single combined comment showed only x64 and was
skipped entirely if the x64 leg failed). Each comment carries that arch's own
benchmark table (SSE2 on x64, NEON on arm64).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows SIMD: implement the byte codec/image kernels (fixes SIMD slower-than-scalar)
The benchmark honestly showed base64 SIMD ~6.5x SLOWER than scalar and a few image
ops slower too. Root cause: the ops the Base64 SIMD codec and Image.createMask use
(shl/shrLogical/lookupBytes/pack+unpack interleaved/unpackLookup/packIntToByteTruncate)
were not implemented in WindowsSimd, so they fell through to the generic Simd scalar
DEFAULTS -- lane-scratch loops with per-op dispatch -- which are slower than the
straight-line scalar codec. Only the fused ops (replaceTopByteFromUnsignedBytes,
used by modifyAlpha -66%) were native and won. iOS implements all of these in NEON,
which is why iOS base64 SIMD is faster.
Implement them natively in cn1_windows_simd.c: NEON-vectorized on arm64 (mirroring
IOSSimd.m: vld3q/vst3q/vld4q/vst4q interleave, vshlq_u8 byte shifts), SSE2 on x64
(byte shift via 16-bit shift + per-byte mask, since SSE2 has no byte shift), and
scalar table lookups on both (exactly as IOSSimd does -- the lookup is scalar there
too). A native C kernel (one call, tight loop) already beats the scalar-default
fallback, so SIMD stops losing. unpackLookupBytesInterleaved4 matches the base Simd
contract precisely (out-of-range -> 0, returns the OR of all outputs).
Verified: NEON kernels pass a standalone correctness harness on the arm64 VM (shl/
shrLogical x8 shifts, unpack3/pack3/pack4 vs scalar references); the C compiles clean
arm64. x64/SSE2 correctness is gated by the base64/image SIMD validation in the
benchmark (byteArraysEqual) on CI. Also deletes Ports/WindowsPort/status.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Windows SIMD: native blendByMaskTestNonzeroSubstituteOnKeepEq (fixes modifyAlpha removeColor)
removeColor was the one image op still on the scalar fallback (8% slower on x64,
27% on arm64) -- WindowsSimd didn't override blendByMaskTestNonzeroSubstituteOnKeepEq,
so Image.modifyAlpha(alpha, removeColor) hit the generic Simd scalar default. Add it
as a fused vectorized blend (SSE2 on x64, NEON on arm64), mirroring its sibling
blendByMaskTestNonzero (which modifyAlpha uses and wins 70%). Verified on the arm64
VM: the NEON kernel matches the scalar reference exactly over mixed transparent /
removeColor-matching / opaque pixels. With this, every image SIMD op (createMask,
applyMask, modifyAlpha, modifyAlpha+removeColor, PNG encode) beats scalar.
(base64 SIMD remains ~2x slower on x64 -- it is bound by ParparVM per-native-call
overhead across ~6000 calls/encode plus the /O2 auto-vectorized scalar competitor;
on arm64 base64 ENCODE is already 25% faster. base64 SIMD is explicit opt-in;
standard Base64.encode uses the fast scalar path.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Test builds: optimize the translated C at -O2 (honest SIMD-vs-scalar benchmark)
The iOS and Mac UI-test builds ran -configuration Debug, which inherits Xcode's
GCC_OPTIMIZATION_LEVEL=0 (-O0) -- so the benchmark's scalar baseline was unvectorized
and the SIMD speedups were inflated (SIMD beating unoptimized code, not real shipping
code). Windows already builds the benchmark Release (/O2, auto-vectorized scalar),
which is why its SIMD margins are smaller and base64 micro-SIMD even loses there.
Override GCC_OPTIMIZATION_LEVEL to 2 (env CN1_TEST_OPT_LEVEL, 0/1/2/3/s) on both the
iOS and Mac test builds so the compiler auto-vectorizes the scalar baseline -- making
the SIMD-vs-scalar comparison apples-to-apples across all three ports. This is the
measurement foundation for pruning the SIMD that doesn't beat optimized scalar.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Base64 SIMD: gate the byte codec to platforms where it beats autovectorized scalar
The O2 measurement settled it: with -O2 the explicit NEON Base64 codec is still
75-83% faster than autovectorized scalar on Apple Silicon (iOS/Mac), but on x86-64
it is ~2x slower -- /O2 already autovectorizes the scalar codec and SSE2 has no
3-way interleave, so the per-op SIMD just adds overhead. The fused image kernels
win on every platform (they can't be autovectorized) and are unaffected.
Add Simd.isByteShuffleAccelerated() -- true only where the chained byte
shuffle/interleave pipeline actually beats scalar: IOSSimd returns true (NEON);
WindowsSimd returns it as a per-arch native constant (arm64 true, x86-64 false);
the base scalar Simd returns false. Base64.encodeNoNewlineSimd / decodeNoWhitespaceSimd
consult it and, when false, skip the SIMD loop so the autovectorized scalar tail
encodes everything (still fully correct -- SimdTest's scalar-vs-SIMD equality test
passes). So x86-64 base64 SIMD now matches scalar instead of losing 2x, while ARM
keeps its 75-83% win. The benchmark emits the gate state ("active (NEON)" vs "gated
to scalar"). These methods are explicit opt-in; no production code auto-uses them.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Fix SpotBugs + gate base64 SIMD off on all Windows + drop noisy PNG encode bench
- SpotBugs (the build-test failure): drop the redundant `simd != null` checks in
Base64.encode/decodeNoWhitespaceSimd -- Simd.get() is provably non-null there
(RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE).
- base64 decode was 27% slower on Windows arm64: my unpackLookupBytesInterleaved4 /
lookupBytes are scalar (iOS uses NEON vqtbl), so the codec's decode loses there
even though encode wins -- not a clear net win. So WindowsSimd.isByteShuffleAccelerated()
is now false on both Windows arches (was true on arm64); only iOS/Mac (full NEON,
75-83% faster) enable the base64 SIMD path. Fused image kernels are unaffected.
- PNG/JPEG "encode SIMD on/off" ratio was misleading (+20% x64, -12.8% arm64): that
benchmark is dominated by the platform image encoder (native WIC on Windows), which
SIMD does not touch, so the ratio is encoder noise, not a SIMD measurement. Removed
it; the four pure image ops (createMask/applyMask/modifyAlpha/removeColor) measure
the SIMD-affected work directly and all win.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Base64: one declaration per line (PMD OneDeclarationPerLine)
Split the gate's combined byte[] declaration into one per line to satisfy PMD
(the SpotBugs fix in the prior commit cleared; this is the remaining static-analysis
nit on the same code).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent e9491e3 commit e3cb4e5
39 files changed
Lines changed: 5352 additions & 389 deletions
File tree
- .github/workflows
- CodenameOne/src/com/codename1
- impl
- util
- Ports
- JavaSE/src/com/codename1/impl/javase
- WindowsPort
- nativeSources
- src/com/codename1/impl/windows
- iOSPort/src/com/codename1/impl/ios
- scripts
- common/java
- hellocodenameone/common/src/main/java/com/codenameone/examples/hellocodenameone/tests
- lib
- windows
- screenshots
- vm
- ByteCodeTranslator/src/com/codename1/tools/translator
- tests/src/test/java/com/codename1/tools/translator
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
494 | 495 | | |
495 | 496 | | |
496 | 497 | | |
| |||
503 | 504 | | |
504 | 505 | | |
505 | 506 | | |
506 | | - | |
507 | 507 | | |
508 | 508 | | |
509 | 509 | | |
510 | 510 | | |
511 | 511 | | |
512 | | - | |
513 | | - | |
514 | | - | |
515 | 512 | | |
516 | 513 | | |
517 | 514 | | |
518 | 515 | | |
519 | | - | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
520 | 546 | | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
521 | 568 | | |
522 | | - | |
523 | 569 | | |
524 | 570 | | |
525 | 571 | | |
526 | | - | |
527 | | - | |
528 | | - | |
| 572 | + | |
| 573 | + | |
529 | 574 | | |
530 | 575 | | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
535 | 580 | | |
536 | 581 | | |
537 | 582 | | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | 583 | | |
542 | | - | |
543 | | - | |
| 584 | + | |
| 585 | + | |
544 | 586 | | |
545 | | - | |
546 | | - | |
| 587 | + | |
547 | 588 | | |
548 | 589 | | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
553 | | - | |
| 590 | + | |
| 591 | + | |
554 | 592 | | |
| 593 | + | |
555 | 594 | | |
556 | 595 | | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
561 | | - | |
562 | | - | |
563 | | - | |
564 | 596 | | |
565 | 597 | | |
566 | | - | |
| 598 | + | |
567 | 599 | | |
568 | 600 | | |
569 | 601 | | |
0 commit comments