QVAC-18741 feat[api]: add standalone image upscaling support to the SDK#1990
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…Error for upscale mode mismatch
simon-iribarren
left a comment
There was a problem hiding this comment.
Both medium findings addressed cleanly in f83a5e5f:
- TSDoc on
upscale()— full block with@param/@returns/@throws/@example, and the critical disambiguating sentence pinningoutputs.length === 1regardless ofrepeats.@examplenow usesrepeats: 2so the multi-pass case is concrete. repeatssemantics — pinned consistently in the client TSDoc, the schema.describe()(now: "only the final image is emitted (outputs.length === 1)"), and the server-side comment inops/upscale.ts. No remaining ambiguity at any layer.
CI on this commit: check (sdk), validate-pr, changes, authorize, resolve-config all green; build running, desktop/iOS/Android device-farm smoke jobs queued behind it. The merge gate keeps tier1 approval + green smoke as separate requirements, so the desktop diffusion-standalone-upscaler-x4 is still the runtime signal we'll want to see before the merge button goes green.
The low / nit findings from my earlier comment (schema-level .refine() for model_src-required-in-diffusion-mode, missing examples/diffusion-standalone-upscale.ts, z.infer → z.input callout in the PR body, PNG-magic check in validateStandaloneUpscale, retrofit diffusion.ts's client with the same StreamEndedError discipline you added here) all remain non-blocking — happy to file follow-up tickets if useful.
Nice refactor overall — the mode discriminator is a much cleaner shape than the first iteration.
QVAC E2E —
|
QVAC E2E —
|
|
/review |
🎯 What problem does this PR solve?
ESRGAN upscalers were only reachable as a post-step inside a diffusion (txt2img / img2img) pipeline; consumers had no way to feed an arbitrary PNG/JPEG into the SDK and get an upscaled image back.
📝 How does it solve it?
The existing
sdcpp-generationplugin gains a standalone-upscale path viamodelConfig.mode = "upscale". Consumers keep usingmodelType: "diffusion", and the SDK reuses one logger namespace, one addon package, and one canonical model type for both image generation and standalone upscaling.mode: "diffusion" | "upscale"tosdcppConfigSchema. Whenmode === "upscale":resolveConfigshort-circuits — no auxiliary encoders/VAEs are downloaded; the primarymodelSrcIS the ESRGAN file.createModelinstantiatesEsrganUpscalerfrom@qvac/diffusion-cppinstead ofImgStableDiffusion.tile_size,direct,offload_params_to_cpu,threads) are honored;upscaler.model_srcis ignored.upscale({ modelId, image, repeats? })client API inclient/api/upscale.ts, returning{ outputs, stats }promises (mirrors the diffusion client shape).upscaleStreamRPC handler that emits base64-encoded PNG chunks and resolves stats ondone. Wired intohandler-registry.ts,handlers/index.ts, and therequestSchema/responseSchemadiscriminated unions inschemas/common.ts.ModelLoadFailedErrorif the caller setsmodelConfig.upscalerbut forgetsmodel_src, instead of letting the native addon error mid-load.upscale()against a model that wasn't loaded withmode: "upscale"raisesModelOperationNotSupportedErrorupfront (no nativeTypeErrorpropagation).@qvac/diffusion-cpp^0.6.0→^0.7.0(forEsrganUpscaler).🧪 How was it tested?
diffusion-standalone-upscaler-x4e2e test intests-qvac/tests/diffusion-tests.ts: upscales the 64×64small-64.jpgfixture and asserts the output PNG IHDR reads 256×256 (validating both therepeats: 1path and the model's native 4× scale factor).test/unit/sdcpp-plugin.test.ts: new branches assert themode: "upscale"discriminator, the optional-model_srcshape, and that the diffusion-mode upscale forwarding path is unchanged. 67/67 tests pass locally (bun run test:unit).🔌 API Changes
New public surface (purely additive — no signatures changed):
upscale()client functionUpscaleClientParams,UpscaleStreamResponse,UpscaleStatstypes (re-exported from@qvac/sdk)mode: "diffusion" | "upscale"field onsdcppConfigSchema(optional, defaults to"diffusion")