Skip to content

QVAC-18741 feat[api]: add standalone image upscaling support to the SDK#1990

Merged
gianni-cor merged 10 commits into
tetherto:mainfrom
maxim-smotrov:feature/sdk-diffusion-upscaler-standalone
May 12, 2026
Merged

QVAC-18741 feat[api]: add standalone image upscaling support to the SDK#1990
gianni-cor merged 10 commits into
tetherto:mainfrom
maxim-smotrov:feature/sdk-diffusion-upscaler-standalone

Conversation

@maxim-smotrov
Copy link
Copy Markdown
Contributor

@maxim-smotrov maxim-smotrov commented May 11, 2026

🎯 What problem does this PR solve?

ESRGAN upscalers were only reachable as a post-step inside a diffusion (txt2img / img2img) pipeline; consumers had no way to feed an arbitrary PNG/JPEG into the SDK and get an upscaled image back.

📝 How does it solve it?

The existing sdcpp-generation plugin gains a standalone-upscale path via modelConfig.mode = "upscale". Consumers keep using modelType: "diffusion", and the SDK reuses one logger namespace, one addon package, and one canonical model type for both image generation and standalone upscaling.

  • Adds mode: "diffusion" | "upscale" to sdcppConfigSchema. When mode === "upscale":
    • resolveConfig short-circuits — no auxiliary encoders/VAEs are downloaded; the primary modelSrc IS the ESRGAN file.
    • createModel instantiates EsrganUpscaler from @qvac/diffusion-cpp instead of ImgStableDiffusion.
    • Only the upscaler tuning fields (tile_size, direct, offload_params_to_cpu, threads) are honored; upscaler.model_src is ignored.
  • Adds upscale({ modelId, image, repeats? }) client API in client/api/upscale.ts, returning { outputs, stats } promises (mirrors the diffusion client shape).
  • Adds upscaleStream RPC handler that emits base64-encoded PNG chunks and resolves stats on done. Wired into handler-registry.ts, handlers/index.ts, and the requestSchema / responseSchema discriminated unions in schemas/common.ts.
  • Diffusion-mode load fails fast with a structured ModelLoadFailedError if the caller sets modelConfig.upscaler but forgets model_src, instead of letting the native addon error mid-load.
  • Calling upscale() against a model that wasn't loaded with mode: "upscale" raises ModelOperationNotSupportedError upfront (no native TypeError propagation).
  • Bumps @qvac/diffusion-cpp ^0.6.0^0.7.0 (for EsrganUpscaler).

🧪 How was it tested?

  • New diffusion-standalone-upscaler-x4 e2e test in tests-qvac/tests/diffusion-tests.ts: upscales the 64×64 small-64.jpg fixture and asserts the output PNG IHDR reads 256×256 (validating both the repeats: 1 path and the model's native 4× scale factor).
  • Unit coverage in test/unit/sdcpp-plugin.test.ts: new branches assert the mode: "upscale" discriminator, the optional-model_src shape, and that the diffusion-mode upscale forwarding path is unchanged. 67/67 tests pass locally (bun run test:unit).

🔌 API Changes

New public surface (purely additive — no signatures changed):

  • upscale() client function
  • UpscaleClientParams, UpscaleStreamResponse, UpscaleStats types (re-exported from @qvac/sdk)
  • New mode: "diffusion" | "upscale" field on sdcppConfigSchema (optional, defaults to "diffusion")
import { upscale, loadModel, REALESRGAN_X4PLUS_ANIME_6B } from "@qvac/sdk";

// Load the ESRGAN model in standalone-upscale mode.
const modelId = await loadModel(REALESRGAN_X4PLUS_ANIME_6B, {
  modelType: "diffusion",
  modelConfig: {
    mode: "upscale",
    upscaler: { tile_size: 128 }, // optional tuning
  },
});

// Run an upscale job.
const { outputs, stats } = upscale({
  modelId,
  image: pngBytes,   // Uint8Array (PNG/JPEG)
  repeats: 1,        // optional, defaults to 1; each pass multiplies dims by the model's native scale factor
});

const [upscaledPng] = await outputs;
console.log(await stats); // { upscaleMs, totalUpscaleMs, width, height, totalPixels, repeats, ... }

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@simon-iribarren simon-iribarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both medium findings addressed cleanly in f83a5e5f:

  • TSDoc on upscale() — full block with @param / @returns / @throws / @example, and the critical disambiguating sentence pinning outputs.length === 1 regardless of repeats. @example now uses repeats: 2 so the multi-pass case is concrete.
  • repeats semantics — pinned consistently in the client TSDoc, the schema .describe() (now: "only the final image is emitted (outputs.length === 1)"), and the server-side comment in ops/upscale.ts. No remaining ambiguity at any layer.

CI on this commit: check (sdk), validate-pr, changes, authorize, resolve-config all green; build running, desktop/iOS/Android device-farm smoke jobs queued behind it. The merge gate keeps tier1 approval + green smoke as separate requirements, so the desktop diffusion-standalone-upscaler-x4 is still the runtime signal we'll want to see before the merge button goes green.

The low / nit findings from my earlier comment (schema-level .refine() for model_src-required-in-diffusion-mode, missing examples/diffusion-standalone-upscale.ts, z.inferz.input callout in the PR body, PNG-magic check in validateStandaloneUpscale, retrofit diffusion.ts's client with the same StreamEndedError discipline you added here) all remain non-blocking — happy to file follow-up tickets if useful.

Nice refactor overall — the mode discriminator is a much cleaner shape than the first iteration.

@maxim-smotrov maxim-smotrov added test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] and removed test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] labels May 12, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

QVAC E2E — ios⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

QVAC E2E — windows — ✅ all tests passed (91/91, 516s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

QVAC E2E — android⚠️ no results

Config: suite=smoke · filter=(none) · exclude=(none)
View run

The test job did not produce a results artifact. Check the run for job-level failures.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

QVAC E2E — linux — ✅ all tests passed (91/91, 357s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

QVAC E2E — macos — ✅ all tests passed (91/91, 342s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@gianni-cor
Copy link
Copy Markdown
Contributor

/review

@gianni-cor gianni-cor merged commit 8b442aa into tetherto:main May 12, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants