Skip to content

Commit e2e8fdd

Browse files
committed
Merge branch 'main' into lstein/fix/zit-qwen3-choice
# Conflicts: # invokeai/frontend/web/src/app/store/middleware/listenerMiddleware/listeners/modelSelected.test.ts
2 parents 477df8a + 75f1992 commit e2e8fdd

34 files changed

Lines changed: 130890 additions & 304 deletions

File tree

docs/src/content/docs/features/External Models/alibabacloud.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,16 +39,16 @@ DashScope has separate international (`dashscope-intl.aliyuncs.com`) and China (
3939
| **Qwen Image 2.0 Pro** | txt2img | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | Best quality, 2K output, excellent bilingual text. |
4040
| **Qwen Image 2.0** | txt2img | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | Faster / cheaper 2K sibling of 2.0 Pro. |
4141
| **Qwen Image Max** | txt2img | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | High quality at ~1.3K native size. |
42-
| **Qwen Image Edit Max** | txt2img + reference images | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | Image editing with industrial / geometric reasoning. Accepts up to 3 reference images. |
42+
| **Qwen Image Edit Max** | txt2img (with reference images) | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | Reference-image-driven generation with industrial / geometric reasoning. Accepts up to 14 reference images. |
4343
| **Wan 2.6 Text-to-Image** | txt2img | 1:1, 4:3, 3:4, 16:9, 9:16 | up to 4 | Photorealistic T2I at 1K. |
4444

45-
All models support **seed**. Negative prompts are not currently plumbed through to DashScope, so the negative prompt input is ignored for these providers.
45+
All models support **seed**. Negative prompts are not currently plumbed through to DashScope, so the negative prompt input is ignored for these providers. None of the Alibaba Cloud models support img2img (denoising-strength edits) or inpaint (mask-based edits) in Invoke today.
4646

4747
## Tips
4848

4949
<Steps>
5050
1. Bilingual prompts. Qwen Image is unusually good at rendering Chinese text and mixed-language prompts — it's a strong choice when your prompt or desired output contains non-Latin script.
51-
2. Editing is only supported by Qwen Image Edit Max. Provide up to 3 reference images via the reference-images panel; masks and denoising strength are not supported for this provider.
51+
2. Reference-image input is only accepted by Qwen Image Edit Max — provide images via the reference-images panel. Masks and denoising strength are not supported for any Alibaba Cloud model.
5252
3. Batching is capped at 4 images per request. Larger batches are split across multiple API calls.
5353
4. Costs vary per model — Qwen Image 2.0 Pro is the most expensive, Qwen Image 2.0 the cheapest of the 2.0 family. Check Alibaba Cloud's pricing page before running large batches.
5454
</Steps>

docs/src/content/docs/features/External Models/gemini.mdx

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,13 @@ Restart Invoke for the change to take effect.
3131
3232
| Model | Modes | Reference Images | Notes |
3333
| --- | --- | --- | --- |
34-
| **Gemini 2.5 Flash Image** | txt2img, img2img, inpaint | Yes | 10 aspect ratios, fixed per-ratio resolutions. |
35-
| **Gemini 3 Pro Image Preview** | txt2img, img2img, inpaint | Up to 14 (6 object + 5 character) | 1K / 2K / 4K resolution presets. |
36-
| **Gemini 3.1 Flash Image Preview** | txt2img, img2img, inpaint | Up to 14 (10 object + 4 character) | 512 / 1K / 2K / 4K resolution presets. |
34+
| **Gemini 2.5 Flash Image** | txt2img | Yes | 10 aspect ratios, fixed per-ratio resolutions. |
35+
| **Gemini 3 Pro Image Preview** | txt2img | Up to 14 (6 object + 5 character) | 1K / 2K / 4K resolution presets. |
36+
| **Gemini 3.1 Flash Image Preview** | txt2img | Up to 14 (10 object + 4 character) | 512 / 1K / 2K / 4K resolution presets. |
3737
38-
All Gemini models are single-image-per-request — batch size is fixed at 1. To generate multiple variations, queue multiple invocations.
39-
40-
## Provider-Specific Options
38+
Reference-image input is used to condition generation but counts as txt2img — neither img2img (denoising strength) nor inpaint (mask) is supported for Gemini.
4139
42-
Gemini exposes a **temperature** control in the parameters panel. Lower values make outputs more deterministic, higher values increase variability.
40+
All Gemini models are single-image-per-request — batch size is fixed at 1. To generate multiple variations, queue multiple invocations.
4341
4442
## Tips
4543

docs/src/content/docs/features/External Models/index.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ External models appear in the model picker alongside locally installed models. G
1313
## Supported Providers
1414

1515
- [Google Gemini](/features/external-models/gemini/) — Gemini 2.5 Flash Image, Gemini 3 Pro Image Preview, Gemini 3.1 Flash Image Preview
16-
- [OpenAI](/features/external-models/openai/) — GPT Image 1 / 1.5 / 1-mini, DALL·E 3, DALL·E 2
16+
- [OpenAI](/features/external-models/openai/) — GPT Image 1 / 1.5 / 1-mini, DALL·E 3
17+
- [BytePlus Seedream](/features/external-models/seedream/) — Seedream 5.0, 5.0 Lite, 4.5, 4.0
18+
- [Alibaba Cloud DashScope](/features/external-models/alibabacloud/) — Qwen Image 2.0 / 2.0 Pro / Max / Edit Max, Wan 2.6 T2I
1719

1820
## Configuring API Keys
1921

@@ -44,7 +46,7 @@ Once installed, external models show up everywhere a model can be selected. Choo
4446

4547
Each external model declares its own **capabilities** — for example:
4648

47-
- Which generation modes it supports (`txt2img`, `img2img`, `inpaint`).
49+
- Which generation modes it supports (`txt2img`, `img2img`). Inpainting is not currently supported by any external provider.
4850
- Whether it accepts reference images, and how many.
4951
- Which aspect ratios and resolutions it allows.
5052
- Whether it supports a negative prompt, seed, or batch size > 1.

docs/src/content/docs/features/External Models/openai.mdx

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@ title: OpenAI
44

55
import { Steps } from '@astrojs/starlight/components'
66

7-
Invoke supports OpenAI's image generation models — both the GPT Image family and the older DALL·E models — through the OpenAI API.
7+
Invoke supports OpenAI's image generation models — the GPT Image family and DALL·E 3 — through the OpenAI API.
8+
9+
:::note[DALL·E 2 removed]
10+
DALL·E 2 was deprecated by OpenAI and is scheduled for shutdown on 2026-05-12. It is no longer offered as a starter model in Invoke.
11+
:::
812

913
## Getting an API Key
1014

@@ -31,11 +35,12 @@ Restart Invoke for the change to take effect.
3135
3236
| Model | Modes | Aspect Ratios | Batch | Notes |
3337
| --- | --- | --- | --- | --- |
34-
| **GPT Image 1.5** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | Fastest and cheapest GPT Image model. |
35-
| **GPT Image 1** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | Highest quality of the GPT Image family. |
36-
| **GPT Image 1 Mini** | txt2img, img2img, inpaint | 1:1, 3:2, 2:3 | up to 10 | ~80% cheaper than GPT Image 1. |
38+
| **GPT Image 1.5** | txt2img, img2img | 1:1, 3:2, 2:3 | up to 10 | Fastest and cheapest GPT Image model. |
39+
| **GPT Image 1** | txt2img, img2img | 1:1, 3:2, 2:3 | up to 10 | Highest quality of the GPT Image family. |
40+
| **GPT Image 1 Mini** | txt2img, img2img | 1:1, 3:2, 2:3 | up to 10 | ~80% cheaper than GPT Image 1. |
3741
| **DALL·E 3** | txt2img only | 1:1, 7:4, 4:7 | 1 | No reference-image / edit support. |
38-
| **DALL·E 2** | txt2img, img2img, inpaint | 1:1 | up to 10 | Square only. |
42+
43+
Inpainting (mask-based editing) is not currently supported for any OpenAI model in Invoke. img2img on the GPT Image family routes through the `/v1/images/edits` endpoint without a mask.
3944

4045
## Provider-Specific Options
4146

@@ -54,7 +59,7 @@ DALL·E 2 and DALL·E 3 do not expose these options.
5459
## Tips
5560

5661
<Steps>
57-
1. Batching on GPT Image and DALL·E 2 tops out at 10 per request. Larger batches are split into multiple API calls.
62+
1. Batching on GPT Image tops out at 10 per request. Larger batches are split into multiple API calls.
5863
2. Costs can climb quickly with high-quality GPT Image generations. Start with GPT Image 1 Mini when iterating on prompts.
5964
3. Rate limits from OpenAI surface as failed invocations — retry after a short wait.
6065
</Steps>

invokeai/app/invocations/anima_model_loader.py

Lines changed: 4 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,10 @@
99
from invokeai.app.invocations.model import (
1010
ModelIdentifierField,
1111
Qwen3EncoderField,
12-
T5EncoderField,
1312
TransformerField,
1413
VAEField,
1514
)
1615
from invokeai.app.services.shared.invocation_context import InvocationContext
17-
from invokeai.app.util.t5_model_identifier import (
18-
preprocess_t5_encoder_model_identifier,
19-
preprocess_t5_tokenizer_model_identifier,
20-
)
2116
from invokeai.backend.model_manager.taxonomy import BaseModelType, ModelType, SubModelType
2217

2318

@@ -28,15 +23,14 @@ class AnimaModelLoaderOutput(BaseInvocationOutput):
2823
transformer: TransformerField = OutputField(description=FieldDescriptions.transformer, title="Transformer")
2924
qwen3_encoder: Qwen3EncoderField = OutputField(description=FieldDescriptions.qwen3_encoder, title="Qwen3 Encoder")
3025
vae: VAEField = OutputField(description=FieldDescriptions.vae, title="VAE")
31-
t5_encoder: T5EncoderField = OutputField(description=FieldDescriptions.t5_encoder, title="T5 Encoder")
3226

3327

3428
@invocation(
3529
"anima_model_loader",
3630
title="Main Model - Anima",
3731
tags=["model", "anima"],
3832
category="model",
39-
version="1.3.0",
33+
version="1.4.0",
4034
classification=Classification.Prototype,
4135
)
4236
class AnimaModelLoaderInvocation(BaseInvocation):
@@ -46,7 +40,9 @@ class AnimaModelLoaderInvocation(BaseInvocation):
4640
- Transformer: Cosmos Predict2 DiT + LLM Adapter (from single-file checkpoint)
4741
- Qwen3 Encoder: Qwen3 0.6B (standalone single-file)
4842
- VAE: AutoencoderKLQwenImage / Wan 2.1 VAE (standalone single-file or FLUX VAE)
49-
- T5 Encoder: T5-XXL model (only the tokenizer submodel is used, for LLM Adapter token IDs)
43+
44+
The T5-XXL tokenizer needed for LLM Adapter token IDs is bundled in the package,
45+
so no T5-XXL encoder model needs to be installed.
5046
"""
5147

5248
model: ModelIdentifierField = InputField(
@@ -72,13 +68,6 @@ class AnimaModelLoaderInvocation(BaseInvocation):
7268
title="Qwen3 Encoder",
7369
)
7470

75-
t5_encoder_model: ModelIdentifierField = InputField(
76-
description="T5-XXL encoder model. The tokenizer submodel is used for Anima text encoding.",
77-
input=Input.Direct,
78-
ui_model_type=ModelType.T5Encoder,
79-
title="T5 Encoder",
80-
)
81-
8271
def invoke(self, context: InvocationContext) -> AnimaModelLoaderOutput:
8372
# Transformer always comes from the main model
8473
transformer = self.model.model_copy(update={"submodel_type": SubModelType.Transformer})
@@ -90,13 +79,8 @@ def invoke(self, context: InvocationContext) -> AnimaModelLoaderOutput:
9079
qwen3_tokenizer = self.qwen3_encoder_model.model_copy(update={"submodel_type": SubModelType.Tokenizer})
9180
qwen3_encoder = self.qwen3_encoder_model.model_copy(update={"submodel_type": SubModelType.TextEncoder})
9281

93-
# T5 Encoder (only tokenizer submodel is used by Anima)
94-
t5_tokenizer = preprocess_t5_tokenizer_model_identifier(self.t5_encoder_model)
95-
t5_encoder = preprocess_t5_encoder_model_identifier(self.t5_encoder_model)
96-
9782
return AnimaModelLoaderOutput(
9883
transformer=TransformerField(transformer=transformer, loras=[]),
9984
qwen3_encoder=Qwen3EncoderField(tokenizer=qwen3_tokenizer, text_encoder=qwen3_encoder),
10085
vae=VAEField(vae=vae),
101-
t5_encoder=T5EncoderField(tokenizer=t5_tokenizer, text_encoder=t5_encoder, loras=[]),
10286
)

invokeai/app/invocations/anima_text_encoder.py

Lines changed: 14 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,10 @@
2828
TensorField,
2929
UIComponent,
3030
)
31-
from invokeai.app.invocations.model import Qwen3EncoderField, T5EncoderField
31+
from invokeai.app.invocations.model import Qwen3EncoderField
3232
from invokeai.app.invocations.primitives import AnimaConditioningOutput
3333
from invokeai.app.services.shared.invocation_context import InvocationContext
34+
from invokeai.backend.anima.t5_tokenizer import load_bundled_t5_tokenizer
3435
from invokeai.backend.patches.layer_patcher import LayerPatcher
3536
from invokeai.backend.patches.lora_conversions.anima_lora_constants import ANIMA_LORA_QWEN3_PREFIX
3637
from invokeai.backend.patches.model_patch_raw import ModelPatchRaw
@@ -56,13 +57,13 @@
5657
title="Prompt - Anima",
5758
tags=["prompt", "conditioning", "anima"],
5859
category="conditioning",
59-
version="1.3.0",
60+
version="1.4.0",
6061
classification=Classification.Prototype,
6162
)
6263
class AnimaTextEncoderInvocation(BaseInvocation):
6364
"""Encodes and preps a prompt for an Anima image.
6465
65-
Uses Qwen3 0.6B for hidden state extraction and T5-XXL tokenizer for
66+
Uses Qwen3 0.6B for hidden state extraction and a bundled T5-XXL tokenizer for
6667
token IDs (no T5 model weights needed). Both are combined by the
6768
LLM Adapter inside the Anima transformer during denoising.
6869
"""
@@ -73,11 +74,6 @@ class AnimaTextEncoderInvocation(BaseInvocation):
7374
description=FieldDescriptions.qwen3_encoder,
7475
input=Input.Connection,
7576
)
76-
t5_encoder: T5EncoderField = InputField(
77-
title="T5 Encoder",
78-
description=FieldDescriptions.t5_encoder,
79-
input=Input.Connection,
80-
)
8177
mask: TensorField | None = InputField(
8278
default=None,
8379
description="A mask defining the region that this conditioning prompt applies to.",
@@ -193,18 +189,17 @@ def _encode_prompt(
193189
# Use last hidden state — only real tokens, no padding
194190
qwen3_embeds = outputs.hidden_states[-1][0] # Shape: (seq_len, 1024)
195191

196-
# --- Step 2: Tokenize with T5-XXL tokenizer (IDs only, no model) ---
192+
# --- Step 2: Tokenize with bundled T5-XXL tokenizer (IDs only, no model) ---
197193
context.util.signal_progress("Tokenizing with T5-XXL")
198-
t5_tokenizer_info = context.models.load(self.t5_encoder.tokenizer)
199-
with t5_tokenizer_info.model_on_device() as (_, t5_tokenizer):
200-
t5_tokens = t5_tokenizer(
201-
prompt,
202-
padding=False,
203-
truncation=True,
204-
max_length=T5_MAX_SEQ_LEN,
205-
return_tensors="pt",
206-
)
207-
t5xxl_ids = t5_tokens.input_ids[0] # Shape: (seq_len,)
194+
t5_tokenizer = load_bundled_t5_tokenizer()
195+
t5_tokens = t5_tokenizer(
196+
prompt,
197+
padding=False,
198+
truncation=True,
199+
max_length=T5_MAX_SEQ_LEN,
200+
return_tensors="pt",
201+
)
202+
t5xxl_ids = t5_tokens.input_ids[0] # Shape: (seq_len,)
208203

209204
return qwen3_embeds, t5xxl_ids, None
210205

invokeai/app/services/model_install/model_install_default.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ def __init__(
112112
self._stop_event = threading.Event()
113113
self._downloads_changed_event = threading.Event()
114114
self._install_completed_event = threading.Event()
115+
self._restore_completed_event = threading.Event()
116+
self._restore_completed_event.set()
115117
self._download_queue = download_queue
116118
self._download_cache: Dict[int, ModelInstallJob] = {}
117119
self._running = False
@@ -264,16 +266,23 @@ def _restore_incomplete_installs(self) -> None:
264266
self._safe_rmtree(job._install_tmpdir, self._logger)
265267

266268
def _restore_incomplete_installs_async(self) -> None:
269+
self._restore_completed_event.clear()
270+
267271
def _run() -> None:
268272
try:
269273
self._logger.info("Restoring incomplete installs")
270274
self._restore_incomplete_installs()
271275
self._logger.info("Finished restoring incomplete installs")
272276
except Exception as e:
273277
self._logger.error(f"Failed to restore incomplete installs: {e}")
278+
finally:
279+
self._restore_completed_event.set()
274280

275281
threading.Thread(target=_run, daemon=True).start()
276282

283+
def _wait_for_restore_complete(self) -> None:
284+
self._restore_completed_event.wait()
285+
277286
def _resume_remote_download(self, job: ModelInstallJob) -> None:
278287
job.status = InstallStatus.WAITING
279288
if job.download_parts:
@@ -459,6 +468,8 @@ def heuristic_import(
459468
return self.import_model(source_obj, config)
460469

461470
def import_model(self, source: ModelSource, config: Optional[ModelRecordChanges] = None) -> ModelInstallJob: # noqa D102
471+
self._wait_for_restore_complete()
472+
462473
similar_jobs = [x for x in self.list_jobs() if x.source == source and not x.in_terminal_state]
463474
if similar_jobs:
464475
self._logger.warning(f"There is already an active install job for {source}. Not enqueuing.")
@@ -506,6 +517,8 @@ def wait_for_job(self, job: ModelInstallJob, timeout: int = 0) -> ModelInstallJo
506517

507518
def wait_for_installs(self, timeout: int = 0) -> List[ModelInstallJob]: # noqa D102
508519
"""Block until all installation jobs are done."""
520+
self._wait_for_restore_complete()
521+
509522
start = time.time()
510523
while len(self._download_cache) > 0:
511524
if self._downloads_changed_event.wait(timeout=0.25): # in case we miss an event
@@ -762,7 +775,7 @@ def _remote_files_from_source(
762775
except ValueError:
763776
pass
764777

765-
return [RemoteModelFile(url=source.url, path=Path("."), size=0)], None
778+
return [RemoteModelFile(url=self._normalize_huggingface_blob_url(source.url), path=Path("."), size=0)], None
766779

767780
raise Exception(f"No files associated with {source}")
768781

@@ -1488,3 +1501,15 @@ def get_fetcher_from_url(url: str) -> Type[ModelMetadataFetchBase]:
14881501
if re.match(r"^https?://huggingface.co/[^/]+/[^/]+$", url.lower()):
14891502
return HuggingFaceMetadataFetch
14901503
raise ValueError(f"Unsupported model source: '{url}'")
1504+
1505+
@staticmethod
1506+
def _normalize_huggingface_blob_url(url: AnyHttpUrl) -> Url:
1507+
"""Convert Hugging Face file page URLs to direct download URLs."""
1508+
return Url(
1509+
re.sub(
1510+
r"^(https?://huggingface\.co/[^/]+/[^/]+)/blob/([^?#]+)([?#].*)?$",
1511+
r"\1/resolve/\2\3",
1512+
str(url),
1513+
flags=re.IGNORECASE,
1514+
)
1515+
)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Bundled T5-XXL tokenizer for Anima.
2+
3+
Anima tokenizes the prompt with the T5-XXL tokenizer to produce token IDs that
4+
index the LLM Adapter's learned embedding table. Only the tokenizer is needed —
5+
never the 9GB T5-XXL weights — so the tokenizer is vendored in the package as a
6+
self-contained fast tokenizer (tokenizer.json), avoiding both the large download
7+
and the sentencepiece runtime path.
8+
"""
9+
10+
from functools import lru_cache
11+
from pathlib import Path
12+
13+
from transformers import T5TokenizerFast
14+
15+
# Size of the LLM Adapter's token embedding table (T5 v1.1 vocab incl. 100 sentinel
16+
# extra_id tokens). Token IDs must stay within this range.
17+
ANIMA_T5_VOCAB_SIZE = 32128
18+
19+
_TOKENIZER_DIR = Path(__file__).parent / "tokenizer"
20+
21+
22+
@lru_cache(maxsize=1)
23+
def load_bundled_t5_tokenizer() -> T5TokenizerFast:
24+
"""Load the vendored T5-XXL fast tokenizer. Result is cached for the process."""
25+
return T5TokenizerFast.from_pretrained(_TOKENIZER_DIR)

0 commit comments

Comments
 (0)