Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 3.41 KB

File metadata and controls

47 lines (29 loc) · 3.41 KB

HLX Quirks

Candle Qwen3.5: tiny metadata mismatches can destroy coherence

Date: 2026-04-10

Area: candle-transformers/src/models/quantized_qwen35.rs

Qwen3.5 is a hybrid architecture: most layers are Gated DeltaNet / SSM layers, with full transformer attention only every fourth layer. That makes it unusually sensitive to small constants in the SSM path. A value that looks like harmless numerical tolerance can be applied 18 times in a 24-layer 0.8B model before the residual stream reaches the output head.

The bug we finally cracked was an SSM gated RMSNorm epsilon mismatch. The implementation hardcoded 1e-5; the GGUF metadata reports qwen35.attention.layer_norm_rms_epsilon = 1e-6. With 1e-5, the model loaded and produced grammatical text, but factual anchors were weak. The raw prompt The capital of France is greedily emitted the; after threading the metadata epsilon into ssm_norm, the same probe emitted Paris.

Important false leads that were ruled out:

  • RoPE decode position drift: traced offset against KV-cache length; they matched.
  • RoPE off-by-one: forced a negative shift; failure mode did not change.
  • Multi-token prefill vs token-by-token decode: chunk size 1 did not restore coherence.
  • ChatML/tokenizer mismatch: special tokens were recognized; prompt formatting changed symptoms but did not fix the core bug.
  • Qwen3.5 full-attention z-gate: the gate was already present.
  • GQA repeat ordering: Candle’s helper matched the HF repeat-interleave ordering on a toy check.
  • 1 + weight RMSNorm convention: made outputs worse for this GGUF.
  • Full-attention fp32 softmax: closer to HF eager attention but did not fix the failing probe.

The lesson is strict: do not hardcode dimensions, head counts, grouping factors, epsilons, norm conventions, RoPE dimensions, or layer layout assumptions in model ports. Pull them from GGUF metadata or the official config, and treat any fallback inference from tensor shape as suspect. In Qwen3.5 specifically, blk.0.attn_qkv.weight is an SSM projection, not a full-attention QKV tensor, so it must not be used to infer transformer KV-head geometry.

For future upstream work, keep the debug loop honest:

  • Use one-token probes for fast first-token regression signal.
  • Keep raw prompts separate from chat-wrapper prompts.
  • Confirm improvements on factual anchors like The capital of France is.
  • Do not mark chat-wrapper behavior fixed just because the core model path improved.

Reserved keywords that look like valid identifiers

Date: 2026-04-14

Several common English words are reserved keywords in HLX and will cause PARSE_ERROR if used as function names, parameter names, or variable names. Known reserved keywords:

As parameter names: fn, default, inner — cause "Expected parameter name or ')', found " parse error.

As function names: match, find, split — discovered when regex.hlx exported functions with these names. Rename with a module prefix (e.g. regex_match, regex_find, regex_split).

General rule: If a name is a keyword in Rust, there is a good chance it is also reserved in HLX. When naming functions or parameters, prefer module-prefixed names (limits_kind rather than kind) to avoid collisions with the flat namespace and reserved word list.

The full reserved keyword list is not formally documented — treat any unexpected PARSE_ERROR on a function or parameter name as a reserved keyword collision and rename.