Skip to content

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture#58

Merged
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
kswanjitsu:fix/mlx-opf-weight-remapping
May 27, 2026
Merged

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture#58
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
kswanjitsu:fix/mlx-opf-weight-remapping

Conversation

@kswanjitsu

Copy link
Copy Markdown

Problem

The openai_privacy_filter model family (e.g. OpenMed/privacy-filter-nemotron) cannot be loaded via create_mlx_pipeline because its HuggingFace weight namespace does not match the MLX model's namespace. All 140 parameters are rejected with ValueError: Received 140 parameters not in model.

Root cause

The existing remap_key() function handles BERT/RoBERTa/DeBERTa/DistilBERT/ELECTRA architectures but has no case for openai_privacy_filter. The OPF architecture differs from all others:

  • Different top-level prefix (score.*unembedding.*)
  • Different layer container (model.layers.N.*block.N.*)
  • QKV fusion: HF has separate q/k/v_proj, MLX has single attn.qkv
  • Different sub-key names (input_layernormattn.norm, routergate)
  • RMSNorm uses scale not weight
  • Expert weights already in [E, in, out] format (no transpose needed)

Fix

Add _convert_opf_weights() and wire into convert_weights() for model types openai-privacy-filter, privacy-filter-nemotron, nemotron-privacy-filter. Also set classifier_bias: True in the MLX config when score.bias is present.

Testing

Verified on OpenMed/privacy-filter-nemotron — model now loads and runs inference correctly:

pipe = create_mlx_pipeline("OpenMed/privacy-filter-nemotron")
result = pipe("Patient John Smith DOB 01/01/1980")
# Returns: first_name=John, last_name=Smith, date=01/01/1980

🤖 Generated with Claude Code

…) architecture

The openai_privacy_filter model family (including privacy-filter-nemotron)
uses a different weight namespace than other OpenMed MLX models. Without
explicit conversion, all 140 parameters are rejected as "not in model".

Changes:
- Add _convert_opf_weights() to handle the HF → MLX namespace mapping:
  - score.* → unembedding.*
  - model.layers.N.* → block.N.*
  - Separate q/k/v_proj → fused attn.qkv (QKV fusion via concatenation)
  - input_layernorm.weight → attn.norm.scale (RMSNorm rename)
  - mlp.router.* → mlp.gate.*
  - mlp.experts.gate_up_proj → mlp.swiglu.weight (no transpose — HF stores [E,in,out])
  - mlp.experts.down_proj → mlp.out.weight (no transpose)
- Set classifier_bias=True in config when score.bias is present in the
  HF state dict, so the MLX model allocates the unembedding bias parameter.
- Wire _convert_opf_weights() into convert_weights() for model_type values
  "openai-privacy-filter", "privacy-filter-nemotron", "nemotron-privacy-filter".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@maziyarpanahi maziyarpanahi self-requested a review May 25, 2026 21:10
@maziyarpanahi

Copy link
Copy Markdown
Owner

Thanks again for chasing this down. I pushed one follow-up commit onto this PR so it now carries the PR #57 fix plus the OPF/Nemotron remapping in one place.

What changed in the follow-up:

  • BF16 tensors are now cast to explicit float32 before .numpy(). This is still needed because PyTorch CPU cannot expose torch.bfloat16 tensors through NumPy. I used t.float() rather than t.to(float), because Python float promotes to float64 in PyTorch and would unnecessarily increase memory/artifact size.
  • The OPF/privacy-filter converter now validates the converted MLX key set and tensor shapes before returning weights. That means a partial mapping, for example missing one Q/K/V tensor, fails early instead of producing a broken artifact.
  • Added focused regression tests for BF16 conversion, OPF HF→MLX remapping, QKV fusion order, and partial-QKV rejection.

The reason this failed for local conversion but not for the existing OpenMed demo is that the demo uses the already-exported OpenMed/privacy-filter-nemotron-mlx artifact. That artifact is already in the OpenMed MLX runtime layout. The broken path was only raw HF → MLX conversion from OpenMed/privacy-filter-nemotron, where the source checkpoint uses model.layers.*, score.*, separate q/k/v_proj, RMSNorm weight, and MoE expert tensors.

I also ran the full suite locally on the updated PR branch: 1153 passed, 1 skipped.

@maziyarpanahi maziyarpanahi self-assigned this May 25, 2026
@maziyarpanahi maziyarpanahi merged commit ace2598 into maziyarpanahi:master May 27, 2026
13 checks passed
maziyarpanahi added a commit that referenced this pull request May 27, 2026
fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants