fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture by kswanjitsu · Pull Request #58 · maziyarpanahi/openmed

kswanjitsu · 2026-05-25T14:26:40Z

Problem

The openai_privacy_filter model family (e.g. OpenMed/privacy-filter-nemotron) cannot be loaded via create_mlx_pipeline because its HuggingFace weight namespace does not match the MLX model's namespace. All 140 parameters are rejected with ValueError: Received 140 parameters not in model.

Root cause

The existing remap_key() function handles BERT/RoBERTa/DeBERTa/DistilBERT/ELECTRA architectures but has no case for openai_privacy_filter. The OPF architecture differs from all others:

Different top-level prefix (score.* → unembedding.*)
Different layer container (model.layers.N.* → block.N.*)
QKV fusion: HF has separate q/k/v_proj, MLX has single attn.qkv
Different sub-key names (input_layernorm → attn.norm, router → gate)
RMSNorm uses scale not weight
Expert weights already in [E, in, out] format (no transpose needed)

Fix

Add _convert_opf_weights() and wire into convert_weights() for model types openai-privacy-filter, privacy-filter-nemotron, nemotron-privacy-filter. Also set classifier_bias: True in the MLX config when score.bias is present.

Testing

Verified on OpenMed/privacy-filter-nemotron — model now loads and runs inference correctly:

pipe = create_mlx_pipeline("OpenMed/privacy-filter-nemotron")
result = pipe("Patient John Smith DOB 01/01/1980")
# Returns: first_name=John, last_name=Smith, date=01/01/1980

🤖 Generated with Claude Code

…) architecture The openai_privacy_filter model family (including privacy-filter-nemotron) uses a different weight namespace than other OpenMed MLX models. Without explicit conversion, all 140 parameters are rejected as "not in model". Changes: - Add _convert_opf_weights() to handle the HF → MLX namespace mapping: - score.* → unembedding.* - model.layers.N.* → block.N.* - Separate q/k/v_proj → fused attn.qkv (QKV fusion via concatenation) - input_layernorm.weight → attn.norm.scale (RMSNorm rename) - mlp.router.* → mlp.gate.* - mlp.experts.gate_up_proj → mlp.swiglu.weight (no transpose — HF stores [E,in,out]) - mlp.experts.down_proj → mlp.out.weight (no transpose) - Set classifier_bias=True in config when score.bias is present in the HF state dict, so the MLX model allocates the unembedding bias parameter. - Wire _convert_opf_weights() into convert_weights() for model_type values "openai-privacy-filter", "privacy-filter-nemotron", "nemotron-privacy-filter". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

maziyarpanahi · 2026-05-25T21:11:05Z

Thanks again for chasing this down. I pushed one follow-up commit onto this PR so it now carries the PR #57 fix plus the OPF/Nemotron remapping in one place.

What changed in the follow-up:

BF16 tensors are now cast to explicit float32 before .numpy(). This is still needed because PyTorch CPU cannot expose torch.bfloat16 tensors through NumPy. I used t.float() rather than t.to(float), because Python float promotes to float64 in PyTorch and would unnecessarily increase memory/artifact size.
The OPF/privacy-filter converter now validates the converted MLX key set and tensor shapes before returning weights. That means a partial mapping, for example missing one Q/K/V tensor, fails early instead of producing a broken artifact.
Added focused regression tests for BF16 conversion, OPF HF→MLX remapping, QKV fusion order, and partial-QKV rejection.

The reason this failed for local conversion but not for the existing OpenMed demo is that the demo uses the already-exported OpenMed/privacy-filter-nemotron-mlx artifact. That artifact is already in the OpenMed MLX runtime layout. The broken path was only raw HF → MLX conversion from OpenMed/privacy-filter-nemotron, where the source checkpoint uses model.layers.*, score.*, separate q/k/v_proj, RMSNorm weight, and MoE expert tensors.

I also ran the full suite locally on the updated PR branch: 1153 passed, 1 skipped.

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture

maziyarpanahi self-requested a review May 25, 2026 21:10

fix: harden OPF MLX conversion

f8467d9

maziyarpanahi mentioned this pull request May 25, 2026

fix: cast bfloat16 tensors to float32 before numpy conversion in MLX pipeline #57

Closed

maziyarpanahi approved these changes May 25, 2026

View reviewed changes

maziyarpanahi self-assigned this May 25, 2026

maziyarpanahi merged commit ace2598 into maziyarpanahi:master May 27, 2026
13 checks passed

maziyarpanahi mentioned this pull request May 27, 2026

Harden privacy-filter remote-code allowlist for 1.5.2 #59

Merged

maziyarpanahi added a commit that referenced this pull request May 27, 2026

Merge pull request #58 from kswanjitsu/fix/mlx-opf-weight-remapping

2cf980d

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture#58

fix: add MLX weight remapping for openai_privacy_filter / nemotron architecture#58
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
kswanjitsu:fix/mlx-opf-weight-remapping

kswanjitsu commented May 25, 2026

Uh oh!

maziyarpanahi commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kswanjitsu commented May 25, 2026

Problem

Root cause

Fix

Testing

Uh oh!

maziyarpanahi commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants