Commit 8093376
committed
feat: Add GLM-4.7-Flash GGUF tensor mapping, MLA attention, and model validation
- TensorNameMapper resolves both llama.cpp (blk.*) and HuggingFace (model.layers.*) naming
- MLA (Multi-Head Latent Attention) with low-rank Q/KV compression (DeepSeek-V2 style)
- Stacked 3D expert tensor support (ffn_gate_exps → per-expert slicing)
- Shared expert + dense layer-0 support (MoeWithShared/Dense/Moe layer types)
- Updated BitNetModelConfig defaults to match GLM-4.7-Flash architecture
- Tensor discovery and model validation harness for GGUF files
- 188 passing tests (14 new)
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK1 parent 4370ddb commit 8093376
2 files changed
Lines changed: 1582 additions & 227 deletions
0 commit comments