-
Notifications
You must be signed in to change notification settings - Fork 94
Open
Description
see also : #3444
- In [CausalLM] support gpt-oss-20b #3444, we implemented CausalLM for gpt-oss-20b.
- It runs with
Q4_0-FP32. - We aims to optimize its speed and reduce its memory on Android.
- last update :
2025-09-15 11:00 KSTby ejyang
WTD
- [Feature] Activate
Q4_0-FP16support in nntrainer. (08cc7ba) - [Feature] Support
Q4_0type in embedding layer. (b2faa4d, [CausalLM] enable Q4_0 in Embedding Layer #3476 ) - [Op] FP16 compute functions
- [Feature] Create Q4_0-FP16 weight bin file for gpt-oss-20b
- [code_clean] Update
moe-cached.cppto inheritmoe.cpp - [Test] Q4_0-FP16 for qwen3-30b / gpt-oss-20b
- [Improvement] moe-cached layer profiling
- [Test] Apply KV-cache loading [CausalLM] Sync gpt-oss with kv-cache-update #3490
- [Improvement] update gpt-moe to process expert with MM for prefill [CausalLM] updeate gpt-moe-cached to support MM #3478
Log
- (2025-09-10)
Q4_0embedding layer is not valid (accuracy drop is significant)
Metadata
Metadata
Assignees
Labels
No labels