Skip to content

[CausalLM] optimize gpt-oss-20b performance #3466

@EunjuYang

Description

@EunjuYang

see also : #3444

  • In [CausalLM] support gpt-oss-20b #3444, we implemented CausalLM for gpt-oss-20b.
  • It runs with Q4_0-FP32.
  • We aims to optimize its speed and reduce its memory on Android.
  • last update : 2025-09-15 11:00 KST by ejyang

WTD

Log

  • (2025-09-10) Q4_0 embedding layer is not valid (accuracy drop is significant)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions