Skip to content

Optimization in Q8_0 loading#74

Merged
mikepapadim merged 5 commits intobeehive-lab:mainfrom
orionpapadakis:opt/q8_0-loading
Nov 28, 2025
Merged

Optimization in Q8_0 loading#74
mikepapadim merged 5 commits intobeehive-lab:mainfrom
orionpapadakis:opt/q8_0-loading

Conversation

@orionpapadakis
Copy link
Copy Markdown
Collaborator

@orionpapadakis orionpapadakis commented Nov 27, 2025

This PR improves performance of loading for q8_0 models by parallelizing tensor copy from MemorySegment to TornadoNativeArray types. Also it avoids one round of conversion.

Llama3.2-1B-Q8_0 loading avg times on rog laptop:

serial ~4600 ms
parallel ~1900 ms (~60% improvement)

@mikepapadim
Copy link
Copy Markdown
Member

/rerun all

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Workflow rerun started

Mode: all
Triggered by: @mikepapadim

View Actions

@github-actions
Copy link
Copy Markdown
Contributor

Workflow rerun success

View Actions

@mikepapadim
Copy link
Copy Markdown
Member

/rerun help

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Rerun Workflow Commands

Command Description
/rerun Rerun only failed/cancelled/timed-out workflows
/rerun all Rerun all workflows for this PR
/rerun failed Same as /rerun
/rerun <name> Rerun workflows matching <name> (e.g. /rerun ci, /rerun build)
/rerun help Show this help message

Note: Only completed workflows can be rerun. In-progress workflows are skipped.

@mikepapadim mikepapadim merged commit 6fb2e83 into beehive-lab:main Nov 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants