Optimization in Q8_0 loading by orionpapadakis · Pull Request #74 · beehive-lab/GPULlama3.java

orionpapadakis · 2025-11-27T17:56:56Z

This PR improves performance of loading for q8_0 models by parallelizing tensor copy from MemorySegment to TornadoNativeArray types. Also it avoids one round of conversion.

Llama3.2-1B-Q8_0 loading avg times on rog laptop:

serial ~4600 ms
parallel ~1900 ms (~60% improvement)

… FP32 conversion logic.

… for consistency

mikepapadim · 2025-11-27T20:41:22Z

/rerun all

github-actions · 2025-11-27T20:41:31Z

🚀 Workflow rerun started

Mode: all
Triggered by: @mikepapadim

View Actions

github-actions · 2025-11-27T20:41:33Z

✅ Workflow rerun success

View Actions

mikepapadim · 2025-11-27T20:47:53Z

/rerun help

github-actions · 2025-11-27T20:48:00Z

🔄 Rerun Workflow Commands

Command	Description
`/rerun`	Rerun only failed/cancelled/timed-out workflows
`/rerun all`	Rerun all workflows for this PR
`/rerun failed`	Same as `/rerun`
`/rerun <name>`	Rerun workflows matching `<name>` (e.g. `/rerun ci`, `/rerun build`)
`/rerun help`	Show this help message

Note: Only completed workflows can be rerun. In-progress workflows are skipped.

mikepapadim and others added 4 commits November 27, 2025 11:44

[CI] Add complete CI testing for all supported models & quant types

cfe367e

Introduce createAsFP32 method in Q8_0TornadoTensor to encapsulate…

11ea161

… FP32 conversion logic.

Rename Q8_0TornadoTensor.create to Q8_0TornadoTensor.createAsQ8_0…

6702382

… for consistency

Optimize Q8_0 tensor loading with parallel streams and loop unrolling.

d74991f

orionpapadakis requested review from mairooni and mikepapadim November 27, 2025 17:56

mikepapadim force-pushed the main branch from b2c1c1c to 51c52bc Compare November 27, 2025 20:52

Rebase on latest ci changes

c73a2e8

mikepapadim merged commit 6fb2e83 into beehive-lab:main Nov 28, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization in Q8_0 loading#74

Optimization in Q8_0 loading#74
mikepapadim merged 5 commits intobeehive-lab:mainfrom
orionpapadakis:opt/q8_0-loading

orionpapadakis commented Nov 27, 2025 •

edited

Loading

Uh oh!

mikepapadim commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

mikepapadim commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

orionpapadakis commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikepapadim commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

mikepapadim commented Nov 27, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

🔄 Rerun Workflow Commands

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orionpapadakis commented Nov 27, 2025 •

edited

Loading