gpu #47

behinger · 2024-02-22T21:29:15Z

Update typings to be more type-stable
Reduce memory allocation and implement other performance improvements
add array dimensions
re-enable multi threading
make Calculate_Q type stable
use AppleAccelerate in certain places
add intelvectormath
minor
more optimized methods
minor
log
fixes, make fp local
update project.toml
precalculate y^rho
things are faster now
optimize more sums
fix sphering LL
speed improvement from 272µs to 640ns
insane additional 100x speed improvement in calculate_y
this should work, but havent tested -oops
added missing broadcast to first minus, 340 to 230 µs
actual 10% slower, but less allocs. try later again to use?
15% or so improvement, much better on allocations
tried to pull out generation of Q
improved y_rho, removed experimental exp/pow
1.46s/it vs. 0.6s/it
finally works again....
rearrange indizes from [n, N, m] to [m, n, N] and implement intel powx
powx works
handling for pcs without intel vector math (apple silicon)
unsure
Move some things to the Amica object, format all files
0,81 s / iter
minor
0,57s/iter
minor
ivm.abs
replace pinv(x)*x with ldiv!()
sum -> loop
one more sum to loop
fix float32
fix float32
supposedly now whitening does something?
better testcases maybe
added GPU support

v-morlock and others added 30 commits November 12, 2023 21:25

Update typings to be more type-stable

d9baba2

Reduce memory allocation and implement other performance improvements

63af6ed

add array dimensions

38372ae

re-enable multi threading

fcfc558

make Calculate_Q type stable

0278c77

use AppleAccelerate in certain places

ac01fe8

add intelvectormath

de16e11

minor

ecbda77

more optimized methods

6f5827a

minor

4cb0ae9

log

8958e6b

add benchmark results

4455096

fixes, make fp local

f41b969

update project.toml

51dee61

precalculate y^rho

3a34caa

things are faster now

9efe43a

optimize more sums

c84e8b2

fix sphering LL

9e607e6

speed improvement from 272µs to 640ns

9df37ae

insane additional 100x speed improvement in calculate_y

3629500

this should work, but havent tested -oops

16a39f6

added missing broadcast to first minus, 340 to 230 µs

df72b27

actual 10% slower, but less allocs. try later again to use?

fe60308

15% or so improvement, much better on allocations

3bb8a13

tried to pull out generation of Q

fc2c85d

merged the u changes, maybe not optimal, bt ok

6e41c09

improved y_rho, removed experimental exp/pow

c8ba5fb

1.46s/it vs. 0.6s/it

abab784

finally works again....

0747103

rearrange indizes from [n, N, m] to [m, n, N] and implement intel powx

07eb2a4

v-morlock and others added 18 commits December 30, 2023 23:23

powx works

7ce5a07

handling for pcs without intel vector math (apple silicon)

c86b416

unsure

884b96a

Merge branch 'loopiloop_outsourceQ' into perf3

2353207

Move some things to the Amica object, format all files

63a6b5c

0,81 s / iter

968dd9e

minor

7711464

0,57s/iter

5a4494d

minor

9f49c41

ivm.abs

6b8676c

replace pinv(x)*x with ldiv!()

02c1c23

sum -> loop

77a99f2

one more sum to loop

c99d809

fix float32

cb52bac

fix float32

9bab7a3

supposedly now whitening does something?

480b5ba

better testcases maybe

9ab9b5f

added GPU support

d2e837b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpu #47

gpu #47

Uh oh!

behinger commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gpu #47

Are you sure you want to change the base?

gpu #47

Uh oh!

Conversation

behinger commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants