Skip to content

Conversation

@behinger
Copy link
Member

@behinger behinger commented Dec 19, 2023

I added all my improvements here.

On one test I got a speedup from 0.17s to 0.12s (fortran 0.3/0.4) - per iteration

I did remove all the intel optimizations, I don't know if they would help even further, but I wanted to focus on other things first.

The code is not fully type stable, but I improved quite a bit. E.g. all the learningrate stuff is not type stable which afaik makes "B" also not type stable.

I mainly looked at the first loop, nt the update-parameter loop

  • Update typings to be more type-stable
  • Reduce memory allocation and implement other performance improvements
  • add array dimensions
  • re-enable multi threading
  • make Calculate_Q type stable
  • use AppleAccelerate in certain places
  • add intelvectormath
  • minor
  • more optimized methods
  • minor
  • log
  • overflow fixed, debug statements added, comparson to fortran/matlab semi-automated
  • fixes, make fp local
  • update project.toml
  • precalculate y^rho
  • things are faster now
  • optimize more sums
  • fix sphering LL
  • speed improvement from 272µs to 640ns
  • insane additional 100x speed improvement in calculate_y
  • this should work, but havent tested -oops
  • added missing broadcast to first minus, 340 to 230 µs
  • actual 10% slower, but less allocs. try later again to use?
  • 15% or so improvement, much better on allocations
  • tried to pull out generation of Q
  • improved y_rho, removed experimental exp/pow

@behinger behinger changed the base branch from main to performance-improvements December 19, 2023 22:13
@behinger behinger changed the base branch from performance-improvements to performance-improvements-fixLL December 19, 2023 22:14
@behinger
Copy link
Member Author

for a large problem I get ~12s/iteration in julia vs. 3s in fortran. So speedup of fortran is still factor 4 or so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants