Skip to content

Conversation

@muhammadalhroob
Copy link
Contributor

@muhammadalhroob muhammadalhroob commented Dec 3, 2025

benchVector.C

This Pull request:

  • Reorder hot members (fElements, fNrows, fRowLwb) to the front for better spatial locality
  • Add alignas(16) on fDataStack
  • Replace enum {} constants with static constexpr for modern C++

Performance: +30-35% faster element access on heap vectors (50-10k elements)
No regressions on small-vector (stack) path or correctness.

Changes or fixes:

Checklist:

  • [*] tested changes locally
  • updated the docs (if necessary)

@guitargeek
Copy link
Contributor

Hi @muhammadalhroob, could you please rebase your new commit on the current master, where you previous PR was merged?

Also, just for curiousity: what made you start working on TMatrix/TVector performance optimization? Do you have a concrete usecase that you need to speed up? We're always interested in hearing how people are using ROOT!

@ferdymercury
Copy link
Collaborator

Not sure if reordering class members requires bumping class version ?

ClassDefOverride(TVectorT, 4) // Template of Vector class

@pcanal does reordering or "alignas" modifiers require it?

@pcanal
Copy link
Member

pcanal commented Dec 3, 2025

Reorder hot members (fElements, fNrows, fRowLwb) to the front for better spatial locality

Unfortunately fNRows needs to be listed before fElements as it uses as the source of size information when reading back fElements.

Not sure if reordering class members requires bumping class version ?

It does as it changes the order in which the data members are stored on file.

does "alignas" modifiers require it?

alignas does not change the format on file so does not require a version bump.
Also any change to transient member do not require a version bump.

Performance: +30-35% faster element access on heap vectors (50-10k elements)

Please include the test you made, at the very least in the description and even better as a test.

... white space changes ...

Either limit white space changes to the lines changed in this PR or segregate them into their own commit (or even better, their own PR).

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@muhammadalhroob muhammadalhroob force-pushed the optimize-tvectort-cache branch from fd14020 to 476426c Compare December 3, 2025 16:43
@muhammadalhroob
Copy link
Contributor Author

Hi @guitargeek,
I just rebased and pushed again.

Regarding the interest in optimising:
I did not have a specific physics use case in mind; it was mostly pure curiosity. I am always looking for ways to optimise code, and recently I have experimented a lot with AI tools.
I used ROOT as a target for exploration, especially for performance-sensitive components such as TMatrix and TVector. AI made it easy to iterate quickly and helped generate test macros( I admit I was too lazy to write meaningful ones by hand)

@guitargeek
Copy link
Contributor

Hi @muhammadalhroob, ROOT doesn't allow PR branches that are not related to the current master differently than just having extra commits at the end that are not in master. PR branches where master was merged inside are not valid.

Can you please checkout the current master branch, cherry pick your new commit "Optimize TVectorT data layout for cache locality", and then force push to this PR branch? Then we can see where we stand.

@ferdymercury
Copy link
Collaborator

Either limit white space changes to the lines changed in this PR or segregate them into their own commit (or even better, their own PR).

Also, usually header files (especially old ones) are aligned on purpose "against" clang-format-style, for example to easily sort them alphabetically by padding spaces after the return type.

@guitargeek
Copy link
Contributor

Hi @guitargeek, I just rebased and pushed again.

Regarding the interest in optimising: I did not have a specific physics use case in mind; it was mostly pure curiosity. I am always looking for ways to optimise code, and recently I have experimented a lot with AI tools. I used ROOT as a target for exploration, especially for performance-sensitive components such as TMatrix and TVector. AI made it easy to iterate quickly and helped generate test macros( I admit I was too lazy to write meaningful ones by hand)

Thanks for the clarification! ROOT has a dedicated package for optimized matrix and vector multiplications called SMatrix, which is supposed to be faster than TMatrix. No users rely on TMatrix/TVector for performance critical code, as they are faster alternative for linear algebra also outside ROOT. So I don't think it's time well invested to optimize these classes further. So just because these optimizations are easy to do with AI, it doesn't mean that we should do it. It doesn't come for free: there is review overhead, and every change risks breaking things. Even if it's just a refactor or optimization.

It you want to contribute to ROOT, I would encourage you to focus either on improvements that improve your own life or the ones of your ATLAS colleagues, or take a look at our GitHub issue tracker and see if there is something you can pick up and help with! Just my 2 cents.

Indeed, AI is great for writing test macros 🙂

@ferdymercury
Copy link
Collaborator

Also: AI might get one into trouble in some (very) rare cases if it copied verbatim from a copyrighted code doing that particular operation.

@muhammadalhroob muhammadalhroob force-pushed the optimize-tvectort-cache branch from ee672d7 to f13784f Compare December 3, 2025 18:30
@silverweed
Copy link
Contributor

silverweed commented Dec 4, 2025

Also: AI might get one into trouble in some (very) rare cases if it copied verbatim from a copyrighted code doing that particular operation.

Those cases are not that rare and in general the practice of regurgitating LLM output straight into PRs is dangerous and sloppy.
I can't speak on behalf of the ROOT project by in my opinion "contributions" like this should be strongly discouraged in favor of ones that imply actual human thinking (and maybe that actually make sense or even compile...)

Considering also what @guitargeek pointed out, I propose closing this PR.

@muhammadalhroob
Copy link
Contributor Author

Dear all,
If this is problematic and not needed in ROOT, I am happy to close this PR.

Cheers,
Muhammad Alhroob

@guitargeek
Copy link
Contributor

Yes, thank you very much! Given that there is no clear motivation, I'd appreciate if this PR gets closed. Thank you for your understanding!

@muhammadalhroob muhammadalhroob deleted the optimize-tvectort-cache branch December 4, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants