llm software analytics tools? #1105
magikRUKKOLA
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to estimate how much faster it can be to run the inference on 16 x RTX 24GB VRAM instead of ... any other match of CPUs from AMD or Intel. Instead of building and testing it myself I would just get some data regarding the technical info (the performance, the capabilities, the possibilities of tighter optimisations at some poiint (?) ). Then I would have to factor in the pcie p2p gpu-gpu interactions (or, alternatively, some additinal NVLINK can be used?), the overhead of schedulers etc. ... ? In the end its possible to give very precise estimate of the performance in the real world.
But at the same time, does it make sense to go with such tensor parallel GPU setup ? May be its better upgrade to the latest DDR5 12 channel and overclock it? Hm ...
But my question is about the parser for the llm analytics regarding the nature of the quants inside the llm. Their types, their sizes, the distiribution of their sizes etc. Some GPUs support of F8 for example? Will that FP8 be (can be?) applied to the task of llm inference -- its unclear, but my point is that its better off to plan ahead of the action.
So, any ideas?
Beta Was this translation helpful? Give feedback.
All reactions