Replies: 2 comments 1 reply
-
Hi @slewie, One guess here is that your transformer models are running on CPU instead of GPU when loading through the For reference, you can use the from guidance import models
gpt = models.Transformers('gpt2', device_map='auto') # `auto` or any other device map |
Beta Was this translation helpful? Give feedback.
-
I think there is more interesting problem, that size doesn't have impact on computation time. I have loaded two models: model_llama34 = models.Transformers("Phind/Phind-CodeLlama-34B-v2", echo=False, device_map='balanced')
model_llama7 = models.Transformers("codellama/CodeLlama-7b-Instruct-hf", echo=False, device_map='balanced') and tried prompts with different length: |
Beta Was this translation helpful? Give feedback.
-
Hello! I tried to classify some texts using guidance, but they were slow, and I tried using transformers, and it turned out that the guidance were about 5 times slower.
Code:
Results for that model:
Guidance

transformers

Also in the guidance there is practically no difference between the sizes of the models, they work with the same time. I tried big codellama and got the same results
Why does this problem occur?
Beta Was this translation helpful? Give feedback.
All reactions