Skip to content
This repository was archived by the owner on Dec 1, 2024. It is now read-only.
This repository was archived by the owner on Dec 1, 2024. It is now read-only.

Update documentation to explicitly describe compatability/performance with early Pascal cards #55

Open
@tensiondriven

Description

@tensiondriven

This was originally a question I wanted to ask, but in the interest of not abusing Github Issues, I'm disguising it as a feature request for documentation :)

There are a couple of very inexpensive cards with large VRAM; the Tesla M40 24GB (Maxwell) and Tesla P40 24GB (Pascal). Neither of these seem to have Tensor cores, which makes them pretty useless for FP16 math - and maybe equally useless for int8/int4, I'm not sure.

What is confusing to a lot of people who are interested in running LLM's on commodity hardware is that Tesla M40 is listed as part of the "Pascal" family, and a feature of Pascal is the inclusion of FP16 processing. However, the Tesla P40 specifically lacks FP16 support and thus runs FP16 at 1/64th the performance of other Tesla Pascal series cards.

Question 1: Do you know if FlexGen will run on a P40 24GB with reasonable performance, given that it is using 8bit or 4bit math? Is it comperable to other Pascal cards in terms of performance?

Question 2: Do you know if FlexGen can split a model across multiple Tesla P40 cards? Something I read suggested that splitting the model was not possible using bitsandbytes on older cards, but I'm not clear on the reason.

For context; if it turns out that the Tesla P40, or 2-3 Tesla P40's, can give reasonable performance in the < 1 second/token range for inference on large models, it would open up a new world of possibility to individuals looking to run LLM's at home.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions