You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+43-3Lines changed: 43 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,9 @@ limitations under the License.
28
28
-[LLama 7b FP8 on 1 Gaudi2 card](#llama-7b-fp8-on-1-gaudi2-card)
29
29
-[LLama 70b BF16 on 8 Gaudi2 card](#llama-70b-bf16-on-8-gaudi2-card)
30
30
-[LLama 70b FP8 on 8 Gaudi2 card](#llama-70b-fp8-on-8-gaudi2-card)
31
-
-[LLava-next 7B BF16 on 1 Gaudi2 card](#llava-next-7b-bf16-on-1-gaudi2-card)
31
+
-[Llava-next](#llava-next)
32
+
-[llava-v1.6-mistral-7b-hf BF16 on 1 Gaudi2 card](#llava-v16-mistral-7b-hf-bf16-on-1-gaudi2-card)
33
+
-[llava-v1.6-mistral-7b-hf FP8 on 1 Gaudi2 card](#llava-v16-mistral-7b-hf-fp8-on-1-gaudi2-card)
32
34
-[Environment variables](#environment-variables)
33
35
-[Profiler](#profiler)
34
36
@@ -264,8 +266,9 @@ docker run -p 8080:80 \
264
266
--sharded true \
265
267
--num-shard 8
266
268
```
269
+
### Llava-next
267
270
268
-
###LLava-next 7B BF16 on 1 Gaudi2 card
271
+
#### llava-v1.6-mistral-7b-hf BF16 on 1 Gaudi2 card
269
272
270
273
An image usually accounts for 2000 input tokens. For example, an image of size 512x512 is represented by 2800 tokens. Thus, `max-input-tokens` must be larger than the number of tokens associated to the image. Otherwise the image may be truncated. We set `BASE_IMAGE_TOKENS=2048` as the default image token number. This is the minimum value of `max-input-tokens`. You can override the environment variable `BASE_IMAGE_TOKENS` to change this value. The warmup will generate graphs with input length from `BASE_IMAGE_TOKENS` to `max-input-tokens`. For LLava-next 7B, the value of `max-batch-prefill-tokens` is 16384, which is calcualted as follows: `prefill_batch_size` = `max-batch-prefill-tokens` / `max-input-tokens`.
-d '{"inputs":"What is this a picture of?\n\n","parameters":{"max_new_tokens":16, "seed": 42}}' \
301
+
-H 'Content-Type: application/json'
302
+
```
303
+
304
+
Multi-card Llava-next inference is currently not supported.
305
+
306
+
#### llava-v1.6-mistral-7b-hf FP8 on 1 Gaudi2 card
307
+
308
+
```bash
309
+
model=llava-hf/llava-v1.6-mistral-7b-hf
310
+
hf_token=YOUR_ACCESS_TOKEN # HF access token
311
+
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
0 commit comments