You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/superb#ks) of the SUPERB dataset on a single HPU.
32
+
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the 🗣️ [Keyword Spotting subset](https://huggingface.co/datasets/regisss/superb_ks) of the SUPERB dataset on a single HPU.
@@ -69,13 +68,13 @@ On a single HPU, this script should run in ~13 minutes and yield an accuracy of
69
68
70
69
## Multi-HPU
71
70
72
-
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) for 🌎 **Language Identification** on the [CommonLanguage dataset](https://huggingface.co/datasets/anton-l/common_language) on 8 HPUs.
71
+
The following command shows how to fine-tune [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) for 🌎 **Language Identification** on the [CommonLanguage dataset](https://huggingface.co/datasets/regisss/common_language) on 8 HPUs.
Copy file name to clipboardExpand all lines: examples/contrastive-image-text/README.md
+4-10Lines changed: 4 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ pip install -r requirements.txt
35
35
**Recommended (datasets>=4.0.0):** use the COCO captions dataset hosted on the Hub. It provides image–caption pairs and does **not** require `trust_remote_code`:
This dataset exposes at least the columns `image` (PIL image) and `caption` (string).
41
41
If you prefer local files, you can also use the built-in Datasets `imagefolder` builder (not a placeholder) to load images/captions from a directory (it typically expects a small CSV/JSON with columns such as `image_path` and `caption`).
@@ -84,7 +84,7 @@ Run the following command for single-device training:
Copy file name to clipboardExpand all lines: examples/speech-recognition/README.md
+6-12Lines changed: 6 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -231,13 +231,11 @@ recognition on one of the well known speech recognition datasets similar to show
231
231
We can load all components of the Whisper model directly from the pretrained checkpoint, including the pretrained model weights, feature extractor and tokenizer. We simply have to specify our fine-tuning dataset and training hyperparameters.
232
232
233
233
### Single HPU Whisper Fine tuning with Seq2Seq
234
-
The following example shows how to fine-tune the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) using a single HPU device in bf16 precision:
234
+
The following example shows how to fine-tune the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using a single HPU device in bf16 precision:
@@ -277,14 +275,12 @@ If training on a different language, you should be sure to change the `language`
277
275
278
276
279
277
### Multi HPU Whisper Training with Seq2Seq
280
-
The following example shows how to fine-tune the [Whisper large](https://huggingface.co/openai/whisper-large) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) using 8 HPU devices in half-precision:
278
+
The following example shows how to fine-tune the [Whisper large](https://huggingface.co/openai/whisper-large) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 8 HPU devices in half-precision:
The following example shows how to do inference with the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) using 1 HPU devices in half-precision:
316
+
The following example shows how to do inference with the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 1 HPU devices in half-precision:
Copy file name to clipboardExpand all lines: examples/text-generation/README.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -215,19 +215,18 @@ You can also provide the name of a dataset from the Hugging Face Hub to perform
215
215
216
216
By default, the first column in the dataset of type`string` will be used as prompts. You can also selectthe column you want with the argument `--column_name`.
217
217
218
-
Here is an example with [JulesBelveze/tldr_news](https://huggingface.co/datasets/JulesBelveze/tldr_news):
218
+
Here is an example with [dim/tldr_news](https://huggingface.co/datasets/dim/tldr_news):
219
219
```bash
220
220
PT_HPU_LAZY_MODE=1 python run_generation.py \
221
221
--model_name_or_path gpt2 \
222
222
--batch_size 2 \
223
223
--max_new_tokens 100 \
224
224
--use_hpu_graphs \
225
225
--use_kv_cache \
226
-
--dataset_name JulesBelveze/tldr_news \
226
+
--dataset_name dim/tldr_news \
227
227
--column_name content \
228
228
--bf16 \
229
229
--sdp_on_bf16 \
230
-
--trust_remote_code
231
230
```
232
231
233
232
> The prompt length is limited to 16 tokens. Prompts longer than this will be truncated.
Copy file name to clipboardExpand all lines: examples/translation/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,7 +103,7 @@ The task of translation supports only custom JSONLINES files, with each line bei
103
103
```
104
104
Here the languages are Romanian (`ro`) and English (`en`).
105
105
106
-
If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as follows:
106
+
If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name regisss/wmt14-en-de-pre-processed`, as follows:
0 commit comments