Skip to content

Commit 6366e8f

Browse files
AleksMatThe gemma Authors
authored andcommitted
Replace unicode escaped characters in ipynb files
PiperOrigin-RevId: 854213846
1 parent 6664b4e commit 6366e8f

File tree

9 files changed

+147
-147
lines changed

9 files changed

+147
-147
lines changed

colabs/finetuning.ipynb

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
{
107107
"data": {
108108
"text/plain": [
109-
"[\u003c_Gemma2SpecialTokens.BOS: 2\u003e, 1596, 603, 671, 3287, 13060]"
109+
"[<_Gemma2SpecialTokens.BOS: 2>, 1596, 603, 671, 3287, 13060]"
110110
]
111111
},
112112
"execution_count": 22,
@@ -213,7 +213,7 @@
213213
"id": "3ny2J07G2X7i"
214214
},
215215
"source": [
216-
"We can decode an example from the batch to inspect the model input. We see that the `\u003cstart_of_turn\u003e` / `\u003cend_of_turn\u003e` where correctly added to follow Gemma dialog format."
216+
"We can decode an example from the batch to inspect the model input. We see that the `<start_of_turn>` / `<end_of_turn>` where correctly added to follow Gemma dialog format."
217217
]
218218
},
219219
{
@@ -238,9 +238,9 @@
238238
"name": "stdout",
239239
"output_type": "stream",
240240
"text": [
241-
"\u003cstart_of_turn\u003euser\n",
242-
"Would love any other tips from anyone, but specially from someone who’s been where I’m at.\u003cend_of_turn\u003e\n",
243-
"\u003cstart_of_turn\u003emodel\n",
241+
"<start_of_turn>user\n",
242+
"Would love any other tips from anyone, but specially from someone who’s been where I’m at.<end_of_turn>\n",
243+
"<start_of_turn>model\n",
244244
"J'apprécierais vraiment d'autres astuces, mais particulièrement par quelqu'un qui était était déjà là où je me trouve.\n"
245245
]
246246
}
@@ -398,7 +398,7 @@
398398
"version_minor": 0
399399
},
400400
"text/plain": [
401-
"train: 0%| | 0/301 [00:00\u003c?, ?it/s]"
401+
"train: 0%| | 0/301 [00:00<?, ?it/s]"
402402
]
403403
},
404404
"metadata": {},
@@ -641,7 +641,7 @@
641641
"layout": "IPY_MODEL_e0685ee78f8c499390ac0e1024ab26a9",
642642
"placeholder": "",
643643
"style": "IPY_MODEL_a022879a38e849cfac7178cea469f36f",
644-
"value": " 301/301 [04:24\u0026lt;00:00, 6.86s/it]"
644+
"value": " 301/301 [04:24&lt;00:00, 6.86s/it]"
645645
}
646646
},
647647
"9cc2c80d152248c4bade0955c25554ad": {

colabs/lora_finetuning.ipynb

Lines changed: 9 additions & 9 deletions
Large diffs are not rendered by default.

colabs/multimodal.ipynb

Lines changed: 24 additions & 24 deletions
Large diffs are not rendered by default.

colabs/quantization_aware_training.ipynb

Lines changed: 14 additions & 14 deletions
Large diffs are not rendered by default.

colabs/quantization_sampling.ipynb

Lines changed: 8 additions & 8 deletions
Large diffs are not rendered by default.

colabs/sampling.ipynb

Lines changed: 19 additions & 19 deletions
Large diffs are not rendered by default.

colabs/sharding.ipynb

Lines changed: 10 additions & 10 deletions
Large diffs are not rendered by default.

colabs/tokenizer.ipynb

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -402,9 +402,9 @@
402402
},
403403
"cell_type": "markdown",
404404
"source": [
405-
"### `\u003cbos\u003e` / `\u003ceos\u003e`\n",
405+
"### `<bos>` / `<eos>`\n",
406406
"\n",
407-
"In Gemma models, the begin of sentence token (`\u003cbos\u003e`) should appear only once at the begining of the input. You can add it either explicitly or with `add_bos=True`:"
407+
"In Gemma models, the begin of sentence token (`<bos>`) should appear only once at the begining of the input. You can add it either explicitly or with `add_bos=True`:"
408408
]
409409
},
410410
{
@@ -432,7 +432,7 @@
432432
{
433433
"data": {
434434
"text/plain": [
435-
"[\u003c_Gemma2SpecialTokens.BOS: 2\u003e, 4521, 2134, 235341]"
435+
"[<_Gemma2SpecialTokens.BOS: 2>, 4521, 2134, 235341]"
436436
]
437437
},
438438
"execution_count": 55,
@@ -465,7 +465,7 @@
465465
{
466466
"data": {
467467
"text/plain": [
468-
"[\u003c_Gemma2SpecialTokens.BOS: 2\u003e, 4521, 2134, 235341]"
468+
"[<_Gemma2SpecialTokens.BOS: 2>, 4521, 2134, 235341]"
469469
]
470470
},
471471
"execution_count": 59,
@@ -481,9 +481,9 @@
481481
},
482482
"cell_type": "markdown",
483483
"source": [
484-
"Similarly, the model can output a `\u003ceos\u003e` token to indicate the prediction is complete.\n",
484+
"Similarly, the model can output a `<eos>` token to indicate the prediction is complete.\n",
485485
"\n",
486-
"When fine-tuning Gemma, you can train the model to predict `\u003ceos\u003e` tokens."
486+
"When fine-tuning Gemma, you can train the model to predict `<eos>` tokens."
487487
]
488488
},
489489
{
@@ -509,7 +509,7 @@
509509
{
510510
"data": {
511511
"text/plain": [
512-
"[4521, 2134, 235341, \u003c_Gemma2SpecialTokens.EOS: 1\u003e]"
512+
"[4521, 2134, 235341, <_Gemma2SpecialTokens.EOS: 1>]"
513513
]
514514
},
515515
"execution_count": 60,
@@ -525,11 +525,11 @@
525525
},
526526
"cell_type": "markdown",
527527
"source": [
528-
"### `\u003cstart_of_turn\u003e` / `\u003cend_of_turn\u003e`\n",
528+
"### `<start_of_turn>` / `<end_of_turn>`\n",
529529
"\n",
530-
"When using the instruction-tuned version of Gemma, the `\u003cstart_of_turn\u003e` / `\u003cend_of_turn\u003e` tokens allow to specify who from the user or the model is talking.\n",
530+
"When using the instruction-tuned version of Gemma, the `<start_of_turn>` / `<end_of_turn>` tokens allow to specify who from the user or the model is talking.\n",
531531
"\n",
532-
"The `\u003cstart_of_turn\u003e` should be followed by either:\n",
532+
"The `<start_of_turn>` should be followed by either:\n",
533533
"\n",
534534
"* `user`\n",
535535
"* `model`\n",
@@ -543,14 +543,14 @@
543543
},
544544
"cell_type": "code",
545545
"source": [
546-
"token_ids = tokenizer.encode(\"\"\"\u003cstart_of_turn\u003euser\n",
547-
"Knock knock.\u003cend_of_turn\u003e\n",
548-
"\u003cstart_of_turn\u003emodel\n",
549-
"Who's there ?\u003cend_of_turn\u003e\n",
550-
"\u003cstart_of_turn\u003euser\n",
551-
"Gemma.\u003cend_of_turn\u003e\n",
552-
"\u003cstart_of_turn\u003emodel\n",
553-
"Gemma who?\u003cend_of_turn\u003e\"\"\")"
546+
"token_ids = tokenizer.encode(\"\"\"<start_of_turn>user\n",
547+
"Knock knock.<end_of_turn>\n",
548+
"<start_of_turn>model\n",
549+
"Who's there ?<end_of_turn>\n",
550+
"<start_of_turn>user\n",
551+
"Gemma.<end_of_turn>\n",
552+
"<start_of_turn>model\n",
553+
"Gemma who?<end_of_turn>\"\"\")"
554554
],
555555
"outputs": [],
556556
"execution_count": null
@@ -584,7 +584,7 @@
584584
"type": "string"
585585
},
586586
"text/plain": [
587-
"'\u003cstart_of_turn\u003e'"
587+
"'<start_of_turn>'"
588588
]
589589
},
590590
"execution_count": 77,
@@ -600,11 +600,11 @@
600600
},
601601
"cell_type": "markdown",
602602
"source": [
603-
"### `\u003cstart_of_image\u003e`\n",
603+
"### `<start_of_image>`\n",
604604
"\n",
605-
"In Gemma 3, to indicate the position of an image in the text, the prompt should contain the special `\u003cstart_of_image\u003e` token. Internally, Gemma model will automatically expand the token to insert the soft images tokens.\n",
605+
"In Gemma 3, to indicate the position of an image in the text, the prompt should contain the special `<start_of_image>` token. Internally, Gemma model will automatically expand the token to insert the soft images tokens.\n",
606606
"\n",
607-
"(Note: There's also a `\u003cend_of_image\u003e` token, but is handled internally by the model)"
607+
"(Note: There's also a `<end_of_image>` token, but is handled internally by the model)"
608608
]
609609
},
610610
{
@@ -624,7 +624,7 @@
624624
"source": [
625625
"In all Gemma versions, a few tokens (`99`) are unused. This allow custom applications to define and fine-tune their own custom tokens for their application. Those tokens are available through `tokenizer.special_tokens.CUSTOM + xx`, with `xx` being a number between `0` and `98`\n",
626626
"\n",
627-
"\u003c!-- TODO(epot): Add option to customize the special tokens --\u003e"
627+
"<!-- TODO(epot): Add option to customize the special tokens -->"
628628
]
629629
},
630630
{
@@ -656,7 +656,7 @@
656656
"type": "string"
657657
},
658658
"text/plain": [
659-
"'\u003cunused17\u003e'"
659+
"'<unused17>'"
660660
]
661661
},
662662
"execution_count": 28,
@@ -694,12 +694,12 @@
694694
"source": [
695695
"tokenizer = gm.text.Gemma3Tokenizer(\n",
696696
" custom_tokens={\n",
697-
" 0: '\u003cmy_custom_tag\u003e',\n",
698-
" 17: '\u003cmy_other_tag\u003e',\n",
697+
" 0: '<my_custom_tag>',\n",
698+
" 17: '<my_other_tag>',\n",
699699
" },\n",
700700
")\n",
701701
"\n",
702-
"tokenizer.encode('\u003cmy_other_tag\u003e')"
702+
"tokenizer.encode('<my_other_tag>')"
703703
],
704704
"outputs": [
705705
{
@@ -786,7 +786,7 @@
786786
"type": "string"
787787
},
788788
"text/plain": [
789-
"'\u003cmy_other_tag\u003e'"
789+
"'<my_other_tag>'"
790790
]
791791
},
792792
"execution_count": 36,

colabs/tool_use.ipynb

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@
118118
"output_type": "stream",
119119
"text": [
120120
"INFO:2025-06-06 02:43:16,896:jax._src.xla_bridge:749: Unable to initialize backend 'pathways': Could not initialize backend 'pathways'\n",
121-
"INFO:2025-06-06 02:43:16,897:jax._src.xla_bridge:749: Unable to initialize backend 'proxy': INVALID_ARGUMENT: IFRT proxy server address must be '\u003ctransport-type\u003e://\u003cbackend-address\u003e' (e.g., 'grpc://localhost'), but got \n",
121+
"INFO:2025-06-06 02:43:16,897:jax._src.xla_bridge:749: Unable to initialize backend 'proxy': INVALID_ARGUMENT: IFRT proxy server address must be '<transport-type>://<backend-address>' (e.g., 'grpc://localhost'), but got \n",
122122
"INFO:2025-06-06 02:43:16,900:jax._src.xla_bridge:749: Unable to initialize backend 'mlcr': Could not initialize backend 'mlcr'\n",
123123
"INFO:2025-06-06 02:43:16,901:jax._src.xla_bridge:749: Unable to initialize backend 'sliceme': Could not initialize backend 'sliceme'\n"
124124
]
@@ -194,10 +194,10 @@
194194
{
195195
"data": {
196196
"text/html": [
197-
"\u003chr\u003e"
197+
"<hr>"
198198
],
199199
"text/plain": [
200-
"\u003cIPython.core.display.HTML object\u003e"
200+
"<IPython.core.display.HTML object>"
201201
]
202202
},
203203
"metadata": {},
@@ -213,10 +213,10 @@
213213
{
214214
"data": {
215215
"text/html": [
216-
"\u003chr\u003e"
216+
"<hr>"
217217
],
218218
"text/plain": [
219-
"\u003cIPython.core.display.HTML object\u003e"
219+
"<IPython.core.display.HTML object>"
220220
]
221221
},
222222
"metadata": {},
@@ -233,10 +233,10 @@
233233
{
234234
"data": {
235235
"text/html": [
236-
"\u003chr\u003e"
236+
"<hr>"
237237
],
238238
"text/plain": [
239-
"\u003cIPython.core.display.HTML object\u003e"
239+
"<IPython.core.display.HTML object>"
240240
]
241241
},
242242
"metadata": {},
@@ -252,10 +252,10 @@
252252
{
253253
"data": {
254254
"text/html": [
255-
"\u003chr\u003e"
255+
"<hr>"
256256
],
257257
"text/plain": [
258-
"\u003cIPython.core.display.HTML object\u003e"
258+
"<IPython.core.display.HTML object>"
259259
]
260260
},
261261
"metadata": {},
@@ -272,10 +272,10 @@
272272
{
273273
"data": {
274274
"text/html": [
275-
"\u003chr\u003e"
275+
"<hr>"
276276
],
277277
"text/plain": [
278-
"\u003cIPython.core.display.HTML object\u003e"
278+
"<IPython.core.display.HTML object>"
279279
]
280280
},
281281
"metadata": {},
@@ -291,10 +291,10 @@
291291
{
292292
"data": {
293293
"text/html": [
294-
"\u003chr\u003e"
294+
"<hr>"
295295
],
296296
"text/plain": [
297-
"\u003cIPython.core.display.HTML object\u003e"
297+
"<IPython.core.display.HTML object>"
298298
]
299299
},
300300
"metadata": {},
@@ -311,10 +311,10 @@
311311
{
312312
"data": {
313313
"text/html": [
314-
"\u003chr\u003e"
314+
"<hr>"
315315
],
316316
"text/plain": [
317-
"\u003cIPython.core.display.HTML object\u003e"
317+
"<IPython.core.display.HTML object>"
318318
]
319319
},
320320
"metadata": {},
@@ -330,10 +330,10 @@
330330
{
331331
"data": {
332332
"text/html": [
333-
"\u003chr\u003e"
333+
"<hr>"
334334
],
335335
"text/plain": [
336-
"\u003cIPython.core.display.HTML object\u003e"
336+
"<IPython.core.display.HTML object>"
337337
]
338338
},
339339
"metadata": {},
@@ -350,10 +350,10 @@
350350
{
351351
"data": {
352352
"text/html": [
353-
"\u003chr\u003e"
353+
"<hr>"
354354
],
355355
"text/plain": [
356-
"\u003cIPython.core.display.HTML object\u003e"
356+
"<IPython.core.display.HTML object>"
357357
]
358358
},
359359
"metadata": {},
@@ -369,10 +369,10 @@
369369
{
370370
"data": {
371371
"text/html": [
372-
"\u003chr\u003e"
372+
"<hr>"
373373
],
374374
"text/plain": [
375-
"\u003cIPython.core.display.HTML object\u003e"
375+
"<IPython.core.display.HTML object>"
376376
]
377377
},
378378
"metadata": {},
@@ -407,7 +407,7 @@
407407
"\n",
408408
"To create your own tool, you can inherit from the `gm.tools.Tool` class. You should provide:\n",
409409
"\n",
410-
"* A description \u0026 example, so the model knows how to use your tool\n",
410+
"* A description & example, so the model knows how to use your tool\n",
411411
"* Implement the `call` method. The `call` function can take arbitrary `**kwargs`, but the name of the args should match the ones defined in `tool_kwargs` and `tool_kwargs_doc`"
412412
]
413413
},
@@ -435,12 +435,12 @@
435435
" query='Which day of the week are we today ?',\n",
436436
" thought='The `datetime.strptime` uses %a for day of the week',\n",
437437
" tool_kwargs={'format': '%a'},\n",
438-
" tool_kwargs_doc={'format': '\u003cANY datetime.strptime expression\u003e'},\n",
438+
" tool_kwargs_doc={'format': '<ANY datetime.strptime expression>'},\n",
439439
" result='Sat',\n",
440440
" answer='Today is Saturday.',\n",
441441
" )\n",
442442
"\n",
443-
" def call(self, format: str) -\u003e str:\n",
443+
" def call(self, format: str) -> str:\n",
444444
" dt = datetime.datetime.now()\n",
445445
" return dt.strftime(format)\n"
446446
],
@@ -499,10 +499,10 @@
499499
{
500500
"data": {
501501
"text/html": [
502-
"\u003chr\u003e"
502+
"<hr>"
503503
],
504504
"text/plain": [
505-
"\u003cIPython.core.display.HTML object\u003e"
505+
"<IPython.core.display.HTML object>"
506506
]
507507
},
508508
"metadata": {},
@@ -518,10 +518,10 @@
518518
{
519519
"data": {
520520
"text/html": [
521-
"\u003chr\u003e"
521+
"<hr>"
522522
],
523523
"text/plain": [
524-
"\u003cIPython.core.display.HTML object\u003e"
524+
"<IPython.core.display.HTML object>"
525525
]
526526
},
527527
"metadata": {},

0 commit comments

Comments
 (0)