Skip to content

Commit fb595c4

Browse files
committed
updated lecture 6, pulled in ELIZA assignment updates
1 parent ebfeced commit fb595c4

File tree

4 files changed

+8
-5
lines changed

4 files changed

+8
-5
lines changed

assignments/eliza-llm-course

slides/week2/lecture6.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1041,7 +1041,7 @@ <h1 id="bpe-in-practice-with-huggingface">BPE in practice with HuggingFace</h1>
10411041
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="14" data-theme="cdl-theme" lang="en-US" style="--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
10421042
<h1 id="wordpiece-merges-surprisingly-common-pairs">WordPiece merges &quot;surprisingly common&quot; pairs</h1>
10431043
<div class="note-box" data-title="Further reading">
1044-
<p><a href="https://arxiv.org/abs/1609.08144"><strong>Wu et al. (2016, <em>arXiv</em>):</strong></a> Google's Neural Machine Translation System</p>
1044+
<p><a href="https://arxiv.org/abs/1609.08144"><strong>Wu et al. (2016, <em>arXiv</em>):</strong></a> Google's neural machine translation system</p>
10451045
</div>
10461046
<div class="definition-box" data-title="Key difference from BPE">
10471047
<p>Instead of merging most <em>frequent</em> pair, merge pair that maximizes <em>likelihood</em>:</p>
@@ -1080,6 +1080,7 @@ <h1 id="wordpiece-example-with-bert">WordPiece example with BERT</h1>
10801080
<h1 id="sentencepiece-works-for-languages-without-spaces">SentencePiece works for languages without spaces</h1>
10811081
<div class="note-box" data-title="Further reading">
10821082
<p><a href="https://aclanthology.org/D18-2012/"><strong>Kudo &amp; Richardson (2018, <em>EMNLP</em>):</strong></a> SentencePiece: a simple and language independent subword tokenizer</p>
1083+
<p><a href="https://arxiv.org/abs/1910.13267"><strong>Provilkov, Emelianeko, &amp; Voita (2019, <em>arXiv</em>):</strong></a> BPE-Dropout: simple and effective subword regularization</p>
10831084
</div>
10841085
<div class="warning-box" data-title="Problems with BPE/WordPiece">
10851086
<ul>
@@ -1339,7 +1340,7 @@ <h1 id="looking-ahead">Looking ahead</h1>
13391340
<div class="tip-box" data-title="Prepare by...">
13401341
<ul>
13411342
<li>Playing more with the <a href="https://contextlab.github.io/llm-course/demos/tokenization/">Tokenization Explorer Demo</a></li>
1342-
<li>Experimenting with HuggingFace tokenizers</li>
1343+
<li>Experimenting with <a href="https://github.com/huggingface/tokenizers">HuggingFace tokenizers</a></li>
13431344
</ul>
13441345
</div>
13451346
</section>

slides/week2/lecture6.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,7 @@ Use the [Tokenization Explorer Demo](https://contextlab.github.io/llm-course/dem
331331

332332
<div class="note-box" data-title="Further reading">
333333

334-
[**Wu et al. (2016, *arXiv*):**](https://arxiv.org/abs/1609.08144) Google's Neural Machine Translation System
334+
[**Wu et al. (2016, *arXiv*):**](https://arxiv.org/abs/1609.08144) Google's neural machine translation system
335335

336336
</div>
337337

@@ -380,6 +380,8 @@ print("Tokens:", tokens2)
380380

381381
[**Kudo & Richardson (2018, *EMNLP*):**](https://aclanthology.org/D18-2012/) SentencePiece: a simple and language independent subword tokenizer
382382

383+
[**Provilkov, Emelianeko, & Voita (2019, *arXiv*):**](https://arxiv.org/abs/1910.13267) BPE-Dropout: simple and effective subword regularization
384+
383385
</div>
384386

385387
<div class="warning-box" data-title="Problems with BPE/WordPiece">
@@ -612,7 +614,7 @@ print("Without special:", clean_decode) # "hello world"
612614
<div class="tip-box" data-title="Prepare by...">
613615

614616
- Playing more with the [Tokenization Explorer Demo](https://contextlab.github.io/llm-course/demos/tokenization/)
615-
- Experimenting with HuggingFace tokenizers
617+
- Experimenting with [HuggingFace tokenizers](https://github.com/huggingface/tokenizers)
616618

617619
</div>
618620

slides/week2/lecture6.pdf

2.38 KB
Binary file not shown.

0 commit comments

Comments
 (0)