Built site for gh-pages

stefanbschneider · stefanbschneider · commit 0a7a1bb531bf · 2025-02-10T23:01:34.000+01:00
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-d64996de
+f4650bf8
diff --git a/index.html b/index.html
@@ -192,7 +192,7 @@ <h5 class="quarto-listing-category-title">Categories</h5><div class="quarto-list
 
 <div class="quarto-listing quarto-listing-container-default" id="listing-listing">
 <div class="list quarto-listing-default">
-<div class="quarto-post image-right" data-index="0" data-categories="transformer,llm,machine learning,HuggingFace,Vast.ai" data-listing-date-sort="1739055600000" data-listing-file-modified-sort="1739206763663" data-listing-date-modified-sort="1739055600000" data-listing-reading-time-sort="9">
+<div class="quarto-post image-right" data-index="0" data-categories="transformer,llm,machine learning,HuggingFace,Vast.ai" data-listing-date-sort="1739055600000" data-listing-file-modified-sort="1739224833436" data-listing-date-modified-sort="1739142000000" data-listing-reading-time-sort="9">
 <div class="thumbnail">
 <p><a href="./posts/llm-fine-tuning/index.html"> <img src="./posts/llm-fine-tuning/images/fine-tuning.jpg" class="thumbnail-image"> </a></p>
 </div>
diff --git a/posts/llm-fine-tuning/index.html b/posts/llm-fine-tuning/index.html
@@ -207,7 +207,7 @@ <h1 class="title">Fine-Tuning a Pre-Trained LLM</h1>
       <div>
       <div class="quarto-title-meta-heading">Modified</div>
       <div class="quarto-title-meta-contents">
-        <p class="date-modified">February 9, 2025</p>
+        <p class="date-modified">February 10, 2025</p>
       </div>
     </div>
       
@@ -638,15 +638,15 @@ <h2 class="anchored" data-anchor-id="training-and-monitoring">Training and Monit
 </section>
 <section id="full-training-run-on-vast.ai" class="level2">
 <h2 class="anchored" data-anchor-id="full-training-run-on-vast.ai">Full Training Run on <a href="https://cloud.vast.ai/?ref_id=202191">Vast.ai*</a></h2>
-<p>In the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used and training should be repeated for multiple epochs. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.</p>
+<p>In the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used by setting <code>dataset_limit=None</code> when loading the training data. Possibly, the training should be repeated for multiple epochs; <code>num_train_epochs</code> defaults to 3 epochs if it is not set otherwise in the training arguments. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.</p>
 <p>As I do not have a GPU locally, I rented a cheap GPU from <a href="https://cloud.vast.ai/?ref_id=202191">Vast.ai*</a>.</p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure">
 <p><img src="images/vastai.png" class="img-fluid figure-img"></p>
 <figcaption class="figure-caption">Cheap RTX 4070s Ti rented from <a href="https://cloud.vast.ai/?ref_id=202191">Vast.ai*</a></figcaption>
 </figure>
 </div>
-<p>Training on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti.</p>
+<p>Training on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti. Use <code>train_data = load_and_process_dataset("train[:50%]", dataset_limit=None)</code> to load the first half of the training set.</p>
 <p>Monitoring the training and evaluation metrics on Weights &amp; Biases, shows the loss slowly going down, both in training and validation, as well as the ROUGE score gradually increasing. If the loss would only decrease on the training set but not the validation set, this would indicate overfitting to the training set.</p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure">
@@ -693,8 +693,8 @@ <h1>Testing the Fine-Tuned Model</h1>
 <pre><code>'A transformer is a type of neural network architecture.  It\'s basically a network architecture that uses a set of neural networks to communicate with each other.  \n\nFor example, if you have a neural network that is connected to a computer, it will send a signal to the computer.  If you connect a computer to a network, the computer will send the signal to your computer, which will then send it to the network.  This is called a "transformer".\n\nThe Transformer is basically a computer that connects a network to another network, and sends the signal back and forth.\n\n_URL_0_\n\n  \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n \n\n &gt; \n\n   \n\n \n\n\n \n,\n\n\n &gt; \n \n\n  ,\n\nI\'m not sure what you mean by " transformer"., I\'m sure you\'re talking about " transformer" or " transformer."\n\nIf you\'re referring to " transformer", then you\'re probably referring to the "Transformer" and " transformer":\n\n* Transformer" is the same thing as transformer.\n\n**Transformer* is a transformer.**.. .\n,.\n\n:\n:: _:\n\n,  = ;\n\n\n"Transformer":\n\n\n" transformer" is a term used to describe the transformer. transformer is the transformer that is used to refer to the transformer.: " transformer.\n\n\n*Transformer:\n\n\n\\* Transmitter:\n\n\n\n*Transmitter:\n\n- Transmitter\n\n---\n\n--\n\n\\- transformer\n transformer is a transmitter.\n\n\n\n -?. I\'m not an expert in this field, but I\'m an expert.\n\n  &amp;  :! Transmitter. is an expert,s (_URL_\n\n\n"Transmitter""transformer""transmitter"\n" Transmitter"\n\n(_URL_1_)\rut(_URL)\n\n\r\n(_)'</code></pre>
 </div>
 </div>
-<p>Remember, the base model above simply repeated the input (its answer was ““question: What’s a transformer? In this context: The dominant sequence transduction models are based on complex recurrent…”). In comparison, the answer of the fine-tuned model is a lot better. The first sentence is a sensible answer to the question: “A transformer is a type of neural network architecture.” However, after that, the model starts rambling and the answer quality gets worse.</p>
-<p>I am not sure why a) the answer is so long and b) why it is not better than that. Since the training and validation loss were still going down, more training should further increase the answer quality.</p>
+<p>Remember, the base model above simply repeated the input (its answer was <code>"question: What's a transformer? In this context: The dominant sequence transduction models are based on complex recurrent...</code>). In comparison, the answer of the fine-tuned model is a lot better. The first sentence is a sensible answer to the question: <code>A transformer is a type of neural network architecture.</code> However, after that, the model starts rambling and the answer quality gets worse.</p>
+<p>Since the training and validation loss were still going down, more training should further increase the answer quality. I am not sure if there are any other ways to further improve the quality and keep the model from generating overly long answers (other than reducing the generation maximum length).</p>
 <p>If you have other suggestions for improving the answer quality, I am happy to hear your suggestions (contact info is on <a href="https://stefanbschneider.github.io/">my website</a>)!</p>
 </section>
 <section id="whats-next" class="level1">
diff --git a/search.json b/search.json
@@ -116,7 +116,7 @@
     "href": "posts/llm-fine-tuning/index.html#full-training-run-on-vast.ai",
     "title": "Fine-Tuning a Pre-Trained LLM",
     "section": "Full Training Run on Vast.ai*",
-    "text": "Full Training Run on Vast.ai*\nIn the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used and training should be repeated for multiple epochs. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.\nAs I do not have a GPU locally, I rented a cheap GPU from Vast.ai*.\n\n\n\nCheap RTX 4070s Ti rented from Vast.ai*\n\n\nTraining on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti.\nMonitoring the training and evaluation metrics on Weights & Biases, shows the loss slowly going down, both in training and validation, as well as the ROUGE score gradually increasing. If the loss would only decrease on the training set but not the validation set, this would indicate overfitting to the training set.\n\n\n\nMonitoring training and validation loss and ROUGE score on Weights & Biases\n\n\nI pushed the final fine-tuned model to HuggingFace: stefanbschneider/led-base-16384-lfqa-ans-len-512"
+    "text": "Full Training Run on Vast.ai*\nIn the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used by setting dataset_limit=None when loading the training data. Possibly, the training should be repeated for multiple epochs; num_train_epochs defaults to 3 epochs if it is not set otherwise in the training arguments. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.\nAs I do not have a GPU locally, I rented a cheap GPU from Vast.ai*.\n\n\n\nCheap RTX 4070s Ti rented from Vast.ai*\n\n\nTraining on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti. Use train_data = load_and_process_dataset(\"train[:50%]\", dataset_limit=None) to load the first half of the training set.\nMonitoring the training and evaluation metrics on Weights & Biases, shows the loss slowly going down, both in training and validation, as well as the ROUGE score gradually increasing. If the loss would only decrease on the training set but not the validation set, this would indicate overfitting to the training set.\n\n\n\nMonitoring training and validation loss and ROUGE score on Weights & Biases\n\n\nI pushed the final fine-tuned model to HuggingFace: stefanbschneider/led-base-16384-lfqa-ans-len-512"
   },
   {
     "objectID": "posts/generative-qa/index.html",
diff --git a/sitemap.xml b/sitemap.xml
@@ -2,82 +2,82 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://stefanbschneider.github.io/blog/index.html</loc>
-    <lastmod>2025-02-10T17:00:29.268Z</lastmod>
+    <lastmod>2025-02-10T22:01:25.586Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/django-db/index.html</loc>
-    <lastmod>2025-02-10T17:00:26.629Z</lastmod>
+    <lastmod>2025-02-10T22:01:23.002Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/understanding-transformers-attention/index.html</loc>
-    <lastmod>2025-02-10T17:00:24.946Z</lastmod>
+    <lastmod>2025-02-10T22:01:21.705Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/llm-fine-tuning/index.html</loc>
-    <lastmod>2025-02-10T17:00:23.210Z</lastmod>
+    <lastmod>2025-02-10T22:01:19.976Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/generative-qa/index.html</loc>
-    <lastmod>2025-02-10T17:00:21.429Z</lastmod>
+    <lastmod>2025-02-10T22:01:18.085Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/rllib-private-cluster/index.html</loc>
-    <lastmod>2025-02-10T17:00:20.014Z</lastmod>
+    <lastmod>2025-02-10T22:01:16.399Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/my-first-project/index.html</loc>
-    <lastmod>2025-02-10T17:00:18.209Z</lastmod>
+    <lastmod>2025-02-10T22:01:14.948Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/pytorch-getting-started/index.html</loc>
-    <lastmod>2025-02-10T17:00:16.608Z</lastmod>
+    <lastmod>2025-02-10T22:01:13.588Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/partial-observability/index.html</loc>
-    <lastmod>2025-02-10T17:00:14.450Z</lastmod>
+    <lastmod>2025-02-10T22:01:11.563Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/deepcomp/index.html</loc>
-    <lastmod>2025-02-10T17:00:01.557Z</lastmod>
+    <lastmod>2025-02-10T22:00:58.771Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/test-markdown-post/index.html</loc>
-    <lastmod>2025-02-10T17:00:02.520Z</lastmod>
+    <lastmod>2025-02-10T22:00:59.601Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/test-notebook-post/index.html</loc>
-    <lastmod>2025-02-10T17:00:15.747Z</lastmod>
+    <lastmod>2025-02-10T22:01:12.845Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/question-answering-huggingface/index.html</loc>
-    <lastmod>2025-02-10T17:00:17.227Z</lastmod>
+    <lastmod>2025-02-10T22:01:14.200Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/django-bootstrap/index.html</loc>
-    <lastmod>2025-02-10T17:00:19.095Z</lastmod>
+    <lastmod>2025-02-10T22:01:15.579Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/django-heroku/index.html</loc>
-    <lastmod>2025-02-10T17:00:20.741Z</lastmod>
+    <lastmod>2025-02-10T22:01:17.016Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/pytorch-django/index.html</loc>
-    <lastmod>2025-02-10T17:00:22.267Z</lastmod>
+    <lastmod>2025-02-10T22:01:19.044Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/descript/index.html</loc>
-    <lastmod>2025-02-10T17:00:23.992Z</lastmod>
+    <lastmod>2025-02-10T22:01:20.882Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/google-analytics-track-links/index.html</loc>
-    <lastmod>2025-02-10T17:00:25.732Z</lastmod>
+    <lastmod>2025-02-10T22:01:22.365Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/posts/django-google-analytics/index.html</loc>
-    <lastmod>2025-02-10T17:00:27.516Z</lastmod>
+    <lastmod>2025-02-10T22:01:23.877Z</lastmod>
   </url>
   <url>
     <loc>https://stefanbschneider.github.io/blog/about.html</loc>
-    <lastmod>2025-02-10T17:00:30.015Z</lastmod>
+    <lastmod>2025-02-10T22:01:26.325Z</lastmod>
   </url>
 </urlset>

Original file line number	Diff line number	Diff line change
`@@ -116,7 +116,7 @@`
`116`	`116`	`"href": "posts/llm-fine-tuning/index.html#full-training-run-on-vast.ai",`
`117`	`117`	`"title": "Fine-Tuning a Pre-Trained LLM",`
`118`	`118`	`"section": "Full Training Run on Vast.ai*",`
`119`		- "text": "Full Training Run on Vast.ai\nIn the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used and training should be repeated for multiple epochs. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.\nAs I do not have a GPU locally, I rented a cheap GPU from Vast.ai.\n\n\n\nCheap RTX 4070s Ti rented from Vast.ai*\n\n\nTraining on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti.\nMonitoring the training and evaluation metrics on Weights & Biases, shows the loss slowly going down, both in training and validation, as well as the ROUGE score gradually increasing. If the loss would only decrease on the training set but not the validation set, this would indicate overfitting to the training set.\n\n\n\nMonitoring training and validation loss and ROUGE score on Weights & Biases\n\n\nI pushed the final fine-tuned model to HuggingFace: stefanbschneider/led-base-16384-lfqa-ans-len-512"
	`119`	+ "text": "Full Training Run on Vast.ai\nIn the example above, I limited the size of the dataset to a very small fraction. For proper fine-tuning, the full dataset should be used by setting dataset_limit=None when loading the training data. Possibly, the training should be repeated for multiple epochs; num_train_epochs defaults to 3 epochs if it is not set otherwise in the training arguments. Similarly, logging, eval, and save frequency should be adjusted and the trained model should be pushed to the HuggingFace hub.\nAs I do not have a GPU locally, I rented a cheap GPU from Vast.ai.\n\n\n\nCheap RTX 4070s Ti rented from Vast.ai*\n\n\nTraining on 50% of the full dataset (a bit over 100k samples) for one epoch took roughly 24 hours on the rented RTX 4070s Ti. Use train_data = load_and_process_dataset(\"train[:50%]\", dataset_limit=None) to load the first half of the training set.\nMonitoring the training and evaluation metrics on Weights & Biases, shows the loss slowly going down, both in training and validation, as well as the ROUGE score gradually increasing. If the loss would only decrease on the training set but not the validation set, this would indicate overfitting to the training set.\n\n\n\nMonitoring training and validation loss and ROUGE score on Weights & Biases\n\n\nI pushed the final fine-tuned model to HuggingFace: stefanbschneider/led-base-16384-lfqa-ans-len-512"
`120`	`120`	`},`
`121`	`121`	`{`
`122`	`122`	`"objectID": "posts/generative-qa/index.html",`