🤗 Hugging Face Learning Journey

This repository documents my step-by-step learning of Hugging Face tools for applied machine learning, with a focus on text data and real-world workflows.

The goal is to build a clear understanding of the full pipeline:

data → preprocessing → training → evaluation → sharing → application

01 - Loading Data

What this covers:

loading datasets using load_dataset accessing train/test splits initial dataset exploration

02 - Inspect Dataset

What this covers:

inspecting samples and features understanding dataset structure checking dataset size and label distribution

03 - Preprocessing

This notebook covers:

general preprocessing such as filtering and text transformation
text preprocessing for transformers
tokenization with a matching pretrained tokenizer
preparing the dataset for PyTorch

04 - Fine-Tuning With Trainer API

This notebook continues after preprocessing and shows how to:

load a pretrained text classification model
define training settings with TrainingArguments
use Trainer to fine-tune the model
evaluate performance with accuracy

05 - Custom Training Loop

This notebook explains how text classification training works without the Trainer API:

prepare tokenized data for PyTorch
create DataLoaders
define optimizer and scheduler
run a manual training loop
evaluate the model

06 - Understanding Learning Curves

This notebook explains how to interpret:

training and validation loss
validation accuracy
healthy learning curves
overfitting
underfitting
unstable training

07 - Sharing Models

Use and share models via the Hugging Face Hub.

loading pretrained models
saving models locally
pushing models to the Hub
model cards

08 - Classical NLP Tasks

This notebook covers the most practical classical NLP tasks that form the foundation of modern language models:

token classification
question answering
summarization
translation

For each task, I included:

the main idea
a small practical example
the key preprocessing concept
the common evaluation metric
a short summary of what I learned

Main takeaways

token classification requires label alignment with subword tokens
extractive QA requires mapping answer spans from characters to tokens
summarization typically uses encoder-decoder models and ROUGE
translation uses sequence-to-sequence models and SacreBLEU

9 - Supervised Fine-Tuning and LoRA

This notebook covers the practical workflow for adapting a pretrained language model into an assistant-style model.

Topics covered

chat templates
supervised fine-tuning (SFT)
LoRA
evaluation after fine-tuning

Main takeaways

instruct models require the correct chat template
SFT is useful only when prompting is not enough
LoRA makes fine-tuning much more memory-efficient
evaluation should include both metrics and qualitative checks

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Hugging Face		Hugging Face
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤗 Hugging Face Learning Journey

01 - Loading Data

02 - Inspect Dataset

03 - Preprocessing

04 - Fine-Tuning With Trainer API

05 - Custom Training Loop

06 - Understanding Learning Curves

07 - Sharing Models

08 - Classical NLP Tasks

Main takeaways

9 - Supervised Fine-Tuning and LoRA

Topics covered

Main takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤗 Hugging Face Learning Journey

01 - Loading Data

02 - Inspect Dataset

03 - Preprocessing

04 - Fine-Tuning With Trainer API

05 - Custom Training Loop

06 - Understanding Learning Curves

07 - Sharing Models

08 - Classical NLP Tasks

Main takeaways

9 - Supervised Fine-Tuning and LoRA

Topics covered

Main takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages