You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: reorder week2 notebooks to focus on open-source fine-tuning
With Cohere fine-tuning deprecated, restructure Week 2 to emphasize
the open-source sentence-transformers approach:
- Move Open Source Models notebook to position 2 (recommended)
- Move deprecated Cohere notebook to position 3 (reference only)
- Update overview to highlight open-source focus
- Revise learning objectives to remove managed service references
- Update prerequisites to remove Cohere API requirement
- Adjust expected outcomes and next steps accordingly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: latest/week2/README.md
+32-32Lines changed: 32 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,19 +4,19 @@
4
4
5
5
This week focuses on one of the most impactful ways to improve RAG system performance: fine-tuning embedding models for your specific domain. Generic embedding models are trained on broad internet data and may not capture the nuances of your particular use case. By fine-tuning these models with domain-specific data, you can achieve significant improvements in retrieval quality.
6
6
7
-
You'll explore two complementary approaches to fine-tuning: using managed services like Cohere that handle the infrastructure complexity for you, and open-source solutions using sentence-transformers that give you complete control over the process. Both approaches demonstrate 15-30% improvements in key retrieval metrics, making fine-tuning one of the highest-ROI optimizations for RAG systems.
7
+
**Note**: As of September 2025, Cohere no longer supports fine-tuning. This week now focuses primarily on open-source fine-tuning using sentence-transformers, which gives you complete control over the training process and demonstrates 15-30% improvements in key retrieval metrics, making fine-tuning one of the highest-ROI optimizations for RAG systems.
8
8
9
9
## Learning Objectives
10
10
11
11
By the end of this week, you'll be able to:
12
12
13
13
- Generate high-quality synthetic training data for embedding fine-tuning
14
14
- Implement manual review processes to ensure training data quality
15
-
- Fine-tune embedding models using both managed services and open-source tools
15
+
- Fine-tune embedding models using open-source tools like sentence-transformers
16
16
- Create effective training datasets with hard and semi-hard negatives
17
17
- Evaluate fine-tuned models against baselines using established metrics
18
-
- Deploy fine-tuned models to production environments
19
-
-Make informed decisions between managed and self-hosted approaches
18
+
- Deploy fine-tuned models to Hugging Face Hub for production use
19
+
-Understand triplet loss and semi-hard negative mining techniques
20
20
21
21
## Notebooks
22
22
@@ -37,41 +37,41 @@ By the end of this week, you'll be able to:
37
37
- Manual review interface using Streamlit
38
38
- Evaluation pipeline with LanceDB for measuring improvements
39
39
40
-
### 2. Finetune Cohere.ipynb (Deprecated)
40
+
### 2. Open Source Models.ipynb (Recommended)
41
41
42
-
> **Note**: As of September 2025, Cohere no longer supports fine-tuning. This notebook is kept for reference purposes.
43
-
44
-
**Purpose**: Fine-tune a Cohere re-ranker model using managed services for simplified deployment
42
+
**Purpose**: Fine-tune open-source embedding models with complete control over the training process
45
43
46
44
**What You'll Learn**:
47
45
48
-
-Hard negative mining techniques for effective training
49
-
-Working with Cohere's fine-tuning API
50
-
-Comparative evaluation of base vs. fine-tuned models
51
-
-Performance analysis and visualization techniques
46
+
-Triplet loss training with semi-hard negative mining
47
+
-SentenceTransformerTrainer configuration and usage
48
+
-Model deployment to Hugging Face Hub
49
+
-Hyperparameter tuning and evaluation techniques
52
50
53
51
**What You'll Build**:
54
52
55
-
- Fine-tuned Cohere re-ranker model
56
-
- Training dataset with carefully selected hard negatives
**Purpose**: Fine-tune open-source embedding models with complete control over the training process
59
+
> **Note**: As of September 2025, Cohere no longer supports fine-tuning. Please focus on the Open Source Models notebook instead. This notebook is kept for reference purposes only.
60
+
61
+
**Purpose**: Fine-tune a Cohere re-ranker model using managed services for simplified deployment
62
62
63
63
**What You'll Learn**:
64
64
65
-
-Triplet loss training with semi-hard negative mining
66
-
-SentenceTransformerTrainer configuration and usage
67
-
-Model deployment to Hugging Face Hub
68
-
-Trade-offs between managed services and self-hosted solutions
65
+
-Hard negative mining techniques for effective training
66
+
-Working with Cohere's fine-tuning API (deprecated)
67
+
-Comparative evaluation of base vs. fine-tuned models
68
+
-Performance analysis and visualization techniques
69
69
70
70
**What You'll Build**:
71
71
72
-
- Fine-tuned BAAI/bge-base-en embedding model
73
-
- Training pipeline using sentence-transformers
74
-
-Deployable model on Hugging Face Hub
72
+
- Fine-tuned Cohere re-ranker model (no longer supported)
73
+
- Training dataset with carefully selected hard negatives
74
+
-Performance comparison visualizations
75
75
76
76
## Key Concepts
77
77
@@ -91,9 +91,9 @@ By the end of this week, you'll be able to:
0 commit comments