Welcome to my Machine Learning & Deep Learning Projects repository!
This repository showcases my journey from classical machine learning models to advanced deep learning architectures, including Transformers and Language Models (LM/LLM). Each project is documented with theory, implementation, results, and reflections.
- About
- Projects
- Installation
- Usage
- Technologies & Libraries
- Repository Structure
- Contributing
- License
This repository demonstrates practical experience with:
- Classical ML: Linear Regression, Logistic Regression, Decision Tree Regression, K-Means, TF-IDF
- Neural Networks: Softmax Layer, Feedforward Neural Network, CNN, RNN
- Advanced Architectures: Transformers, Language Models, LLMs (in progress)
The goal is to build a strong foundation, document each step, and provide reproducible code.
| Model / Project | Type | Description |
|---|---|---|
| Linear Regression | ML | Predict continuous targets using a linear approach (Code) |
| Logistic Regression | ML | Binary/multi-class classification (Code) |
| TF-IDF (Term Frequency β Inverse Document Frequency) | NLP | Text vectorization for ML models (Code) |
| Decision Tree Models | ML/DL | Tree-based models for regression and classification (Code) |
| K-Means Clustering | ML | Unsupervised clustering algorithm (Code) |
| Softmax Classifier | DL | Multi-class classification (Code) |
| Feedforward Neural Network | DL | Fully connected neural network (Code) |
This section documents the Linear Regression model implemented from scratch in Python.
Predict continuous target values using a linear relationship between features and the output.
The model assumes a linear relationship:
Where:
- (A) = slope of the line
- (B) = intercept
The least squares method is used to find the optimal parameters:
Evaluation Metrics:
- Mean Squared Error (MSE):
- Root Mean Squared Error (RMSE):
- Mean Absolute Error (MEA):
class linear(model):
def __init__(self, A=0, B=0):
self.A = A
self.B = B
def fit(self, x, y):
n = len(x)
s, s1, s2, s3 = 0, 0, 0, 0
for i in range(n):
s += x[i] * y[i]
s1 += x[i]
s2 += y[i]
s3 += x[i] ** 2
self.A = np.round((n*s - s1*s2) / (n*s3 - s1**2), 4)
self.B = (s2 - self.A*s1) / n
print("Slope:", self.A, "Intercept:", self.B)
def predict(self, x):
return [self.A * xi + self.B for xi in x]
def MSE(self, y_true, y_pred):
return sum((y_true[i] - y_pred[i])**2 for i in range(len(y_true))) / len(y_true)
def RMSE(self, y_true, y_pred):
return math.sqrt(self.MSE(y_true, y_pred))
def MEA(self, y_true, y_pred):
return sum(abs(y_true[i] - y_pred[i]) for i in range(len(y_true))) / len(y_true)This section documents the Logistic Regression model implemented from scratch in Python.
Perform binary classification (0/1) or multi-class classification using sigmoid or softmax activation.
Sigmoid function (for binary classification):
[ \sigma(z) = \frac{1}{1 + e^{-z}} ]
Prediction probability:
$ \hat{y} = \sigma(X \cdot W + B) $
Where:
-
$X$ = input features -
$W$ = weights -
$B$ = bias
Decision rule (for binary classification):
Gradient Descent Update:
$ W := W - \alpha \frac{1}{n} X^T (\hat{y} - y) $
Where
Evaluation Metrics:
- Accuracy:
- Precision:
- Recall:
- F1 Score:
class logistic(model):
def __init__(self, n_iter=1000, A=0, B=0, use=None):
self.A = A
self.B = B
self.n_iter = n_iter
self.use = use
def fit(self, x, y, use="sigmoid"):
x = np.array(x, dtype=float)
y = np.array(y, dtype=float)
x_bias = np.c_[np.ones(x.shape[0]), x] # Add bias term
w = np.zeros((x_bias.shape[1], 1))
for i in range(self.n_iter):
z = x_bias @ w
h = sigmoid(z) if use=="sigmoid" else softmax(z)
gradient = x_bias.T @ (h - y.reshape(-1, 1)) / y.size
w = w - 0.01 * gradient
self.A = w
self.B = w[0]
self.use = use
return self.A, self.B
def predict_proba(self, x):
x = np.array(x, dtype=float)
x_bias = np.c_[np.ones(x.shape[0]), x]
z = x_bias @ self.A
return sigmoid(z) if self.use=="sigmoid" else softmax(z)
def predict(self, x, threshold=0.5):
probs = self.predict_proba(x)
if self.use=="sigmoid":
return (probs >= threshold).astype(int).flatten()
else:
return np.argmax(probs, axis=1)This section documents the TF-IDF model implemented from scratch in Python for converting text into numerical features suitable for machine learning.
Transform textual data into a numeric representation that captures the importance of each word in a document relative to the entire corpus.
- Term Frequency (TF):
The frequency of term
- Inverse Document Frequency (IDF):
Measures how rare a term is across all documents:
Where
- TF-IDF Score:
import numpy as np
import re
def decompose(text):
text = text.lower()
text = re.sub(r'[^a-z0-9\s]', '', text)
text = re.sub(r'\s+', ' ', text)
return text.strip().split()
class TFIDF:
def __init__(self, vocab=None, idf=None):
self.vocab = vocab
self.idf = idf
def compute_tf(self, data):
tf = []
doc_words = []
for document in data:
words = decompose(document)
n = len(words)
freq = {}
for word in words:
freq[word] = freq.get(word, 0) + 1
if word not in doc_words:
doc_words.append(word)
for word in freq:
freq[word] /= n
tf.append(freq)
# Compute IDF
idf = {}
N = len(data)
for word in doc_words:
s = sum(1 for dec in tf if word in dec)
idf[word] = np.log((N + 1) / (1 + s)) + 1
# Compute TF-IDF
tfidf = []
for freq in tf:
a = [freq.get(word, 0.0) * idf[word] for word in doc_words]
tfidf.append(a)
self.vocab = doc_words
self.idf = idf
return np.array(tfidf, dtype=float)
def transform(self, data):
if self.vocab is None or self.idf is None:
print("You need to fit the model first")
return
tfidf = []
for doc in data:
words = decompose(doc)
n = len(words)
freq = {}
for word in words:
if word in self.vocab:
freq[word] = freq.get(word, 0) + 1
for word in freq:
freq[word] /= n
a = [freq.get(word, 0.0) * self.idf[word] for word in self.vocab]
tfidf.append(a)
return np.array(tfidf, dtype=float)This section documents Decision Trees implemented from scratch in Python for both classification and regression tasks.
Classify data into categories by recursively splitting the dataset based on information gain (IG).
- Entropy (Impurity Measure):
For a set of labels
Where
- Information Gain (IG):
When splitting dataset
The feature and threshold with highest IG is chosen for splitting.
--
def fit_tree(x,y):
if len(x)==0 or len(y)==0:
raise ValueError("empty array")
if len(x)!=len(y):
raise ValueError("different lengths")
root=node()
root.feature=x
root.lable=y
def split(root):
if len(root.feature)==0:
return None
if len(np.unique(root.lable)) == 1:
leaf = node()
leaf.lable = [root.lable[0]]
leaf.left = None
leaf.right = None
return leaf
ig,t0,y0,t1,y1=information_gain(root.feature,root.lable)
if ig == 0 or len(y0) == 0 or len(y1) == 0:
leaf = node()
leaf.lable = [np.bincount(root.lable).argmax()]
leaf.left = None
leaf.right = None
return leaf
print("X:", t0)
print("y:", y0)
print("X:", t1)
print("y:", y1)
print("entropy:", entropy(y))
print("IG:", ig)
print("t0:", t0)
print("t1:", t1)
node_left=node()
node_right=node()
node_left.feature=t0
node_left.lable=y0
node_right.feature=t1
node_right.lable=y1
root.left=split(node_left)
root.right=split(node_right)
return root
return split(root)
def fit_regression(X, y, max_depth=float('inf'), min_samples=2):
X = np.array(X)
y = np.array(y)
n_samples,n_features=X.shape
best_mse = float('inf')
best_feature = None
best_threshold = None
best_split = None
for feature in range(n_features):
x_feature=X[:, feature]
sorted_idx=np.argsort(x_feature)
x_sorted=x_feature[sorted_idx]
y_sorted=y[sorted_idx]
for i in range(n_samples - 1):
threshold = (x_sorted[i] + x_sorted[i + 1]) / 2
left_mask = x_feature <= threshold
right_mask = x_feature > threshold
if left_mask.sum() == 0 or right_mask.sum() == 0:
continue
mse = MSE(y[left_mask], y[right_mask])
if mse<best_mse:
best_mse=mse
best_feature=feature
best_threshold=threshold
best_split=(left_mask, right_mask)
if max_depth==0 or n_samples<=min_samples or best_mse==0:
leaf = node()
leaf.value = np.mean(y)
return leaf
if best_feature is None:
leaf = node()
leaf.value = np.mean(y)
return leaf
root=node()
root.feature=best_feature
root.threshold=best_threshold
left_mask, right_mask=best_split
root.left = fit_regression(X[left_mask], y[left_mask], max_depth - 1, min_samples)
root.right = fit_regression(X[right_mask], y[right_mask], max_depth - 1, min_samples)
return rootThis section documents the K-Means clustering algorithm implemented from scratch in Python for unsupervised learning tasks.
Group data points into k clusters based on feature similarity by minimizing within-cluster variance.
- Cluster Assignment:
Each data point
Where
- Centroid Update:
After assigning points, update each cluster centroid:
Where
- Objective Function (Within-Cluster Sum of Squares):
K-Means iteratively minimizes J by alternating between assignment and centroid update.
import numpy as np
def k_means(x, k, max_iters=10):
n_samples, n_features = x.shape
# Randomly initialize centroids
idx = np.random.choice(n_samples, k, replace=False)
centroids = x[idx]
for _ in range(max_iters):
labels = np.zeros(n_samples)
# Assign points to nearest centroid
for j in range(n_samples):
distances = np.linalg.norm(x[j] - centroids[0])
cluster = 0
for c in range(1, k):
dist = np.linalg.norm(x[j] - centroids[c])
if dist < distances:
distances = dist
cluster = c
labels[j] = cluster
# Update centroids
for c in range(k):
points = x[labels == c]
if len(points) > 0:
centroids[c] = np.mean(points, axis=0)
return labels, centroidsThis section documents the Softmax classifier implemented from scratch in Python for multi-class classification tasks.
Classify data into multiple classes by generalizing logistic regression using the softmax function and cross-entropy loss.
- Softmax Function:
Converts raw scores (logits) into probabilities for each class:
Where
- Cross-Entropy Loss:
Measures the difference between predicted probabilities and true labels:
Where
- Gradient Descent Update:
For each class
Where
import numpy as np
def softmax(z):
exp_z = np.exp(z - np.max(z)) # stability trick
return exp_z / np.sum(exp_z)
def fit_softmax(x, y, learning_rate=0.01, epochs=1000):
x = np.array(x)
y = np.array(y)
n_samples, n_features = x.shape
n_classes = len(np.unique(y))
b = np.zeros(n_classes)
w = np.random.randn(n_features, n_classes) * 0.01
Y = np.eye(n_classes)[y] # one-hot encoding
for _ in range(epochs):
for j in range(n_samples):
z = np.array([x[j] @ w[:, k] + b[k] for k in range(n_classes)])
p = softmax(z)
for k in range(n_classes):
if k == y[j]:
p[k] -= 1
w[:, k] -= learning_rate * x[j] * p[k]
b[k] -= learning_rate * p[k]
return w, bThis section documents a fully-connected feedforward neural network implemented from scratch in Python for supervised learning tasks.
Train a multi-layer perceptron (MLP) for classification tasks, supporting multiple hidden layers, using sigmoid activation and softmax output.
- Forward Pass:
For layer
Where
For the output layer, softmax is used to get class probabilities:
- Loss Function (Cross-Entropy):
- Backpropagation:
- Compute gradient of output:
- For hidden layers
$l$ :
Where
- Weight and bias updates:
import numpy as np
from softmax_regressor import softmax
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(z):
s = sigmoid(z)
return s * (1 - s)
def neural_network(x, y, learning_rate=0.01, n_layer=2, n_neurons=[5,5], epochs=1000):
x = np.array(x)
y = np.array(y)
n_samples, n_features = x.shape
n_classes = len(np.unique(y))
# Initialize weights and biases
wights = []
biases = []
y_onehot = np.eye(n_classes)[y]
for i in range(epochs):
for t in range(n_samples):
A = []
Z = []
# Forward pass
for j in range(n_layer):
if i==0 and t==0:
if j!=0:
n_features=n_neurons[j-1]
w=np.random.randn(n_features, n_neurons[j])*0.01
wights.append(w)
b=np.zeros(n_neurons[j])
biases.append(b)
else:
w=wights[j]
b=biases[j]
a_prev = x[t] if j==0 else A[-1]
z = a_prev @ w + b
a = sigmoid(z)
Z.append(z)
A.append(a)
if i==0 and t==0:
w_out = np.random.randn(n_neurons[-1], n_classes) * 0.01
b_out = np.zeros(n_classes)
# Output layer
z_out = A[-1] @ w_out + b_out
p = softmax(z_out)
# Backpropagation
dZ_out = p - y_onehot[t]
dw_out = np.outer(A[-1], dZ_out)
db_out = dZ_out
dA = dZ_out @ w_out.T
dw = [0]*n_layer
db = [0]*n_layer
for l in reversed(range(n_layer)):
dZ = dA * sigmoid_derivative(Z[l])
dw[l] = np.outer(x[t] if l==0 else A[l-1], dZ)
db[l] = dZ
dA = dZ @ wights[l].T
# Update weights and biases
for k in range(n_layer):
wights[k] -= learning_rate * dw[k]
biases[k] -= learning_rate * db[k]
w_out -= learning_rate * dw_out
b_out -= learning_rate * db_out
return wights, biases, w_out, b_out- Clone the repository:
git clone https://github.com/yacinemebarki/model_from_scratch
cd ml-dl-models- Create a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txtfrom rnn.tok import tokenizer, embedding
from rnn.recunn import recurent
from rnn.rnn import layer
text_array=[
"I love AI",
"Deep learning is fun",
"Hello world",
"Python is great",
"RNN is powerful",
"I love deep learning"
]
labels=[1, 1, 0, 1, 0, 1]
#tokenization
tok=tokenizer()
tok.fit(text_array)
vec=tok.encode(text_array)
vec_padded = tok.padding(vec, 5)
print("tokenization",vec)
#creating rnn model
model=layer()
model.addembedding(tok.wordid,5)
model.addrecun(6)
#train
model.fit(vec_padded,labels)
print("wight",model.w_out)
print("bias",model.b_out)
text_pre=[
"i love python",
"i love machine learning"
]
vec_pre=tok.encode(text_pre)
vec_pre=tok.padding(vec_pre,5)
#predict
result=model.predict(vec_pre)
print(result)model_from_scratch/ β ββ cnn/ β ββ convnn.py β ββ conv.py β ββ flatt.py β ββ maxpool.py β ββ rnn/ β ββ tok.py β ββ recunn.py β ββ rnn.py β ββ my_models.py ββ my_models.py ββ decision_tree_alogrithm.py ββ k_means.py ββ TFIDF.py ββ neural_network.py ββ softmax_regressor.py ββ README.md ββ requirements.txt
Contributions are welcome! This project aims to be a learning and reference repository for implementing ML and DL models from scratch. You can contribute by:
- Reporting issues or bugs
- Suggesting improvements or optimizations
- Adding new models or architectures
- Improving documentation, explanations, and examples
- Fork the repository.
- Create a new branch for your feature or fix:
git checkout -b feature/your-feature-name - Make your changes and commit them:
git commit -m "Add feature: description" - Push to your fork:
git push origin feature/your-feature-name - Open a Pull Request explaining your changes.
Please ensure your code is well-commented and follows the existing project structure.
This project is licensed under the MIT License β see the LICENSE file for details.
You are free to use, modify, and distribute this code for personal, educational, or commercial purposes, as long as proper attribution is given.