How things work under the hood in Machine Learning

Section 1 - Dataset Creation

Nowadays (and fortunately) we have TensorFlow among the most popular libraries for Machine Learning, which comes with many utilities out of the box, but have you ever wondered how you could implement your own functions in absence of TensorFlow or any other ML library?

Certailny I do not want to reinvent the wheel: ML libraries are developed by very talented engineers that collaborate accross the world, source code is available for everyone, they have become "de facto standards" and many other aspects that motivate us to use ML libraries, instead of creating our own libraries.

Just for understanding what happens under the hood when using a function in my code, I wondered how I could implement this API call from TensorFlow: tf.keras.utils.image_dataset_from_directory

TensorFlow documentation simply says: Generates a tf.data.Dataset from image files in a directory.

Based on this, then my question was: Given a dataset composed by images, how can I convert it in a CSV file, where each row (sample) represents the input values (X), followed by the output value (y)?

The output dataset would be as follows:


x11, x12, ..., x1n, y1 (Example 1)
x21, x22, ..., x2n, y2 (Example 2)
...
xn1, xn2, ..., xnn, yn (Example n)

I came accross with a simple solution using Python, NumPy and Pillow. For details, check the directory /notebooks/Dataset Creation.ipynb

Section 2 - Model Creation, Model Training and Prediction

Given the dataset created in the previous section, let's build a machine learning model that can be used to predict on new samples (Horses or Humans)

The algorithm for building a machine learning model includes several steps, as follows:

- Initialize parameters

- Repeat

    - Forward propagation
    
    - Compute cost
    
    - Backward propagation
    
    - Update parameters (weights and biases)

  Until it reaches total of epochs

Notice that implement the above steps in complex models can be a nightmare using only Python and NumPy. That's why we use a friendly ML library.

What we usually do in TensorFlow:

We create a function that returns the model architecture.

def create_model(input_features):
    
    # Create an instance of Sequential class
    model = Sequential()
    
    # Add layers
    model.add(Dense(10, input_dim = input_features, activation = 'tanh')) 
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile the model
    model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=['accuracy'])
    
    return model

We train the model using the training dataset (Process in which the model learns the parameters such as weights and biases)

model.fit(X_training, y_training, epochs = 2000, verbose = 0)

Given a validation dataset, we evaluate the loss and accuracy of the model.

model.evaluate(X_validation, y_validation)

We use the model for predicting on new samples.

model.predict(some_new_sample)

For details, check the directory /notebooks/Model Creation, Model Training and Prediction.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
files		files
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How things work under the hood in Machine Learning

Section 1 - Dataset Creation

Section 2 - Model Creation, Model Training and Prediction

About

Uh oh!

Releases

Packages

Languages

License

fabiogaiera/how-things-work-in-ml

Folders and files

Latest commit

History

Repository files navigation

How things work under the hood in Machine Learning

Section 1 - Dataset Creation

Section 2 - Model Creation, Model Training and Prediction

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages