Introduction to the Enrichment Assignments

This repository contains the assignments and code for the Enrichment Assignments of the Course SOW-MKI85 Machine Hearing 2024-2025.

These assignments aim to provide you with a hands-on experience with deep learning approaches for machine hearing. You will learn to pre-process audio, to extract relevant audio features, to train a ResNet-18 model on an environmental sound classification task and to evaluate the performance of the trained model.

The assignments consists of five sessions. Session 1 and Session 2 introduce the assignment, the dataset, relevant concepts, frameworks, libraries, and audio feature extraction. Session 3 and Session 4 cover data preprocessing and training the ResNet-18 model on different sets of audio features, while Session 5 is focused on analyzing and visualizing model performance. To conclude, Session 6 is dedicated to placing your findings within a wider theoretical framework based on your newly acquired knowledge of AI for Audio.

Intended learning outcomes

After successful completion of the enrichment assignments, you can...
• Describe, extract and analyse relevant audio features for sound classification.
• Implement and train a ResNet-18 model using various audio features.
• Evaluate and compare model performance for a sound classification task using relevant performance metrics.
• Visualize data and results in a meaningful, informative way.
• Interpret findings within the wider theoretical framework of AI for Audio.

Materials

Dataset

The dataset that we are using for these assignments is the Environmental Sound Classification 50 (ESC-50) dataset[1]. This dataset consists of sound clips of 5 second duration in five categories: “Natural soundscapes & Water sounds”, “Human, non-speech sounds”, “Interior/Domestic sounds”, “Exterior/Urban noises”. More information about the dataset can be found [here](https://github.com/karolpiczak/ESC-50).

The link to the ESC-50 database and metafile is on Brightspace in 'Content' --> 'Practical' --> 'Dataset'.

The Resnet-18 model

For these assignments, we make use of the ResNet-18 model [2]. ResNet models use skip connections to learn residual functions with respect to the input, rather than learning unreferenced functions as is the case in most neural networks. These skip connections mitigate the vanishing/exploding gradient problem that deep neural networks encounter, resulting in faster convergence and better performance.

Here, we make use of an implementation of the ResNet model with 18 layers. Although this relatively small ResNet performs on par with other state-of-the-art architectures, it has relatively low complexity and converges faster [2].

Libraries, frameworks, platforms

Pytorch: The assignments use the open-source library Pytorch to implement the ResNet-18 model.
Torchaudio: The assignments use Torchaudio to compute and extract relevant audio features. Torchaudio is a library for audio and signal processing with Pytorch. A good alternative to Torchaudio is Librosa.
WandB (Weights and Biases): The assignments use the Weights & Biases ML Ops platform to visualize and track training progress.
Colab: You can work on the assingments using Google Colab. Colab is a hosted Jupyter Notebook service. Use Google Drive for data storage. If you do not yet have an account, please sign up and create an account.

The assignments

You can download assignments, clone the repository or open an assignment in Google Colab using the link at the top of the script. Assignments will be added at the latest on the evening before the practical session. In total, there will be six assignments.

Practical report

The Practical Assignments are completed with a Practical Report consisting of two parts:

Part 1: Audio feature extraction; template practical report part 1.
Part 2: Sound classification using the Resnet-18 model; template practical report part 2.

References

[1] Piczak, K. J. (2015, October). ESC: Dataset for environmental sound classification. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1015-1018).
[2] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Assignments		Assignments
ResNet-18_code		ResNet-18_code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction to the Enrichment Assignments

Intended learning outcomes

Materials

Dataset

The Resnet-18 model

Libraries, frameworks, platforms

The assignments

Practical report

References

About

Uh oh!

Releases

Packages

Languages

jellymace/MachineHearing_EnrichmentAssignments_2024-2025

Folders and files

Latest commit

History

Repository files navigation

Introduction to the Enrichment Assignments

Intended learning outcomes

Materials

Dataset

The Resnet-18 model

Libraries, frameworks, platforms

The assignments

Practical report

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages