Amsterdam University College -- Text Mining -- Winter/Spring 2022.
You can use the Hello World notebooks to check that everything is working.
| Week | Topic | Materials |
|---|---|---|
| 1 | Introduction and Python refresher | slides + notebooks 1, 2, 3, 4, 5 |
| 2 | Introduction to NLP and NLP pipelines | slides + notebook |
| 3 | Language modelling | slides + notebooks 1, 2 |
| 4 | Vector space semantics | slides + notebook |
| 5 | Word embeddings | slides + notebook |
| 6 | Machine learning fundamentals | slides + notebook |
| 7 | Text classification | slides + notebook (Scikit-learn), notebook (PyTorch) |
| 8 | RNNs and NER | slides + notebook |
| 9 | Web scraping and APIs | notebook |
| 10 | Recommender systems | slides + notebook |
| 11 | Creating annotated corpora | slides |
| 12 | Sentiment analysis | slides + notebook |
| 13 | Clustering and topic modelling | slides + notebook |
| 14 | XAI and Ethics | Selected contents from this course |
- Introduction (Stanford's CS231N).
- Optimization 1 (Stanford's CS231N).
- Yes you should understand backprop by Andrej Karpathy.
- Optimization 2 (Stanford's CS231N).
See the projects folder for info.
- Fork the repository to your Github account: go to https://github.com/bloemj/AUC_TMCI_2022 and click Fork
- Get updates (from time to time): In your fork on the Github website, click "Fetch upstream"
- Launch notebooks by going to your Google Colab: https://colab.research.google.com/ and loading them using the "Open Notebook" window. Enter the GitHub URL of the fork of the course materials in your own GitHub account to be able to save your changes. Click "Open notebook in new tab" to run the notebook.
- To save your changes, choose "Save a copy in GitHub" and accept the suggested location. Note that just using "Save" will not work, and changes will not automatically save. This will also not work if you did not perform step 1 and are loading my version of the repository directly.
- Clone the repository locally:
git clone https://github.com/bloemj/AUC_TMCI_2022.git - Get updates (from time to time):
git pull - Create a conda environemnt:
conda create -n myenv python=3.7 anaconda(wheremyenvis the envirnoment name) - Activate it:
conda activate myenv - Install packages (see the
requirements.txtfile), e.g.conda install pandas - Launch a Jupyter notebook:
jupyter notebook
- More on conda enviroments
- Conda cheatsheet
- Getting started with Jupyter notebooks
- On using git and GitHub for version control
Alternatively, use Binder (link above).
A more detailed guide to setup your environment, with multiple options.
- Giovanni Colavizza, who ran the previous-year edition of this course.
- Michael Repplinger, who ran the 2018/19 edition and Gianluca Lebani, who ran the 2017/18 edition.
- Giovanni Colavizza and Matteo Romanello, Applied Data Analysis course for the Oxford Digitial Humanities Summer School
- James Hetherington and Giovanni Colavizza, Research Software Engineering with Python
Everything in this repository which is not already attributed to someone else is released under CC BY 4.0.