This model predicts the L1 language of English learners based on the patterns identified in their writing in English.
This project is conducted from scratch, first obtaining the text from the html webpages where users post their writing in English, then dividing the sample texts into training, development, and test data.
The most important part of the project is creating features which can predict the native language of English learners. For each feature extraction, an explanation is provided for the linguistics reasons why the selected feature is most appropiate to predict the native language. The extracted features are applied to the text in an optimized way, and then finally converted to a numpy array to be tested by different models.
To set up the necessary packages for running the labs and lecture material, download the environment file to your computer (hit "Raw" and then Ctrl + s to save it, or copy paste the content). Then create a virtual environment by using conda with the environment file you just downloaded:
conda env create --file environment.yml
This will set up Python with the correct versions of all required packages.
Now you can download [Final.ipynb] and run it using your preferred method with the new environment native
The report is at the end of the notebook, and if you don't want to run the code, it can also be viewed through the PDF.