Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 622 Bytes

File metadata and controls

10 lines (7 loc) · 622 Bytes

StatCan Text Classification using Fasttext (Tutorial)

This repository contains code submitted for the United Nations Economic Commission for Europe (UNECE) HLG-MOS, WP1: Pilot Study, Classification and Coding.

Data

The data used in this repository is open-source ECOICOP classified products provided by Statistics Poland.

Source: https://github.com/UNECE/ML_dataset.

Pipeline

The repository is a series of Juypter Notebook files which break down a machine-learning pipeline into four steps: preprocessing (Step1), hyperparameter tuning (Step2), training the model (Step3), and evaluating the model (Step4).