Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.23 KB

File metadata and controls

22 lines (13 loc) · 1.23 KB

ComCorTxt

This project comprises two main components:

  1. Training a Classification Model: This component involves training a classification model aiming to predict one categorical feature based on free text input.
  2. Unsupervised Clustering and Visualization: This component runs an unsupervised model to identify and visualize clusters within the dataset, providing insights into the inherent structure and patterns of the data.

Setup Instructions

Download the required NLP models from Hugging Face:

Follow the instructions on the respective pages to download and place the models in the appropriate directory.

This project is developed using Python version 3.8.16

⚠️ Warning: Dataset Not Provided

Please note that the dataset for which this code was written cannot be made publicly available without authorization by the French data protection authority Commission Nationale de l'Informatique et des Libertés (CNIL). You will need to replace the placeholder with your own dataset before running the code.