Automatically pseudo-anonymise name of people in Cour des Comptes's jurisprudence
- We explore 138 documents.
 - We have more than 12 k different words.
 - We have more 420 k words (with 3147 positive / others are negative)
 
Donwload data from this link then dezip it. You should see a directory dataon root.
- python reading_doc_files.py --> Create data.csv file with all features and structure
 - python trainning.py --> Train the model and give some metrics
 - get_prediction.py --> Read & processs a .docx (line 220) to anonymise it in ouput directory.
 
Create ouput files :
[name_of_file]_log.csv: Log of this file (warning is a bool)[name_of_file].txt: Return the text with anonymise result.[name_of_file].html: Return the text in html balise with color (green seems OK, Red mean warning this could be a error).
result of html file :
