88- can be deployed as an API server using a compose stack
99
1010## API usage
11+
11122-steps:
13+
1214- analyze: NER from raw text using models
1315- anonymize: config (rule) based processing of pre-detected PII
1416
1517### analyze
18+
1619- Minimal requirements: text + language. By default, all recognizers for that language are enabled.
1720 ``` sh
18- $ curl http://localhost:5002/analyze -s --header " Content-Type: application/json" --request POST --data ' {"text": "John Smith drivers license is AC432223","language": "en"}' | jq
21+ $ curl http://localhost:5002/analyze -s --header " Content-Type: application/json" --request POST --data ' {"text": "John Smith drivers license is AC432223","language": "en"}' | jq
1922 [
2023 {
2124 " analysis_explanation" : null,
3336 }
3437 ]
3538 ```
36- - analysis can be controlled by setting detection score, selecting entities, adding context words and adding a correlation id(?)
39+ - analysis can be controlled by setting detection score, selecting entities, adding context words and adding a correlation id(?)
3740- ad-hoc pattern (regex) recognizers can be provided as json objects
3841- a correlation-id (hash) can be given to append to logs for easier grouping of analyses in logs / traces.
3942
4043### anonymize
44+
4145- By default, the anonymization replaces all detected identifies by their type (e.g. <PERSON >) in the input text.
4246- An anonymizer dictionary can be provided to associate specific anonymization procedure to specific entity types.
4347- Two inputs must be given to the endpoint:
4448 - the raw text
4549 - the response from the analyze step (detected entities and their positions)
4650
4751### artificial sample
52+
4853Input:
54+
4955```
5056Prof. Gérard Waeber, Chef de service
5157Tél: +41 21 314 68 85 / Fax: +41 21 314 08 95
@@ -77,8 +83,10 @@ jfldéijf
7783Dr Médecin 00 Formateur
7884Chef de clinique
7985```
86+
8087- ## initial tests
81- Works with example artifical lettre de sortie.
88+ Works with example artifical lettre de sortie.
89+
8290``` python
8391import json
8492import requests
@@ -129,7 +137,9 @@ print(
129137## limitations
130138
131139### potential improvements
140+
132141Model configuration
142+
133143``` yaml
134144# config.yaml
135145nlp_engine_name : spacy
@@ -157,30 +167,28 @@ ner_model_configuration:
157167` ` `
158168
159169Recognizer configuration
170+
160171` ` ` yaml
161172# recognizers.yaml
162173recognizers :
163- -
164- name : " Swiss Zip code Recognizer"
174+ - name : " Swiss Zip code Recognizer"
165175 supported_languages :
166176 - language : fr
167177 context : [adresse, postal]
168178 - language : de
169- context : [ort, ]
179+ context : [ort]
170180 - language : it
171181 context : [...]
172182
173183 patterns :
174- -
175- name : " zip code (weak)"
176- regex : " (\\ b\\ d{5}(?:\\ -\\ d{4})?\\ b)"
177- score : 0.01
184+ - name : " zip code (weak)"
185+ regex : " (\\ b\\ d{5}(?:\\ -\\ d{4})?\\ b)"
186+ score : 0.01
178187 context :
179- - zip
180- - code
188+ - zip
189+ - code
181190 supported_entity : " ZIP"
182- -
183- name : " Titles recognizer"
191+ - name : " Titles recognizer"
184192 supported_language : " en"
185193 supported_entity : " TITLE"
186194 deny_list :
@@ -190,5 +198,4 @@ recognizers:
190198 - Miss
191199 - Dr.
192200 - Prof.
193-
194201` ` `
0 commit comments