Skip to content

Commit 0a58771

Browse files
authored
[TA] Adding support for PII endpoint (#13687)
* added pii endpoint support
1 parent ab87efe commit 0a58771

38 files changed

+1957
-48
lines changed

sdk/textanalytics/azure-ai-textanalytics/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
- `length` is the number of characters in the text of these models
55
- `offset` is the offset of the text from the start of the document
66

7+
**New features**
8+
- Added support for Personally Identifiable Information(PII) entity recognition feature.
9+
To use this feature, you need to make sure you are using the service's v3.1-preview.1 API.
10+
711
## 5.0.0 (2020-07-27)
812
- Re-release of version `1.0.1` with updated version `5.0.0`.
913

sdk/textanalytics/azure-ai-textanalytics/README.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ and includes six main functions:
66
- Language Detection
77
- Key Phrase Extraction
88
- Named Entity Recognition
9+
- Personally Identifiable Information Entity Recognition
910
- Linked Entity Recognition
1011

1112
[Source code][source_code] | [Package (Maven)][package] | [API reference documentation][api_reference_doc] | [Product Documentation][product_documentation] | [Samples][samples_readme]
@@ -186,6 +187,7 @@ The following sections provide several code snippets covering some of the most c
186187
* [Detect Language](#detect-language "Detect language")
187188
* [Extract Key Phrases](#extract-key-phrases "Extract key phrases")
188189
* [Recognize Entities](#recognize-entities "Recognize entities")
190+
* [Recognize Personally Identifiable Information Entities](#recognize-personally-identifiable-information-entities "Recognize Personally Identifiable Information entities")
189191
* [Recognize Linked Entities](#recognize-linked-entities "Recognize linked entities")
190192

191193
### Text Analytics Client
@@ -209,7 +211,7 @@ TextAnalyticsAsyncClient textAnalyticsClient = new TextAnalyticsClientBuilder()
209211

210212
### Analyze sentiment
211213
Run a Text Analytics predictive model to identify the positive, negative, neutral or mixed sentiment contained in the
212-
passed-in document or batch of documents.
214+
provided document or batch of documents.
213215

214216
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L104-L108 -->
215217
```java
@@ -236,7 +238,7 @@ For samples on using the production recommended option `DetectLanguageBatch` see
236238
Please refer to the service documentation for a conceptual discussion of [language detection][language_detection].
237239

238240
### Extract key phrases
239-
Run a model to identify a collection of significant phrases found in the passed-in document or batch of documents.
241+
Run a model to identify a collection of significant phrases found in the provided document or batch of documents.
240242

241243
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L149-L151 -->
242244
```java
@@ -248,7 +250,7 @@ For samples on using the production recommended option `ExtractKeyPhrasesBatch`
248250
Please refer to the service documentation for a conceptual discussion of [key phrase extraction][key_phrase_extraction].
249251

250252
### Recognize entities
251-
Run a predictive model to identify a collection of named entities in the passed-in document or batch of documents and
253+
Run a predictive model to identify a collection of named entities in the provided document or batch of documents and
252254
categorize those entities into categories such as person, location, or organization. For more information on available
253255
categories, see [Text Analytics Named Entity Categories][named_entities_categories].
254256

@@ -262,8 +264,24 @@ textAnalyticsClient.recognizeEntities(document).forEach(entity ->
262264
For samples on using the production recommended option `RecognizeEntitiesBatch` see [here][recognize_entities_sample].
263265
Please refer to the service documentation for a conceptual discussion of [named entity recognition][named_entity_recognition].
264266

267+
### Recognize Personally Identifiable Information entities
268+
Run a predictive model to identify a collection of Personally Identifiable Information(PII) entities in the provided
269+
document. It recognizes and categorizes PII entities in its input text, such as
270+
Social Security Numbers, bank account information, credit card numbers, and more. This endpoint is only supported for
271+
API versions v3.1-preview.1 and above.
272+
273+
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L158-L161 -->
274+
```java
275+
String document = "My SSN is 859-98-0987";
276+
textAnalyticsClient.recognizePiiEntities(document).forEach(entity -> System.out.printf(
277+
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
278+
+ " confidence score: %f.%n",
279+
```
280+
281+
Please refer to the service documentation for [supported PII entity types][pii_entity_recognition].
282+
265283
### Recognize linked entities
266-
Run a predictive model to identify a collection of entities found in the passed-in document or batch of documents,
284+
Run a predictive model to identify a collection of entities found in the provided document or batch of documents,
267285
and include information linking the entities to their corresponding entries in a well-known knowledge base.
268286

269287
<!-- embedme ./src/samples/java/com/azure/ai/textanalytics/ReadmeSamples.java#L135-L142 -->
@@ -357,6 +375,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For m
357375
[named_entity_recognition]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking
358376
[named_entity_recognition_types]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
359377
[named_entities_categories]: https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/named-entity-types
378+
[pii_entity_recognition]: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal
360379
[package]: https://mvnrepository.com/artifact/com.azure/azure-ai-textanalytics
361380
[performance_tuning]: https://github.com/Azure/azure-sdk-for-java/wiki/Performance-Tuning
362381
[product_documentation]: https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview

sdk/textanalytics/azure-ai-textanalytics/src/main/java/com/azure/ai/textanalytics/RecognizeEntityAsyncClient.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@
1111
import com.azure.ai.textanalytics.models.CategorizedEntityCollection;
1212
import com.azure.ai.textanalytics.models.EntityCategory;
1313
import com.azure.ai.textanalytics.models.RecognizeEntitiesResult;
14-
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
1514
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
1615
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
1716
import com.azure.ai.textanalytics.models.TextDocumentInput;
1817
import com.azure.ai.textanalytics.models.WarningCode;
18+
import com.azure.ai.textanalytics.util.RecognizeEntitiesResultCollection;
1919
import com.azure.core.exception.HttpResponseException;
2020
import com.azure.core.http.rest.Response;
2121
import com.azure.core.http.rest.SimpleResponse;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
package com.azure.ai.textanalytics;
5+
6+
import com.azure.ai.textanalytics.implementation.TextAnalyticsClientImpl;
7+
import com.azure.ai.textanalytics.implementation.models.EntitiesResult;
8+
import com.azure.ai.textanalytics.implementation.models.MultiLanguageBatchInput;
9+
import com.azure.ai.textanalytics.implementation.models.WarningCodeValue;
10+
import com.azure.ai.textanalytics.models.EntityCategory;
11+
import com.azure.ai.textanalytics.models.PiiEntity;
12+
import com.azure.ai.textanalytics.models.PiiEntityCollection;
13+
import com.azure.ai.textanalytics.models.RecognizePiiEntitiesResult;
14+
import com.azure.ai.textanalytics.models.TextAnalyticsRequestOptions;
15+
import com.azure.ai.textanalytics.models.TextAnalyticsWarning;
16+
import com.azure.ai.textanalytics.models.TextDocumentInput;
17+
import com.azure.ai.textanalytics.models.WarningCode;
18+
import com.azure.ai.textanalytics.util.RecognizePiiEntitiesResultCollection;
19+
import com.azure.core.http.rest.Response;
20+
import com.azure.core.http.rest.SimpleResponse;
21+
import com.azure.core.util.Context;
22+
import com.azure.core.util.IterableStream;
23+
import com.azure.core.util.logging.ClientLogger;
24+
import reactor.core.publisher.Mono;
25+
26+
import java.util.ArrayList;
27+
import java.util.Collections;
28+
import java.util.List;
29+
import java.util.Objects;
30+
import java.util.stream.Collectors;
31+
32+
import static com.azure.ai.textanalytics.TextAnalyticsAsyncClient.COGNITIVE_TRACING_NAMESPACE_VALUE;
33+
import static com.azure.ai.textanalytics.implementation.Utility.inputDocumentsValidation;
34+
import static com.azure.ai.textanalytics.implementation.Utility.mapToHttpResponseExceptionIfExist;
35+
import static com.azure.ai.textanalytics.implementation.Utility.toBatchStatistics;
36+
import static com.azure.ai.textanalytics.implementation.Utility.toMultiLanguageInput;
37+
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsError;
38+
import static com.azure.ai.textanalytics.implementation.Utility.toTextAnalyticsException;
39+
import static com.azure.ai.textanalytics.implementation.Utility.toTextDocumentStatistics;
40+
import static com.azure.core.util.FluxUtil.monoError;
41+
import static com.azure.core.util.FluxUtil.withContext;
42+
import static com.azure.core.util.tracing.Tracer.AZ_TRACING_NAMESPACE_KEY;
43+
44+
/**
45+
* Helper class for managing recognize Personally Identifiable Information entity endpoint.
46+
*/
47+
class RecognizePiiEntityAsyncClient {
48+
private final ClientLogger logger = new ClientLogger(RecognizePiiEntityAsyncClient.class);
49+
private final TextAnalyticsClientImpl service;
50+
51+
/**
52+
* Create a {@link RecognizePiiEntityAsyncClient} that sends requests to the Text Analytics services's
53+
* recognize Personally Identifiable Information entity endpoint.
54+
*
55+
* @param service The proxy service used to perform REST calls.
56+
*/
57+
RecognizePiiEntityAsyncClient(TextAnalyticsClientImpl service) {
58+
this.service = service;
59+
}
60+
61+
/**
62+
* Helper function for calling service with max overloaded parameters that returns a {@link Mono}
63+
* which contains {@link PiiEntityCollection}.
64+
*
65+
* @param document A single document.
66+
* @param language The language code.
67+
*
68+
* @return The {@link Mono} of {@link PiiEntityCollection}.
69+
*/
70+
Mono<PiiEntityCollection> recognizePiiEntities(String document, String language) {
71+
try {
72+
Objects.requireNonNull(document, "'document' cannot be null.");
73+
return recognizePiiEntitiesBatch(
74+
Collections.singletonList(new TextDocumentInput("0", document).setLanguage(language)), null)
75+
.map(resultCollectionResponse -> {
76+
PiiEntityCollection entityCollection = null;
77+
// for each loop will have only one entry inside
78+
for (RecognizePiiEntitiesResult entitiesResult : resultCollectionResponse.getValue()) {
79+
if (entitiesResult.isError()) {
80+
throw logger.logExceptionAsError(toTextAnalyticsException(entitiesResult.getError()));
81+
}
82+
entityCollection = new PiiEntityCollection(entitiesResult.getEntities(),
83+
entitiesResult.getEntities().getWarnings());
84+
}
85+
return entityCollection;
86+
});
87+
} catch (RuntimeException ex) {
88+
return monoError(logger, ex);
89+
}
90+
}
91+
92+
/**
93+
* Helper function for calling service with max overloaded parameters.
94+
*
95+
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
96+
* @param options The {@link TextAnalyticsRequestOptions} request options.
97+
*
98+
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
99+
*/
100+
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatch(
101+
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options) {
102+
try {
103+
inputDocumentsValidation(documents);
104+
return withContext(context -> getRecognizePiiEntitiesResponse(documents, options, context));
105+
} catch (RuntimeException ex) {
106+
return monoError(logger, ex);
107+
}
108+
}
109+
110+
/**
111+
* Helper function for calling service with max overloaded parameters with {@link Context} is given.
112+
*
113+
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
114+
* @param options The {@link TextAnalyticsRequestOptions} request options.
115+
* @param context Additional context that is passed through the Http pipeline during the service call.
116+
*
117+
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
118+
*/
119+
Mono<Response<RecognizePiiEntitiesResultCollection>> recognizePiiEntitiesBatchWithContext(
120+
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
121+
try {
122+
inputDocumentsValidation(documents);
123+
return getRecognizePiiEntitiesResponse(documents, options, context);
124+
} catch (RuntimeException ex) {
125+
return monoError(logger, ex);
126+
}
127+
}
128+
129+
/**
130+
* Helper method to convert the service response of {@link EntitiesResult} to {@link Response} which contains
131+
* {@link RecognizePiiEntitiesResultCollection}.
132+
*
133+
* @param response the {@link Response} of {@link EntitiesResult} returned by the service.
134+
*
135+
* @return A {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
136+
*/
137+
private Response<RecognizePiiEntitiesResultCollection> toRecognizePiiEntitiesResultCollectionResponse(
138+
final Response<EntitiesResult> response) {
139+
final EntitiesResult entitiesResult = response.getValue();
140+
// List of documents results
141+
final List<RecognizePiiEntitiesResult> recognizeEntitiesResults = new ArrayList<>();
142+
entitiesResult.getDocuments().forEach(documentEntities -> {
143+
// Pii entities list
144+
final List<PiiEntity> piiEntities = documentEntities.getEntities().stream().map(entity ->
145+
new PiiEntity(entity.getText(), EntityCategory.fromString(entity.getCategory()),
146+
entity.getSubcategory(), entity.getConfidenceScore(), entity.getOffset(), entity.getLength()))
147+
.collect(Collectors.toList());
148+
// Warnings
149+
final List<TextAnalyticsWarning> warnings = documentEntities.getWarnings().stream()
150+
.map(warning -> {
151+
final WarningCodeValue warningCodeValue = warning.getCode();
152+
return new TextAnalyticsWarning(
153+
WarningCode.fromString(warningCodeValue == null ? null : warningCodeValue.toString()),
154+
warning.getMessage());
155+
}).collect(Collectors.toList());
156+
157+
recognizeEntitiesResults.add(new RecognizePiiEntitiesResult(
158+
documentEntities.getId(),
159+
documentEntities.getStatistics() == null ? null
160+
: toTextDocumentStatistics(documentEntities.getStatistics()),
161+
null,
162+
new PiiEntityCollection(new IterableStream<>(piiEntities), new IterableStream<>(warnings))
163+
));
164+
});
165+
// Document errors
166+
entitiesResult.getErrors().forEach(documentError -> {
167+
recognizeEntitiesResults.add(
168+
new RecognizePiiEntitiesResult(documentError.getId(), null,
169+
toTextAnalyticsError(documentError.getError()), null));
170+
});
171+
172+
return new SimpleResponse<>(response,
173+
new RecognizePiiEntitiesResultCollection(recognizeEntitiesResults, entitiesResult.getModelVersion(),
174+
entitiesResult.getStatistics() == null ? null : toBatchStatistics(entitiesResult.getStatistics())));
175+
}
176+
177+
/**
178+
* Call the service with REST response, convert to a {@link Mono} of {@link Response} that contains
179+
* {@link RecognizePiiEntitiesResultCollection} from a {@link SimpleResponse} of {@link EntitiesResult}.
180+
*
181+
* @param documents The list of documents to recognize Personally Identifiable Information entities for.
182+
* @param options The {@link TextAnalyticsRequestOptions} request options.
183+
* @param context Additional context that is passed through the Http pipeline during the service call.
184+
*
185+
* @return A mono {@link Response} that contains {@link RecognizePiiEntitiesResultCollection}.
186+
*/
187+
private Mono<Response<RecognizePiiEntitiesResultCollection>> getRecognizePiiEntitiesResponse(
188+
Iterable<TextDocumentInput> documents, TextAnalyticsRequestOptions options, Context context) {
189+
return service.entitiesRecognitionPiiWithResponseAsync(
190+
new MultiLanguageBatchInput().setDocuments(toMultiLanguageInput(documents)),
191+
options == null ? null : options.getModelVersion(),
192+
options == null ? null : options.isIncludeStatistics(),
193+
null,
194+
context.addData(AZ_TRACING_NAMESPACE_KEY, COGNITIVE_TRACING_NAMESPACE_VALUE))
195+
.doOnSubscribe(ignoredValue -> logger.info(
196+
"Start recognizing Personally Identifiable Information entities for a batch of documents."))
197+
.doOnSuccess(response -> logger.info(
198+
"Successfully recognized Personally Identifiable Information entities for a batch of documents."))
199+
.doOnError(error ->
200+
logger.warning("Failed to recognize Personally Identifiable Information entities - {}", error))
201+
.map(this::toRecognizePiiEntitiesResultCollectionResponse)
202+
.onErrorMap(throwable -> mapToHttpResponseExceptionIfExist(throwable));
203+
}
204+
}

0 commit comments

Comments
 (0)