Skip to content

Commit 2fa8173

Browse files
leo-ganefriis
andauthored
docs[patch]: microsoft platform page update (#14476)
Added `presidio` and `OneNote` references to `microsoft.mdx`; added link and description to the `presidio` notebook --------- Co-authored-by: Erick Friis <erickfriis@gmail.com>
1 parent 84a57f5 commit 2fa8173

2 files changed

Lines changed: 38 additions & 1 deletion

File tree

docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
"\n",
99
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb)\n",
1010
"\n",
11+
">[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.\n",
12+
"\n",
1113
"## Use case\n",
1214
"\n",
1315
"Data anonymization is crucial before passing information to a language model like GPT-4 because it helps protect privacy and maintain confidentiality. If data is not anonymized, sensitive information such as names, addresses, contact numbers, or other identifiers linked to specific individuals could potentially be learned and misused. Hence, by obscuring or removing this personally identifiable information (PII), data can be used freely without compromising individuals' privacy rights or breaching data protection laws and regulations.\n",
@@ -530,7 +532,7 @@
530532
"name": "python",
531533
"nbconvert_exporter": "python",
532534
"pygments_lexer": "ipython3",
533-
"version": "3.11.4"
535+
"version": "3.10.12"
534536
}
535537
},
536538
"nbformat": 4,

docs/docs/integrations/platforms/microsoft.mdx

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,20 @@ See a [usage example](/docs/integrations/document_loaders/microsoft_powerpoint).
151151
from langchain.document_loaders import UnstructuredPowerPointLoader
152152
```
153153

154+
### Microsoft OneNote
155+
156+
First, let's install dependencies:
157+
158+
```bash
159+
pip install bs4 msal
160+
```
161+
162+
See a [usage example](/docs/integrations/document_loaders/onenote).
163+
164+
```python
165+
from langchain.document_loaders.onenote import OneNoteLoader
166+
```
167+
154168

155169
## Vector stores
156170

@@ -259,4 +273,25 @@ from langchain.agents.agent_toolkits import PowerBIToolkit
259273
from langchain.utilities.powerbi import PowerBIDataset
260274
```
261275

276+
## More
277+
278+
### Microsoft Presidio
279+
280+
>[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’)
281+
> helps to ensure sensitive data is properly managed and governed. It provides fast identification and
282+
> anonymization modules for private entities in text and images such as credit card numbers, names,
283+
> locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.
284+
285+
First, you need to install several python packages and download a `SpaCy` model.
286+
287+
```bash
288+
pip install langchain-experimental openai presidio-analyzer presidio-anonymizer spacy Faker
289+
python -m spacy download en_core_web_lg
290+
```
291+
292+
See [usage examples](/docs/guides/privacy/presidio_data_anonymization/).
293+
294+
```python
295+
from langchain_experimental.data_anonymizer import PresidioAnonymizer, PresidioReversibleAnonymizer
296+
```
262297

0 commit comments

Comments
 (0)