You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+57-13
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,6 @@
1
-
**We recommend to download the latest tested version form the [release section](https://github.com/Georgetown-IR-Lab/QuickUMLS/releases)**.
1
+
**We recommend to download the latest tested version from the [releases section](https://github.com/Georgetown-IR-Lab/QuickUMLS/releases)**.
2
+
3
+
**NEW: v.1.2 now includes client/server support!** Start a QuickUMLS server once, avoid loading QuickUMLS each time your experiments run! See <ahref="#client_server">below</a> for more info.
2
4
3
5
# QuickUMLS
4
6
@@ -13,43 +15,85 @@ This project should be compatible with both Python 2 and 3 and run on any UNIX s
13
15
#### Before Starting
14
16
15
17
1. Make sure that your Python installation include C headers (e.g., on Ubuntu, make sure `python3-dev` or `python-dev` are installed).
16
-
2. This software requires all packages listed in the requirements.txt file. You can install all of them by running `pip install -r requirements.txt`.
18
+
2. This software requires all packages listed in the `requirements.txt` file. You can install all of them by running `pip install -r requirements.txt`.
17
19
3. Note that, in order to use `spacy`, you are required to download its corpus. You can do that by running `python -m spacy.en.download`.
18
20
4. This system requires you to have a valid UMLS installation on disk. The installation can be remove once the system has been initialized.
19
21
20
-
#### To get the System Running
22
+
#### How To get the System Initialized
21
23
22
24
1. Download and compile Simstring by running `bash setup_simstring.sh <python_version>`, where `<python_version>` is either "`2`" or "`3`".
23
-
2. Initialize the system by running `python install.py <umls_installation_path> <destination_path>`, where `<umls_installation_path>` is where the installation files are (in particular, we need `MRCONSO.RRF` and `MRSTY.RRF`) and `<destination_path>` is the directory where the QuickUmls data files should be installed. This process will take between 5 and 30 minutes depending how fast is the drive where UMLS and QuickUMLS files are stored.
25
+
2. Initialize the system by running `python install.py <umls_installation_path> <destination_path>`, where `<umls_installation_path>` is where the installation files are (in particular, we need `MRCONSO.RRF` and `MRSTY.RRF`) and `<destination_path>` is the directory where the QuickUmls data files should be installed. This process will take between 5 and 30 minutes depending how fast the CPU and the drive where UMLS and QuickUMLS files are stored are (on a system with a Intel i7 6700K CPU and a 7200RPM hard drive, initialization takes 8.5 minutes).
26
+
27
+
`install.py` supports the following optional arguments:
28
+
-`-L` / `--lowercase`: if used, all concept terms are folded to lowercase before being processed. This option typically increases recall, but it might reduce precision;
29
+
-`-U` / `--normalize-unicode`: if used, expressions with non-ASCII characters are converted to the closest combination of ASCII characters.
30
+
-`-E` / `--language`: Specify the language to consider for UMLS concepts; by default, English is used. For a complete list of languages, please see [this table provided by NLM](https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/abbreviations.html#LAT).
24
31
25
32
## APIs
26
33
27
34
A QuickUMLS object can be instantiated as follows:
-`quickumls_fp` is the directory where the QuickUMLS data files are installed.
37
-
-`overlapping_criteria` (default: "score") is the criteria used to deal with overlapping concepts; choose "score" if the matching score of the concepts should be consider first, "length" if the longest should be considered first instead.
38
-
-`threshold` (default: 0.7) is the minimum similarity value between strings.
39
-
-`similarity_name` (default: "jaccard") is the name of similarity to use. Choose between "dice", "jaccard", "cosine", or "overlap".
40
-
-`window` (default: 5) is the maximum number of tokens to consider for matching.s
41
-
-`accepted_semtypes` (default: see `constants.py`) is the set of UMLS semantic types concepts should belong to. Semantic types are identified by the letter "T" followed by three numbers (e.g., "T131", which identifies the type *"Hazardous or Poisonous Substance"*). See [here](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2013AA.txt) for the full list.
44
+
-`overlapping_criteria` (optional, default: "score") is the criteria used to deal with overlapping concepts; choose "score" if the matching score of the concepts should be consider first, "length" if the longest should be considered first instead.
45
+
-`threshold` (optional, default: 0.7) is the minimum similarity value between strings.
46
+
-`similarity_name` (optional, default: "jaccard") is the name of similarity to use. Choose between "dice", "jaccard", "cosine", or "overlap".
47
+
-`window` (optional, default: 5) is the maximum number of tokens to consider for matching.
48
+
-`accepted_semtypes` (optional, default: see `constants.py`) is the set of UMLS semantic types concepts should belong to. Semantic types are identified by the letter "T" followed by three numbers (e.g., "T131", which identifies the type *"Hazardous or Poisonous Substance"*). See [here](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2013AA.txt) for the full list.
42
49
43
50
To use the matcher, simply call
44
51
45
52
```python
46
-
>>>text ="The ulna has dislocated posteriorly from the trochlea of the humerus."
Set `best_match` to `False` if you want to return overlapping candidates, `ignore_syntax` to `True` to disable all heuristics introduced in (Soldaini and Goharian, 2016).
51
58
52
59
60
+
<h2id="client_server">[NEW] Server / Client Support</h2>
61
+
62
+
Starting with v.1.2, QuickUMLS includes a support for being used in a client-server configuration. That is, you can start one QuickUMLS server, and query it from multiple scripts using a client.
Host and port are optional; by default, QuickUMLS runs on `localhost:4645`. You can also pass any QuickUMLS option mentioned above to the server. To obtain a list of options for the server, run `python server.py -h`.
71
+
72
+
To load the client, import `get_quickumls_client` from `client.py`:
73
+
74
+
```bash
75
+
from client import get_quickumls_client
76
+
matcher = get_quickumls_client()
77
+
text = "The ulna has dislocated posteriorly from the trochlea of the humerus."
0 commit comments