Skip to content

Commit 6131236

Browse files
committed
Documentation syntax clean up
1 parent 451f220 commit 6131236

2 files changed

Lines changed: 197 additions & 5 deletions

File tree

docs/docs/api/lexicon_collection.md

Lines changed: 194 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -312,9 +312,12 @@ combination of the lemma and pos and the value are the semantic tags.
312312
The lemma and pos are combined as follows: `{lemma}|{pos}`, e.g.
313313
`Car|Noun`
314314

315-
If the pos value is None then then only the lemma is used: `{lemma}`,
315+
If the pos value is None then only the lemma is used: `{lemma}`,
316316
e.g. `Car`
317317

318+
**Note** If the key already exists then the most recent entry will
319+
overwrite the existing entry.
320+
318321
<h4 id="add_lexicon_entry.parameters">Parameters<a className="headerlink" href="#add_lexicon_entry.parameters" title="Permanent link">&para;</a></h4>
319322

320323

@@ -466,6 +469,132 @@ welsh_lexicon_collection = LexiconCollection(welsh_lexicon_dict)
466469
assert welsh_lexicon_dict['ceir'][0] == 'M3fn'
467470
```
468471

472+
<a id="pymusas.lexicon_collection.LexiconCollection.merge"></a>
473+
474+
### merge
475+
476+
```python
477+
class LexiconCollection(MutableMapping):
478+
| ...
479+
| @staticmethod
480+
| def merge(
481+
| *lexicon_collections: "LexiconCollection"
482+
| ) -> "LexiconCollection"
483+
```
484+
485+
Given more than one lexicon collection it will create a single lexicon
486+
collection whereby the lexicon data from each will be combined.
487+
488+
**Note** the data is loaded in list order therefore the last lexicon
489+
collection will take precedence, i.e. if the last contains `London`: [`Z3`]
490+
and the first contains `London`: [`Z2`] then the returned
491+
LexiconCollection will only contain the one entry; `London`: [`Z3`].
492+
493+
**Note** if the lexicon collections contain POS information we assume
494+
that all of the lexicon collections use the same POS tagset,
495+
if they do not this could cause issues during tag time.
496+
497+
<h4 id="merge.parameters">Parameters<a className="headerlink" href="#merge.parameters" title="Permanent link">&para;</a></h4>
498+
499+
500+
- __*lexicon\_collections__ : `LexiconCollection` <br/>
501+
More than one lexicon collections that are to be merged.
502+
503+
<h4 id="merge.returns">Returns<a className="headerlink" href="#merge.returns" title="Permanent link">&para;</a></h4>
504+
505+
506+
- [`LexiconCollection`](#lexiconcollection) <br/>
507+
508+
<h4 id="merge.examples">Examples<a className="headerlink" href="#merge.examples" title="Permanent link">&para;</a></h4>
509+
510+
511+
``` python
512+
from pymusas.lexicon_collection import LexiconCollection
513+
welsh_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/Welsh/semantic_lexicon_cy.tsv"
514+
english_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/English/semantic_lexicon_en.tsv"
515+
welsh_lexicon_data = LexiconCollection.from_tsv(welsh_lexicon_url, include_pos=True)
516+
welsh_lexicon = LexiconCollection(welsh_lexicon_data)
517+
english_lexicon_data = LexiconCollection.from_tsv(english_lexicon_url, include_pos=True)
518+
english_lexicon = LexiconCollection(english_lexicon_data)
519+
combined_lexicon_collection = LexiconCollection.merge(welsh_lexicon, english_lexicon)
520+
assert isinstance(combined_lexicon_collection, LexiconCollection)
521+
assert combined_lexicon_collection["Aber-lash|pnoun"] == ["Z2"]
522+
assert combined_lexicon_collection["Aqua|PROPN"] == ["Z3c"]
523+
```
524+
525+
<a id="pymusas.lexicon_collection.LexiconCollection.tsv_merge"></a>
526+
527+
### tsv\_merge
528+
529+
```python
530+
class LexiconCollection(MutableMapping):
531+
| ...
532+
| @staticmethod
533+
| def tsv_merge(
534+
| *tsv_file_paths: PathLike,
535+
| *,
536+
| include_pos: bool = True
537+
| ) -> dict[str, list[str]]
538+
```
539+
540+
Given one or more TSV files it will create a single dictionary object
541+
with the combination of all the lexicon data in each TSV, this dictionary
542+
object can then be used to create a [`LexiconCollection`](#lexiconcollection).
543+
544+
For more information on how the TSV data is loaded see [`from_tsv`](#from_tsv).
545+
546+
**Note** the data is loaded in list order therefore the last TSV file
547+
will take precedence, i.e. if the last TSV file contains `London`: [`Z3`]
548+
and the first TSV file contains `London`: [`Z2`] then the returned
549+
dictionary will only contain the one entry; `London`: [`Z3`].
550+
551+
**Note** if the TSV files contain POS information we assume that all
552+
of the TSV files use the same POS tagset, if they do not this could
553+
cause issues during tag time.
554+
555+
<h4 id="tsv_merge.parameters">Parameters<a className="headerlink" href="#tsv_merge.parameters" title="Permanent link">&para;</a></h4>
556+
557+
558+
- __*tsv\_file\_paths__ : `PathLike` <br/>
559+
File paths and/or URLs to a TSV file that contains at least two
560+
fields, with an optional third, with the following headings:
561+
562+
1. `lemma`,
563+
2. `semantic_tags`
564+
3. `pos` (Optional)
565+
566+
All other fields will be ignored.
567+
- __include\_pos__ : `bool`, optional (default = `True`) <br/>
568+
Whether to include the POS information, if the information is available,
569+
or not. See [`add_lexicon_entry`](#add_lexicon_entry) for more information on this
570+
parameter.
571+
572+
<h4 id="tsv_merge.returns">Returns<a className="headerlink" href="#tsv_merge.returns" title="Permanent link">&para;</a></h4>
573+
574+
575+
- `dict[str, list[str]]` <br/>
576+
577+
<h4 id="tsv_merge.raises">Raises<a className="headerlink" href="#tsv_merge.raises" title="Permanent link">&para;</a></h4>
578+
579+
580+
- `ValueError` <br/>
581+
If the minimum field headings, `lemma` and `semantic_tags`, do not
582+
exist in the given TSV files.
583+
584+
<h4 id="tsv_merge.examples">Examples<a className="headerlink" href="#tsv_merge.examples" title="Permanent link">&para;</a></h4>
585+
586+
587+
``` python
588+
from pymusas.lexicon_collection import LexiconCollection
589+
welsh_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/Welsh/semantic_lexicon_cy.tsv"
590+
english_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/English/semantic_lexicon_en.tsv"
591+
tsv_urls = [welsh_lexicon_url, english_lexicon_url]
592+
combined_lexicon_collection = LexiconCollection.tsv_merge(*tsv_urls, include_pos=True)
593+
assert isinstance(combined_lexicon_collection, dict)
594+
assert combined_lexicon_collection["Aber-lash|pnoun"] == ["Z2"]
595+
assert combined_lexicon_collection["Aqua|PROPN"] == ["Z3c"]
596+
```
597+
469598
<a id="pymusas.lexicon_collection.LexiconCollection.__str__"></a>
470599

471600
### \_\_str\_\_
@@ -565,7 +694,7 @@ this.
565694
If not `None`, maps from the lexicon's POS tagset to the desired
566695
POS tagset, whereby the mapping is a `List` of tags, at the moment there
567696
is no preference order in this list of POS tags. The POS mapping is
568-
useful in situtation whereby the leixcon's POS tagset is different to
697+
useful in situations whereby the lexicon's POS tagset is different to
569698
the token's. **Note** that the longer the `List[str]` for each POS
570699
mapping the longer it will take to match MWE templates. A one to one
571700
mapping will have no speed impact on the tagger. A selection of POS
@@ -825,6 +954,69 @@ assert mwe_lexicon_dict['abaixo_adv de_prep'][0] == 'M6'
825954
assert mwe_lexicon_dict['arco_noun e_conj flecha_noun'][0] == 'K5.1'
826955
```
827956

957+
<a id="pymusas.lexicon_collection.MWELexiconCollection.tsv_merge"></a>
958+
959+
### tsv\_merge
960+
961+
```python
962+
class MWELexiconCollection(MutableMapping):
963+
| ...
964+
| @staticmethod
965+
| def tsv_merge(*tsv_file_paths: PathLike) -> dict[str, list[str]]
966+
```
967+
968+
Given one or more TSV files it will create a dictionary
969+
object that can be used to create a [`MWELexiconCollection`](#mwelexiconcollection) whereby
970+
this dictionary is the combination of all of the lexicon information
971+
in the TSV files.
972+
973+
**Note** the data is loaded in list order therefore the last TSV file
974+
will take precedence, i.e. if the last TSV file contains
975+
`London_* city_*`: [`Z3`] and the first TSV file contains
976+
`London_* city_*`: [`Z2`] then the returned dictionary will only
977+
contain the one entry; `London_* city_*`: [`Z3`].
978+
979+
**Note** if the POS tagset used in the TSV files are different this
980+
could cause issues during tag time.
981+
982+
<h4 id="tsv_merge.parameters">Parameters<a className="headerlink" href="#tsv_merge.parameters" title="Permanent link">&para;</a></h4>
983+
984+
985+
- __*tsv\_file\_paths__ : `Union[PathLike, str]` <br/>
986+
File paths or URLs to a TSV file that contains at least these two
987+
fields:
988+
989+
1. `mwe_template`,
990+
2. `semantic_tags`
991+
992+
All other fields will be ignored.
993+
994+
<h4 id="tsv_merge.returns">Returns<a className="headerlink" href="#tsv_merge.returns" title="Permanent link">&para;</a></h4>
995+
996+
997+
- `dict[str, list[str]]` <br/>
998+
999+
<h4 id="tsv_merge.raises">Raises<a className="headerlink" href="#tsv_merge.raises" title="Permanent link">&para;</a></h4>
1000+
1001+
1002+
- `ValueError` <br/>
1003+
If the minimum field headings, `mwe_template` and `semantic_tags`,
1004+
do not exist in the given TSV file.
1005+
1006+
<h4 id="tsv_merge.examples">Examples<a className="headerlink" href="#tsv_merge.examples" title="Permanent link">&para;</a></h4>
1007+
1008+
1009+
``` python
1010+
from pymusas.lexicon_collection import LexiconCollection
1011+
welsh_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/Welsh/mwe-welsh.tsv"
1012+
english_lexicon_url = "https://raw.githubusercontent.com/UCREL/Multilingual-USAS/refs/heads/master/English/mwe-en.tsv"
1013+
tsv_urls = [welsh_lexicon_url, english_lexicon_url]
1014+
combined_lexicon_data = MWELexiconCollection.tsv_merge(*tsv_urls)
1015+
assert isinstance(combined_lexicon_data, dict)
1016+
assert combined_lexicon_data["Academy_NOUN Award_NOUN"] == ["A5.1+/K1"]
1017+
assert combined_lexicon_data["Ffwrnais_* Dyfi_*"] == ["Z2"]
1018+
```
1019+
8281020
<a id="pymusas.lexicon_collection.MWELexiconCollection.escape_mwe"></a>
8291021

8301022
### escape\_mwe

pymusas/lexicon_collection.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -380,12 +380,12 @@ def merge(*lexicon_collections: "LexiconCollection") -> "LexiconCollection":
380380
381381
# Parameters
382382
383-
*lexicon_collections: class:`LexiconCollection`
383+
*lexicon_collections: `LexiconCollection`
384384
More than one lexicon collections that are to be merged.
385385
386386
# Returns
387387
388-
class:`LexiconCollection`
388+
:class:`LexiconCollection`
389389
390390
# Examples
391391
@@ -954,7 +954,7 @@ def tsv_merge(*tsv_file_paths: PathLike) -> dict[str, list[str]]:
954954
955955
# Returns
956956
957-
dict[str, list[str]]
957+
`dict[str, list[str]]`
958958
959959
# Raises
960960

0 commit comments

Comments
 (0)