You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api.rst
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,7 +83,7 @@ minLength ``1`` ``0`` When all pre-processing steps are done, tokens w
83
83
Fetching
84
84
--------
85
85
86
-
The fetching parameters are implemented in `PubFetcher <https://github.com/edamontology/pubfetcher>`_ and thus are described in its documentation: `Fetching parameters <https://pubfetcher.readthedocs.io/en/latest/cli.html#fetching>`_.
86
+
The fetching parameters are implemented in `PubFetcher <https://github.com/edamontology/pubfetcher>`_ and thus are described in its documentation: `Fetching parameters <https://pubfetcher.readthedocs.io/en/stable/cli.html#fetching>`_.
compoundWords ``1`` ``0`` Try to match words that have accidentally been made compound (given number is maximum number of words in an accidental compound minus one). Not done for tokens from `fulltext <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#fulltext>`_, `doc <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-docs>`_ and `webpage <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-webpages>`_. Set to 0 to disable (for a slight speed increase with only slight changes to the results).
111
+
compoundWords ``1`` ``0`` Try to match words that have accidentally been made compound (given number is maximum number of words in an accidental compound minus one). Not done for tokens from `fulltext <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#fulltext>`_, `doc <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-docs>`_ and `webpage <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-webpages>`_. Set to 0 to disable (for a slight speed increase with only slight changes to the results).
112
112
mismatchMultiplier ``2.0`` ``0.0`` Multiplier for score decrease caused by mismatch
113
113
matchMinimum ``1.0`` ``0.0`` ``1.0`` Minimum score allowed for approximate match. Not done for tokens from fulltext_, doc_ and webpage_. Set to ``1`` to disable approximate matching.
114
114
positionOffBy1 ``0.35`` ``0.0`` ``1.0`` Multiplier of a position score component for the case when a word is inserted between matched words or matched words are switched
@@ -164,14 +164,14 @@ Parameter Default Min Max Description
164
164
nameNormaliser ``0.81`` ``0.0`` ``1.0`` Score normaliser for matching a query name. Set to ``0`` to disable matching of names.
165
165
keywordNormaliser ``0.77`` ``0.0`` ``1.0`` Score normaliser for matching a query keyword. Set to ``0`` to disable matching of keywords.
166
166
descriptionNormaliser ``0.92`` ``0.0`` ``1.0`` Score normaliser for matching a query description. Set to ``0`` to disable matching of descriptions.
167
-
publicationTitleNormaliser ``0.91`` ``0.0`` ``1.0`` Score normaliser for matching a publication `title <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#title>`_. Set to ``0`` to disable matching of titles.
168
-
publicationKeywordNormaliser ``0.77`` ``0.0`` ``1.0`` Score normaliser for matching a publication `keyword <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#keywords>`_. Set to ``0`` to disable matching of keywords.
169
-
publicationMeshNormaliser ``0.75`` ``0.0`` ``1.0`` Score normaliser for matching a publication `MeSH term <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#mesh>`_. Set to ``0`` to disable matching of MeSH terms.
170
-
publicationMinedTermNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a publication mined term (`EFO <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#efo>`_, `GO <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#go>`_). Set to ``0`` to disable matching of mined terms.
171
-
publicationAbstractNormaliser ``0.985`` ``0.0`` ``1.0`` Score normaliser for matching a publication `abstract <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#theabstract>`_. Set to ``0`` to disable matching of abstracts.
172
-
publicationFulltextNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a publication `fulltext <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#fulltext>`_. Set to ``0`` to disable matching of fulltexts.
173
-
docNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a query `doc <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-docs>`_. Set to ``0`` to disable matching of docs.
174
-
webpageNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a query `webpage <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-webpages>`_. Set to ``0`` to disable matching of webpages.
167
+
publicationTitleNormaliser ``0.91`` ``0.0`` ``1.0`` Score normaliser for matching a publication `title <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#title>`_. Set to ``0`` to disable matching of titles.
168
+
publicationKeywordNormaliser ``0.77`` ``0.0`` ``1.0`` Score normaliser for matching a publication `keyword <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#keywords>`_. Set to ``0`` to disable matching of keywords.
169
+
publicationMeshNormaliser ``0.75`` ``0.0`` ``1.0`` Score normaliser for matching a publication `MeSH term <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#mesh>`_. Set to ``0`` to disable matching of MeSH terms.
170
+
publicationMinedTermNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a publication mined term (`EFO <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#efo>`_, `GO <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#go>`_). Set to ``0`` to disable matching of mined terms.
171
+
publicationAbstractNormaliser ``0.985`` ``0.0`` ``1.0`` Score normaliser for matching a publication `abstract <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#theabstract>`_. Set to ``0`` to disable matching of abstracts.
172
+
publicationFulltextNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a publication `fulltext <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#fulltext>`_. Set to ``0`` to disable matching of fulltexts.
173
+
docNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a query `doc <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-docs>`_. Set to ``0`` to disable matching of docs.
174
+
webpageNormaliser ``1.0`` ``0.0`` ``1.0`` Score normaliser for matching a query `webpage <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-webpages>`_. Set to ``0`` to disable matching of webpages.
Name of the used `database <https://pubfetcher.readthedocs.io/en/latest/output.html#database>`_ file
374
+
Name of the used `database <https://pubfetcher.readthedocs.io/en/stable/output.html#database>`_ file
375
375
idf
376
376
Name of the used :ref:`IDF <idf>` file
377
377
idfStemmed
@@ -406,11 +406,11 @@ The type_ ``"full"`` includes everything from core_, plus the following:
406
406
mapping
407
407
queryFetched
408
408
_`webpages`
409
-
Array of metadata objects corresponding to webpageUrls_ in query_. Webpages are implemented in PubFetcher_ and thus are described in its documentation: `Content of webpages <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-webpages>`_. The structure of webpages here will be the same as described in PubFetcher, except for `content <https://pubfetcher.readthedocs.io/en/latest/output.html#webpage-content>`_ which will be missing. The values of `startUrl <https://pubfetcher.readthedocs.io/en/latest/output.html#starturl>`_ of webpages will be the URLs given in webpageUrls_ in query_.
409
+
Array of metadata objects corresponding to webpageUrls_ in query_. Webpages are implemented in PubFetcher_ and thus are described in its documentation: `Content of webpages <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-webpages>`_. The structure of webpages here will be the same as described in PubFetcher, except for `content <https://pubfetcher.readthedocs.io/en/stable/output.html#webpage-content>`_ which will be missing. The values of `startUrl <https://pubfetcher.readthedocs.io/en/stable/output.html#starturl>`_ of webpages will be the URLs given in webpageUrls_ in query_.
410
410
_`docs`
411
411
Array of metadata objects corresponding to docUrls_ in query_. Structure of objects same as in webpages_.
412
412
_`publications`
413
-
Array of metadata objects corresponding to publicationIds_ in query_. Publications are implemented in PubFetcher_ and thus are described in its documentation: `Content of publications <https://pubfetcher.readthedocs.io/en/latest/output.html#content-of-publications>`_. The structure of publications here will be the same as described in PubFetcher, except for fulltext_ which will be missing.
413
+
Array of metadata objects corresponding to publicationIds_ in query_. Publications are implemented in PubFetcher_ and thus are described in its documentation: `Content of publications <https://pubfetcher.readthedocs.io/en/stable/output.html#content-of-publications>`_. The structure of publications here will be the same as described in PubFetcher, except for fulltext_ which will be missing.
414
414
results
415
415
topic/operation/data/format
416
416
Array of objects defined in topic_, i.e. the same content as in core_, plus the field parts_ defined below.
@@ -625,7 +625,7 @@ To supply the same data (except the "keywords") as `bio.tools input`_, the follo
625
625
Prefetching
626
626
***********
627
627
628
-
Once a query has been received by the API, content corresponding to webpageUrls_, docUrls_ and publicationIds_ has to be `fetched <https://pubfetcher.readthedocs.io/en/latest/fetcher.html>`_ (unless it has been fetched and stored in some previous occurrence), before mapping can take place.
628
+
Once a query has been received by the API, content corresponding to webpageUrls_, docUrls_ and publicationIds_ has to be `fetched <https://pubfetcher.readthedocs.io/en/stable/fetcher.html>`_ (unless it has been fetched and stored in some previous occurrence), before mapping can take place.
629
629
630
630
This content could be prefetched and prestored in the database_ as a separate step, before the mapping query is sent. This is useful in the web application, where content can be fetched as soon as the user has entered the corresponding query details, and thus mapping time could be less when the entire query form is finally submitted. It might be of less use in the API, but has been included nevertheless.
631
631
@@ -652,7 +652,7 @@ webpageUrls
652
652
id
653
653
A webpage URL specified in the request
654
654
status
655
-
The status of that webpage. One of "`broken <https://pubfetcher.readthedocs.io/en/latest/output.html#broken>`_", "`empty <https://pubfetcher.readthedocs.io/en/latest/output.html#webpage-empty>`_", "non-`usable <https://pubfetcher.readthedocs.io/en/latest/output.html#webpage-usable>`_", "non-`final <https://pubfetcher.readthedocs.io/en/latest/output.html#webpage-final>`_", "`final <https://pubfetcher.readthedocs.io/en/latest/output.html#webpage-final>`_".
655
+
The status of that webpage. One of "`broken <https://pubfetcher.readthedocs.io/en/stable/output.html#broken>`_", "`empty <https://pubfetcher.readthedocs.io/en/stable/output.html#webpage-empty>`_", "non-`usable <https://pubfetcher.readthedocs.io/en/stable/output.html#webpage-usable>`_", "non-`final <https://pubfetcher.readthedocs.io/en/stable/output.html#webpage-final>`_", "`final <https://pubfetcher.readthedocs.io/en/stable/output.html#webpage-final>`_".
656
656
657
657
/api/doc
658
658
========
@@ -689,12 +689,12 @@ publicationIds
689
689
doi
690
690
The DOI of the publication
691
691
status
692
-
The status of that publication. One of `"empty" <https://pubfetcher.readthedocs.io/en/latest/output.html#publication-empty>`_, "non-`usable" <https://pubfetcher.readthedocs.io/en/latest/output.html#publication-usable>`_, "non-`final" <https://pubfetcher.readthedocs.io/en/latest/output.html#publication-final>`_, `"final" <https://pubfetcher.readthedocs.io/en/latest/output.html#publication-final>`_, `"totally final" <https://pubfetcher.readthedocs.io/en/latest/output.html#totallyfinal>`_.
692
+
The status of that publication. One of `"empty" <https://pubfetcher.readthedocs.io/en/stable/output.html#publication-empty>`_, "non-`usable" <https://pubfetcher.readthedocs.io/en/stable/output.html#publication-usable>`_, "non-`final" <https://pubfetcher.readthedocs.io/en/stable/output.html#publication-final>`_, `"final" <https://pubfetcher.readthedocs.io/en/stable/output.html#publication-final>`_, `"totally final" <https://pubfetcher.readthedocs.io/en/stable/output.html#totallyfinal>`_.
693
693
694
694
Example
695
695
=======
696
696
697
-
Try to prefetch the publication with PMID "23479348" and PMCID "PMC3654706", increasing connect and read `timeout <https://pubfetcher.readthedocs.io/en/latest/cli.html#timeout>`_ to give the server more time to fetch the whole publication:
697
+
Try to prefetch the publication with PMID "23479348" and PMCID "PMC3654706", increasing connect and read `timeout <https://pubfetcher.readthedocs.io/en/stable/cli.html#timeout>`_ to give the server more time to fetch the whole publication:
Copy file name to clipboardExpand all lines: docs/future.rst
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Algorithm
23
23
*********
24
24
25
25
* Currently, scores are not totally comparable across queries. Try to make a score in one query mean the same thing in another query as exactly as possible.
26
-
* An extra query part could be tags present in some web pages, like software registries or code repositories. This would require `changes in PubFetcher <https://pubfetcher.readthedocs.io/en/latest/future.html#structure-changes>`_.
26
+
* An extra query part could be tags present in some web pages, like software registries or code repositories. This would require `changes in PubFetcher <https://pubfetcher.readthedocs.io/en/stable/future.html#structure-changes>`_.
27
27
* Maybe `WordNet <https://wordnet.princeton.edu/>`_ could be used as part of the mapping algorithm. For example use lemmatisation instead of stemming.
28
28
* In results got from running EDAMmap against existing entries of bio.tools, look at FNs and see if anything can be done to increase their score.
29
29
@@ -76,7 +76,8 @@ Server
76
76
Maintenance
77
77
***********
78
78
79
-
* Update PubFetcher's `scraping rules <https://pubfetcher.readthedocs.io/en/latest/scraping.html#scraping-rules>`_, by `testing the rules <https://pubfetcher.readthedocs.io/en/latest/scraping.html#testing-of-rules>`_ and modifying outdated rules in `journals.yaml <https://github.com/edamontology/pubfetcher/blob/master/core/src/main/resources/scrape/journals.yaml>`_, `webpages.yaml <https://github.com/edamontology/pubfetcher/blob/master/core/src/main/resources/scrape/webpages.yaml>`_ and most importantly the hardcoded rules for `Europe PMC <https://europepmc.org/>`_ and other built-in `resources <https://pubfetcher.readthedocs.io/en/latest/fetcher.html#resources>`_.
79
+
* Update PubFetcher's `scraping rules <https://pubfetcher.readthedocs.io/en/stable/scraping.html#scraping-rules>`_, by `testing the rules <https://pubfetcher.readthedocs.io/en/stable/scraping.html#testing-of-rules>`_ and modifying outdated rules in `journals.yaml <https://github.com/edamontology/pubfetcher/blob/master/core/src/main/resources/scrape/journals.yaml>`_, `webpages.yaml <https://github.com/edamontology/pubfetcher/blob/master/core/src/main/resources/scrape/webpages.yaml>`_ and most importantly the hardcoded rules for `Europe PMC <https://europepmc.org/>`_ and other built-in `resources <https://pubfetcher.readthedocs.io/en/stable/fetcher.html#resources>`_.
80
80
* Update dependencies in `pom.xml <https://github.com/edamontology/edammap/blob/master/pom.xml>`_ (but care should be taken to not cause regressions).
81
+
* Check for broken links in the documentation using ``make linkcheck``.
81
82
* When a new `biotoolsSchema <https://github.com/bio-tools/biotoolsSchema>`_ is released, some code modifications might be necessary to adhere to it.
82
83
* Also, when a new `EDAM ontology <https://github.com/edamontology/edamontology>`_ is released, some modifications might be necessary (for example in `blacklist.txt <https://github.com/edamontology/edammap/blob/master/core/src/main/resources/edam/blacklist.txt>`_ and `blacklist_synonyms.txt <https://github.com/edamontology/edammap/blob/master/core/src/main/resources/edam/blacklist_synonyms.txt>`_; also, any running :ref:`server` instances could be restarted to use the new ontology version).
0 commit comments