-
-
Notifications
You must be signed in to change notification settings - Fork 7
Add converter with configurable pre-processing #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 17 commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
13ce8b8
Create wrapped.py
cthoyt 7ad4258
Update wrapped.py
cthoyt 9a914e2
Adding tests
cthoyt c7e3667
Update test_wrapped.py
cthoyt 8e90eb6
Cleanup
cthoyt e5fd9b3
Add more tests
cthoyt 30eb979
Cleamup
cthoyt 3a28d9e
Merge branch 'main' into add-wrapped-converter
cthoyt e4cc483
Update wrapped.py
cthoyt e12f191
Update wrapped.py
cthoyt e58cc63
Update wrapped.py
cthoyt 957ca5f
Rename and start docs
cthoyt be74a4e
Update
cthoyt 378c858
Cleanup
cthoyt 0c3f198
Rename to context
cthoyt 1754e65
Update preprocessing.py
cthoyt b1903d4
Update preprocessing.py
cthoyt d770635
Rename components
cthoyt 6e8c9bc
Update tutorial
cthoyt 75ad66d
Add configurable blocklist action
cthoyt 50066c5
Update docs
cthoyt ef1aa49
Implement URI parsing
cthoyt b53bda9
Add more tests and typing
cthoyt 1d41e6b
Update preprocessing.py
cthoyt 431fc08
Add URI example
cthoyt 0cea138
Update preprocessing.py
cthoyt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,5 +2,4 @@ API Reference | |
| ============= | ||
|
|
||
| .. automodapi:: curies | ||
| :no-inheritance-diagram: | ||
| :no-heading: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| Converter with Preprocessing | ||
| ============================ | ||
|
|
||
| When simple expansion and contraction aren't enough, and you want to inject global or | ||
| context-specific rewrite rules, you can wrap a :class:`curies.Converter` and | ||
| preprocessing rules encoded in an instance of :class:`curies.PreprocessingRules` inside | ||
| a :class:`curies.PreprocessingConverter`. | ||
|
|
||
| For example, you always want to fix legacy references to the ``OBO_REL`` namespace: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| import curies | ||
| from curies import PreprocessingRules, PreprocessingConverter | ||
| from curies.wrapped import Rewrites | ||
|
|
||
| rules = PreprocessingRules( | ||
| rewrites=Rewrites( | ||
| full={"OBO_REL:is_a": "rdfs:subClassOf"} | ||
| ) | ||
| ) | ||
|
|
||
| converter = curies.get_obo_converter() | ||
| converter = PreprocessingConverter.from_converter( | ||
| converter, rules=rules | ||
| ) | ||
|
|
||
| >>> converter.parse_curie("OBO_REL:is_a") | ||
| ReferenceTuple('rdfs', 'subClassOf') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,276 @@ | ||
| """Reusable configuration.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from pathlib import Path | ||
| from typing import Any, Literal, TypeVar, overload | ||
|
|
||
| from pydantic import BaseModel, Field | ||
| from typing_extensions import Self | ||
|
|
||
| from .api import Converter, Reference, ReferenceTuple | ||
|
|
||
| __all__ = [ | ||
| "BlacklistError", | ||
| "PreprocessingBlacklist", | ||
| "PreprocessingConverter", | ||
| "PreprocessingRewrites", | ||
| "PreprocessingRules", | ||
| ] | ||
|
|
||
| X = TypeVar("X", bound=Reference) | ||
|
|
||
|
|
||
| class PreprocessingBlacklist(BaseModel): | ||
| """A model for prefix and full blacklists.""" | ||
|
|
||
| full: list[str] = Field(default_factory=list) | ||
| resource_full: dict[str, list[str]] = Field(default_factory=dict) | ||
| prefix: list[str] = Field(default_factory=list) | ||
| resource_prefix: dict[str, list[str]] = Field(default_factory=dict) | ||
| suffix: list[str] = Field(default_factory=list) | ||
|
|
||
| def _sort(self) -> None: | ||
| self.full.sort() | ||
| self.prefix.sort() | ||
| self.suffix.sort() | ||
| for v in self.resource_full.values(): | ||
| v.sort() | ||
| for v in self.resource_prefix.values(): | ||
| v.sort() | ||
|
|
||
| def str_has_blacklisted_prefix( | ||
| self, str_or_curie_or_uri: str, *, context: str | None = None | ||
| ) -> bool: | ||
| """Check if the CURIE string has a blacklisted prefix.""" | ||
| if context: | ||
| prefixes: list[str] = self.resource_prefix.get(context, []) | ||
| if prefixes and any(str_or_curie_or_uri.startswith(prefix) for prefix in prefixes): | ||
| return True | ||
| return any(str_or_curie_or_uri.startswith(prefix) for prefix in self.prefix) | ||
|
|
||
| def str_has_blacklisted_suffix(self, str_or_curie_or_uri: str) -> bool: | ||
| """Check if the CURIE string has a blacklisted suffix.""" | ||
| return any(str_or_curie_or_uri.endswith(suffix) for suffix in self.suffix) | ||
|
|
||
| def str_is_blacklisted_full( | ||
| self, str_or_curie_or_uri: str, *, context: str | None = None | ||
| ) -> bool: | ||
| """Check if the full CURIE string is blacklisted.""" | ||
| if context and str_or_curie_or_uri in self.resource_full.get(context, set()): | ||
| return True | ||
| return str_or_curie_or_uri in self.full | ||
|
|
||
| def str_is_blacklisted(self, str_or_curie_or_uri: str, *, context: str | None = None) -> bool: | ||
| """Check if the full CURIE string is blacklisted.""" | ||
| return ( | ||
| self.str_has_blacklisted_prefix(str_or_curie_or_uri, context=context) | ||
| or self.str_has_blacklisted_suffix(str_or_curie_or_uri) | ||
| or self.str_is_blacklisted_full(str_or_curie_or_uri, context=context) | ||
| ) | ||
|
|
||
|
|
||
| class PreprocessingRewrites(BaseModel): | ||
| """A model for prefix and full rewrites.""" | ||
|
|
||
| full: dict[str, str] = Field( | ||
| default_factory=dict, description="Global remappings for an entire string" | ||
| ) | ||
| resource_full: dict[str, dict[str, str]] = Field( | ||
| default_factory=dict, description="Resource-keyed remappings for an entire string" | ||
| ) | ||
| prefix: dict[str, str] = Field( | ||
| default_factory=dict, description="Global remappings of just the prefix" | ||
| ) | ||
| resource_prefix: dict[str, dict[str, str]] = Field( | ||
| default_factory=dict, description="Resource-keyed remappings for just a prefix" | ||
| ) | ||
|
|
||
| def remap_full( | ||
| self, | ||
| str_or_curie_or_uri: str, | ||
| reference_cls: type[X], | ||
| *, | ||
| context: str | None = None, | ||
| ) -> X | None: | ||
| """Remap the string if possible otherwise return it.""" | ||
| if context: | ||
| resource_rewrites: dict[str, str] = self.resource_full.get(context, {}) | ||
| if resource_rewrites and str_or_curie_or_uri in resource_rewrites: | ||
| return reference_cls.from_curie(resource_rewrites[str_or_curie_or_uri]) | ||
|
|
||
| if str_or_curie_or_uri in self.full: | ||
| return reference_cls.from_curie(self.full[str_or_curie_or_uri]) | ||
|
|
||
| return None | ||
|
|
||
| def remap_prefix(self, str_or_curie_or_uri: str, *, context: str | None = None) -> str: | ||
| """Remap a prefix.""" | ||
| if context is not None: | ||
| for old_prefix, new_prefix in self.resource_prefix.get(context, {}).items(): | ||
| if str_or_curie_or_uri.startswith(old_prefix): | ||
| return new_prefix + str_or_curie_or_uri[len(old_prefix) :] | ||
| for old_prefix, new_prefix in self.prefix.items(): | ||
| if str_or_curie_or_uri.startswith(old_prefix): | ||
| return new_prefix + str_or_curie_or_uri[len(old_prefix) :] | ||
| return str_or_curie_or_uri | ||
|
|
||
|
|
||
| class PreprocessingRules(BaseModel): | ||
| """A model for blacklists and rewrites.""" | ||
|
|
||
| blacklists: PreprocessingBlacklist | ||
| rewrites: PreprocessingRewrites | ||
|
|
||
| @classmethod | ||
| def lint_file(cls, path: str | Path) -> None: | ||
| """Lint a file, in place, given a file path.""" | ||
| path = Path(path).expanduser().resolve() | ||
| rules = cls.model_validate_json(path.read_text()) | ||
| rules.blacklists._sort() | ||
| path.write_text( | ||
| json.dumps( | ||
| rules.model_dump(exclude_unset=True, exclude_defaults=True), | ||
| sort_keys=True, | ||
| indent=2, | ||
| ) | ||
| ) | ||
|
|
||
| def str_is_blacklisted(self, str_or_curie_or_uri: str, *, context: str | None = None) -> bool: | ||
| """Check if the CURIE string is blacklisted.""" | ||
| return self.blacklists.str_is_blacklisted(str_or_curie_or_uri, context=context) | ||
|
|
||
| def remap_full( | ||
| self, | ||
| str_or_curie_or_uri: str, | ||
| reference_cls: type[X], | ||
| *, | ||
| context: str | None = None, | ||
| ) -> X | None: | ||
| """Remap the string if possible otherwise return it.""" | ||
| return self.rewrites.remap_full( | ||
| str_or_curie_or_uri, reference_cls=reference_cls, context=context | ||
| ) | ||
|
|
||
| def remap_prefix(self, str_or_curie_or_uri: str, *, context: str | None = None) -> str: | ||
| """Remap a prefix.""" | ||
| return self.rewrites.remap_prefix(str_or_curie_or_uri, context=context) | ||
|
|
||
|
|
||
| def _load_rules(rules: str | Path | PreprocessingRules) -> PreprocessingRules: | ||
| if isinstance(rules, (str, Path)): | ||
| rules = Path(rules).expanduser().resolve() | ||
| rules = PreprocessingRules.model_validate_json(rules.read_text()) | ||
| return rules | ||
|
|
||
|
|
||
| class BlacklistError(ValueError): | ||
| """An error for blacklist.""" | ||
|
|
||
|
|
||
| class PreprocessingConverter(Converter): | ||
| """A converter with pre-processing rules.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| *args: Any, | ||
| rules: PreprocessingRules | str | Path, | ||
| reference_cls: type[X] | None = None, | ||
| **kwargs: Any, | ||
| ) -> None: | ||
| """Instantiate a converter with a ruleset for pre-processing. | ||
|
|
||
| :param args: Positional arguments passed to :func:`Converter.__init__` | ||
| :param rules: A set of rules | ||
| :param reference_cls: The reference class to use. Defaults to | ||
| :class:`curies.Reference`. | ||
| :param kwargs: Keyword arguments passed to :func:`Converter.__init__` | ||
| """ | ||
| super().__init__(*args, **kwargs) | ||
| self.rules = _load_rules(rules) | ||
| self._reference_cls = Reference if reference_cls is None else reference_cls | ||
|
|
||
| @classmethod | ||
| def from_converter(cls, converter: Converter, rules: PreprocessingRules | str | Path) -> Self: | ||
| """Wrap a converter with a ruleset.""" | ||
| return cls(records=converter.records, rules=rules) | ||
|
|
||
| # docstr-coverage:excused `overload` | ||
| @overload | ||
| def parse( | ||
| self, | ||
| str_or_uri_or_curie: str, | ||
| *, | ||
| strict: Literal[True] = True, | ||
| context: str | None = ..., | ||
| ) -> ReferenceTuple: ... | ||
|
|
||
| # docstr-coverage:excused `overload` | ||
| @overload | ||
| def parse( | ||
| self, | ||
| str_or_uri_or_curie: str, | ||
| *, | ||
| strict: Literal[False] = False, | ||
| context: str | None = ..., | ||
| ) -> ReferenceTuple | None: ... | ||
|
|
||
| def parse( | ||
| self, str_or_uri_or_curie: str, *, strict: bool = False, context: str | None = None | ||
| ) -> ReferenceTuple | None: | ||
| """Parse a string, CURIE, or URI.""" | ||
| if r1 := self.rules.remap_full( | ||
| str_or_uri_or_curie, reference_cls=self._reference_cls, context=context | ||
| ): | ||
| return r1.pair | ||
|
|
||
| # Remap node's prefix (if necessary) | ||
| str_or_uri_or_curie = self.rules.remap_prefix(str_or_uri_or_curie, context=context) | ||
|
|
||
| if self.rules.str_is_blacklisted(str_or_uri_or_curie, context=context): | ||
| raise BlacklistError | ||
|
|
||
| if strict: | ||
| return super().parse(str_or_uri_or_curie, strict=strict) | ||
| return super().parse(str_or_uri_or_curie, strict=strict) | ||
|
|
||
| # docstr-coverage:excused `overload` | ||
| @overload | ||
| def parse_curie( | ||
| self, curie: str, *, strict: Literal[False] = False, context: str | None = ... | ||
| ) -> ReferenceTuple | None: ... | ||
|
|
||
| # docstr-coverage:excused `overload` | ||
| @overload | ||
| def parse_curie( | ||
| self, curie: str, *, strict: Literal[True] = True, context: str | None = ... | ||
| ) -> ReferenceTuple: ... | ||
|
|
||
| def parse_curie( | ||
| self, curie: str, *, strict: bool = False, context: str | None = None | ||
| ) -> ReferenceTuple | None: | ||
| """Parse and standardize a CURIE. | ||
|
|
||
| :param curie: The CURIE to parse and standardize | ||
| :param strict: If the CURIE can't be parsed, should an error be thrown? Defaults | ||
| to false. | ||
| :param context: Is there a context, e.g., an ontology prefix that should be | ||
| applied to the remapping and blacklist rules? | ||
|
|
||
| :returns: A tuple representing a parsed and standardized CURIE | ||
|
|
||
| :raises BlacklistError: If the CURIE is blacklisted | ||
| """ | ||
| if r1 := self.rules.remap_full(curie, reference_cls=self._reference_cls, context=context): | ||
| return r1.pair | ||
|
|
||
| # Remap node's prefix (if necessary) | ||
| curie = self.rules.remap_prefix(curie, context=context) | ||
|
|
||
| if self.rules.str_is_blacklisted(curie, context=context): | ||
| raise BlacklistError | ||
|
|
||
| if strict: | ||
| return super().parse_curie(curie, strict=strict) | ||
| return super().parse_curie(curie, strict=strict) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.