Skip to content

refactor: implement rapidfuzz#3

Merged
rmfranken merged 6 commits into
developfrom
refactor_rapidfuzz
Sep 23, 2025
Merged

refactor: implement rapidfuzz#3
rmfranken merged 6 commits into
developfrom
refactor_rapidfuzz

Conversation

@rmfranken
Copy link
Copy Markdown
Member

@rmfranken rmfranken commented Sep 15, 2025

Proposed Changes

Implementing rapidfuzz matcher (return only 1 IRI for a string, if rapidfuzz finds a match above score 90).
Add tests.
Add logging so you can see in the terminal which labels were IRIfied, which can help you pick a better match threshold.

Types of Changes

What types of changes does your contribution introduce? Put an x in the boxes
that apply

  • A bug fix (non-breaking change which fixes an issue). Use MR tag
    bugfix.
  • [ x ] A new feature (non-breaking change which adds functionality). Use MR
    tag feature.
  • A breaking change (fix or feature that would cause existing
    functionality to not work as expected). Use MR tag feature.
  • A non-productive update (documentation, tooling, etc. if none of the
    other choices apply). Use MR tag chore.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

Further Comments

@caviri caviri requested review from caviri and Copilot September 15, 2025 13:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements rapidfuzz fuzzy string matching functionality to the RDF transformer, enabling approximate matching when exact matches are not found. The changes add configurable fuzzy matching with a threshold-based scoring system to improve string-to-IRI mapping capabilities.

Key changes:

  • Added rapidfuzz dependency and fuzzy matching logic to RDFTransformer
  • Enhanced API endpoint to accept fuzzy matching parameters (fuzzy flag and threshold)
  • Updated tests to validate both exact and fuzzy matching scenarios

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

File Description
pyproject.toml Adds rapidfuzz>=3.0.0 dependency
src/strings2things/app/core/rdf_transformer.py Implements fuzzy matching with configurable threshold and refactors matching logic
src/strings2things/app/api/endpoints.py Updates API to accept fuzzy matching parameters and creates transformer instances per request
tests/test_rdf_transformer.py Adds comprehensive test coverage for exact, fuzzy, and no-match scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread src/strings2things/app/core/rdf_transformer.py Outdated
Comment thread src/strings2things/app/core/rdf_transformer.py Outdated
rmfranken and others added 3 commits September 16, 2025 09:51
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rmfranken rmfranken requested a review from vancauwe September 16, 2025 11:09
@rmfranken rmfranken merged commit e8e0adf into develop Sep 23, 2025
1 of 5 checks passed
@caviri caviri deleted the refactor_rapidfuzz branch September 23, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants