Skip to content

Add relationship_type#123

Merged
straeter merged 4 commits intomainfrom
add_one_on_one
Feb 11, 2026
Merged

Add relationship_type#123
straeter merged 4 commits intomainfrom
add_one_on_one

Conversation

@straeter
Copy link
Copy Markdown
Contributor

  • add one_on_one parameter for merge
  • add / uncomment some system files to gitignore

Copy link
Copy Markdown
Contributor

@jackwildman jackwildman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs better explained to the user, and the bug sentry spotted also looks genuine. Other than that, good to go.

Copy link
Copy Markdown
Member

@CallumMcMahon CallumMcMahon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a bool? what about "1:1"/"m:1"/"1:m" or enums, to handle the other options?

@straeter
Copy link
Copy Markdown
Contributor Author

straeter commented Feb 11, 2026

does this need to be a bool? what about "1:1"/"m:1"/"1:m" or enums, to handle the other options?

ok good point. At the moment we can only have m:1 and 1:1 but in the future we might have 1:m and m:m. I will instead introduce a parameter "relationship_type" this an enum now and default to m:1

@straeter straeter changed the title Add one on one Add relationship_type Feb 11, 2026
@straeter
Copy link
Copy Markdown
Contributor Author

@jackwildman is it now clearer with the relationship_type enum?

Copy link
Copy Markdown
Contributor

@jackwildman jackwildman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and definitely more obvious now. I'm not the keenest on relationship_type, but I'm happy to go with whatever you ultimately go with

right_key=right_key,
use_web_search=use_web_search,
one_on_one=one_on_one,
relationship_type=relationship_type,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nittiest of nits: relationship_type is kind of vague, and arguably a bit inaccurate. cardinality is maybe more accurate (or at least where terms like "many-to-many" often come in), but at the expense of not being immediately obvious to anyone who doesn't live in a database. We already have a strategy field on dedupe, so strategy here might be a good choice. If nothing else, it would add some harmony between the operations.

Copy link
Copy Markdown
Contributor Author

@straeter straeter Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about cardinality but for me this is a very mathematical term that most people have never heard of, whereas relationship should also be more familiar for every person that has worked with SQL. I think strategy would not be a good choice here, it would rather refer / understood as the strategy how to perform the merge like "first try fuzzy, then web agents". What we want to describe is really an existing relationship between the data / rows and our algorithm then figures out the best way + strategy to cope with that relationship

update: I just realized that cardinality has two very different meanings in mathematics and computer science

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I am a bit confused about is that you call relationship_type "inaccurate" -> is it not just a synonym of cardinality? https://www.geeksforgeeks.org/dbms/types-of-relationship-in-database/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think cardinality also suffers the same inaccuracy now that I think about it more. I think, fundamentally, it's about whether we call this a "relationship", as we're not really establishing a relationship but more operating in a manner where x on one side can match and merge with y on the other side, so it is more like a mode or principle of operation than a relationship.

Saying this, I think basically any term can be nitpicked for this, so probably best just to pick relationship_type and move on with it. The key thing is that even if the term isn't immediately obvious, it doesn't take long to figure out from reading the doc string or the enumerated values, and from there it's easy enough to understand

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the blog articles we will soon write about it :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: I just realized that cardinality has two very different meanings in mathematics and computer science

Yet more confusingly, computer science often uses both definitions. I mean, I do see why one might arrive at "cardinality" for n:m relationships in a database if we consider that element x has a set of connections of cardinality m, but it's definitely a bit of an overloaded term.

@straeter straeter merged commit f599f82 into main Feb 11, 2026
3 checks passed
@straeter straeter deleted the add_one_on_one branch February 11, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants