[BUG] Inconsistent left join results #123
Description
Describe the bug
I've been working on porting some logic from pandas to dataframe-js and came across an inconsistency in left joins (I suspect the problem may be seen in right joins as well). I am finding that dataframe-js does not produce consistent left joins, sometimes introducing duplicates that should not be in the join result. In my case, the duplication inserts approximately 50 duplicates which renders the result unusable for my purposes unless I drop duplicates.
To Reproduce
Steps to reproduce the behavior:
Run the below sample code. The expectation is that the join result would have 7 rows, but I come out with 8 rows - in this example there are duplicates for A: 1.
Note that column "A" is used in the join...dfA has 7 records, dfB has 4 records. There are no duplicate A values in dfB.
const jsonDataA = [{ A: 1, B: 4.28283, C: -1.509, D: -1.1352 },
{ A: 2, B: -0.22863, C: -3.39059, D: 1.1632 },
{ A: 3, B: -0.82863, C: -1.5059, D: 2.1352 },
{ A: 4, B: -1.28863, C: 4.5059, D: 4.1632 },
{ A: 5, B: -1.28863, C: 4.5059, D: 4.1632 },
{ A: 6, B: -1.28863, C: 4.5059, D: 4.1632 },
{ A: 7, B: -1.28863, C: 4.5059, D: 4.1632 }];
const jsonDataB = [{ A: 1, xb: 4.28283, B: null, C: -1.509, D: -1.1352 },
{ A: 2, xb: null, B: -0.22863, C: -3.39059, D: 1.1632 },
{ A: 3, xb: null, B: -0.82863, C: -1.5059, D: 2.1352 },
{ A: 4, xb: null, B: -1.28863, C: 4.5059, D: 4.1632 }];
const dfA = new DataFrame(jsonDataA);
const dfB = new DataFrame(jsonDataB);
const dfC = dfA.join(
dfB,
"A",
"left"
);
console.log('TEST', dfC);
Expected behavior
A left join should produce a dataframe with 7 rows, but the result contains duplicates.
Desktop (please complete the following information):
- OS: Manjaro 20.2.1