Skip to content
This repository was archived by the owner on Aug 17, 2024. It is now read-only.
This repository was archived by the owner on Aug 17, 2024. It is now read-only.

[BUG] Inconsistent left join results #123

Open
@earlmedina

Description

@earlmedina

Describe the bug
I've been working on porting some logic from pandas to dataframe-js and came across an inconsistency in left joins (I suspect the problem may be seen in right joins as well). I am finding that dataframe-js does not produce consistent left joins, sometimes introducing duplicates that should not be in the join result. In my case, the duplication inserts approximately 50 duplicates which renders the result unusable for my purposes unless I drop duplicates.

To Reproduce
Steps to reproduce the behavior:
Run the below sample code. The expectation is that the join result would have 7 rows, but I come out with 8 rows - in this example there are duplicates for A: 1.

Note that column "A" is used in the join...dfA has 7 records, dfB has 4 records. There are no duplicate A values in dfB.

      const jsonDataA = [{ A: 1, B: 4.28283, C: -1.509, D: -1.1352 },
                  { A: 2, B: -0.22863, C: -3.39059, D: 1.1632 },
                  { A: 3, B: -0.82863, C: -1.5059, D: 2.1352 },
                  { A: 4, B: -1.28863, C: 4.5059, D: 4.1632 },
                  { A: 5, B: -1.28863, C: 4.5059, D: 4.1632 },
                  { A: 6, B: -1.28863, C: 4.5059, D: 4.1632 },
                  { A: 7, B: -1.28863, C: 4.5059, D: 4.1632 }];

      const jsonDataB = [{ A: 1, xb: 4.28283, B: null, C: -1.509, D: -1.1352 },
                  { A: 2, xb: null, B: -0.22863, C: -3.39059, D: 1.1632 },
                  { A: 3, xb: null, B: -0.82863, C: -1.5059, D: 2.1352 },
                  { A: 4, xb: null, B: -1.28863, C: 4.5059, D: 4.1632 }];
      const dfA = new DataFrame(jsonDataA);
      const dfB = new DataFrame(jsonDataB);
      const dfC = dfA.join(
        dfB,
        "A",
        "left"
      );
      console.log('TEST', dfC);

image

Expected behavior
A left join should produce a dataframe with 7 rows, but the result contains duplicates.

Desktop (please complete the following information):

  • OS: Manjaro 20.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions