Skip to content

[BUG] Error with outerJoin in Getting Cohort Dispatch #77

Open
@TheCedarPrince

Description

@TheCedarPrince

I had seen this bug a few times and thought maybe I was just "using it wrong", but it just dawned on me that there is actually an error here as the outerjoin should also join on the :subject_id variable or else there will result duplicate column name errors.

"""
function GetCohortSubjectStartDate(df:DataFrame, conn; tab = cohort)

Given a `DataFrame` with a `:cohort_definition_id` column and `:subject_id` column, return the `DataFrame` with an associated `:cohort_start_date` corresponding to a cohort's subject ID in the `DataFrame`

Multiple dispatch that accepts all other arguments like in `GetCohortSubjectStartDate(ids, conn; tab = cohort)`
"""
function GetCohortSubjectStartDate(
    df::DataFrame, 
    conn; 
    tab = cohort
)

    return outerjoin(GetCohortSubjectStartDate(df[:,"cohort_definition_id"], df[:,"subject_id"], conn; tab=tab), df, on = :cohort_definition_id)

end

@Jay-sanjay, I am not sure how we missed this with the tests... Did we not have a test that accounted for a dataframe with both cohort_definition_id and subject_id? I guess I am just surprised we missed this; ah well!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions