Skip to content

Remove highly co-linear (or equivalent confounds)  #312

@jdkent

Description

@jdkent

Is your feature request related to a problem? Please describe.
It can look concerning if the design matrix is "singular" and needs to be "regularized".
equivalent confounds can be removed, so that a "singular" matrix would only refer to (near) equivalence between task regressors and/or a task regressor and confound.

Describe the solution you'd like
Automatic detection and deletion of duplicate columns (with helpful warning being raised)

Describe alternatives you've considered
Keep all columns or allow user to decide

Additional context
This will be useful code to implement this feature:

def get_duplicate_columns(df):
    '''
    Get a list of duplicate columns.
    It will iterate over all the columns in dataframe and find the columns whose contents are duplicate.
    :param df: Dataframe object
    :return: List of columns whose contents are duplicates.
    '''
    duplicateColumnNames = set()
    # Iterate over all the columns in dataframe
    for x in range(df.shape[1]):
        # Select column at xth index.
        col = df.iloc[:, x]
        # Iterate over all the columns in DataFrame from (x+1)th index till end
        for y in range(x + 1, df.shape[1]):
            # Select column at yth index.
            otherCol = df.iloc[:, y]
            # Check if two columns at x 7 y index are equal
            if np.all(np.isclose(col, otherCol)):
                duplicateColumnNames.add(df.columns.values[y])
    return list(duplicateColumnNames)

source

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions