-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
Is your feature request related to a problem? Please describe.
It can look concerning if the design matrix is "singular" and needs to be "regularized".
equivalent confounds can be removed, so that a "singular" matrix would only refer to (near) equivalence between task regressors and/or a task regressor and confound.
Describe the solution you'd like
Automatic detection and deletion of duplicate columns (with helpful warning being raised)
Describe alternatives you've considered
Keep all columns or allow user to decide
Additional context
This will be useful code to implement this feature:
def get_duplicate_columns(df):
'''
Get a list of duplicate columns.
It will iterate over all the columns in dataframe and find the columns whose contents are duplicate.
:param df: Dataframe object
:return: List of columns whose contents are duplicates.
'''
duplicateColumnNames = set()
# Iterate over all the columns in dataframe
for x in range(df.shape[1]):
# Select column at xth index.
col = df.iloc[:, x]
# Iterate over all the columns in DataFrame from (x+1)th index till end
for y in range(x + 1, df.shape[1]):
# Select column at yth index.
otherCol = df.iloc[:, y]
# Check if two columns at x 7 y index are equal
if np.all(np.isclose(col, otherCol)):
duplicateColumnNames.add(df.columns.values[y])
return list(duplicateColumnNames)
Metadata
Metadata
Assignees
Labels
No labels