Skip to content

feat: add dataframe duolicated issue - #667 #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

RahulDas-dev
Copy link

This merge request adds a new [duplicated()] method to the DataFrame class that identifies duplicate rows within a DataFrame. This functionality is essential for data cleaning and exploration workflows.

Resolve the issue - #667

Features

  • Identifies duplicate rows in a DataFrame based on specified columns
  • Returns a Series of boolean values marking duplicate entries
  • Supports flexible options for handling duplicates:
    • keep: 'first' - Mark duplicates except for the first occurrence (default)
    • keep: 'last'- Mark duplicates except for the last occurrence
    • keep: false - Mark all duplicates
      Allows focusing on specific columns with the subset option

Implementation Details

  • Optimized to handle large datasets efficiently with a hash-based approach
  • Comprehensive input validation for better error handling
  • Well-documented with JSDoc comments and examples
// Create a DataFrame with duplicate rows
const df = new DataFrame({
  'A': [1, 2, 2, 3, 3],
  'B': ['a', 'b', 'b', 'c', 'c']
});

// Find duplicates keeping first occurrence (default)
const dups = df.duplicated();
// Returns: [false, false, true, false, true]

// Find duplicates keeping last occurrence
const dupsLast = df.duplicated({ keep: 'last' });
// Returns: [false, true, false, true, false]

// Find duplicates based on specific columns
const dupsSubset = df.duplicated({ subset: ['B'] });
// Returns: [false, false, true, false, true]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant