Skip to content

Integrate Polars Plugin for High-Volume Email Validation  #5

Open
@bnkc

Description

Description:

I propose adding a Polars plugin to the emval library to enable high-performance email validation directly within Polars DataFrames. This integration would allow users to efficiently validate large datasets of email addresses, leveraging emval's speed and Polars' data manipulation strengths.

Benefits:

  • Performance: Validate entire DataFrames of emails quickly using Rust's performance.
  • Integration: Seamlessly incorporate email validation into existing Polars workflows.
  • Scalability: Handle large datasets efficiently with minimal performance overhead.

Proposed Usage:

The plugin would enable email validation with the following syntax:

import polars as pl
from emval.polars import validate_email

df = pl.DataFrame({
    'email': [
        '[email protected]',
        'invalid-email',
        '[email protected]',
        'user@[192.168.1.1]',
        ''
    ]
})

# Apply the email validation plugin
df = df.with_columns(
    validated=validate_email(
        pl.col('email'),
        allow_smtputf8=True,
        allow_empty_local=False,
        allow_quoted_local=False,
        allow_domain_literal=False,
        deliverable_address=True,
    )
)

# Access the fields from the Struct column
df = df.with_columns(
    original=pl.col('validated').struct.field('original'),
    normalized=pl.col('validated').struct.field('normalized'),
    local_part=pl.col('validated').struct.field('local_part'),
    domain_name=pl.col('validated').struct.field('domain_name'),
    domain_address=pl.col('validated').struct.field('domain_address'),
    is_deliverable=pl.col('validated').struct.field('is_deliverable'),
).drop('validated')

print(df)

Proposed Project Structure:

emval/
├── __init__.py
├── validator.py
├── model.py
├── polars/
│   ├── __init__.py
│   └── plugin.py
src/
├── lib.rs             # Main module for emval
├── validators/        # Additional validation logic
├── polars_plugin.rs   # Polars plugin module

Optional Installation:

The Polars plugin should be an optional dependency, installable via:

pip install emval[polars]

This ensures the base emval library remains lightweight for users who don’t require the plugin.

Reference Documentation:

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions