Skip to content

suggest add selectExpr method to DataFrame #298

@lyhue1991

Description

@lyhue1991

In pyspark, selectExpr of DataFrame is very convenient and readable. But there is no such method in odps DataFrame.

There is only select method.

I wrote a selectExpr by first persist the input DataFrame and then use the o.execute_sql to do the select purpose.

But this realization is too slow , Can you give an official fast realization and support Lazy Evaluation?

from odps import ODPS 
from uuid import uuid1 

o = ODPS(xxx)
    
def selectExpr(self, df, *exprs): 
    posfix = str(uuid1()).replace('-','_')
    temp_table = 'temp_data_'+ posfix
    df.persist(temp_table,lifecycle=1)
    sql = 'select ' + ','.join(exprs) + ' from ' + temp_table

    posfix = str(uuid1()).replace('-','_')
    temp_table = 'temp_data_'+ posfix
    o.execute_sql(f'CREATE TABLE {temp_table} lifecycle 1 AS {sql}')  
    dfout = o.get_table(temp_table).to_df()
    return dfout

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions