-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Description
In pyspark, selectExpr of DataFrame is very convenient and readable. But there is no such method in odps DataFrame.
There is only select method.
I wrote a selectExpr by first persist the input DataFrame and then use the o.execute_sql to do the select purpose.
But this realization is too slow , Can you give an official fast realization and support Lazy Evaluation?
from odps import ODPS
from uuid import uuid1
o = ODPS(xxx)
def selectExpr(self, df, *exprs):
posfix = str(uuid1()).replace('-','_')
temp_table = 'temp_data_'+ posfix
df.persist(temp_table,lifecycle=1)
sql = 'select ' + ','.join(exprs) + ' from ' + temp_table
posfix = str(uuid1()).replace('-','_')
temp_table = 'temp_data_'+ posfix
o.execute_sql(f'CREATE TABLE {temp_table} lifecycle 1 AS {sql}')
dfout = o.get_table(temp_table).to_df()
return dfoutMetadata
Metadata
Assignees
Labels
No labels