Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem statement
Currently we need code changes in existing code to accomodate the dq rules and add additional validaion.
Proposed Solution
Use Context Manager and re-direct reading and writing with the context Manager
class DQS_ContextManager():
def __init__(self,rule_tbl,name=None):
self.name=name
self.rule_tbl=rule_tbl
def __enter__(self):
pass
def __exit__(self, exc_type, exc_value, exc_traceback):
if exc_value is not None:
print("there is some error in business logic and we cant proceed further",exc_traceback)
else:
print(f"running the quarantine for {self.name} !! hold tight...")
rule_obj=rule_generator(self.rule_tbl,self.name)
df_write_list=list(rule_obj.rule_df_tuple)
#print(df_write_list)
rule_executor.rule_runner(df_write_list)
Output :
Existing code
#without dqs
df_raw=spark.table("raw_tbl_taxi")
#business logic df as of now
df=df_raw.drop_duplicates()
df.write.mode("overwrite").saveAsTable("curated_without_dqs")
With Context Manager
with DQS_ContextManager("rule_tbl","df","tbl"):
df_raw=spark.table("raw_tbl_taxi")
#business logic df as of now
df=df_raw.drop_duplicates()
#df.write.saveAsTable("curated_without_dqs")
#read and write handled inside context Manager
Cleaner read and write
Additional Context
No response