-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Labels
bb converterIssues related to BB converterIssues related to BB converter
Description
Is there an existing issue for this?
- I have searched the existing issues
Category of Bug / Issue
Converter bug
Current Behavior
Description
Lookup stages (DataStage joins with reference data) generate .select() statements with missing source column references, causing immediate AnalysisException on execution.
Example: Multi-Table Lookup
DataStage: Lookup joining main stream with 2 reference tables
Current Transpilation (INCORRECT):
result_df = main_df \
.join(ref_table1, [...], 'LEFT_OUTER') \
.join(ref_table2, [...], 'LEFT_OUTER') \
.select(
main_df.COL1.alias('COL1'),
main_df.COL2.alias('COL2'),
.alias('REF_COL1'), # ❌ Missing source DataFrame
.alias('REF_COL2') # ❌ Missing source DataFrame
)Expected Behavior
Expected Transpilation :
TGT_TABLE = dbutils.widgets.get("TGT_TABLE")
spark.sql(f"""TRUNCATE TABLE {TGT_TABLE}""") 'left') \
.join(ref_table2, [...], 'left') \
.select(
main_df.COL1.alias('COL1'),
main_df.COL2.alias('COL2'),
main_df.JOIN_KEY.alias('JOIN_KEY'),
ref_table1.REF_COL1.alias('REF_COL1'),
ref_table2.REF_COL2.alias('REF_COL2')
)Steps To Reproduce
1. Create a job with lookup stage:
Save as test_job_lookup.xml:
<?xml version="1.0" encoding="UTF-8"?>
<DSExport>
<Job Identifier="TestLookup">
<Record Identifier="V1" Type="PxLookup">
<Property Name="Name">LKP_1</Property>
<Collection Name="InputPins">
<SubRecord>
<Property Name="Name">MainInput</Property>
<Collection Name="Columns">
<SubRecord>
<Property Name="Name">ID</Property>
</SubRecord>
<SubRecord>
<Property Name="Name">JOIN_KEY</Property>
</SubRecord>
</Collection>
</SubRecord>
<SubRecord>
<Property Name="Name">RefInput</Property>
<Property Name="LookupKey">REF_KEY</Property>
<Collection Name="Columns">
<SubRecord>
<Property Name="Name">REF_KEY</Property>
</SubRecord>
<SubRecord>
<Property Name="Name">REF_VALUE</Property>
</SubRecord>
</Collection>
</SubRecord>
</Collection>
<Collection Name="OutputPins">
<SubRecord>
<Property Name="Name">Output</Property>
<Collection Name="Columns">
<SubRecord>
<Property Name="Name">ID</Property>
<Property Name="Source">MainInput.ID</Property>
</SubRecord>
<SubRecord>
<Property Name="Name">JOIN_KEY</Property>
<Property Name="Source">MainInput.JOIN_KEY</Property>
</SubRecord>
<SubRecord>
<Property Name="Name">REF_VALUE</Property>
<Property Name="Source">RefInput.REF_VALUE</Property>
</SubRecord>
</Collection>
</SubRecord>
</Collection>
</Record>
</Job>
</DSExport>2. Run transpilation:
databricks labs lakebridge transpile --input-source test_job_lookup.xml --output-folder transpiled --debug 3. Observe output
Relevant log output or Exception details
Logs Confirmation
- I ran the command line with
--debug - I have attached the
lsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log
Sample Query
Operating System
macOS
Version
latest via Databricks CLI
Metadata
Metadata
Assignees
Labels
bb converterIssues related to BB converterIssues related to BB converter