Skip to content

[BUG]: DataStage Functions (IF, TRANSLATE, Char, Len, etc.) Not Converted - Generate Invalid PySpark Code #2119

@Fatiine

Description

@Fatiine

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Converter bug

Current Behavior

Description

When transpiling DataStage jobs to Databricks/PySpark, DataStage-specific functions are left unconverted in the generated Python code. These functions are embedded within expr(f"""...""") blocks and will cause runtime errors because they are not valid Spark SQL or PySpark syntax.
Affected Functions: IF(), TRANSLATE(), Char(), Len(), SUBSTRING(), UStringToString(), Convert(), IIF(), TimestampToString()
Impact: Generated notebooks will fail immediately upon execution with syntax errors.

Environment

Lakebridge Version: v0.10.12
Source: IBM InfoSphere DataStage 11.7
Target: Databricks/PySpark

Actual Behavior

Generated PySpark code contains DataStage-specific functions that are not valid in Spark SQL:

Generated code (line 50 in transpiled file)

DataTransformJob = SourceDataFrame.select(
    expr(f"""IF ( trim(TRANSLATE(OFFICE_CODE,Char ( 00 ),Char ( 32 ))) = '' , 
                 'DEFAULT_KEY' , 
                 IF ( trim(TRANSLATE(SUBSTRING ( OFFICE_CODE , 6 , 4 ),Char ( 00 ),Char ( 32 ))) = '' , 
                      'UNKNOWN' , 
                      trim(TRANSLATE(OFFICE_CODE,Char ( 00 ),Char ( 32 ))) 
                 ) 
            ) as OUTPUT_CODE""")
)

Expected Behavior

from pyspark.sql.functions import when, trim, regexp_replace, substring, length, lit

DataTransformJob = SourceDataFrame.select(
    when(
        trim(regexp_replace(col('OFFICE_CODE'), '\x00', ' ')) == '',
        lit('DEFAULT_KEY')
    ).when(
        trim(regexp_replace(substring(col('OFFICE_CODE'), 6, 4), '\x00', ' ')) == '',
        lit('UNKNOWN')
    ).when(
        length(trim(regexp_replace(col('OFFICE_CODE'), '\x00', ' '))) < 9,
        lit('UNKNOWN')
    ).otherwise(
        trim(regexp_replace(col('OFFICE_CODE'), '\x00', ' '))
    ).alias('OUTPUT_CODE')
)

Steps To Reproduce

1: Create Test DataStage Job with Transformation
Create a DataStage job (TestTransformJob.dsx) with a Transformer stage containing these expressions:

<Job Identifier="TestTransformJob">
  <Stage Type="CTransformerStage" Name="Transformer_1">
    <!-- Test Case 1: Simple IF function -->
    <Column Name="STATUS_CODE">
      <Derivation>IF(TRIM(STATUS_INPUT) = '', 'UNKNOWN', STATUS_INPUT)</Derivation>
    </Column>
    
    <!-- Test Case 2: TRANSLATE with Char function -->
    <Column Name="CLEAN_TEXT">
      <Derivation>TRANSLATE(TEXT_FIELD, Char(0), Char(32))</Derivation>
    </Column>
    
    <!-- Test Case 3: Nested functions -->
    <Column Name="PROCESSED_VALUE">
      <Derivation>
        IF(Len(TRIM(TRANSLATE(RAW_VALUE, Char(0), Char(32)))) = 0, 
           'EMPTY', 
           SUBSTRING(RAW_VALUE, 1, 10))
      </Derivation>
    </Column>
    
    <!-- Test Case 4: Complex nested IF -->
    <Column Name="CATEGORY">
      <Derivation>
        IF(TYPE_A = 'Y', 'CATEGORY_A', 
           IF(TYPE_B = 'Y', 'CATEGORY_B', 
              IF(TYPE_C = 'Y', 'CATEGORY_C', 'OTHER')))
      </Derivation>
    </Column>
    
    <!-- Test Case 5: UStringToString -->
    <Column Name="STRING_VALUE">
      <Derivation>UStringToString(UNICODE_FIELD)</Derivation>
    </Column>
  </Stage>
</Job>

2: Transpile the file

Relevant log output or Exception details

Logs Confirmation

  • I ran the command line with --debug
  • I have attached the lsp-server.log under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

Sample Query

Operating System

macOS

Version

latest via Databricks CLI

Metadata

Metadata

Labels

bb converterIssues related to BB converter

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions