Skip to content

[BUG]: Datastage : Transpiled Databricks Workflow JSON Uses Parent Notebook Path Instead of Actual Job Names #2118

@Fatiine

Description

@Fatiine

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Converter bug

Current Behavior

Description

When transpiling DataStage sequence jobs to Databricks workflows, the generated JSON files incorrectly use the parent sequence notebook path for all child job tasks instead of mapping to the actual job notebooks specified in the LJobName parameter.

Environment

Lakebridge Version: v0.10.12
Source: IBM InfoSphere DataStage 11.7+
Target: Databricks

Actual Behaviour :

In transpiled JSON workflow files, all tasks use the parent sequencer notebook path:

{
  "task_key": "ChildJobA",
  "notebook_task": {
    "notebook_path": "/Workspace/Users/ParentSequencerJob"
  }
}

Impact: All tasks would execute the same notebook code instead of their specific job logic.

Expected Behavior

Each task should reference its own notebook based on the LJobName parameter from the XML:

{
  "task_key": "ChildJobA",
  "notebook_task": {
    "notebook_path": "/Workspace/Users/ChildJobA"
  }
}

Steps To Reproduce

1- Prepare Test DataStage XML
Create or use a DataStage sequence job with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<DSExport>
  <Job Identifier="SequenceJobMain" DateModified="2024-01-01">
    <Record Identifier="ROOT" Type="JobDefn">
      <Property Name="Name">SequenceJobMain</Property>
      <Property Name="Description">Sequencer with multiple child jobs</Property>
      
      <!-- JobControlCode contains the execution logic -->
      <Property Name="JobControlCode" PreFormatted="1">
        *** Activity "ChildJob1": Initialize job
        h$V1 = DSAttachJob("JobRunner", DSJ.ERRNONE)
        jb$V1 = "JobRunner":'.':"ChildJob1"
        p$V1$1 = "ChildJob1"
        err$code = DSSetParam(h$V1, "LJobName", p$V1$1)
        err$code = DSRunJob(h$V1, DSJ.RUNNORMAL)
        
        *** Activity "ChildJob2": Initialize job
        h$V2 = DSAttachJob("JobRunner", DSJ.ERRNONE)
        jb$V2 = "JobRunner":'.':"ChildJob2"
        p$V2$1 = "ChildJob2"
        err$code = DSSetParam(h$V2, "LJobName", p$V2$1)
        err$code = DSRunJob(h$V2, DSJ.RUNNORMAL)
      </Property>
    </Record>
  </Job>
  
  <!-- Include the child job definitions -->
  <Job Identifier="ChildJob1" DateModified="2024-01-01">
    <Record Identifier="ROOT" Type="JobDefn">
      <Property Name="Name">ChildJob1</Property>
      <!-- ... job definition ... -->
    </Record>
  </Job>
  
  <Job Identifier="ChildJob2" DateModified="2024-01-01">
    <Record Identifier="ROOT" Type="JobDefn">
      <Property Name="Name">ChildJob2</Property>
      <!-- ... job definition ... -->
    </Record>
  </Job>
</DSExport>

2- Run Lakebridge Transpilation
3- Examine Generated Workflow JSON

Relevant log output or Exception details

Logs Confirmation

  • I ran the command line with --debug
  • I have attached the lsp-server.log under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

Sample Query

Operating System

macOS

Version

latest via Databricks CLI

Metadata

Metadata

Labels

bb converterIssues related to BB converter

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions