-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Is there an existing issue for this?
- I have searched the existing issues
Category of Bug / Issue
Converter bug
Current Behavior
Description
When transpiling DataStage sequence jobs to Databricks workflows, the generated JSON files incorrectly use the parent sequence notebook path for all child job tasks instead of mapping to the actual job notebooks specified in the LJobName parameter.
Environment
Lakebridge Version: v0.10.12
Source: IBM InfoSphere DataStage 11.7+
Target: Databricks
Actual Behaviour :
In transpiled JSON workflow files, all tasks use the parent sequencer notebook path:
{
"task_key": "ChildJobA",
"notebook_task": {
"notebook_path": "/Workspace/Users/ParentSequencerJob"
}
}
Impact: All tasks would execute the same notebook code instead of their specific job logic.
Expected Behavior
Each task should reference its own notebook based on the LJobName parameter from the XML:
{
"task_key": "ChildJobA",
"notebook_task": {
"notebook_path": "/Workspace/Users/ChildJobA"
}
}
Steps To Reproduce
1- Prepare Test DataStage XML
Create or use a DataStage sequence job with the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<DSExport>
<Job Identifier="SequenceJobMain" DateModified="2024-01-01">
<Record Identifier="ROOT" Type="JobDefn">
<Property Name="Name">SequenceJobMain</Property>
<Property Name="Description">Sequencer with multiple child jobs</Property>
<!-- JobControlCode contains the execution logic -->
<Property Name="JobControlCode" PreFormatted="1">
*** Activity "ChildJob1": Initialize job
h$V1 = DSAttachJob("JobRunner", DSJ.ERRNONE)
jb$V1 = "JobRunner":'.':"ChildJob1"
p$V1$1 = "ChildJob1"
err$code = DSSetParam(h$V1, "LJobName", p$V1$1)
err$code = DSRunJob(h$V1, DSJ.RUNNORMAL)
*** Activity "ChildJob2": Initialize job
h$V2 = DSAttachJob("JobRunner", DSJ.ERRNONE)
jb$V2 = "JobRunner":'.':"ChildJob2"
p$V2$1 = "ChildJob2"
err$code = DSSetParam(h$V2, "LJobName", p$V2$1)
err$code = DSRunJob(h$V2, DSJ.RUNNORMAL)
</Property>
</Record>
</Job>
<!-- Include the child job definitions -->
<Job Identifier="ChildJob1" DateModified="2024-01-01">
<Record Identifier="ROOT" Type="JobDefn">
<Property Name="Name">ChildJob1</Property>
<!-- ... job definition ... -->
</Record>
</Job>
<Job Identifier="ChildJob2" DateModified="2024-01-01">
<Record Identifier="ROOT" Type="JobDefn">
<Property Name="Name">ChildJob2</Property>
<!-- ... job definition ... -->
</Record>
</Job>
</DSExport>
2- Run Lakebridge Transpilation
3- Examine Generated Workflow JSON
Relevant log output or Exception details
Logs Confirmation
- I ran the command line with
--debug - I have attached the
lsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log
Sample Query
Operating System
macOS
Version
latest via Databricks CLI