Skip to content

[BUG]: Running a job with ForeachBatch under Databricks causes the Job to never stop #1114

Open
@muhammad-s-zainal

Description

@muhammad-s-zainal

Describe the bug
Using ForeachBatch in a spark job deployed to Databricks causes the job to never terminate.

To Reproduce

Steps to reproduce the behavior:

  1. Follow this tutorial to setup dotnet spark on Databricks:
    https://learn.microsoft.com/en-us/dotnet/spark/tutorials/databricks-deployment
  2. Include ForeachBatch in your structured streaming query, for example:
var query = df
      .WriteStream()
      .Format("delta")
      .OutputMode("append")
      .ForeachBatch(UpsertToDelta)
      .Option("checkpointLocation", $"{destPath}/checkpoint")
      .Option("path", destPath)
      .Trigger(Trigger.Once())
      .Start();

For more info, refer to this page on implementing ForeachBatch and upserting into a delta table:
https://docs.databricks.com/_static/notebooks/merge-in-streaming.html

  1. Run the job. I can confirm that the streaming query will actually proceed and finish however the Databricks UI will continue to show the job as running until it is manually terminated.

Expected behavior
Databricks Job terminates after streaming query is complete.

Additional context

  • FYI, I have raised a support ticket with Databricks and Azure and both have expressed that the issue is with the dotnet spark APIs.
  • Using Microsoft.Spark.Extensions.Delta 2.0.0 for delta table support

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions