Open
Description
Describe the bug
Using ForeachBatch in a spark job deployed to Databricks causes the job to never terminate.
To Reproduce
Steps to reproduce the behavior:
- Follow this tutorial to setup dotnet spark on Databricks:
https://learn.microsoft.com/en-us/dotnet/spark/tutorials/databricks-deployment - Include ForeachBatch in your structured streaming query, for example:
var query = df
.WriteStream()
.Format("delta")
.OutputMode("append")
.ForeachBatch(UpsertToDelta)
.Option("checkpointLocation", $"{destPath}/checkpoint")
.Option("path", destPath)
.Trigger(Trigger.Once())
.Start();
For more info, refer to this page on implementing ForeachBatch and upserting into a delta table:
https://docs.databricks.com/_static/notebooks/merge-in-streaming.html
- Run the job. I can confirm that the streaming query will actually proceed and finish however the Databricks UI will continue to show the job as running until it is manually terminated.
Expected behavior
Databricks Job terminates after streaming query is complete.
Additional context
- FYI, I have raised a support ticket with Databricks and Azure and both have expressed that the issue is with the dotnet spark APIs.
- Using Microsoft.Spark.Extensions.Delta 2.0.0 for delta table support