DELTA_STATE_RECOVER_ERROR in Databricks reading delta tables written by python deltalake

Hey folks!

First of all, this issue may have nothing to do with deltalake, but we struggle to get any addition info about it, so though worth asking in deltalake GH as well. Any thoughts are appreciated!

# Environment

- dlt==1.11.0 with deltalake==0.25.1 OR
- dlt==1.14.1 with deltalake==1.1.0
- Cloud Storage: AWS S3
- Databricks Runtime: 16.4 (Spark 3.5.2, Scala 2.13)
- Write Pattern: All commits are overwrite mode. Occasional schema changes occur as part of the overwrite.
- Delta protocol written by dlt: {"protocol":{"minReaderVersion":1,"minWriterVersion":2}}

***
# Bug

### Summary
We are experiencing intermittent read failures on Delta tables located in S3. These tables are written exclusively by dlt library with deltalake (delta-rs)). Databricks will occasionally fail to read a table with a DELTA_STATE_RECOVER_ERROR, even though the table's transaction log (_delta_log) appears valid and is readable by other query engines. The issue is transient and seems to resolve itself over time without any specific user intervention.

### Behavior Observed
When querying an affected Delta table, Databricks intermittently fails with the following error:
``` [2025-08-22, 06:16:51 UTC] {utils.py:39} ERROR - [DELTA_STATE_RECOVER_ERROR] The metadata of your Delta table could not be recovered while Reconstructing version: 26. Did you manually delete files in the _delta_log directory?
```
* Reconstructing version number varies depending on the actual table version.
This error occurs on various read operations, including:
`SELECT * FROM table_name`
`DESCRIBE DETAILS table_name`
The failures have been observed in both interactive notebooks and API-driven jobs like from dbt-databricks.

### Steps to Reproduce
The issue is spontaneous and we have not found a deterministic way to reproduce it. However, we made it appear intentionally several times under the following conditions:

- A process uses the Python dlt library with the deltalake destination to write to a Delta table in S3 in overwrite mode.
- This process runs multiple times, creating new versions of the table, with read queries run from Databricks after every write.
- After a number of commits (empirically observed to be between 10 and 12), Databricks may begin failing to read the table.

### Additional details
- _delta_log appears valid: Despite the error message suggesting deleted files, the _delta_log directory appears complete. We can successfully read the same "corrupted" table and its history using other tools like DuckDB, which can parse the transaction log without issue.
- Table copying "fixes" the issue: If we perform an S3-to-S3 copy of the entire table directory (data and _delta_log) to a new location, the new copy of the table is immediately readable in Databricks without any errors. This strongly suggests the underlying files are not corrupt.
- Cluster restart  “fixes” the issue: if we restart the cluster, attached to the notebook producing the error, the table becomes readable within this cluster and notebook.
- Spontaneous resolution: The error is transient. If we take no action, the table becomes readable again on the same cluster after a variable period, ranging from 5 minutes to a few hours.
- Minor protocol deviations: We noticed that some transaction log files contain multiple metaData actions within a single commit JSON. While this is atypical, it is valid according to the Delta protocol specification and does not cause issues for other readers.
- Previous versions of the table are still readable: While error occurs the same cluster within the same notebook is usually able to read another historical version of the same table.
- Not all the tables affected: We have multiple (~20) tables written from the same setup, however only 2 of them are yet affected.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DELTA_STATE_RECOVER_ERROR in Databricks reading delta tables written by python deltalake #3743

Environment

Bug

Summary

Behavior Observed

Steps to Reproduce

Additional details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DELTA_STATE_RECOVER_ERROR in Databricks reading delta tables written by python deltalake #3743

Description

Environment

Bug

Summary

Behavior Observed

Steps to Reproduce

Additional details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions