Skip to content

[Bug]: Panic: Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type #3884

@auyer

Description

@auyer

What happened?

A incremental (upsert) ETL failed with this panic.

This ETL reads data from one tennant at a time into a DeltaTable.
This failure was persistant for the one tennant. It kept failing after successful writes to other tennants, calls to dt.repair(), dt.cleanup_metadata(), dt.optimize.compact(, dt.vacuum( ...

The table looks like this:

table/
   tennant=x/
     ...
   tennant=y/
     ...

The upsert operation is a call to merge:

    dt.merge(
        source=df.to_arrow(), # a polars dataframe
        predicate=predicate,  # predicate containing the partitions and keys
        writer_properties=deltalake.WriterProperties(compression=compression.upper()),
        source_alias="source",
        target_alias="target",
    )
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute()

I tried creating a reproduction bed that would not affect business.
I wanted to dig deep trying to figure out the reason for this panic...
I created a copy of the entire DeltaTable (aws s3 sync origing dest). However, the failure did not occur in the copy . I tries twice (a couple hours of file copy), and no failures in the new copy, while the original kept failing.

This leads me to suspect the issue might be related to the file storage on S3 (like the timestamp for the file, or incomplete uploads...).

What I had to do, is delete the partition, apply dt.repair, and re-start the partition extraction from zero.
I should also mention that this issue happened first a few weeks ago, but I had not time to investigate it back then.

I hope the logs are helpful.

Expected behavior

Writes without panics ;P

Operating System

Linux

Binding

Python

Bindings Version

1.1.4

Steps to reproduce

Sadly, unknown.

Relevant logs

[09:53:54] {.../delta_writer.py:90} INFO - attempting write with predicate: source.id = target.id AND source.tennant = target.tennant AND source.__part_created_at_m = target.__part_created_at_m AND target.tennant = 'REDACTED' AND source.vertical = 'REDACTED' AND target.__part_created_at_m = '1713312000000' AND source.__part_created_at_m = '1713312000000'

[2025-10-17T12:53:54Z WARN  datafusion_datasource_parquet::source] The SchemaAdapter API will be removed from ParquetSource in a future release. Use PhysicalExprAdapterFactory API instead. See https://github.com/apache/datafusion/issues/16800 for discussion and https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-49-0-0 for upgrade instructions.

thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-55.1.0/src/buffer/scalar.rs:143:17:
Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type. Before importing buffer through FFI, please make sure the allocation is aligned.
stack backtrace:
   0:     0x7f714fa2b6b6 - <unknown>
   1:     0x7f714f95cde3 - <unknown>
   2:     0x7f714fa2b11f - <unknown>
   3:     0x7f714fa2b3f3 - <unknown>
   4:     0x7f714fa2ade3 - <unknown>
   5:     0x7f714fa55fa5 - <unknown>
   6:     0x7f714fa55f39 - <unknown>
   7:     0x7f714fa5658c - <unknown>
   8:     0x7f714f4e9c7f - <unknown>
   9:     0x7f714f5452e7 - <unknown>
  10:     0x7f714f50fcc5 - <unknown>
  11:     0x7f714f50f551 - <unknown>
  12:     0x7f714f515cae - <unknown>
  13:     0x7f714f55967e - <unknown>
  14:     0x7f714fa0543b - <unknown>
  15:     0x7f714f9cdceb - <unknown>
  16:     0x7f7155cfb02d - <arrow_array::ffi_stream::ArrowArrayStreamReader as core::iter::traits::iterator::Iterator>::next::hfb8851a54ca91d46
  17:     0x7f7150803811 - <deltalake::writer::LazyCastReader as core::iter::traits::iterator::Iterator>::next::hf3635c418dbe99d8
  18:     0x7f7150803627 - <deltalake::writer::ArrowStreamBatchGenerator as datafusion_physical_plan::memory::LazyBatchGenerator>::generate_next_batch::h3d42bfc1338166ff
  19:     0x7f71535f9335 - <datafusion_physical_plan::memory::LazyMemoryStream as futures_core::stream::Stream>::poll_next::h254cb6d35ec9e8dd
  20:     0x7f715378dffe - <datafusion_physical_plan::coop::CooperativeStream<T> as futures_core::stream::Stream>::poll_next::h45cd8d2dde85703b
  21:     0x7f71537dc40d - datafusion_common_runtime::trace_utils::trace_future::{{closure}}::haced11efc2de567a
  22:     0x7f715377e886 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h8d20307e68888013
  23:     0x7f71536e7819 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::h34833455077eb757
  24:     0x7f71537e772d - tokio::runtime::task::harness::Harness<T,S>::poll::hf1c5b4869b7d1f93
  25:     0x7f715523ae97 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h3c2636e7b6a2688a
  26:     0x7f715523a350 - tokio::runtime::scheduler::multi_thread::worker::Context::run::hb5227abe216ecc07
  27:     0x7f715524b2e8 - tokio::runtime::context::scoped::Scoped<T>::set::hd3596ec0c1293944
  28:     0x7f7155253688 - tokio::runtime::context::runtime::enter_runtime::hcdc0213b78e7b45e
  29:     0x7f7155239bcd - tokio::runtime::scheduler::multi_thread::worker::run::hc0d943a43db679c6
  30:     0x7f715524eed6 - tokio::runtime::task::core::Core<T,S>::poll::h9028d52430295479
  31:     0x7f7155232401 - tokio::runtime::task::harness::Harness<T,S>::poll::h15dabfae04b9ae12
  32:     0x7f71552435b7 - tokio::runtime::blocking::pool::Inner::run::hf067ba3ff6b2cc24
  33:     0x7f7155251c47 - std::sys::backtrace::__rust_begin_short_backtrace::h09e5fc373045dfd9
  34:     0x7f715524fc22 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1b8ac65b86469291
  35:     0x7f7156084b6f - std::sys::pal::unix::thread::Thread::new::thread_start::h1ff51d6e85162efd
  36:     0x7f7167c969cb - <unknown>
  37:     0x7f7167d1aa0c - <unknown>
  38:                0x0 - <unknown>

thread '<unnamed>' panicked at library/core/src/panicking.rs:218:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x7f714fa2b6b6 - <unknown>
   1:     0x7f714f95cde3 - <unknown>
   2:     0x7f714fa2b11f - <unknown>
   3:     0x7f714fa2b3f3 - <unknown>
   4:     0x7f714fa2ade3 - <unknown>
   5:     0x7f714fa55fa5 - <unknown>
   6:     0x7f714fa55f39 - <unknown>
   7:     0x7f714fa5658c - <unknown>
   8:     0x7f714f4ea18c - <unknown>
   9:     0x7f714f4ea1ec - <unknown>
  10:     0x7f714f4ea19c - <unknown>
  11:     0x7f714f9ce0e8 - <unknown>
  12:     0x7f7155cfb02d - <arrow_array::ffi_stream::ArrowArrayStreamReader as core::iter::traits::iterator::Iterator>::next::hfb8851a54ca91d46
  13:     0x7f7150803811 - <deltalake::writer::LazyCastReader as core::iter::traits::iterator::Iterator>::next::hf3635c418dbe99d8
  14:     0x7f7150803627 - <deltalake::writer::ArrowStreamBatchGenerator as datafusion_physical_plan::memory::LazyBatchGenerator>::generate_next_batch::h3d42bfc1338166ff
  15:     0x7f71535f9335 - <datafusion_physical_plan::memory::LazyMemoryStream as futures_core::stream::Stream>::poll_next::h254cb6d35ec9e8dd
  16:     0x7f715378dffe - <datafusion_physical_plan::coop::CooperativeStream<T> as futures_core::stream::Stream>::poll_next::h45cd8d2dde85703b
  17:     0x7f71537dc40d - datafusion_common_runtime::trace_utils::trace_future::{{closure}}::haced11efc2de567a
  18:     0x7f715377e886 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h8d20307e68888013
  19:     0x7f71536e7819 - <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll::h34833455077eb757
  20:     0x7f71537e772d - tokio::runtime::task::harness::Harness<T,S>::poll::hf1c5b4869b7d1f93
  21:     0x7f715523ae97 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h3c2636e7b6a2688a
  22:     0x7f715523a350 - tokio::runtime::scheduler::multi_thread::worker::Context::run::hb5227abe216ecc07
  23:     0x7f715524b2e8 - tokio::runtime::context::scoped::Scoped<T>::set::hd3596ec0c1293944
  24:     0x7f7155253688 - tokio::runtime::context::runtime::enter_runtime::hcdc0213b78e7b45e
  25:     0x7f7155239bcd - tokio::runtime::scheduler::multi_thread::worker::run::hc0d943a43db679c6
  26:     0x7f715524eed6 - tokio::runtime::task::core::Core<T,S>::poll::h9028d52430295479
  27:     0x7f7155232401 - tokio::runtime::task::harness::Harness<T,S>::poll::h15dabfae04b9ae12
  28:     0x7f71552435b7 - tokio::runtime::blocking::pool::Inner::run::hf067ba3ff6b2cc24
  29:     0x7f7155251c47 - std::sys::backtrace::__rust_begin_short_backtrace::h09e5fc373045dfd9
  30:     0x7f715524fc22 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h1b8ac65b86469291
  31:     0x7f7156084b6f - std::sys::pal::unix::thread::Thread::new::thread_start::h1ff51d6e85162efd
  32:     0x7f7167c969cb - <unknown>
  33:     0x7f7167d1aa0c - <unknown>
  34:                0x0 - <unknown>
thread caused non-unwinding panic. aborting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions