Skip to content

FlatBuffers causing Feather failure for empty data frame and segment faults when read back by R #38

Open
@zhouyan

Description

@zhouyan

Not sure really if this is caused by FlatBuffers.jl or Feather.jl, but given the error messages and some evidence below, it might be more likely a FlatBuffers issues.

below is a simple example,

using Feather
using DataFrames

Feather.write("test.feather", DataFrame())
println(Feather.read("test.feather"))

Run it and I got the error

ERROR: LoadError: EOFError: read end of file
Stacktrace:
 [1] read at ./iobuffer.jl:175 [inlined]
 [2] get(::FlatBuffers.Table{Feather.Metadata.CTable}, ::Int64, ::Type{Int32}) at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/internals.jl:8
 [3] offset at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/internals.jl:18 [inlined]
 [4] read(::FlatBuffers.Table{Feather.Metadata.CTable}, ::Type{Feather.Metadata.CTable}) at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:207
 [5] read at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:199 [inlined]
 [6] read at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:217 [inlined]
 [7] getctable(::Array{UInt8,1}) at /Users/zhou/.julia/packages/Feather/tppUH/src/loadfile.jl:38
 [8] #Source#4(::Bool, ::Type, ::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:18
 [9] Type at ./none:0 [inlined]
 [10] #read#7(::Bool, ::Function, ::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:68
 [11] read(::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:68
 [12] top-level scope at none:0
 [13] include at ./boot.jl:317 [inlined]
 [14] include_relative(::Module, ::String) at ./loading.jl:1044
 [15] include(::Module, ::String) at ./sysimg.jl:29
 [16] exec_options(::Base.JLOptions) at ./client.jl:231
 [17] _start() at ./client.jl:425
in expression starting at /Users/zhou/Desktop/test.jl:5

Below are equivalent in python and R

import feather
import pandas

feather.write_dataframe(pandas.DataFrame(), "test.feather")
print(feather.read_dataframe("test.feather"))
library(feather)

write_feather(data.frame(), "test.feather")
print(read_feather("test.feather"))

They all give the expected results.

Further, say I write the empty data frame using R or python, and read it back with Julia, then expected results (empty data frame) is return

So it looks like the metadata is not written correctly by Julia in the case of empty data frame. I have tried both FlatBuffers.jl v0.4.0 and v0.5.2, the same errors.

On the other hand, the empty data frame written by Julia, when read back by R causes segment faults while not in Python and C++.

In addition, the following simple data frame also cause a segment fault when read back by R if FlatBuffers.jl is upgrade to v0.5.2, but not v0.4.0, and again not in Python or C++

using Feather
using DataFrames

Feather.write("test.feather", DataFrame(x = Float64.(1:500)))

And if I run R through a debugger, I can trace the segment faults to the call to calls like feather::metadata::Column::Init(void const *)

However, the R specific issues here might have something more to do with the fact that it is using an old version flatbuffers C++ library 1.3.0 (https://github.com/wesm/feather/blob/master/cpp/thirdparty/versions.sh), which may have some bugs that does not handles say 0 offsets correctly and other similar issues, while both pyarrow and C++ arrow that I tested are use a flatbuffers version somewhere between 1.9.0 and 1.10.0.

In general I think cross language support and correctness is of some importance. I am inclined to work something out in this regard and send a PR when I can.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions