Description
Not sure really if this is caused by FlatBuffers.jl or Feather.jl, but given the error messages and some evidence below, it might be more likely a FlatBuffers issues.
below is a simple example,
using Feather
using DataFrames
Feather.write("test.feather", DataFrame())
println(Feather.read("test.feather"))
Run it and I got the error
ERROR: LoadError: EOFError: read end of file
Stacktrace:
[1] read at ./iobuffer.jl:175 [inlined]
[2] get(::FlatBuffers.Table{Feather.Metadata.CTable}, ::Int64, ::Type{Int32}) at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/internals.jl:8
[3] offset at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/internals.jl:18 [inlined]
[4] read(::FlatBuffers.Table{Feather.Metadata.CTable}, ::Type{Feather.Metadata.CTable}) at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:207
[5] read at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:199 [inlined]
[6] read at /Users/zhou/.julia/packages/FlatBuffers/jRuEN/src/FlatBuffers.jl:217 [inlined]
[7] getctable(::Array{UInt8,1}) at /Users/zhou/.julia/packages/Feather/tppUH/src/loadfile.jl:38
[8] #Source#4(::Bool, ::Type, ::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:18
[9] Type at ./none:0 [inlined]
[10] #read#7(::Bool, ::Function, ::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:68
[11] read(::String) at /Users/zhou/.julia/packages/Feather/tppUH/src/source.jl:68
[12] top-level scope at none:0
[13] include at ./boot.jl:317 [inlined]
[14] include_relative(::Module, ::String) at ./loading.jl:1044
[15] include(::Module, ::String) at ./sysimg.jl:29
[16] exec_options(::Base.JLOptions) at ./client.jl:231
[17] _start() at ./client.jl:425
in expression starting at /Users/zhou/Desktop/test.jl:5
Below are equivalent in python and R
import feather
import pandas
feather.write_dataframe(pandas.DataFrame(), "test.feather")
print(feather.read_dataframe("test.feather"))
library(feather)
write_feather(data.frame(), "test.feather")
print(read_feather("test.feather"))
They all give the expected results.
Further, say I write the empty data frame using R or python, and read it back with Julia, then expected results (empty data frame) is return
So it looks like the metadata is not written correctly by Julia in the case of empty data frame. I have tried both FlatBuffers.jl v0.4.0 and v0.5.2, the same errors.
On the other hand, the empty data frame written by Julia, when read back by R causes segment faults while not in Python and C++.
In addition, the following simple data frame also cause a segment fault when read back by R if FlatBuffers.jl is upgrade to v0.5.2, but not v0.4.0, and again not in Python or C++
using Feather
using DataFrames
Feather.write("test.feather", DataFrame(x = Float64.(1:500)))
And if I run R through a debugger, I can trace the segment faults to the call to calls like feather::metadata::Column::Init(void const *)
However, the R specific issues here might have something more to do with the fact that it is using an old version flatbuffers C++ library 1.3.0 (https://github.com/wesm/feather/blob/master/cpp/thirdparty/versions.sh), which may have some bugs that does not handles say 0 offsets correctly and other similar issues, while both pyarrow and C++ arrow that I tested are use a flatbuffers version somewhere between 1.9.0 and 1.10.0.
In general I think cross language support and correctness is of some importance. I am inclined to work something out in this regard and send a PR when I can.
Activity