-
Notifications
You must be signed in to change notification settings - Fork 574
Description
Environment
- Codon version: v0.19.3
- OS: Debian Bullseye
Description
Codon's pickle.dump() implementation has a critical type mismatch bug. The File class stores a FILE* pointer from fopen(), but the pickle module incorrectly treats it as a gzFile (zlib-ng's gz_state*), causing gzwrite() to fail with invalid state.
Reproduction Code
import pickle
with open('foo.pkl', 'wb') as f:
pickle.dump("bar", f)Expected Behavior
- The string
"bar"should be successfully pickled and written tofoo.pkl - This code works correctly in standard Python 3.x
Actual Behavior
Program crashes with:
IOError: pickle error: gzwrite returned 0
Raised from: std.pickle._write_raw.0:0
lib/codon/stdlib/pickle.codon:25:13
Backtrace:
[0x...] std.pickle._write_raw.0:0[Ptr[byte],Ptr[byte],int].1047 at lib/codon/stdlib/pickle.codon:25:13
[0x...] std.pickle._write.0:0[Ptr[byte],int,int].1054 at lib/codon/stdlib/pickle.codon:42:34
[0x...] str:str.__pickle__:0[str,Ptr[byte]].1057 at lib/codon/stdlib/pickle.codon:111:21
[0x...] std.pickle.dump.0:0[str,std.internal.file.File.0,str].1060 at lib/codon/stdlib/pickle.codon:13:18
Root Cause Analysis
Type Mismatch: FILE* vs gzFile*
- In
file.codon, theFileclass stores a standard CFILE*:
class File:
sz: int
buf: Ptr[byte]
fp: cobj # This is FILE* from fopen()
def __init__(self, path: str, mode: str):
self.fp = _C.fopen(path.c_str(), mode.c_str()) # Returns FILE*
if not self.fp:
raise IOError(f"file {path} could not be opened")
self._reset()- In
pickle.codon,dump()passesf.fpto pickle methods:
def dump(x: T, f, T: type):
x.__pickle__(f.fp) # Passes FILE* as if it were gzFile- The pickle implementation treats it as
Jar(which isPtr[byte]):
@extend
class str:
def __pickle__(self, jar: Jar): # jar is actually FILE*, not gzFile
_write(jar, self.len)
_write_raw(jar, self.ptr, self.len)_write_raw()callsgzwrite()with the wrong pointer type:
# In pickle.codon, _write_raw eventually calls:
# gzwrite(jar, data, size)
# But jar is FILE*, not gz_state* (gzFile)- In zlib-ng's
gzwrite.c:233, the function expectsgzFile:
int Z_EXPORT PREFIX(gzwrite)(gzFile file, void const *buf, unsigned len) {
gz_state *state;
if (file == NULL)
return 0;
state = (gz_state *)file; // Incorrectly casts FILE* to gz_state*
// check that we're writing and that there's no error
if (state->mode != GZ_WRITE || state->err != Z_OK)
return 0; // Returns 0 because state is invalid
// ...
}- GDB confirms the invalid state:
(gdb) p state
$1 = (gz_state *) 0xc304bd0
(gdb) p *state
$2 = {
x = {
have = 4222428292, // Garbage value
next = 0x0,
pos = 0
},
mode = 0, // Should be GZ_WRITE, but is 0
fd = 0,
path = 0x0
The state->mode is 0 instead of GZ_WRITE, causing gzwrite() to return 0 immediately.
The Design Flaw
Codon's pickle module assumes all file handles are gzipped files, but File class uses standard FILE* from fopen(). There are two possible design intentions:
Option A: Pickle should use standard FILE*
- Change pickle to use
fwrite()instead ofgzwrite() - Remove gzip compression from pickle
Option B: File should use gzFile
- Change
File.__init__()to usegzopen()instead offopen() - Keep gzip compression in pickle
Comparison with Standard Python
Standard Python's pickle module:
- Works with any file-like object that has
write()method - Does NOT automatically compress with gzip
- Users can explicitly use
gzip.open()if compression is needed:
import pickle
import gzip
# Standard pickle (no compression)
with open('foo.pkl', 'wb') as f:
pickle.dump("bar", f)
# With compression (explicit)
with gzip.open('foo.pkl.gz', 'wb') as f:
pickle.dump("bar", f)Workaround
There is a workaround available - use gzopen() instead of open():
import pickle
from gzip import gzopen
# Workaround: Use gzopen instead of open
with gzopen('foo.pkl', 'wb') as f:
pickle.dump("bar", f)
# Reading back
with gzopen('foo.pkl', 'rb') as f:
data = pickle.load(f)
print(data) # Output: barSome observations about this approach:
This workaround does solve the immediate problem, but it's a bit different from what Python developers might expect. When porting code from standard Python, you'd need to change all open() calls to gzopen() when working with pickle. This isn't immediately obvious - I only figured it out after debugging the type mismatch issue.
The current implementation means pickle is tightly coupled with gzip compression. In standard Python, using gzopen() is usually an optional optimization for compression, not a requirement for pickle to work at all. This can be confusing when you're just trying to serialize some data and the standard open() approach fails.
It would be nice if this requirement was documented somewhere, or if open() could work directly with pickle for better Python compatibility. Alternatively, if the gzip compression isn't essential for the pickle implementation, using standard file operations might make the API more intuitive for users coming from Python.