Skip to content

[Bug Report/Feature Request] pickle.dump() Type Mismatch - FILE* vs gzFile #713

@MetalOxideSemi

Description

@MetalOxideSemi

Environment

  • Codon version: v0.19.3
  • OS: Debian Bullseye

Description

Codon's pickle.dump() implementation has a critical type mismatch bug. The File class stores a FILE* pointer from fopen(), but the pickle module incorrectly treats it as a gzFile (zlib-ng's gz_state*), causing gzwrite() to fail with invalid state.

Reproduction Code

import pickle

with open('foo.pkl', 'wb') as f:
    pickle.dump("bar", f)

Expected Behavior

  • The string "bar" should be successfully pickled and written to foo.pkl
  • This code works correctly in standard Python 3.x

Actual Behavior

Program crashes with:

IOError: pickle error: gzwrite returned 0

Raised from: std.pickle._write_raw.0:0
lib/codon/stdlib/pickle.codon:25:13

Backtrace:
  [0x...] std.pickle._write_raw.0:0[Ptr[byte],Ptr[byte],int].1047 at lib/codon/stdlib/pickle.codon:25:13
  [0x...] std.pickle._write.0:0[Ptr[byte],int,int].1054 at lib/codon/stdlib/pickle.codon:42:34
  [0x...] str:str.__pickle__:0[str,Ptr[byte]].1057 at lib/codon/stdlib/pickle.codon:111:21
  [0x...] std.pickle.dump.0:0[str,std.internal.file.File.0,str].1060 at lib/codon/stdlib/pickle.codon:13:18

Root Cause Analysis

Type Mismatch: FILE* vs gzFile*

  1. In file.codon, the File class stores a standard C FILE*:
class File:
    sz: int
    buf: Ptr[byte]
    fp: cobj  # This is FILE* from fopen()

    def __init__(self, path: str, mode: str):
        self.fp = _C.fopen(path.c_str(), mode.c_str())  # Returns FILE*
        if not self.fp:
            raise IOError(f"file {path} could not be opened")
        self._reset()
  1. In pickle.codon, dump() passes f.fp to pickle methods:
def dump(x: T, f, T: type):
    x.__pickle__(f.fp)  # Passes FILE* as if it were gzFile
  1. The pickle implementation treats it as Jar (which is Ptr[byte]):
@extend
class str:
    def __pickle__(self, jar: Jar):  # jar is actually FILE*, not gzFile
        _write(jar, self.len)
        _write_raw(jar, self.ptr, self.len)
  1. _write_raw() calls gzwrite() with the wrong pointer type:
# In pickle.codon, _write_raw eventually calls:
# gzwrite(jar, data, size)
# But jar is FILE*, not gz_state* (gzFile)
  1. In zlib-ng's gzwrite.c:233, the function expects gzFile:
int Z_EXPORT PREFIX(gzwrite)(gzFile file, void const *buf, unsigned len) {
    gz_state *state;
    
    if (file == NULL)
        return 0;
    state = (gz_state *)file;  // Incorrectly casts FILE* to gz_state*
    
    // check that we're writing and that there's no error
    if (state->mode != GZ_WRITE || state->err != Z_OK)
        return 0;  // Returns 0 because state is invalid
    // ...
}
  1. GDB confirms the invalid state:
(gdb) p state
$1 = (gz_state *) 0xc304bd0 
(gdb) p *state
$2 = {
  x = {
    have = 4222428292,  // Garbage value
    next = 0x0,
    pos = 0
  },
  mode = 0,              // Should be GZ_WRITE, but is 0
  fd = 0,
  path = 0x0

The state->mode is 0 instead of GZ_WRITE, causing gzwrite() to return 0 immediately.

The Design Flaw

Codon's pickle module assumes all file handles are gzipped files, but File class uses standard FILE* from fopen(). There are two possible design intentions:

Option A: Pickle should use standard FILE*

  • Change pickle to use fwrite() instead of gzwrite()
  • Remove gzip compression from pickle

Option B: File should use gzFile

  • Change File.__init__() to use gzopen() instead of fopen()
  • Keep gzip compression in pickle

Comparison with Standard Python

Standard Python's pickle module:

  • Works with any file-like object that has write() method
  • Does NOT automatically compress with gzip
  • Users can explicitly use gzip.open() if compression is needed:
import pickle
import gzip

# Standard pickle (no compression)
with open('foo.pkl', 'wb') as f:
    pickle.dump("bar", f)

# With compression (explicit)
with gzip.open('foo.pkl.gz', 'wb') as f:
    pickle.dump("bar", f)

Workaround

There is a workaround available - use gzopen() instead of open():

import pickle
from gzip import gzopen

# Workaround: Use gzopen instead of open
with gzopen('foo.pkl', 'wb') as f:
    pickle.dump("bar", f)

# Reading back
with gzopen('foo.pkl', 'rb') as f:
    data = pickle.load(f)
    print(data)  # Output: bar

Some observations about this approach:

This workaround does solve the immediate problem, but it's a bit different from what Python developers might expect. When porting code from standard Python, you'd need to change all open() calls to gzopen() when working with pickle. This isn't immediately obvious - I only figured it out after debugging the type mismatch issue.

The current implementation means pickle is tightly coupled with gzip compression. In standard Python, using gzopen() is usually an optional optimization for compression, not a requirement for pickle to work at all. This can be confusing when you're just trying to serialize some data and the standard open() approach fails.

It would be nice if this requirement was documented somewhere, or if open() could work directly with pickle for better Python compatibility. Alternatively, if the gzip compression isn't essential for the pickle implementation, using standard file operations might make the API more intuitive for users coming from Python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions