Description
Roadmap
Ruby types to transform: (Items are sorted from the easiest to the hardest.)
-
T_STRING
: off-heap array of non-reference byte data -
T_BIGNUM
: off-heap array of non-reference integer data (BDIGIT
, defined asunsigned int
) -
T_ARRAY
: off-heap array of references -
T_HASH
: off-heap hash table -
T_OBJECT
: off-heap array plus hash table -
T_MATCH
: off-heap arrays holding begin/end of match groups.
Rationale
In Ruby, almost all object types are subject to finalization (i.e. calling obj_free
on the object when the object dies). This is not normal.
Most of the time, Ruby call obj_free
to free underlying off-heap buffers allocated using xmalloc
. Vanilla Ruby does this because their GC cannot allocate objects larger than 40 bytes (extended to 320 bytes with RVARGC). MMTk doesn't have such limitation.
We should fix this problem by transforming those off-heap buffers into on-heap objects.
Attached type information of buffers
To scan those buffers, the buffers must have attached type information, in the form of We discussed recently that this approach is wasteful w.r.t. memory space.struct RBasic
. We may add a special ruby_value_type
, such as RUBY_T_MMTK = 0x17
, to indicate it is special MMTk-specific objects, and should be scanned specially.
Use disjoint objects
An alternative The preferred approach is to keep those buffers "naked" (without header), and extend mmtk-core to support such objects. Julia also has such "naked" buffers..
We have some discussions about "disjoint objects" here: mmtk/mmtk-core#656 . The key is that the header (In Ruby, the RObject
, RString
, RArray
structs...) contains all the necessary type & length & capacity information for scanning both the header itself and the buffer(s) it owns.
This also gives (the current) Ruby (implementation) an opportunity to support object resizing better. Currently, when an object transitions from the "embedded" and the "heap" layout or back, the size of the header object cannot be changed. That's a waste of memory because a 320-byte header can never be smaller even if the array/string only contains a few element. With disjoint objects, at GC time, the VM can decide to split/merge the object into/from the header and the buffer, and resize them if needed.