Skip to content

Off-heap buffers #10

Open
Open
@wks

Description

@wks

Roadmap

Ruby types to transform: (Items are sorted from the easiest to the hardest.)

  • T_STRING: off-heap array of non-reference byte data
  • T_BIGNUM: off-heap array of non-reference integer data (BDIGIT, defined as unsigned int)
  • T_ARRAY: off-heap array of references
  • T_HASH: off-heap hash table
  • T_OBJECT: off-heap array plus hash table
  • T_MATCH: off-heap arrays holding begin/end of match groups.

Rationale

In Ruby, almost all object types are subject to finalization (i.e. calling obj_free on the object when the object dies). This is not normal.

Most of the time, Ruby call obj_free to free underlying off-heap buffers allocated using xmalloc. Vanilla Ruby does this because their GC cannot allocate objects larger than 40 bytes (extended to 320 bytes with RVARGC). MMTk doesn't have such limitation.

We should fix this problem by transforming those off-heap buffers into on-heap objects.

Attached type information of buffers

To scan those buffers, the buffers must have attached type information, in the form of struct RBasic. We may add a special ruby_value_type, such as RUBY_T_MMTK = 0x17, to indicate it is special MMTk-specific objects, and should be scanned specially. We discussed recently that this approach is wasteful w.r.t. memory space.

Use disjoint objects

An alternative The preferred approach is to keep those buffers "naked" (without header), and extend mmtk-core to support such objects. Julia also has such "naked" buffers..

We have some discussions about "disjoint objects" here: mmtk/mmtk-core#656 . The key is that the header (In Ruby, the RObject, RString, RArray structs...) contains all the necessary type & length & capacity information for scanning both the header itself and the buffer(s) it owns.

This also gives (the current) Ruby (implementation) an opportunity to support object resizing better. Currently, when an object transitions from the "embedded" and the "heap" layout or back, the size of the header object cannot be changed. That's a waste of memory because a 320-byte header can never be smaller even if the array/string only contains a few element. With disjoint objects, at GC time, the VM can decide to split/merge the object into/from the header and the buffer, and resize them if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions