-
-
Notifications
You must be signed in to change notification settings - Fork 32
Description
[Draft-1] Serialization
In razix "asset is everything" for the first draft we only support binary file format. Later we can optionally support JSON/S-expressions for asset debuggability or a binary editor tool in pyqt.
Now how will this work?
RZAssetDB will take care of this, RZAssetPool is only for runtime management. Once we call serialize/deserialize on RZAssetDB. It picks up each of the pool and then uses the RZReflectionRegistry to write them into the *.rzasset files.
*.rzasset file format --> defined in RZAssetSpec.h
- [Header] starts with RZAsset and it's RZAssetColdData metadata ==> Header/Metadata serialization, this can be hand-written functions without reflection or use it idc. Buy basically that.
- [Payload] Next we have the asset payload, this is the actual important thing, this can be compressed blob, the header will tell us that and RZAssetDB can call schedule relevant compression/decompression Job on it.
Use the RZReflectionRegistry to parse through the members and write them as the structs need them, we do it as per the ABI format. Each data type is stored as per the datatype size of the platforms. Hopefull all x64 has the same ABI requirements.
Data Types handling:
- Blobs: As for pointer/blob data we use a blob_size to load a blob of pointer data and can be type-casted as needed.
- Strings: again it's just like a pointer, we store the str_size and the blob of data, basically behaves like pointer blobs
- Floats/Doubles: copy 4/8 bytes
- Primitives: written as-is for i32/u32/char/unsigned char/bool/bitset/enums flags
- Arrays: uses a RZSerializedArrayHeader that has type info, size and element count again similar to Blobs, everything is a blob if you look closely
- HashMaps: Serialize keys/value/occupied arrays as Arrays
- Clean POD structs: needs to track alignment and we can drirectly write the serialized Blobs
- Compression is handles per SerialzedBlob instead of at asset level
- for trivially capable types we just memcpy
- for more complex types we define macros walk through TypeMetaData members and serialize them one by one
- We can use heuristics to choose to compress based on Blob payloads
Use std::visit and overloadTs to customize serialization logic (or use a simple switch case?)If the type has all primitive datatypes they are trivially copyable, and however big never compressed just memcpy on filer read.- Use custom macros for each reflection type ex. REFLECT_PRIMITIVE, REFLECT_STRUCT, REFLECT_STRING etc. to handle metadata generation and make intent more explicit and this also helps with easier extension in future
As for iterating through members:
u8* base = reinterpret_cast<u8*>(obj);
void* fieldPtr = base + m.offset;I can then check the TypeMetaData serialization type and pass to a switch and fill these SerializedXXX structs and write to disk
struct SerializedBlob {
uint32_t offset;
uint32_t size;
uint32_t type_hash;
uint8_t compression;
uint8_t reserved[3];
uint32_t decompressed_size;
};
// Array of anything
struct SerializedArray {
SerializedBlob data; // blob of all elements
uint32_t element_count;
uint32_t element_type_hash;
uint8_t element_size;
uint8_t reserved[3];
};
// HashMap
struct SerializedHashMap {
SerializedBlob keys; // blob of all keys
SerializedBlob values; // blob of all values
SerializedBlob occupied; // blob of all occupancy flags
uint32_t capacity;
uint32_t count;
uint32_t index;
};
// String
struct SerializedString {
SerializedBlob data; // blob of characters
uint32_t length;
uint8_t encoding; // UTF-8, UTF-16, ASCII
uint8_t reserved[3];
};
// Struct instance - only works for POD/simple structs without pointers. Marked as clean POD in reflection registry.
struct SerializedObject {
SerializedBlob data; // blob of struct bytes
uint32_t type_hash;
uint32_t size;
};
// Array of objects - only works for POD/simple structs without pointers. Marked as clean POD in reflection registry.
struct SerializedObjectArray {
SerializedBlob data; // blob containing array of blobs
uint32_t element_count;
uint32_t element_type_hash;
};
// Vector<T>
template<typename T>
struct SerializedVector {
SerializedBlob data; // blob of elements
uint32_t count;
uint32_t capacity;
};
// Map/HashMap<K, V>
template<typename K, typename V>
struct SerializedMap {
SerializedBlob keys; // blob of keys
SerializedBlob values; // blob of values
uint32_t count;
};Compression
- Done inside RZSerializer using another subclass RZCompressedArchive.
- Deferred payloads with header offset patching
- [[RZFileHeader][HeaderSection][PayloadSection]
- [FileHeader] - magic, headersSize, payloadSize
- Writing to archive:
struct RZPendingBlob
{
size_t headerOffset; // offset inside headerBuffer
const void* payload; // original uncompressed payload pointer
u32 payloadSize; // uncompressed size
rz_compression_type compression; // compression type
};- push payloads into this, don't write data yet until we finalize
struct RZCompressedArchive
{
RZDynamicArray<u8>* finalBuffer; // output
RZDynamicArray<u8> headerBuffer; // all headers
RZDynamicArray<u8> payloadBuffer; // all payloads (compressed or raw)
RZDynamicArray<RZPendingBlob> pendingBlobs;
size_t headerCursor = 0;
enum class Mode { Write, Read } mode;
RZCompressedArchive(RZDynamicArray<u8>* out, Mode m)
: finalBuffer(out), mode(m)
{}
// ----------------------------------
// Header writing (same role as old write())
// ----------------------------------
void write(const void* src, size_t size)
{
size_t oldSize = headerBuffer.size();
headerBuffer.resize(oldSize + size);
memcpy(headerBuffer.data() + oldSize, src, size);
headerCursor += size;
}
// ----------------------------------
// Blob registration (NEW)
// ----------------------------------
void registerBlob(size_t headerOffset,
const void* payload,
u32 payloadSize,
rz_compression_type compression)
{
pendingBlobs.push_back({
headerOffset,
payload,
payloadSize,
compression
});
}
// ----------------------------------
// Finalization (patch offsets, emit payloads)
// ----------------------------------
void finalize()
{
payloadBuffer.clear();
for (auto& pb : pendingBlobs)
{
size_t payloadStart = payloadBuffer.size();
if (pb.compression != RZ_COMPRESSION_NONE)
{
// compress into payloadBuffer
compress_append(payloadBuffer,
pb.payload,
pb.payloadSize,
pb.compression);
}
else
{
payloadBuffer.resize(payloadBuffer.size() + pb.payloadSize);
memcpy(payloadBuffer.data() + payloadStart,
pb.payload,
pb.payloadSize);
}
u32 writtenSize =
static_cast<u32>(payloadBuffer.size() - payloadStart);
// patch header
RZSerializedBlob* hdr =
reinterpret_cast<RZSerializedBlob*>(
headerBuffer.data() + pb.headerOffset);
hdr->offset =
sizeof(RZFileHeader) +
static_cast<u32>(headerBuffer.size()) +
static_cast<u32>(payloadStart);
hdr->size = writtenSize;
hdr->decompressedSize = pb.payloadSize;
hdr->compression = pb.compression;
}
// build final buffer
finalBuffer->clear();
RZFileHeader fileHdr = {};
fileHdr.magic = 0x525A4958; // 'RZIX'
fileHdr.version = 1;
fileHdr.flags = 0;
fileHdr.headerSize = static_cast<u32>(headerBuffer.size());
fileHdr.payloadSize = static_cast<u32>(payloadBuffer.size());
finalBuffer->resize(sizeof(RZFileHeader) +
headerBuffer.size() +
payloadBuffer.size());
u8* dst = finalBuffer->data();
memcpy(dst, &fileHdr, sizeof(fileHdr));
memcpy(dst + sizeof(fileHdr),
headerBuffer.data(),
headerBuffer.size());
memcpy(dst + sizeof(fileHdr) + headerBuffer.size(),
payloadBuffer.data(),
payloadBuffer.size());
}
};Tasks
- Define reflection metadata for each of the primitve/engine types as mentioned in the design Below, create Serialized version for each primitive type that a complex type will resolve into via type registry parsing
use std::variant and std::visit for thisusing a simple switch for each SerializeableDataType- write a small gtest --> deifne a small struct Player stry to reflect and serialize it, that should give a nice starting point, print and load back the binary blob.
- Write simple test to use the reflection registry to serialize primitive types into binary data and load them back. (use std::filesystem for temp tests)
- Define and cleanup Serialized structs in RZSerialzable.h
- Add SerializedType enum to TypeMetaData.h and create more utility REFLECT_XXX macros to pass in enum value
- do a simple example of primitive types serialization and deserialization
- write basic interface wrapper for LZ4 compression or just tests are also enough, integration tests to see how we will be using LZ4 for some data, maybe some strings? and binary data
- Extend this to complex types and write more tests. (use std::filesystem for temp tests) (extend switch to fill more complex types)
- Blobs
- Arrays
- String
- HashMap
- Object (basically nesting)
-
ObjectArray -
Enum (same as primitive?) -
Bitfield (same as primitive?) - Add new reflection macros to all asset types and implement any missing complex types
- UUID, just read/write the 16 bytes of memory
- Serialization Tests for binary loads
- test on arbitrary types
- test on asset payload types
- RZAsset and it's subtype tests
- Add RAZIX_ASSET macro to all asset types and update proper serialization tests for final usage and fix anything it lacks (Note: full *.rzasset need to be done in 2 parts by AssetDB serialize the RZAsset and then the payload separately)
- Serialization tests with fake compression enabled on SerializedBlobs and trigger them manually for custom asset types and also using direct write/read ==> print numbers in tests
- Add RZCompressedArchive sub-class for writing RZSerializedBlobs with compression and write payloads at end, change from inline to clumped payloads structure, with minimal refactor
- test on arbitrary types
- test on final *.rzasset types
- Remove cereal library from vendor
- Add custom parser for *.razixproject files using nlohmann::json thingy --> can we pass this to RZSerializer with json support? idk whatever is the fastest. Not really necessary we don't care we can be manual with JSON parser or build if things get too out of hand. *.ini parser made sense as we have many and custom ini format but for JSON only this uses it also we can replace it with s-expression parses we build for scene graph, I mean Data-driven FG uses it but we can remove json completely.