Skip to content

The Dilemma of Using InsertBinaryAsync #328

@SoftStoneDevelop

Description

@SoftStoneDevelop

For example, we have a table:

CREATE TABLE IF NOT EXISTS gedaqtests.byteuint8e0m
(
    m_id Int32,
    m_value UInt8,
    m_value_second UInt8,
    m_value_another UInt8,
)
ENGINE = MergeTree
PARTITION BY (intHash32(m_id) % 5)
ORDER BY m_id
SETTINGS index_granularity = 8192;

Let's say we need to insert data into Id + Value.
To insert data into it, we currently have two options:

Option 1:

var rows = new List<object[]>
{
    new object[] { (int)12, (byte)5 }, // row1
    new object[] { (int)12, (byte)5 }, // row2
    //...
};

long rowsInserted = await client.InsertBinaryAsync("byteuint8e0m", ["m_id", "m_value"], rows);

object[] is a problem, we will have to pack int and byte into object[] and this is get us boxing.
Since CLickhouse typically inserts very large batches of data, this can cause performance issues.

Option 2:

public class Row1
{
    [ClickHouseColumn(Type = "Int32", Name = "m_id")]
    public Int32 Id { get; set; }

    [ClickHouseColumn(Type = "UInt8", Name = "m_value")]
    public System.Byte Value { get; set; }

    [ClickHouseNotMapped()]
    public System.Byte Second { get; set; }

    [ClickHouseNotMapped()]
    public System.Byte Another { get; set; }
}

client.RegisterBinaryInsertType<Row>();

var list = new List<Row1>()
{
    new Row1
    {
        Id = 12,
        Value = 5
    },

    new Row1
    {
        Id = 15,
        Value = 6
    }
};

long rowsInserted = await client.InsertBinaryAsync("byteuint8e0m", list);

There are even more problems here:

  • In each case, we need to write Row1(m_id + m_value), Row2(m_id + m_value_second), Row3... . In applications, the table has many columns, and it's not always clear in advance which data will be in a row: "m_id + m_value" or "m_id + m_value_second" or another combination of columns. Writing POCO classes for all combinations is impossible.

  • Attributes don't work when building AOT - the application simply crashes, apparently because reflection occurs. I haven't tested it in detail, but it looks like AOT isn't supported because of the BuildMapping method and how it's implemented, but that might not be the case.

A third method is needed that will avoid these limitations. Perhaps it will be more manual. Something like this:


var import = await client.StartInsertAsync("byteuint8e0m", ["m_id", "m_value"]);

using var batch1 = import.StartNewBatch();
// fill batch 1
batch1.WriteData(0, 12);
batch1.WriteData(1, 5);

batch2.CompleteWrite();

var sendTask = import.SendBatchAsync(batch1);

using var batch2 = import.StartNewBatch();
// fill batch 2
batch2.WriteData(0, 15);
batch2.WriteData(1, 6);

batch2.CompleteWrite();

var sendTask2 = import.SendBatchAsync(batch2);

await Task.WhenAll(sendTask, sendTask2);

In this case, the ORM, not the driver developer, will handle the automation. Am I missing something? Are there existing ways to avoid the problems I encountered above?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions