Skip to content

CP-307933: avoid serializing/deserializing database fields all the time #6462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 15, 2025

Conversation

edwintorok
Copy link
Contributor

@edwintorok edwintorok commented May 9, 2025

Builds on top of #6461

Benchmark results are on individual commits.

The most significant one:

ministat confirms a speedup:

       Db.Pool.get_all_records  :
           N           Min           Max        Median           Avg        Stddev
       x 432     54115.256     493532.78      56048.42     57937.384      24117.68
       + 524     22642.778     333257.36     23595.495     24679.258     15206.708
       Difference at 95.0% confidence
               -33258.1 +/- 2513.99
               -57.4036% +/- 2.90374%
               (Student's t, pooled s = 19737.2)

       >>>>  Db.VM.set_NVRAM  :
           N           Min           Max        Median           Avg        Stddev
       x 132         36794     2355369.4     1095736.6     1107222.1      298247.5
       + 168     49167.417     1485278.1     678231.31     685636.89     161480.92
       Difference at 95.0% confidence
               -421585 +/- 52835.6
               -38.0759% +/- 3.54275%
               (Student's t, pooled s = 231767)

@edwintorok edwintorok marked this pull request as ready for review May 13, 2025 08:16
@edwintorok
Copy link
Contributor Author

Testing looks good, there was a failure regarding XML backward compat in xhad.conf, but that is not part of this PR.

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to remove the commits from the already-merged PR

So we can call this from more generic code, without worrying whether locks are held.
Skip adding to the stringpool when locks aren't held and we can't obtain an atomic lock either.

No functional change

Signed-off-by: Edwin Török <[email protected]>
This needs to be called around the same places that `ensure_utf8_xml` and `UTF8_XML.is_valid` gets called:
those are the places where new strings can enter the system.

Now that calling Share.merge is thread-safe also call it when writing new fields,
not just when creating the row.
For now the DB lock is still held when this function is called, but future refactoring might move code around.

For maps and sets we only call this when new keys get added (except in create_row, where we need to process all elements)

Signed-off-by: Edwin Török <[email protected]>
* schema access, similar to write_field: we can access it outside of create_row_locked, reducing the time we hold the DB lock.
* debug writes
* UTF8 validation and StringPool operations

We cannot move field reads in read/write pairs because we need to ensure noone else has changed the DB after we've read it.
(There are other alternatives if we used lock-free data structures, we could optimistically perform the operation on a snapshot,
 and check whether the snapshot is still the same when committing, and if not repeat the operation.
 But for now make just simple changes).

Signed-off-by: Edwin Török <[email protected]>
Stores a Value.t and its marshaled form.
A phantom type distinguishes between situations where Value.t is always present,
from situations where it may be absent.
For now it is always present, but may become absent when we start using this for `Db_interface.field`.

Document complexity of functions (most are `O(1)`, except set/map at construction time,
and unmarshal if the backwards compatible construction from string is used).

In particular operations on strings are `O(1)`, because they don't look at the actual string value,
just wrap or unwrap it from a `Value.String` variant.
set/map requires serialization in S-expression form, but this is only done once at construction time,
and we can retrieve it multiple times in `O(1)`.

No functional or performance change

Signed-off-by: Edwin Török <[email protected]>
Will make it easier to introduce UTF8 validation here.

No functional change

Signed-off-by: Edwin Török <[email protected]>
field=string for now, but will be changed to Schema.CachedValue.t later.

There are separate types for input and output fields, e.g. we may want to perform UTF8 validation on input,
which would modify both Value.t and its serialized form, so we can avoid serializing twice.

No functional change

Signed-off-by: Edwin Török <[email protected]>
This provides direct access to a Value.t when known, which avoids having to serialize and deserialize a Value.t just to read it.

Not yet used, no functional change.

Signed-off-by: Edwin Török <[email protected]>
No functional change, we still use the Compat module

Signed-off-by: Edwin Török <[email protected]>
`ministat` confirms a speedup:
```
Db.Pool.get_all_records  :
    N           Min           Max        Median           Avg        Stddev
x 432     54115.256     493532.78      56048.42     57937.384      24117.68
+ 524     22642.778     333257.36     23595.495     24679.258     15206.708
Difference at 95.0% confidence
        -33258.1 +/- 2513.99
        -57.4036% +/- 2.90374%
        (Student's t, pooled s = 19737.2)

>>>>  Db.VM.set_NVRAM  :
    N           Min           Max        Median           Avg        Stddev
x 132         36794     2355369.4     1095736.6     1107222.1      298247.5
+ 168     49167.417     1485278.1     678231.31     685636.89     161480.92
Difference at 95.0% confidence
        -421585 +/- 52835.6
        -38.0759% +/- 3.54275%
        (Student's t, pooled s = 231767)
```

Signed-off-by: Edwin Török <[email protected]>
@edwintorok
Copy link
Contributor Author

It would be good to remove the commits from the already-merged PR

Rebased, there are now only 10 commits.

github-advanced-security[bot]

This comment was marked as resolved.

@lindig
Copy link
Contributor

lindig commented May 14, 2025

@contificate could take a look

@edwintorok edwintorok added this pull request to the merge queue May 15, 2025
Merged via the queue into xapi-project:master with commit bac8dcf May 15, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants