Hi, I've been playing around with Typebox and I'm a bit concerned about the performance of Value.Default, Value.Clean, etc.
I've created a simple playground repo to do some benchmarking and here's some preview:
> node --expose-gc index.js -i 10000
Running 10,000 iterations with GC exposed: true
┌─────────┬─────────────────────────┬────────────┬──────────────────┬─────────────┐
│ (index) │ name │ iterations │ time │ performance │
├─────────┼─────────────────────────┼────────────┼──────────────────┼─────────────┤
│ 0 │ 'Typebox without Parse' │ 10000 │ ' 18.114 ms' │ '1.00x' │
│ 1 │ 'Typebox with Parse' │ 10000 │ ' 3417.595 ms' │ '0.01x' │
│ 2 │ 'Zod' │ 10000 │ ' 404.747 ms' │ '0.04x' │
└─────────┴─────────────────────────┴────────────┴──────────────────┴─────────────┘
I'm totally aware that micro-benchmarking is not very representative, and there's ton of different things that can impact the performace, as well as fundamental difference between how zod and typebox operate. Therefore, this numbers are just for a reference and something to base the conversation on.
From my inverstigation, unions, nested object and arrays have the biggest impact on performance, which sounds very reasonable. Well, why that is the case is also pretty obvious: by the architectural design, all the actions (apply default values, clean extra properties, etc..) in typebox are built to be independetly consumable. But this leads to a not very satisfactory performance, when for some usecase its requried to pass a value through Typebox parse pipeline with multiple actions — most of which deeply traverse the value — therefore doing pretty expensive recursive traverse operations for EACH step. And this one of the biggest differences of how zod operates (does all the validations, default values resolution, extra properties cleanup, etc. in one go).
Another thing that I noticed is that, for example, for Value.Default to correctly resolve and apply default values for union types, its necessary to run Check on the value for each subschema/subtype to figure out the schema to take the default value from. But in this case it always uses runtime validation, which can be significantly slower compared to a complied validation (example from typebox's readme)
┌────────────────────────────┬────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ (index) │ Iterations │ ValueCheck │ Ajv │ TypeCompiler │ Performance │
├────────────────────────────┼────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Array_Composite_Union │ 1000000 │ ' 1331 ms' │ ' 76 ms' │ ' 40 ms' │ ' 1.90 x' │
└────────────────────────────┴────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
So, I wonder, if there's room for improvement here? Do you see any way to optimize it? Here's couple of things I've been thinking about:
- is is possible to squash all the actions of Value.Parse into one-time-only deep value traversal? So, instread of having separate
function Visit(schema: TSchema, references: TSchema[], value: unknown): any {
const references_ = Pushref(schema, references)
const schema_ = schema as any
switch (schema_[Kind]) {
case 'Array':
return FromArray(schema_, references_, value)
case 'Date':
return FromDate(schema_, references_, value)
// ...
}
}
for every action, make it that the list of actions to be performed is passed to each type specific function
function FromDate(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown) {
return Value.Parse(actions, schema, references, value)
}
function FromArray(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown) {
if (IsArray(value)) {
for (let i = 0; i < value.length; i++) {
value[i] = Value.Parse(actions, schema.items, references, value[i])
}
return value
}
return value
}
function Visit(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown): any {
const references_ = Pushref(schema, references)
const schema_ = schema as any
switch (schema_[Kind]) {
case 'Array':
return FromArray(actions, schema_, references_, value)
case 'Date':
return FromDate(actions, schema_, references_, value)
// ...
}
}
- is it possible to extend a compiler, so it also compiles and saves references to the optimized
check functions of all nested types/schemas, so they can be passed to a Value.Default (and others) and be accessed there, just like you do with references, something like this:
function FromUnion(schema: TUnion, references: TSchema[], value: any, checks: Map<TSchema, CheckFn>): unknown {
for (const subschema of schema.anyOf) {
const converted = Visit(subschema, references, Clone(value))
const checkFn = checks.get(subschema) ?? Check.bind(null, subschema)
if (!checkFn(references, converted)) continue
return converted
}
return value
}
This, probably, should significantly improve performance for union and intersaction types
Hi, I've been playing around with Typebox and I'm a bit concerned about the performance of Value.Default, Value.Clean, etc.
I've created a simple playground repo to do some benchmarking and here's some preview:
I'm totally aware that micro-benchmarking is not very representative, and there's ton of different things that can impact the performace, as well as fundamental difference between how zod and typebox operate. Therefore, this numbers are just for a reference and something to base the conversation on.
From my inverstigation, unions, nested object and arrays have the biggest impact on performance, which sounds very reasonable. Well, why that is the case is also pretty obvious: by the architectural design, all the actions (apply default values, clean extra properties, etc..) in typebox are built to be independetly consumable. But this leads to a not very satisfactory performance, when for some usecase its requried to pass a value through Typebox parse pipeline with multiple actions — most of which deeply traverse the value — therefore doing pretty expensive recursive traverse operations for EACH step. And this one of the biggest differences of how zod operates (does all the validations, default values resolution, extra properties cleanup, etc. in one go).
Another thing that I noticed is that, for example, for Value.Default to correctly resolve and apply default values for union types, its necessary to run Check on the value for each subschema/subtype to figure out the schema to take the default value from. But in this case it always uses runtime validation, which can be significantly slower compared to a complied validation (example from typebox's readme)
So, I wonder, if there's room for improvement here? Do you see any way to optimize it? Here's couple of things I've been thinking about:
for every action, make it that the list of actions to be performed is passed to each type specific function
checkfunctions of all nested types/schemas, so they can be passed to a Value.Default (and others) and be accessed there, just like you do with references, something like this:This, probably, should significantly improve performance for union and intersaction types