You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Java `String`objects are immutable, so the API differs from Protobuf-Java in that accessors accept `CharSequence` arguments and return `StringBuilder` objects instead. `StringBuilder` can be converted via `toString()`, but you may want to use a `StringInterner` to share references if you receive many identical strings.
234
+
`String`types are internally stored as `Utf8String` that are lazily parsed and can be set with `CharSequence`. Since Java `String` objects are immutable, there are additional access methods to allow for decoding characters into a reusable `StringBuilder` instance, as well as for using a custom `Utf8Decoder` that can implement interning.
235
235
236
236
```proto
237
237
// .proto
@@ -243,22 +243,24 @@ message SimpleMessage {
243
243
```Java
244
244
// simplified generated code
245
245
publicfinalclassSimpleMessage {
246
-
publicSimpleMessagesetOptionalString(CharSequencevalue);// copies data
Note that this test was done using the original SBE .proto definitions. If the varint types are changed to a less expensive encoding, e.g., `fixed64/32` instead of `int64/32`, the market data numbers improve by another 10-20%. By additionally inlining the small nested fields it'd result in 3-4x the original message throughput of Protobuf-Java. The choice of type can have a huge impact on the performance.
19
+
Note that this test was done using the original SBE .proto definitions. If the varint types are changed to a less expensive encoding, e.g., `fixed64/32` instead of `int64/32`, the results improve by 30-50%. By additionally inlining the small nested fields it'd result in more than 5x the original message throughput. Overall, be aware that there is a significant trade-off between wire size and encoding speed.
20
20
21
-
We also compared the built-in JSON encoding and found that for this particular benchmark the message throughput is roughly the same as Protobuf-Java. However, at 559 byte (car) and 435 byte (market) the uncompressed binary sizes are significantly larger.
21
+
We also compared the built-in JSON encoding and found that for this particular benchmark the message throughput is on par with Protobuf-Java. However, at 559 byte (car) and 435 byte (market) the uncompressed binary sizes are significantly larger.
We also ran benchmarks for reading and writing streams of delimited protobuf messages with varying contents, which is similar to reading sequentially from a log file. All datasets were loaded into memory and decoded from a byte array. Neither benchmark triggers Protobuf-Java's lazy-parsing of strings, so the results may be slightly off. The benchmark code can be found in the `benchmarks` directory.
33
+
We also ran benchmarks for reading and writing streams of delimited protobuf messages with varying contents, which is similar to reading sequentially from a log file. All datasets were loaded into memory and decoded from a byte array. This benchmark does not trigger lazy-parsing of strings, so it is primarily indicative of forwarding use cases. The benchmark code can be found in the `benchmarks` directory.
While the official C++ benchmark shows tremendous performance benefits over Protobuf, the Java implementation has unfortunately been lagging behind a bit. Recent versions have seen some significant performance improvements, but encoding and traversing a `ByteBuffer` still results in more overhead than may be expected.
94
94
95
-
Also be aware that the benchmark was created with a bias for FlatBuffers. The original data is mostly comprised of large varint numbers (e.g. a 10 byte int64) and repeated messages with multiple levels of nesting, which is a particularly bad case for Protobuf. Messages with a flatter hierarchy and more fixed-size scalar types should fare much better.
95
+
It is also worth noting that the benchmark was created with a bias for FlatBuffers. The original data is mostly comprised of large varint numbers (e.g. a 10 byte int64) and repeated messages with multiple levels of nesting, which is a particularly bad case for Protobuf. Messages with a flatter hierarchy and more fixed-size scalar types should fare much better.
0 commit comments