Skip to content

Conversation

@afiorillo
Copy link

@afiorillo afiorillo commented Nov 7, 2025

Hello!

I'm a fan of orjson and use it extensively. Thanks for the great work!

I recently found myself trying to serialize a dict containing numpy floats for keys and found that while orjson supports serializing numpy, and it supports serializing non-string keys, it does not supporting serializing non-string numpy keys. This PR hopefully introduces support for this in a nice way. I'm not the first person to have encountered this issue, for example #604 .

From the contribution guidelines, I think my checklist to completion looks something like:

  • Tests. I wrote serialization tests within test_non_str_keys.py to cover all the same Numpy scalar types covered in test_numpy.py.
  • A working feature. Originally I used serde_json::to_string(...) to reuse the existing serializer implementation for NumpyScalar but this (A) doesn't feel very efficient, and (B) ruined datetimes. Instead I made a to_string() implementation which is very similar.
  • Error handling. The only serialization path that can fail is when we're given a numpy.datetime64 and this returns a serialization error if it's invalid.
  • Benchmarks. I performed some micro benchmarking and found that this approach is often more runtime than to use the workaround. I share some examples below.
  • Documentation. I think this warrants an extra sentence or two to clarify it's possible.

Benchmarks

Here are some micro-benchmarks:

In [2]: dates = {dt: 3.14 for dt in np.arange('1930', '2030', dtype='datetime64[D]')}

In [3]: epochs = {dt.astype(np.int64): v for dt, v in dates.items()}

In [4]: %timeit json.dumps({str(date): value for date, value in dates.items()})
13.5 ms ± 3.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit orjson.dumps({str(date): value for date, value in dates.items()})
6.76 ms ± 36.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit orjson.dumps(dates, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NON_STR_KEYS)
17 ms ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit json.dumps({str(timestamp): value for timestamp, value in epochs.items()})
8.93 ms ± 95.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit orjson.dumps({str(timestamp): value for timestamp, value in epochs.items()})
5.3 ms ± 19.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit orjson.dumps(epochs, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NON_STR_KEYS)
3.02 ms ± 17.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: strings_with_datetime_values = {str(dt): dt for dt in dates}

In [11]: datetimes_with_string_values = {dt: str(dt) for dt in dates}

In [12]: %timeit orjson.dumps(strings_with_datetime_values, option=orjson.OPT_SERIALIZE_NUMPY)
20.7 ms ± 4.83 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [13]: %timeit orjson.dumps(strings_with_datetime_values, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NON_STR_KEYS)
21.1 ms ± 5.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

From this we can draw some conclusions:

  1. All of the orjson.dumps serializations are faster than json.dumps .
  2. When serializing 36.5k numpy.int64 it is ~40% faster to use the extra option.
  3. For np.datetime64 values, it is surprisingly faster (by a wide margin) to convert each date into a string in Python and then to use orjson.dumps(...) without options. Lines 4-6 we see for 36.5k dates it takes almost 1/3rd the time to stringify outside of orjson. I was a bit surprised by this and found for much larger datasets the relative improvement is less, but it's still there.
  4. Perhaps can explain the previous point as a general extra cost of serializing Numpy. On lines 12 and 13 we serialize 36.k np.datetime64 and 36.5k strs: once with numpy as the values (using the existing option) and once with numpy as the keys (using the new option). Here the serialization time with numpy keys is still slightly slower, but well within the error of the two samples.

…light the "solution". Since really serde_json::to_str is producing a JSON string of the given value, it escapes the time string and that gets escaped again as the key string. So that needs fixing.
…t creating a whole serializer. It still needs error handling for the malformed datetimes.
…ts add ~400ms to the runtime of `test/test_non_str_keys.py` compared to the upstream, of which these contribute almost all of it
@afiorillo afiorillo marked this pull request as ready for review November 8, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant