We have found that the performance of json.NewDecoder(in).Decode(&out), i.e. when using streamed decoder is very poor.
Compared to json.Unmarshal it is about 80-90x times slower.
Profiling shows that memmove is being called too often and is responsible for 96% of the execution.
These two places are the culprit:
Looking at the code, the problem is that those append (memmove) calls are basically called for every character in the input stream.