I was reading the The Book of JOSH and saw the following statement:
“Json delivers on what XML promised. Simple to understand, effective data markup accessible and usable by human and computer alike. Serialization/Deserialization is on par with or faster then XML, Thrift and Protocol Buffers.”
That seemed a bit too definite for my taste. There are so many variables that can affect the results that I was interested in more information, so I asked for it and eventually got an answer.
I had a brief look at the benchmark referenced and that was enough to come up with some talking points. To make it easier to follow, I will just compare protocol buffers and json (jackson). I started by running the benchmark in my machine (java 1.6.0_14-ea-b03):
Object create | Serialization | Deserialization | Serialized Size | |
---|---|---|---|---|
protobuf | 312.95730 | 3052.26500 | 2340.84600 | 217 |
json | 182.64535 | 2284.88300 | 3362.31850 | 310 |
Ok, so json doesn’t seem to be faster on deserialization and the size is almost 50% bigger (a big deal if the network is the bottleneck as is often the case). Why is serialization of protobuf so slow though? Let’s see the code:
public byte[] serialize(MediaContent content, ByteArrayOutputStream baos) throws IOException { content.writeTo(baos); return baos.toByteArray(); }
How about we replace that with content.toByteArray()?
Object create | Serialization | Deserialization | Serialized Size | |
---|---|---|---|---|
protobuf | 298.89330 | 2087.79800 | 2339.44450 | 217 |
json (jackson) | 174.49190 | 2482.53350 | 3599.90800 | 310 |
That’s more like it. Let’s try something a bit more exotic just for fun and add XX:+DoEscapeAnalysis:
Object create | Serialization | Deserialization | Serialized Size | |
---|---|---|---|---|
protobuf | 260.51330 | 1925.32300 | 2302.74250 | 217 |
json (jackson) | 176.20370 | 2385.99750 | 3647.01700 | 310 |
That reduces some of the cost of object creation for protobuf, but it’s still substantially slower than json. This is not hard to believe because of the builder pattern employed by the Java classes generated by protocol buffers, but I haven’t investigated it in more detail. In any case, protocol buffers is better in 3 of the measures for this particular benchmark.
What does this mean? Not a lot. As usual, where performance is important, you should create benchmarks that mirror your application and environment. I just couldn’t let the blanket “json is on par with or faster than…” statement pass without a bit of scrutiny. ;)