I was reading the The Book of JOSH and saw the following statement:

“Json delivers on what XML promised. Simple to understand, effective data markup accessible and usable by human and computer alike. Serialization/Deserialization is on par with or faster then XML, Thrift and Protocol Buffers.”

That seemed a bit too definite for my taste. There are so many variables that can affect the results that I was interested in more information, so I asked for it and eventually got an answer.

I had a brief look at the benchmark referenced and that was enough to come up with some talking points. To make it easier to follow, I will just compare protocol buffers and json (jackson). I started by running the benchmark in my machine (java 1.6.0_14-ea-b03):

Object create Serialization Deserialization Serialized Size
protobuf 312.95730 3052.26500 2340.84600 217
json 182.64535 2284.88300 3362.31850 310

Ok, so json doesn’t seem to be faster on deserialization and the size is almost 50% bigger (a big deal if the network is the bottleneck as is often the case). Why is serialization of protobuf so slow though? Let’s see the code:

    public byte[] serialize(MediaContent content, ByteArrayOutputStream baos) throws IOException
        return baos.toByteArray();

How about we replace that with content.toByteArray()?

Object create Serialization Deserialization Serialized Size
protobuf 298.89330 2087.79800 2339.44450 217
json (jackson) 174.49190 2482.53350 3599.90800 310

That’s more like it. Let’s try something a bit more exotic just for fun and add XX:+DoEscapeAnalysis:

Object create Serialization Deserialization Serialized Size
protobuf 260.51330 1925.32300 2302.74250 217
json (jackson) 176.20370 2385.99750 3647.01700 310

That reduces some of the cost of object creation for protobuf, but it’s still substantially slower than json. This is not hard to believe because of the builder pattern employed by the Java classes generated by protocol buffers, but I haven’t investigated it in more detail. In any case, protocol buffers is better in 3 of the measures for this particular benchmark.

What does this mean? Not a lot. As usual, where performance is important, you should create benchmarks that mirror your application and environment. I just couldn’t let the blanket “json is on par with or faster than…” statement pass without a bit of scrutiny. ;)