Protobuf In C# .NET – Part 4 – Performance Comparisons

This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!

Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons


We’ve made mention in previous posts to the fact that Protobuf (supposedly) will out perform many other data formats, namely JSON. And while we’ve kind of alluded to the fact it’s “fast” and it’s “small”, we haven’t really jumped into the actual numbers.

This post will take a look across three different metrics :

  • File Size – So just how lightweight is Protobuf?
  • Serialization – How fast can we take a C# object and serialize it into Protobuf or JSON?
  • Deserialization – Given a Protobuf/JSON data format, how fast can we turn it into a C# object?

Let’s jump right in!

File Size Comparisons

Before looking at read/write performance, I actually wanted to compare how large the actual output is between Protobuf and JSON. I set up a really simple test that used the following model :

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    public string FirstName { get; set; }

    [ProtoMember(2)]
    public string LastName { get; set; }

    [ProtoMember(3)]
    public List Emails { get; set; }
}

And I used the following code to create an object, and write it twice. Once with Protobuf and once with JSON :

var person = new Person
{
    FirstName = "Wade",
    LastName = "G", 
    Emails = new List<string>
    {
        "wade.g@gmail.com", 
        "wade@business.com"
    }
};

using (var fileStream = File.Create("person.buf"))
{
    Serializer.Serialize(fileStream, person, PrefixStyle.Fixed32);
}

var personString = JsonConvert.SerializeObject(person);
File.WriteAllText("person.json", personString);

The results were :

Format FileSize
Protobuf 46 bytes
JSON 85 bytes

So just by default, Protobuf is almost half the size. Obviously your mileage may vary depending on your data types and even your property names.

That last point is important because while Protobuf has other mechanisms keeping the size down, a big part of it is that all property names are serialized as integers rather than their string form. To illustrate this, I modified the model to look like so :

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    [JsonProperty("1")]
    public string FirstName { get; set; }

    [ProtoMember(2)]
    [JsonProperty("2")]
    public string LastName { get; set; }

    [ProtoMember(3)]
    [JsonProperty("3")]
    public List Emails { get; set; }
}

So now our JSON will be serialized with single digit names as well. When running this, our actual comparison table looks like so :

Format FileSize
Protobuf 46 bytes
JSON 85 bytes
JSON With Digit Properties 65 bytes

So half of the benefits of using Protobuf when it comes to size instantly disappears! For now, I’m not going to use the single digit properties going forward because it’s not illustrative of what happens in the real world with JSON, but it’s an interesting little footnote that you can shrink your disk footprint with just this one simple hack that storage providers hate.

So overall, Protobuf has JSON beat when it comes to file size. That’s no surprise, but what about actual performance when working with objects?

Serialization Performance

Next, let’s take a look at serializing performance. There are a couple of notes on the methodology behind this

  • Because Protobuf serializes to bytes and JSON to strings, I wanted to leave them like that. e.g. I did not take the JSON string, and convert it into bytes as this would artificially create an overhead when there is no need.
  • I kept everything in memory (I did not write to a file etc)
  • I wanted to try and use *both* JSON.NET and Microsoft’s JSON Serializer. The latter is almost certainly going to be faster, but the former probably has more use cases out there in the wild.
  • For now, I’m just using the Protobuf.NET library for everything related to Protobuf
  • Use Protobuf as the “baseline” so everything will compared to how much slower (Or faster, you never know!) it is compared to Protobuf

With that in mind, here’s the benchmark using BenchmarkDotNet (Quick guide if you haven’t seen it before here : https://dotnetcoretutorials.com/2017/12/04/benchmarking-net-core-code-benchmarkdotnet/)

public class ProtobufVsJSONSerializeBenchmark
{
    static Person person = new Person
    {
        FirstName = "Wade",
        LastName = "G",
        Emails = new List<string>
        {
            "wade.g@gmail.com",
            "wade@business.com"
        }
    };

    [Benchmark(Baseline = true)]
    public byte[] SerializeProtobuf()
    {
        using(var memoryStream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(memoryStream, person);
            return memoryStream.ToArray();
        }
    }

    [Benchmark]
    public string SerializeJsonMicrosoft()
    {
        return System.Text.Json.JsonSerializer.Serialize(person);
    }

    [Benchmark]
    public string SerializeJsonDotNet()
    {
        return Newtonsoft.Json.JsonConvert.SerializeObject(person);
    }
}

And the results?

Format Average Time Baseline Comparison
Protobuf 680ns
Microsoft JSON 743ns 9% Slower
JSON.NET 1599ns 135% Slower

So we can see that Protobuf is indeed faster, but not by a heck of a lot. And of course, I’m willing to bet a keen eyed reader will drop a comment below and tell me how the benchmark could be improved to make Microsoft’s JSON serializer even faster.

Of course JSON.NET is slower, and that is to be expected, but again I’m surprised that Protobuf, while fast, isn’t *that* much faster. How about deserialization?

Deserialization Performance

We’ve done serialization, so let’s take a look at the reverse – deserialization.

I do want to point out one thing before we even start, and that is that JSON.NET and Microsoft’s JSON library handle case sensitivity with JSON *very* differently. Infact, JSON.NET is case insensitive by default and is the *only* way it can run. Microsoft’s JSON library is case sensitive by default and must be switched to handle case insensitivity at a huge cost. I have an entire article dedicated to the subject here : https://dotnetcoretutorials.com/2020/01/25/what-those-benchmarks-of-system-text-json-dont-mention/

In some ways, that somewhat invalidates our entire test (Atleast when comparing JSON.NET to Microsoft’s JSON), because it actually entirely depends on whether your JSON is in the exact casing you require (In most cases that’s going to be PascalCase), or if it’s in CamelCase (In which case you take a performance hit). But for now, let’s push that aside and try our best to create a simple benchmark.

Other things to note :

  • Again, I want to work with the formats that work with each data format. So Protobuf will be deserializing from a byte array, and JSON will be deserializing from a string
  • I *had* to create a memory stream for Protobuf. Atleast without making the test more complicated than it needed to be.
public class ProtobufVsJSONDeserializeBenchmark
{
    public static Person person = new Person
    {
        FirstName = "Wade",
        LastName = "G",
        Emails = new List<string>
        {
            "wade.g@gmail.com",
            "wade@business.com"
        }
    };

    static byte[] PersonBytes;
    static string PersonString;

    [GlobalSetup]
    public void Setup()
    {
        using (var memoryStream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(memoryStream, person);
            PersonBytes = memoryStream.ToArray();
        }

        PersonString = JsonConvert.SerializeObject(person);
    }

    [Benchmark(Baseline = true)]
    public Person DeserializeProtobuf()
    {
        using (var memoryStream = new MemoryStream(PersonBytes))
        {
            return ProtoBuf.Serializer.Deserialize<Person>(memoryStream);
        }
    }

    [Benchmark]
    public Person DeserializeJsonMicrosoft()
    {
        return System.Text.Json.JsonSerializer.Deserialize<Person>(PersonString);
    }

    [Benchmark]
    public Person DeserializeJsonDotNet()
    {
        return Newtonsoft.Json.JsonConvert.DeserializeObject<Person>(PersonString);
    }
}

I know it’s a big bit of code to sift through but it’s all relatively simple. We are just deserializing back into a Person object. And the results?

Format Average Time Baseline Comparison
Protobuf 1.019us
Microsoft JSON 1.238us 21% Slower
JSON.NET 2.598us 155% Slower

So overall, Protobuf wins again and by a bigger margin this time than our Serialization effort (When it comes to percentage). But again, your mileage will vary heavily depending on what format your JSON is in.

Conclusion

The overall conclusion is that indeed, Protobuf is faster than JSON by a reasonable margin, or a huge margin if comparing it to JSON.NET. However, in some respects a big part of the difference is likely to lie in how JSON is always serialized as strings versus the direct byte serialization of Protobuf. But that’s just a hunch of mine.

When it comes to file size, Protobuf wins out again, *especially* when serializing full JSON property names. Obviously here we are talking about the difference between a few bytes, but when you are storing say 500GB of data in Protobuf, that same data would be 1000GB in JSON, so it definitely adds up.

That’s all I’m doing on Protobuf for a bit and I hope you’ve learnt something a bit new. Overall, just in my personal view, don’t get too dragged into the hype. Protobuf is great and it does what it says on the tin. But it’s just another data format, nothing to be afraid of!

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

3 comments

  1. To be really sure if you measure things right you should check under a profiler. See https://github.com/Alois-xx/SerializerTests/ for a full test suite with many different serializers where you can also compare .NET 4.8 to .NET 6.0 how much faster is really is. Below are some numbers to derialize 1m Book objects:
    Protobuf_Net on .NET 6.0 needs 0,30 while the .NET JSON serializer needs 0,75s. This is over two times faster which is not bad. Your test might be problematic due to the newly created MemoryStream in DeserializeProtobuf which could measure the allocation costs of the MemoryStream and not the actual serializer performance.
    Using Benchmark.NET does not make systematic measurement errors go away. The main benefit it has is that it aids repeatability by getting rid of other noise artefacts and warming things up. Personally I do not micro benchmarks so much. I find it easier to repeat the test simply e.g. 1 million times to get something faster than microseconds which is not as easily distorted by a myriad of influencing factors.

    Format .NET Core 3.1.22 .NET Core 5.0.13 .NET Core 6.0.1 .NET Framework 4.8.4420.0
    FlatBuffer 0,015 0,013 0,0115 0,0235
    GoogleProtobuf 0,335 0,304 0,3035 0,5095
    Protobuf_net 0,3445 0,312 0,3065 0,364
    Utf8JsonSerializer 0,557 0,6215 0,4905 0,592
    SystemTextJson 1,1595 0,7485 0,753
    JsonNet 1,583 1,521 1,2735 1,452
    XmlSerializer 1,985 2,0425 1,6055 2,1035
    BinaryFormatter 5,536 5,2275 4,462 5,1085
    1. Interesting stuff!

      With the MemoryStream, the issue for me is that obviously Protobuf is from a Stream, but JSON.NET you would say is normally from just plain text (Infact, it doesn’t have a method to take a stream at all). I noticed in your tests, you have to wrap JSON.NET in a StreamWriter wrapper :

      var text = new StreamWriter(stream);
      Formatter.Serialize(text, obj);
      text.Flush();
      

      So it’s almost like someone has to take the hit of taking a format that they aren’t used to unfortunately if you want to keep the tests uniform.

      1. Yes that is a problem, but not so much. TextReader does just the conversion from a byte array to a sequence of strings which originate from a binary buffer (think of a network stream which just knows about bytes). If you have a text only serializer you always need to convert the binary data to a text representation that also costs time. I think it is therefore justified to compare it this way.

Leave a Reply

Your email address will not be published.