This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!
Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons
I had just started programming in C# when JSON started gaining steam as the “XML Killer”. Being new to software development, I didn’t really have a horse in the race, but I was fascinated by the almost tribal level of care people put into such a simple thing as competing data formats.
Surprisingly, Google actually released Protobuf (Or Protocol Buffers) in 2008, but I think it’s only just started to pick up steam (Or maybe that’s just in the .NET world). I recently worked on a project that used it, and while not to the level of JSON vs XML, I still saw some similarities in how Protobuf was talked about. Mainly that it was almost made out to be some voodoo world changing piece of technology. All I could think was “But.. It’s just a data serialization format right?”.
The Protobuf docs (just in my view) are not exactly clear in spelling out just what Protobuf is and how it works. Mainly I think that’s because much of the documentation out there takes a language neutral approach to describing how it works. But imagine if you were just learning XML, and you learnt all of the intricacies of XML namespaces, declarations, or entities before actually doing the obvious and serializing a simple piece of data down, looking at it, then deserializing it back up.
That’s what I want to do with this article series. Take Protobuf and give you a dead simple overview with C# in mind and show you just how obvious and easy it really is.
Defining Proto Message Contracts
The first thing we need to understand is the Proto Message Contract. These look scary and maybe even a bit confusing as to what they are used for, but they are actually dead simple. A proto message definition would look like this (In proto format) :
syntax="proto3"; message Person { string firstName = 1; string lastName = 2; repeated string emails = 3; }
Just look at this like any other class definition in any language :
- We have our message name (Person)
- We have our fields and their types (For example firstName is a string)
- We can have “repeated” fields (Basically arrays/lists in C#)
- We have an integer identifier for each field. This integer is used *instead* of the field name when we serialize. For example, if we serialized someone with the first name Bob, the serialized content would not have “firstName=’bob'”, it would have “1=’bob'”.
The last point there may be tricky at first but just think about it like this. Using numerical identifiers for each field means you can save a lot of space when dealing with big data because you aren’t subject to storing the entire field name when you serialize.
These contracts are nothing more than a universal way to describe what a message looks like when we serialize it. In my view, it’s no different than an XML or JSON schema. Put simply, we can take this contract and give it to anyone and they will know what the data will look like when we send it to them.
If we take this proto message, and paste it into a converter like the one by Marc Gravell here : https://protogen.marcgravell.com/ We can get what a generated C# representation of this data model will look like (And a bit more on this later!).
The fact is, if you are talking between two systems with Protobuf, you may not even need to worry about every writing or seeing contracts in this format. It’s really no different than someone flicking you an email with something like :
Hey about that protobuf message, it’s going to be in this format :
Firstname will be 1. It’s a string.
LastName will be 2. It’s also a string.
Emails will be 3, and it’s going to be an array of strings
It’s that simple.
Proto Message Contracts In C#
When it comes to working with JSON in C# .NET, you have JSON.NET, so it only makes sense when you are working with Protobuf in C# .NET you have… Protobuf.NET (Again by the amazing Marc Gravell)! Let’s spin up a dead simple console application and add the following package via the package manager console :
Install-Package protobuf-net
Now I will say there are actually a few Protobuf C# libraries floating around, including one by Google. But what I typically find is that these are converted Java libraries, and as such they don’t really conform to how C# is typically written. Protobuf.NET on the other hand is very much a C# library from the bottom up, which makes it super easy and intuitive to work with.
Let’s then take our person class, and use a couple of special attributes given to us by the Protobuf.NET library :
[ProtoContract] class Person { [ProtoMember(1)] public string FirstName { get; set; } [ProtoMember(2)] public string LastName { get; set; } [ProtoMember(3)] public List Emails { get; set; } }
If we compare this to our Proto contact from earlier, it’s a little less scary right? It’s just a plain old C# class, but with a couple of attributes to ensure that we are serializing to the correct identifiers.
I’ll also point something else out here, because we are using integer identifiers, the casing of our properties no longer matters at all. Coming from the C# world where we love PascalCase, this is enormously easy on the eyes. But even more so, when we take a look at performance a bit later on in this series, it will become even clearer what a good decision this is because we no longer have to fiddle around parsing strings, including whether the casing is right or not.
I’ll say it again that if you have an existing Proto message contract given to you (For example, someone else is building an application in Java and they have given you the contract only), you can simply run it through Marc Gravell’s Protogen tool here : https://protogen.marcgravell.com/
It does generate a bit of a verbose output :
[global::ProtoBuf.ProtoContract()] public partial class Person : global::ProtoBuf.IExtensible { private global::ProtoBuf.IExtension __pbn__extensionData; global::ProtoBuf.IExtension global::ProtoBuf.IExtensible.GetExtensionObject(bool createIfMissing) => global::ProtoBuf.Extensible.GetExtensionObject(ref __pbn__extensionData, createIfMissing); [global::ProtoBuf.ProtoMember(1)] [global::System.ComponentModel.DefaultValue("")] public string firstName { get; set; } = ""; [global::ProtoBuf.ProtoMember(2)] [global::System.ComponentModel.DefaultValue("")] public string lastName { get; set; } = ""; [global::ProtoBuf.ProtoMember(3, Name = @"emails")] public global::System.Collections.Generic.List Emails { get; } = new global::System.Collections.Generic.List(); }
But for larger contracts it may just work well as a scaffolding tool for you!
So defining contracts is all well and good, how do we go about Serializing the data? Let’s check that out in Part 2! https://dotnetcoretutorials.com/2022/01/13/protobuf-in-c-net-part-2-serializing-deserializing/