This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!
Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons
In the last post in this series, we looked at how we can serialize and deserialize a single piece of data to and from Protobuf. For the most part, this is going to be your bread and butter way of working with Protocol Buffers. But there’s actually a slightly “improved” way of serializing data that might come in handy in certain situations, and that’s using “Length Prefixes”.
What Are Length Prefixes In Protobuf?
Length Prefixes sound a bit scary but really it’s super simple. Let’s first start with a scenario of “why” we would want to use length prefixes in the first place.
Imagine that I have multiple objects that I want to push into a single Protobuf stream. Let’s say using our example from previous posts, I have multiple “Person” objects that I want to push across the wire to another application.
Because we are sending multiple objects at once, and they are all encoded as bytes, we need to know when one person ends, and another begins. There are really two ways to solve this :
- Have a unique byte code that won’t appear in your data, but can be used as a “delimiter” between items
- Use a “Length Prefix” whereby the first byte (Or bytes) in a stream say how long the first object is, and you know after that many bytes, you can then read the next byte to figure out how long the next item is.
I’ve actually seen *both* options used with Protobuf, but the more common one these days is to use the latter. Mostly because it’s pretty fail safe (You don’t have to pick some special delimited character), but also because you can know ahead of time how large the upcoming object is (You don’t have to just keep reading blindly until you reach a special byte character).
I’m not much of a photoshop guy, so here’s how the stream of data might look in MS Paint :
When reading this data, it might work like so :
- Read the first 4 bytes to understand how long Message 1 will be
- Read exactly that many bytes and store as Message 1
- We can now read the next 4 bytes to understand exactly how long Message 2 will be
- Read exactly that many bytes and store as Message 2
And so on, and we could actually do this forever if the stream was a constant pump of data. As long as we read the first set of bytes to know how long the next message is, we don’t need any other breaking up of the messages. And again, it’s a boon to us to use this method as we never have to pre-fetch data to know what we are getting.
In all honesty, Length Prefixing is not Protobuf specific. After all the data following could be in *any* format, but it’s probably one of the few data formats that seem to have it really baked in. So much so that of course our Protobuf.NET library from earlier posts has out of the box functionality to handle it! So let’s jump into that now.
Using Protobuf Length Prefixes In C# .NET
As always, if you’re only just jumping into this post without reading the previous ones in the series, you’ll need to install the Protobuf.NET library by using the following command on your package manager console.
Install-Package protobuf-net
Then the code to serialize multiple items to the same data stream might look like so :
var person1 = new Person { FirstName = "Wade", LastName = "G" }; var person2 = new Person { FirstName = "John", LastName = "Smith" }; using (var fileStream = File.Create("persons.buf")) { Serializer.SerializeWithLengthPrefix(fileStream, person1, PrefixStyle.Fixed32); Serializer.SerializeWithLengthPrefix(fileStream, person2, PrefixStyle.Fixed32); }
This is a fairly verbose example to write to a file, but obviously you could be writing to any data stream, looping through a list of people etc. The important thing is that our Serialize call changes to “SerializeWithLengthPrefix”.
Nice and easy!
And then to deserialize, there are some tricky things to look out for. But our basic code might look like so :
using (var fileStream = File.OpenRead("persons.buf")) { Person person = null; do { person = Serializer.DeserializeWithLengthPrefix<Person>(fileStream, PrefixStyle.Fixed32); } while (person != null); }
Notice how we actually *loop* the DeserializeWithLengthPrefix. This is because if there are multiple items within the stream, calling this method will return *one* item each time it’s called (And also move the stream to the start of the next item). If we reach the end of the stream and call this again, we will instead return a null object.
Alternatively, you can call DeserializeItems to instead return an IEnumerable of items. This is actually very similar to serializing one at a time because the IEnumerable is lazy loaded.
using (var fileStream = File.OpenRead("persons.buf")) { var persons = Serializer.DeserializeItems<Person>(fileStream, PrefixStyle.Fixed32, -1); }
Because the Protobuf.NET library is so easy to use, I don’t want to really dive into every little overloaded method. But the important thing to understand is that when using Length Prefixes, we can push multiple pieces of data to the same stream without any additional legwork required. It’s really great!
Of course, all of this isn’t really worth it unless there is some sort of performance gains right? And that’s what we’ll be looking at in the next part of this series. Just how does ProtoBuf compare to something like JSON? Take a look at all of that and more here : https://dotnetcoretutorials.com/2022/01/18/protobuf-in-c-net-part-4-performance-comparisons/