A number of times in recent years, I’ve had the chance to work in companies that completely design out entire API’s using OpenAPI, before writing a single line of code. Essentially writing YAML to say which endpoints will be available, and what each API should accept and return.

There’s pros and cons to doing this of course. A big pro is that by putting in upfront time to really thinking about API structure, we can often uncover issues well before we get half way through a build. But a con is that after spending a bunch of time defining things like models and endpoints in YAML, we then need to spend days doing nothing but creating C# classes as clones of their YAML counterparts which can be tiresome and frankly, demoralizing at times.

That’s when I came across Open API Generator : https://openapi-generator.tech/

It’s a tool to take your API definitions, and scaffold out APIs and Clients without you having to lift a finger. It’s surprisingly configurable, but at the same time it isn’t too opinionated and allows you to do just the basics of turning your definition into controllers and models, and nothing more.

Let’s take a look at a few examples!

Installing Open API Generator

If you read the documentation here https://github.com/OpenAPITools/openapi-generator, it would look like installing is a huge ordeal of XML files, Maven and JAR files. But for me, using NPM seemed to be simple enough. Assuming you have NPM installed already (Which you should!), then you can simply run :

npm install @openapitools/openapi-generator-cli -g

And that’s it! Now from a command line you can run things like :

openapi-generator-cli version

Scaffolding An API

For this example, I actually took the PetStore API available here : https://editor.swagger.io/

It’s just a simple YAML definition that has CRUD operations on an example API for a pet store. I took this YAML and stored it as “petstore.yaml” locally. Then I ran the following command in the same folder  :

openapi-generator-cli generate -i petstore.yaml -g aspnetcore -o PetStore.Web --package-name PetStore.Web

Pretty self explanatory but one thing I do want to point out is the -g flag. I’m passing in aspnetcore here but in reality, Open API Generator has support to generate API’s for things like PHP, Ruby, Python etc. It’s not C# specific at all!

Our project is generated and overall, it looks just like any other API you would build in .NET

Notice that for each group of API’s in our definition, it’s generated a controller each and models as well.

The controllers themselves are well decorated, but are otherwise empty. For example here is the AddPet method :

/// <summary>
/// Add a new pet to the store
/// </summary>
/// <param name="body">Pet object that needs to be added to the store</param>
/// <response code="405">Invalid input</response>
[HttpPost]
[Route("/v2/pet")]
[Consumes("application/json", "application/xml")]
[ValidateModelState]
[SwaggerOperation("AddPet")]
public virtual IActionResult AddPet([FromBody]Pet body)
{
    //TODO: Uncomment the next line to return response 405 or use other options such as return this.NotFound(), return this.BadRequest(..), ...
    // return StatusCode(405);
    throw new NotImplementedException();
}

I would note that this is obviously rather verbose (With the comments, Consumes attribute etc), but a lot of that is because that’s what we decorated our OpenAPI definition with, therefore it tries to generate a controller that should act and function identically.

But also notice that it hasn’t generated a service or data layer. It’s just the controller and the very basics of how data gets in and out of the API. It means you can basically scaffold things and away you go.

The models themselves also get generated, but they can be rather verbose. For example, each model gets an override of the ToString method that looks a bit like so :

/// <summary>
/// Returns the string presentation of the object
/// </summary>
/// <returns>String presentation of the object</returns>
public override string ToString()
{
    var sb = new StringBuilder();
    sb.Append("class Pet {\n");
    sb.Append("  Id: ").Append(Id).Append("\n");
    sb.Append("  Category: ").Append(Category).Append("\n");
    sb.Append("  Name: ").Append(Name).Append("\n");
    sb.Append("  PhotoUrls: ").Append(PhotoUrls).Append("\n");
    sb.Append("  Tags: ").Append(Tags).Append("\n");
    sb.Append("  Status: ").Append(Status).Append("\n");
    sb.Append("}\n");
    return sb.ToString();
}

It’s probably overkill, but you can always delete it if you don’t like it.

Obviously there isn’t much more to say about the process. One command and you’ve got yourself a great starting point for an API. I would like to say that you should definitely dig into the docs for the generator as there is actually a tonne of flags to use that likely solve a lot of hangups you might have about what it generates for you. For example there is a flag to use NewtonSoft.JSON instead of System.Text.Json if that is your preference!

I do want to touch on a few pros and cons on using a generator like this though…

The first con is that updates to the original Open API definition really can’t be “re-generated” into the API. There are ways to do it using the tool but in reality, I find it unlikely that you would do it like this. So for the most part, the generation of the API is going to be a one time thing.

Another con is as I’ve already pointed out, the generator has it’s own style which may or may not suit the way you like to develop software. On larger API’s fixing some of these quirks of the generator can be annoying. But I would say that for the most part, fixing any small style issues is still likely to take less time than writing the entire API from scratch by hand.

Overall however, the pro of this is that you have a very consistent style. For example, I was helping out a professional services company with some of their code practices recently. What I noticed is that they spun up new API’s every month for different customers. Each API was somewhat beholden to the tech leads style and preferences. By using an API generator as a starting point, it meant that everyone had a somewhat similar starting point for the style we wanted to go for, and the style that we should use going forward.

Generating API Clients

I want to quickly touch on another functionality of the Open API Generator, and that is generating clients for an API. For example, if you have a C# service that needs to call out to a web service, how can you quickly whip up a client to interact with that API?

We can use the following command to generate a Client library project :

openapi-generator-cli generate -i petstore.yaml -g csharp -o PetStore.Client --package-name PetStore.Client

This generates a very simple PetApi interface/class that has all of our methods to call the API.

For example, take a look at this simple code :

var petApi  = new PetApi("https://myapi.com");
var myPet = petApi.GetPetById(123);
myPet.Name = "John Smith";
petApi.UpdatePet(myPet);

Unlike the server code we generated, I find that the client itself is often able to be regenerated as many times as needed, and over long periods of time too.

As I mentioned, the client code is very handy when two services need to talk to each other, but I’ve also found it useful for writing large scale integration tests without having to copy and paste large models between projects or be mindful about what has changed in an API, and copy those changes over to my test project.

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

I cannot tell you how many times I’ve had the following conversation

“Hey I’m getting an error”

“What’s the error?”

“DBUpdateException”

“OK, what’s the message though, that could be anything”

“ahhh.. I didn’t see…..”

Frustratingly, When doing almost anything with Entity Framework including updates, deletes and inserts, if something goes wrong you’ll be left with the generic exception of :

Microsoft.EntityFrameworkCore.DbUpdateException: ‘An error occurred while saving the entity changes. See the inner exception for details.’

It can be extremely annoying if you’re wanting to catch a particular database exception (e.g. It’s to be expected that duplicates might be inserted), and handle them in a different way than something like being unable to connect to the database full stop. Let’s work up a quick example to illustrate what I mean.

Let’s assume I have a simple database model like so :

class BlogPost
{
    public int Id { get; set; }
    public string PostName { get; set; }
}

And I have configured my entity to have a unique constaint meaning that every BlogPost must have a unique name :

modelBuilder.Entity<BlogPost>()
    .HasIndex(x => x.PostName)
    .IsUnique();

If I do something as simple as :

context.Add(new BlogPost
{
    PostName = "Post 1"
});
context.Add(new BlogPost
{
    PostName = "Post 1"
});
context.SaveChanges();

The *full* exception would be along the lines of :

Microsoft.EntityFrameworkCore.DbUpdateException: ‘An error occurred while saving the entity changes. See the inner exception for details.’
Inner Exception
SqlException: Cannot insert duplicate key row in object ‘dbo.BlogPosts’ with unique index ‘IX_BlogPosts_PostName’. The duplicate key value is (Post 1).

Let’s say that we want to handle this exception in a very specific way, for us to do this we would have to have a bit of a messy try/catch statement :

try
{
    context.SaveChanges();
}catch(DbUpdateException exception) when (exception?.InnerException?.Message.Contains("Cannot insert duplicate key row in object") ?? false)
{
    //We know that the actual exception was a duplicate key row
}

Very ugly and there isn’t much reusability here. If we want to catch a similar exception elsewhere in our code, we’re going to be copy and pasting this long catch statement everywhere.

And that’s where I came across the EntityFrameworkCore.Exceptions library!

Using EntityFrameworkCore.Exceptions

The EntityFrameworkCore.Exceptions library is extremely easy to use and I’m actually somewhat surprised that it hasn’t made it’s way into the core EntityFramework libraries already.

To use it, all we have to do is run the following on our Package Manager Console :

Install-Package EntityFrameworkCore.Exceptions.SqlServer

And note that there are packages for things like Postgres and MySQL if that’s your thing!

Then with a single line for our DB Context we can set up better error handling :

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseExceptionProcessor();
}

If we run our example code from above, instead of our generic DbUpdateException we get :

EntityFramework.Exceptions.Common.UniqueConstraintException: ‘Unique constraint violation’

Meaning we can change our Try/Catch to be :

try
{
    context.SaveChanges();
}catch(UniqueConstraintException ex)
{
    //We know that the actual exception was a duplicate key row
}

Much cleaner, much tidier, and far more reusable!

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!

Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons


We’ve made mention in previous posts to the fact that Protobuf (supposedly) will out perform many other data formats, namely JSON. And while we’ve kind of alluded to the fact it’s “fast” and it’s “small”, we haven’t really jumped into the actual numbers.

This post will take a look across three different metrics :

  • File Size – So just how lightweight is Protobuf?
  • Serialization – How fast can we take a C# object and serialize it into Protobuf or JSON?
  • Deserialization – Given a Protobuf/JSON data format, how fast can we turn it into a C# object?

Let’s jump right in!

File Size Comparisons

Before looking at read/write performance, I actually wanted to compare how large the actual output is between Protobuf and JSON. I set up a really simple test that used the following model :

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    public string FirstName { get; set; }
    [ProtoMember(2)]
    public string LastName { get; set; }
    [ProtoMember(3)]
    public List Emails { get; set; }
}

And I used the following code to create an object, and write it twice. Once with Protobuf and once with JSON :

var person = new Person
{
    FirstName = "Wade",
    LastName = "G", 
    Emails = new List<string>
    {
        "[email protected]", 
        "[email protected]"
    }
};
using (var fileStream = File.Create("person.buf"))
{
    Serializer.Serialize(fileStream, person, PrefixStyle.Fixed32);
}
var personString = JsonConvert.SerializeObject(person);
File.WriteAllText("person.json", personString);

The results were :

FormatFileSize
Protobuf46 bytes
JSON85 bytes

So just by default, Protobuf is almost half the size. Obviously your mileage may vary depending on your data types and even your property names.

That last point is important because while Protobuf has other mechanisms keeping the size down, a big part of it is that all property names are serialized as integers rather than their string form. To illustrate this, I modified the model to look like so :

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    [JsonProperty("1")]
    public string FirstName { get; set; }
    [ProtoMember(2)]
    [JsonProperty("2")]
    public string LastName { get; set; }
    [ProtoMember(3)]
    [JsonProperty("3")]
    public List Emails { get; set; }
}

So now our JSON will be serialized with single digit names as well. When running this, our actual comparison table looks like so :

FormatFileSize
Protobuf46 bytes
JSON85 bytes
JSON With Digit Properties65 bytes

So half of the benefits of using Protobuf when it comes to size instantly disappears! For now, I’m not going to use the single digit properties going forward because it’s not illustrative of what happens in the real world with JSON, but it’s an interesting little footnote that you can shrink your disk footprint with just this one simple hack that storage providers hate.

So overall, Protobuf has JSON beat when it comes to file size. That’s no surprise, but what about actual performance when working with objects?

Serialization Performance

Next, let’s take a look at serializing performance. There are a couple of notes on the methodology behind this

  • Because Protobuf serializes to bytes and JSON to strings, I wanted to leave them like that. e.g. I did not take the JSON string, and convert it into bytes as this would artificially create an overhead when there is no need.
  • I kept everything in memory (I did not write to a file etc)
  • I wanted to try and use *both* JSON.NET and Microsoft’s JSON Serializer. The latter is almost certainly going to be faster, but the former probably has more use cases out there in the wild.
  • For now, I’m just using the Protobuf.NET library for everything related to Protobuf
  • Use Protobuf as the “baseline” so everything will compared to how much slower (Or faster, you never know!) it is compared to Protobuf

With that in mind, here’s the benchmark using BenchmarkDotNet (Quick guide if you haven’t seen it before here : https://dotnetcoretutorials.com/2017/12/04/benchmarking-net-core-code-benchmarkdotnet/)

public class ProtobufVsJSONSerializeBenchmark
{
    static Person person = new Person
    {
        FirstName = "Wade",
        LastName = "G",
        Emails = new List<string>
        {
            "[email protected]",
            "[email protected]"
        }
    };
    [Benchmark(Baseline = true)]
    public byte[] SerializeProtobuf()
    {
        using(var memoryStream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(memoryStream, person);
            return memoryStream.ToArray();
        }
    }
    [Benchmark]
    public string SerializeJsonMicrosoft()
    {
        return System.Text.Json.JsonSerializer.Serialize(person);
    }
    [Benchmark]
    public string SerializeJsonDotNet()
    {
        return Newtonsoft.Json.JsonConvert.SerializeObject(person);
    }
}

And the results?

FormatAverage TimeBaseline Comparison
Protobuf680ns
Microsoft JSON743ns9% Slower
JSON.NET1599ns135% Slower

So we can see that Protobuf is indeed faster, but not by a heck of a lot. And of course, I’m willing to bet a keen eyed reader will drop a comment below and tell me how the benchmark could be improved to make Microsoft’s JSON serializer even faster.

Of course JSON.NET is slower, and that is to be expected, but again I’m surprised that Protobuf, while fast, isn’t *that* much faster. How about deserialization?

Deserialization Performance

We’ve done serialization, so let’s take a look at the reverse – deserialization.

I do want to point out one thing before we even start, and that is that JSON.NET and Microsoft’s JSON library handle case sensitivity with JSON *very* differently. Infact, JSON.NET is case insensitive by default and is the *only* way it can run. Microsoft’s JSON library is case sensitive by default and must be switched to handle case insensitivity at a huge cost. I have an entire article dedicated to the subject here : https://dotnetcoretutorials.com/2020/01/25/what-those-benchmarks-of-system-text-json-dont-mention/

In some ways, that somewhat invalidates our entire test (Atleast when comparing JSON.NET to Microsoft’s JSON), because it actually entirely depends on whether your JSON is in the exact casing you require (In most cases that’s going to be PascalCase), or if it’s in CamelCase (In which case you take a performance hit). But for now, let’s push that aside and try our best to create a simple benchmark.

Other things to note :

  • Again, I want to work with the formats that work with each data format. So Protobuf will be deserializing from a byte array, and JSON will be deserializing from a string
  • I *had* to create a memory stream for Protobuf. Atleast without making the test more complicated than it needed to be.
public class ProtobufVsJSONDeserializeBenchmark
{
    public static Person person = new Person
    {
        FirstName = "Wade",
        LastName = "G",
        Emails = new List<string>
        {
            "[email protected]",
            "[email protected]"
        }
    };
    static byte[] PersonBytes;
    static string PersonString;
    [GlobalSetup]
    public void Setup()
    {
        using (var memoryStream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize(memoryStream, person);
            PersonBytes = memoryStream.ToArray();
        }
        PersonString = JsonConvert.SerializeObject(person);
    }
    [Benchmark(Baseline = true)]
    public Person DeserializeProtobuf()
    {
        using (var memoryStream = new MemoryStream(PersonBytes))
        {
            return ProtoBuf.Serializer.Deserialize<Person>(memoryStream);
        }
    }
    [Benchmark]
    public Person DeserializeJsonMicrosoft()
    {
        return System.Text.Json.JsonSerializer.Deserialize<Person>(PersonString);
    }
    [Benchmark]
    public Person DeserializeJsonDotNet()
    {
        return Newtonsoft.Json.JsonConvert.DeserializeObject<Person>(PersonString);
    }
}

I know it’s a big bit of code to sift through but it’s all relatively simple. We are just deserializing back into a Person object. And the results?

FormatAverage TimeBaseline Comparison
Protobuf1.019us
Microsoft JSON1.238us21% Slower
JSON.NET2.598us155% Slower

So overall, Protobuf wins again and by a bigger margin this time than our Serialization effort (When it comes to percentage). But again, your mileage will vary heavily depending on what format your JSON is in.

Conclusion

The overall conclusion is that indeed, Protobuf is faster than JSON by a reasonable margin, or a huge margin if comparing it to JSON.NET. However, in some respects a big part of the difference is likely to lie in how JSON is always serialized as strings versus the direct byte serialization of Protobuf. But that’s just a hunch of mine.

When it comes to file size, Protobuf wins out again, *especially* when serializing full JSON property names. Obviously here we are talking about the difference between a few bytes, but when you are storing say 500GB of data in Protobuf, that same data would be 1000GB in JSON, so it definitely adds up.

That’s all I’m doing on Protobuf for a bit and I hope you’ve learnt something a bit new. Overall, just in my personal view, don’t get too dragged into the hype. Protobuf is great and it does what it says on the tin. But it’s just another data format, nothing to be afraid of!

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!

Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons


In the last post in this series, we looked at how we can serialize and deserialize a single piece of data to and from Protobuf. For the most part, this is going to be your bread and butter way of working with Protocol Buffers. But there’s actually a slightly “improved” way of serializing data that might come in handy in certain situations, and that’s using “Length Prefixes”.

What Are Length Prefixes In Protobuf?

Length Prefixes sound a bit scary but really it’s super simple. Let’s first start with a scenario of “why” we would want to use length prefixes in the first place.

Imagine that I have multiple objects that I want to push into a single Protobuf stream. Let’s say using our example from previous posts, I have multiple “Person” objects that I want to push across the wire to another application.

Because we are sending multiple objects at once, and they are all encoded as bytes, we need to know when one person ends, and another begins. There are really two ways to solve this :

  • Have a unique byte code that won’t appear in your data, but can be used as a “delimiter” between items
  • Use a “Length Prefix” whereby the first byte (Or bytes) in a stream say how long the first object is, and you know after that many bytes, you can then read the next byte to figure out how long the next item is.

I’ve actually seen *both* options used with Protobuf, but the more common one these days is to use the latter. Mostly because it’s pretty fail safe (You don’t have to pick some special delimited character), but also because you can know ahead of time how large the upcoming object is (You don’t have to just keep reading blindly until you reach a special byte character).

I’m not much of a photoshop guy, so here’s how the stream of data might look in MS Paint :

When reading this data, it might work like so :

  • Read the first 4 bytes to understand how long Message 1 will be
  • Read exactly that many bytes and store as Message 1
  • We can now read the next 4 bytes to understand exactly how long Message 2 will be
  • Read exactly that many bytes and store as Message 2

And so on, and we could actually do this forever if the stream was a constant pump of data. As long as we read the first set of bytes to know how long the next message is, we don’t need any other breaking up of the messages. And again, it’s a boon to us to use this method as we never have to pre-fetch data to know what we are getting.

In all honesty, Length Prefixing is not Protobuf specific. After all the data following could be in *any* format, but it’s probably one of the few data formats that seem to have it really baked in. So much so that of course our Protobuf.NET library from earlier posts has out of the box functionality to handle it! So let’s jump into that now.

Using Protobuf Length Prefixes In C# .NET

As always, if you’re only just jumping into this post without reading the previous ones in the series, you’ll need to install the Protobuf.NET library by using the following command on your package manager console.

Install-Package protobuf-net

Then the code to serialize multiple items to the same data stream might look like so :

var person1 = new Person
{
    FirstName = "Wade",
    LastName = "G"
};
var person2 = new Person
{
    FirstName = "John",
    LastName = "Smith"
};
using (var fileStream = File.Create("persons.buf"))
{
    Serializer.SerializeWithLengthPrefix(fileStream, person1, PrefixStyle.Fixed32);
    Serializer.SerializeWithLengthPrefix(fileStream, person2, PrefixStyle.Fixed32);
}

This is a fairly verbose example to write to a file, but obviously you could be writing to any data stream, looping through a list of people etc. The important thing is that our Serialize call changes to “SerializeWithLengthPrefix”.

Nice and easy!

And then to deserialize, there are some tricky things to look out for. But our basic code might look like so :

using (var fileStream = File.OpenRead("persons.buf"))
{
    Person person = null;
    do
    {
        person = Serializer.DeserializeWithLengthPrefix<Person>(fileStream, PrefixStyle.Fixed32);
    } while (person != null);
}

Notice how we actually *loop* the DeserializeWithLengthPrefix. This is because if there are multiple items within the stream, calling this method will return *one* item each time it’s called (And also move the stream to the start of the next item). If we reach the end of the stream and call this again, we will instead return a null object.

Alternatively, you can call DeserializeItems to instead return an IEnumerable of items. This is actually very similar to serializing one at a time because the IEnumerable is lazy loaded.

using (var fileStream = File.OpenRead("persons.buf"))
{
    var persons = Serializer.DeserializeItems<Person>(fileStream, PrefixStyle.Fixed32, -1);
}

Because the Protobuf.NET library is so easy to use, I don’t want to really dive into every little overloaded method. But the important thing to understand is that when using Length Prefixes, we can push multiple pieces of data to the same stream without any additional legwork required. It’s really great!

Of course, all of this isn’t really worth it unless there is some sort of performance gains right? And that’s what we’ll be looking at in the next part of this series. Just how does ProtoBuf compare to something like JSON? Take a look at all of that and more here : https://dotnetcoretutorials.com/2022/01/18/protobuf-in-c-net-part-4-performance-comparisons/

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!

Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons


In our last post, we spent much of the time talking about how proto contracts work. But obviously that’s all for nothing if we don’t start serializing some data. Thankfully for us, the Protobuf.NET library takes almost all of the leg work out of it, and we more or less follow the same paradigms that we did when working with XML or JSON in C#.

Of course, if you haven’t already, install Protobuf.NET into your application using the following package manager console command :

Install-Package protobuf-net

I’m going to be using the same C# contract we used in the last post. But for reference, here it is again.

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    public string FirstName { get; set; }
    [ProtoMember(2)]
    public string LastName { get; set; }
    [ProtoMember(3)]
    public List Emails { get; set; }
}

And away we go!

Serializing Data

To serialize or write our data in protobuf format, we simply need to take our object and push it into a stream. An in memory example (For example if you needed a byte array to send somewhere else), would look like this :

var person = new Person
{
    FirstName = "Wade",
    LastName = "Smith",
    Emails = new List
    {
        "[email protected]", 
        "[email protected]"
    }
};
using(var memoryStream = new MemoryStream())
{
    Serializer.Serialize(memoryStream, person);
    var byteArray = memoryStream.ToArray();
}

So ignoring our set up code there for the Person object, we’ve basically serialized in 1 or 5 lines of code depending on if you want to count the setup of the memory stream. Pretty trivial and it makes all that talk about Protobuf being some sort of voodoo really just melt away.

If we wanted to, we could instead serialize directly to a file like so :

using (var fileStream = File.Create("person.buf"))
{
    Serializer.Serialize(fileStream, person);
}

This leaves us with a person.buf file locally. Of course, if we open this file in a text editor it’s unreadable (Protobuf is not human readable when serialized), but we can use a tool such as https://protogen.marcgravell.com/decode to open the file and tell us what’s inside of it.

Doing that, we get :

Field #1: 0A String Length = 4, Hex = 04, UTF8 = “Wade”
Field #2: 12 String Length = 5, Hex = 05, UTF8 = “Smith”
Field #3: 1A String Length = 20, Hex = 14, UTF8 = “[email protected] …” (total 20 chars)
Field #3: 1A String Length = 18, Hex = 12, UTF8 = “[email protected] …” (total 18 chars)

Notice that the fields within our protobuf file are identified by their integer identifer, *not* by their string property name. Again, this is important to understand because we need the same proto contract identifiers on both ends to know that Field 1 is actually a persons first name.

Well that’s serialization done, how about deserializing?

Deserializing Data

Of course if serializing data can be done in 1 line of code, deserializing or reading back the data is going to be just as easy.

using (var fileStream = File.OpenRead("person.buf"))
{
    var myPerson = Serializer.Deserialize<Person>(fileStream);
    Console.WriteLine(myPerson.FirstName);
}

This is us simply reading a file and deserializing it into our myPerson object. It’s somewhat trivial and really straight forward if I’m being honest and there actually isn’t too much to deep dive into.

That is.. Until we start talking about length prefixes. Length prefixes are protobufs way of serializing several piece of data into the same data stream. So imagine that if we have 5 people, how can we store 5 people in the same file or data stream and know when one persons data ends, and another begins. In the next part of this series we’ll be taking a look at just how that works with Protobuf.NET! Check it out : https://dotnetcoretutorials.com/2022/01/14/protobuf-in-c-net-part-3-using-length-prefixes/

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This is a 4 part series on working with Protobuf in C# .NET. While you can start anywhere in the series, it’s always best to start at the beginning!

Part 1 – Getting Started
Part 2 – Serializing/Deserializing
Part 3 – Using Length Prefixes
Part 4 – Performance Comparisons


I had just started programming in C# when JSON started gaining steam as the “XML Killer”. Being new to software development, I didn’t really have a horse in the race, but I was fascinated by the almost tribal level of care people put into such a simple thing as competing data formats.

Surprisingly, Google actually released Protobuf (Or Protocol Buffers) in 2008, but I think it’s only just started to pick up steam (Or maybe that’s just in the .NET world). I recently worked on a project that used it, and while not to the level of JSON vs XML, I still saw some similarities in how Protobuf was talked about. Mainly that it was almost made out to be some voodoo world changing piece of technology. All I could think was “But.. It’s just a data serialization format right?”.

The Protobuf docs (just in my view) are not exactly clear in spelling out just what Protobuf is and how it works. Mainly I think that’s because much of the documentation out there takes a language neutral approach to describing how it works. But imagine if you were just learning XML, and you learnt all of the intricacies of XML namespaces, declarations, or entities before actually doing the obvious and serializing a simple piece of data down, looking at it, then deserializing it back up.

That’s what I want to do with this article series. Take Protobuf and give you a dead simple overview with C# in mind and show you just how obvious and easy it really is.

Defining Proto Message Contracts

The first thing we need to understand is the Proto Message Contract. These look scary and maybe even a bit confusing as to what they are used for, but they are actually dead simple. A proto message definition would look like this (In proto format) :

syntax="proto3";
message Person {
  string firstName = 1;
  string lastName = 2;
  repeated string emails = 3;
}

Just look at this like any other class definition in any language :

  • We have our message name (Person)
  • We have our fields and their types (For example firstName is a string)
  • We can have “repeated” fields (Basically arrays/lists in C#)
  • We have an integer identifier for each field. This integer is used *instead* of the field name when we serialize. For example, if we serialized someone with the first name Bob, the serialized content would not have “firstName=’bob'”, it would have “1=’bob'”.

The last point there may be tricky at first but just think about it like this. Using numerical identifiers for each field means you can save a lot of space when dealing with big data because you aren’t subject to storing the entire field name when you serialize.

These contracts are nothing more than a universal way to describe what a message looks like when we serialize it. In my view, it’s no different than an XML or JSON schema. Put simply, we can take this contract and give it to anyone and they will know what the data will look like when we send it to them.

If we take this proto message, and paste it into a converter like the one by Marc Gravell here : https://protogen.marcgravell.com/ We can get what a generated C# representation of this data model will look like (And a bit more on this later!).

The fact is, if you are talking between two systems with Protobuf, you may not even need to worry about every writing or seeing contracts in this format. It’s really no different than someone flicking you an email with something like :

Hey about that protobuf message, it’s going to be in this format :

Firstname will be 1. It’s a string.
LastName will be 2. It’s also a string.
Emails will be 3, and it’s going to be an array of strings

It’s that simple.

Proto Message Contracts In C#

When it comes to working with JSON in C# .NET, you have JSON.NET, so it only makes sense when you are working with Protobuf in C# .NET you have… Protobuf.NET (Again by the amazing Marc Gravell)! Let’s spin up a dead simple console application and add the following package via the package manager console :

Install-Package protobuf-net

Now I will say there are actually a few Protobuf C# libraries floating around, including one by Google. But what I typically find is that these are converted Java libraries, and as such they don’t really conform to how C# is typically written. Protobuf.NET on the other hand is very much a C# library from the bottom up, which makes it super easy and intuitive to work with.

Let’s then take our person class, and use a couple of special attributes given to us by the Protobuf.NET library :

[ProtoContract]
class Person
{
    [ProtoMember(1)]
    public string FirstName { get; set; }
    [ProtoMember(2)]
    public string LastName { get; set; }
    [ProtoMember(3)]
    public List Emails { get; set; }
}

If we compare this to our Proto contact from earlier, it’s a little less scary right? It’s just a plain old C# class, but with a couple of attributes to ensure that we are serializing to the correct identifiers.

I’ll also point something else out here, because we are using integer identifiers, the casing of our properties no longer matters at all. Coming from the C# world where we love PascalCase, this is enormously easy on the eyes. But even more so, when we take a look at performance a bit later on in this series, it will become even clearer what a good decision this is because we no longer have to fiddle around parsing strings, including whether the casing is right or not.

I’ll say it  again that if you have an existing Proto message contract given to you (For example, someone else is building an application in Java and they have given you the contract only), you can simply run it through Marc Gravell’s Protogen tool here : https://protogen.marcgravell.com/

It does generate a bit of a verbose output :

[global::ProtoBuf.ProtoContract()]
public partial class Person : global::ProtoBuf.IExtensible
{
    private global::ProtoBuf.IExtension __pbn__extensionData;
    global::ProtoBuf.IExtension global::ProtoBuf.IExtensible.GetExtensionObject(bool createIfMissing)
        => global::ProtoBuf.Extensible.GetExtensionObject(ref __pbn__extensionData, createIfMissing);
    [global::ProtoBuf.ProtoMember(1)]
    [global::System.ComponentModel.DefaultValue("")]
    public string firstName { get; set; } = "";
    [global::ProtoBuf.ProtoMember(2)]
    [global::System.ComponentModel.DefaultValue("")]
    public string lastName { get; set; } = "";
    [global::ProtoBuf.ProtoMember(3, Name = @"emails")]
    public global::System.Collections.Generic.List Emails { get; } = new global::System.Collections.Generic.List();
}

But for larger contracts it may just work well as a scaffolding tool for you!

So defining contracts is all well and good, how do we go about Serializing the data? Let’s check that out in Part 2! https://dotnetcoretutorials.com/2022/01/13/protobuf-in-c-net-part-2-serializing-deserializing/

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

Now that the flames have simmered down on the Hot Reload Debacle, maybe it’s time again to revisit this feature!

I legitimately feel this is actually one of the best things to be released with .NET in a while. The amount of frustrating times I’ve had to restart my entire application because of one small typo… whereas now it’s Hot Reload to the rescue!

It’s actually a really simple feature so this isn’t going to be too long. You’ll just have to give it a crack and try it out yourself. In short, it looks like this when used :

In case it’s too small, you can click to make it bigger. But in short, I have a console application that is inside a never ending loop. I can change the Console.WriteLine text, and immediately see the results of my change *without* restarting my application. That’s the power of Hot Reload!

And it isn’t just limited to Console Applications. It (should) work with Web Apps, Blazor, WPF applications, really anything you can think of. Obviously there are some limitations. Notably that if you edit your application startup (Or other run-once type code), your application will hot reload, it doesn’t re-run any code blocks, meaning you’ll need to restart your application to get that startup ran again. I’ve also at times had the Hot Reload fail with various errors, usually meaning I just restart and we are away again.

Honestly, one of the biggest things to get used to is the mentality of Hot Reload actually doing something. It’s very hard to “feel” like your changes have been applied. If I’m fixing a bug, and I do a reload and the bug still exists…. It’s hard for me to not stop the application completely and restart just to be sure!

Hot Reload In Visual Studio 2022

Visual Studio 2019 *does* have a hot reload functionality, but it’s less featured (Atleast for me). Thus I’m going to show off Visual Studio 2022 instead!

All we need to do is edit our application while it’s running, then look to our nice little task bar in Visual Studio for the following icon :

That little icon with two fishes swimming after each other (Or.. atleast that’s what it looks like to me) is Hot Reload. Press it, and you are done!

If that’s a little too labour intensive for you, there is even an option to Hot Reload on file save.

If you’re coming from a front end development background you’ll be used to file watchers recompiling your applications based on a save only. On larger projects I’ve found this to maybe be a little bit more pesky (If Hot Reload is having issues, having popups firing off every save is a bit annoying), but on smaller projects I’ve basically run this without a hitch everytime.

Hot Reload From Terminal

Hot Reload from a terminal or command line is just as easy. Simple run the following from your project directory :

dotnet watch

Note *without* typing run after (Just incase you used to use “dotnet watch run”). And that’s it!

Your application will now run with Hot Reload on file save switched on! Doing this you’ll see output looking something like

watch : Files changed: F:\Projects\Core Examples\HotReload\Program.cs~RF1f7ccc54.TMP, F:\Projects\Core Examples\HotReload\Program.cs, F:\Projects\Core Examples\HotReload\qiprgg31.zfd~
watch : Hot reload of changes succeeded.

And then you’re away laughing again!

 

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

TL;DR; Check out the new Q&A Section here : https://qna.dotnetcoretutorials.com/

It’s almost 5 years to the day that I started .NET Core Tutorials. I actually went back and checked and the first ever post was on the 26th of December. Maybe that gives away just how much I do xmas!

One of the first things I did all those years ago was set up an email ([email protected]), and start fielding questions from people. After all, my writing was literally just me figuring things out as I tried to get to grips on the differences between .NET Core and .NET Framework. Over the years, I’ve probably had an ungodly amount of emails from students to 30 year veterans, just asking questions about .NET and C# in general. Some were clearly homework questions, and others were about bugs in .NET Core that I had a hell of a time debugging, but I treated them all the same and gave replies as best I could.

Some of those questions got turned into blog posts of their own where I more or less shared my reply. But other times, the answer was simple enough that dedicating an entire post to what was almost a one word answer or a 5 line code snippet seemed somewhat dumb. That being said, it always annoyed me that while I was willing to help anyone and everyone who emailed me, me replying to that person one on one and not sharing it wasn’t helping anyone else at the same time.

So, with a bit of time on my hands I’ve spun up a Q&A section right here : https://qna.dotnetcoretutorials.com/

What are the rules? Well.. I’m not really sure. For now, I’m just slowing going back and posting questions that I have been emailed over the years, and pasting in my answer to the question. But *anyone* is available to post a question (You don’t even have to register/login), and *anyone* can post an answer, even on questions that already have answers. I know it’s a bit redundant when things like stackoverflow exist, but again, I’m just trying to share what can’t be turned into it’s own fully fledged post.

Will it be overrun with spam in 3 months time? Who knows. But for now, feel free to jump in, post a question, lend a helping hand with an answer, and let’s see how we go.

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

An under the radar feature introduced in SQL Server 2016 (And available in Azure SQL also) is Temporal Tables. Temporal tables allow you to keep a “history” of all data within a SQL table, in a separate (but connected) history table. In very basic terms, every time data in a SQL table is updated, a copy of the original state of the data is cloned into the history table.

The use cases of this are pretty obvious, but include :

  • Ability to query what the state of data was at a specific time
  • Ability to select multiple sets of data between two time periods
  • Viewing how data changes over time (For example, feeding into an analytics or machine learning model)
  • An off the shelf, easy to use, auditing solution for tracking what changed when
  • And finally, a somewhat basic, but still practical disaster recovery scenario for applications going haywire

A big reason for me doing this post is that EF Core 6 has just been released, and includes built in support for temporal tables. While this post will just be a quick intro in how temporal tables work, in the future I’ll be giving a brief intro on getting set up with Entity Framework too!

Getting Started

When creating a new table, it’s almost trivial to add in temporal tables. If I was to create a Person table with two columns, a first and last name, it would look something like so :

CREATE TABLE Person
(  
    [Id] int NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, 
    FirstName NVARCHAR(250) NOT NULL,
    LastName NVARCHAR(250) NOT NULL , 
    -- The below is how we turn on Temporal. 
    [ValidFrom] datetime2 (0) GENERATED ALWAYS AS ROW START,
    [ValidTo] datetime2 (0) GENERATED ALWAYS AS ROW END,
    PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
 )  
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.PersonHistory));

There is a couple of things to note here, the first is the last three lines of the CreateTable expression. We need to add the ValidFrom and ValidTo columns and the PERIOD line for everything to work nicely.

Second, it’s very important to note the HISTORY_TABLE statement. When I first started with temporal tables I assumed that there would be a naming convention along the lines of {{TableName}}History. But infact, if you don’t specify what the history table should be called, instead you just end up with a semi random generated name that doesn’t look great.

With this statement run, we end up with a table within a table when looking via SQL Management Studio. Something like so :

I will note that you can turn on temporal tables on an existing table too with an ALTER TABLE statement which is great for projects already on the go.

But here’s the most amazing part about all of this. Nothing about how you use a SQL table changes. For example, inserting a Person record is the same old insert statement as always :

INSERT INTO Person (FirstName, LastName)
VALUES ('Wade', 'Smith')

Our SQL statements for the most part don’t even have to know this is a temporal table at all. And that’s important because if we have an existing project, we aren’t going to run into consistency issues when trying to turn temporal tables on.

With the above insert statement, we end up with a record that looks like so :

The ValidFrom is the datetime we inserted, and obviously the ValidTo is set to maximum because for this particular record, it is valid for all of time (That will become important shortly).

Our PersonHistory table at this point is still empty. But let’s change that! Let’s do an Update statement like so :

UPDATE Person
SET LastName = 'G'
WHERE FirstName = 'Wade'

If we check our Person table, it looks largely the same as before, our ValidFrom date has shifted forward and Wade’s last name is G. But if we check our PersonHistory table :

We now have a record in here that tells us that between our two datetimes, the record with ID 1 had a last name of Smith.

Again, our calling code that updates our Person record doesn’t even have to know that temporal tables is turned on, and everything just works like clockwork encapsulated with SQL Server itself.

Modifying Table Structure

I wanted to point out another real convenience with temporal tables that you might not get if you decided to roll your own history table. After a table creation, what happens if you wanted to add a column to the table?

For example, let’s take our Person table and add a DateOfBirth column.

ALTER TABLE Person
ADD DateOfBirth DATE NULL

You’ll notice that I am only altering the Person table, and not touching the PersonHistory table. That’s because temporal tables automatically handle the alter table statements to also modify the underlying history table. So if I run the above, my history table also receives the update :

This is a huge feature because it means your two tables never get out of sync, and yet, it’s all abstracted away for you and you’ll never have to think about it!

Querying Temporal Tables

Of course, what happens if we actually want to query the history of our Person record? If we were rolling our own, we might have to do a union of our current Person table, and our PersonHistory table. But with temporal tables, it’s a single select statement and SQL Server will work out under the hood which table the data should come from.

Confused? Maybe these examples will help :

SELECT *
FROM Person 
FOR SYSTEM_TIME AS OF '2021-12-10 23:19:25'
WHERE Id = 1

I run the above statement to ask for the state of the Person record, with Id 1, at exactly a particular time. The code executes, and in my case, it pulls the record from the History table.

But let’s say I run it again with a different time :

SELECT *
FROM Person 
FOR SYSTEM_TIME AS OF '2022-01-01'
WHERE Id = 1

Here I’ve made it in the future, just to illustrate a point, but in this case I know it will pull the record from the Person table because it will be the very latest.

What I’m trying to demonstrate is that there is no switching between tables to try and work out which version was correct at the right time. SQL Server does it all for you!

Better yet, you’ll probably end up showing an audit history page somewhere on your web app if using temporal tables. For that we can use the BETWEEN statement like so :

SELECT *
FROM Person 
FOR SYSTEM_TIME BETWEEN '2021-01-01' AND '2022-01-01'
WHERE Id = 1

This then fetches all audit history *and* the current record if applicable between those time periods. Again, all hidden away under the hood for you and exposed as a very simple SYSTEM_TIME query statement.

Size Considerations

While all of this sounds amazing, there is one little caveat to a lot of this. And that’s data size footprint.

In general, you’ll have to think about how much data you are storing if your system generates many updates across a table. Due to the nature of temporal tables storing a copy of the data, many small updates could explode the size of your database. However, in a somewhat ironic twist, tables that receive many updates may be a good candidate for temporal table anyway for auditing history.

Another thing to think about is use of blob data types (text, nvarchar(max)), and even things such as nvarchar vs varchar. Considerations around these data types upfront could save a lot of data space in the long run when it’s duplicated across many historic rows.

There is no one size all approach that fits perfectly, but it is something to keep in mind!

Temporal Tables vs Event Sourcing

Let’s just get this out of the way, temporal tables and event sourcing are not drop in replacements for each other, nor are they really competing technologies.

A temporal table is somewhat rudimentary. It takes a copy of your data and stores it elsewhere on every update/delete operations. If we ask for a row at a specific point in time, we will receive what the data looked like at that point. And if we give a timeframe, we will be returned several copies of that data.

Event sourcing is more of a series of deltas that describe how the data was changed. The hint is the name (event), and it functions much the same as receiving events on a queue. Given a point in time, event sourcing can recreate the data by applying deltas up to that point, and given a timeframe, instead of receiving several copies of the data, we instead receive the deltas that were applied.

I think temporal tables work best when a simple copy of the data will do. For pure viewing purposes, maybe as a data administrator looking at how data looked at a certain point of time for application debugging and the like. Whereas event sourcing really is about structuring your application in an event driven way. It’s not a simple “switch” that you flick on to suddenly make your application work via event sourcing.

Temporal Tables vs Roll Your Own

Of course, history tables are a pretty common practice already. So why use Temporal Tables if you’ve already got your own framework set up?

I think it really comes down to ease of use and a real “switch and forget” mentality with temporal tables. Your application logic does not have to change at all, nor do you have to deal with messy triggers. It almost is an audit in a box type solution with very little overhead to set up and maintain. If you are thinking of adding an audit trail/historic log to an application, temporal tables will likely be the solution 99% of the time.

Entity Framework Support

As mentioned earlier, EF Core 6.0 shipped with temporal table support. That includes code first migrations for turning on temporal tables, and LINQ query extensions to make querying temporal tables a breeze. In the next post, we’ll give head first into how that works!

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

I’ve run into this issue not only when migrating legacy projects to use async/await in C# .NET, but even just day to day on greenfields projects. The issue I’m talking about involves code that looks like so :

static async Task Main(string[] args)
{
    MyAsyncMethod(); // Oops I forgot to await this!
}
static async Task MyAsyncMethod()
{
    await Task.Yield();
}

It can actually be much harder to diagnose than you may think. Due to the way async/await works in C#, your async method may not *always* be awaited. If the async method completes before it has a chance to wait, then your code will actually work much the same as you expect. I have had this happen often in development scenarios, only for things to break only in test. And the excuse of “but it worked on my machine” just doesn’t cut it anymore!

In recent versions of .NET and Visual Studio, there is now a warning that will show to tell you your async method is not awaited. It gives off the trademark green squiggle :

And you’ll receive a build warning with the text :

CS4014 Because this call is not awaited, execution of the current method continues before the call is completed. Consider applying the ‘await’ operator to the result of the call.

The problem with this is that the warning isn’t always immediately noticeable. On top of this, a junior developer may not take heed of the warning anyway.

What I prefer to do is add a line to my csproj that looks like so :

<PropertyGroup>
    <WarningsAsErrors>CS4014;</WarningsAsErrors>
</PropertyGroup>

This means that every async method that is not awaited will actually stop the build entirely.

Disabling Errors By Line

But what if it’s one of those rare times you actually do want to fire and forget (Typically for desktop or console applications), but now you’ve just set up everything to blow up? Worse still the error will show if you are inside an async method calling a method that returns a Task, even if the called method is not itself async.

But we can disable this on a line by line basis like so :

static async Task Main(string[] args)
{
    #pragma warning disable CS4014 
    MyAsyncMethod(); // I don't want to wait this for whatever reason, it's not even async!
    #pragma warning restore CS4014
}
static Task MyAsyncMethod()
{
    return Task.CompletedTask;
}

Non-Awaited Tasks With Results

Finally, the one thing I have not found a way around is like so :

static async Task Main(string[] args)
{
    var result = MyAsyncMethodWithResult();
    var newResult = result + 10;//Error because result is actually an integer. 
}
static async Task<int> MyAsyncMethodWithResult()
{
    await Task.Yield();
    return 0;
}

This code will actually blow up. The reason being that we expect the value of result to be an integer, but in this case because we did not await the method, it’s a task. But what if we pass the result to a method that doesn’t care about the type like so :

static async Task Main(string[] args)
{
    var result = MyAsyncMethodWithResult();
    DoSomethingWithAnObject(result);
}
static async Task MyAsyncMethodWithResult()
{
    await Task.Yield();
    return 0;
}
static void DoSomethingWithAnObject(object myObj)
{
}

This will not cause any compiler warnings or errors (But it will cause runtime errors depending on what DoSomethingWithAnObject does with the value).

Essentially, I found that the warning/error for non awaited tasks is not shown if you assign the value to a variable. This is even the case with Tasks that don’t return a result like so :

static async Task Main(string[] args)
{
    var result = MyAsyncMethod(); // No error
}
static async Task MyAsyncMethod()
{
    await Task.Yield();
}

I have searched high and low for a solution for this but most of the time it leads me to stack overflow answers that go along the lines of “Well, if you assigned the value you MIGHT actually want the Task as a fire and forget”. Which I agree with, but 9 times out of 10, is not going to be the case.

That being said, turning the compiler warnings to errors will catch most of the errors in your code, and the type check system should catch 99% of the rest. For everything else… “Well it worked on my machine”.

ENJOY THIS POST?
Join over 3,000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.