I recently got asked a pretty good question about EFCore (Although it does also apply in general to database concepts), and that was :

Should I use RowVersion or ConcurrencyToken for optimistic concurrency?

And the answer is “It depends” and even more specifically, “Do you know what the difference, or lack thereof, there is between the two?”

Let’s rewind a little bit and start with what exactly are Concurrency Tokens, then what is a RowVersion, then finally, how do they compare.

What Is A Concurrency Token?

A concurrency token is a value that will be “checked” every time an update occurs on a database record. By checked what I specifically mean is that the existing value will be used as part of a SQL WHERE statement, so that should it have changed from underneath you, the update will fail. This might occur if two users are trying to edit the same record at the same time.

Or in very crude lucidchart form :

When UserB is updating the same record as UserA, at worst he is overwriting details from UserA unwittingly, but even at best he is writing details to a record using knowledge from his read that may be outdated by the update from UserA.

A concurrency token fights this by simply checking that information contained in the original read, is the still there on the write. Let’s imagine that we have a database table called “User” that looks like so :

Id		int
FirstName	nvarchar(max)
LastName	nvarchar(max)
Version		int

Normally a SQL update statement without a concurrency token might look like so :

UPDATE User
SET FirstName = 'Wade'
WHERE Id = 1

But if we use the Version column as a concurrency token, it might instead look like :

UPDATE User
SET FirstName = 'Wade', Version = Version + 1
WHERE Id = 1 AND Version = 1

The Version value in our WHERE statement is the value we fetched when we read the data originally. This way, if someone has updated a record in the time it took us to read the data and then update it, the Version is not going to match and our Update statement will fail.

In Entity Framework/EF Core, we have two ways to say that a property is a ConcurrencyToken. If you prefer using DataAnnotations you can simply apply an attribute to your models.

[ConcurrencyCheck]
public int Version { get; set; }

Or if you prefer Fluent Configurations (Which you should!), then it’s just as easy

modelBuilder.Entity<People>()
	.Property(p => p.Version)
	.IsConcurrencyToken();

But There’s A Catch!

So that all sounds great! But there’s a catch, a small one, but one that can be quite annoying.

The problem is that short of some sort of database trigger (ugh!), or some sort of database auto increment field, it’s up to you, the developer, to ensure that you increment the version everytime you do an update. Now you can obviously write some EntityFramework extensions to get around this and auto increment things in C#, but it can complicated really fast.

And that’s where a RowVersion comes in.

What Is A RowVersion?

Let’s start in pure SQL Server terms what a RowVersion is. RowVersion (Also known as Timestamp, they are the same thing), is a SQL column type that uses auto generated binary numbers that are unique across that database, and stamped on records. Any time a record is inserted or updated on a table with a row version, a new unique number is generated (in binary format) and given to that record. Again, the RowVersions are unique across that entire database, not just the table.

Now in EntityFramework/EFCore it actually takes a somewhat different meaning because of what the SQL RowVersion is actually used to *achieve*.

Typically inside EF, when someone describes using a RowVersion, they are describing using a RowVersion/Timestamp column as a *ConcurrencyToken*. Now if you remember earlier the issue with just using a pure ConcurrencyToken was that we had to update/increment the value ourselves, but obviously if SQL Server is auto updating using RowVersion, then problem solved!

It actually gets more interesting if we take a look at how EFCore actually works out whether to use a RowVersion or not. The actual code is here : https://github.com/dotnet/efcore/blob/master/src/EFCore/Metadata/Builders/PropertyBuilder.cs#L152

public virtual PropertyBuilder IsRowVersion()
{
	Builder.ValueGenerated(ValueGenerated.OnAddOrUpdate, ConfigurationSource.Explicit);
	Builder.IsConcurrencyToken(true, ConfigurationSource.Explicit);

	return this;
}

Calling IsRowVersion() is actually shorthand for simply telling EFCore that the property is a ConcurrencyToken and it’s AutoGenerated. So in actual fact, if you added both of these configurations to a property manually, EF Core would actually treat it like a RowVersion even though you haven’t explicitly said it is.

We can see this by checking the code that asks if a column is a RowVersion here : https://github.com/dotnet/efcore/blob/master/src/EFCore.Relational/Metadata/IColumn.cs#L56

bool IsRowVersion => PropertyMappings.First().Property.IsConcurrencyToken
					&& PropertyMappings.First().Property.ValueGenerated == ValueGenerated.OnAddOrUpdate;

So all it actually does is interrogate whether the column is a concurrency token and auto generated. Easy!

I would note that if you actually had a column that you auto incremented some other way (DB Trigger for example), and was also a concurrency token.. I’m pretty sure EFCore would have issues actually handling this, but that’s for another day.

In EntityFramework you can setup a RowVersion on a property like so for DataAnnotations :

[TimeStamp]
public byte[] RowVersion{ get; set; }

And for Fluent Configurations:

modelBuilder.Entity<People>()
	.Property(p => p.RowVersion)
	.IsRowVersion();

Even though you specify that a column should be a RowVersion, the actual implementation of how that works (e.g. The datatype, specific settings on how that gets updated), is actually very dependent on the SQL Server (And SQL C# Adapter). Different databases can implement RowVersion how they like, but typically in SQL Server atleast, it’s a byte[] type.

Note that when using RowVersion with EntityFramework, there is nothing more you really need to do to get up and running. Anytime you update a record with a RowVersion property, it will automatically add that column to the WHERE statement giving you optimistic concurrency right out of the box.

So ConcurrencyToken vs RowVersion?

So if we go back to the original question of when you should use Concurrency Token vs when you should use a RowVersion. The answer is actually very simple. If you want to use a ConcurrencyToken as an auto incremented field, and you don’t actually care how it gets incremented or the data type, then use RowVersion. If you care about what the data type of your concurrency token should be, or you specifically want to control how and when it gets updated, then use Concurrency Token and manage the incrementing yourself.

What I’ve generally found is that when people have suggested to me to use Concurrency Token’s, typically what they actually mean is using RowVersion. Infact it’s probably easier to say that RowVersion (In the Entity Framework sense) is a *type* of Concurrency Token.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

I contract/freelance out a lot to companies that are dipping their toes into .NET Core, but don’t want to use Microsoft SQL Server – so they either want to use PostgreSQL or MYSQL. The thing that gets me is often these companies are so wary about the ability for .NET Core to talk to anything non-Microsoft. The amount of time I’ve spent on calls trying to explain it really doesn’t matter for the most part which tech choice they go with if all they are expecting from .NET Core’s point of view is to run simple commands.

Maybe if you’re overlaying something like EF Core or a very heavy ORM you might have issues. But in my experience, when using something like Dapper that allows you to really control the queries you are running, it really doesn’t make a heck of a lot of difference between any SQL Server.

I would also add that for both MySQL and Postgres, I’ve had .NET Core apps running inside Linux (Containers and VM’s) with absolutely no issue. That also seems to get asked a lot, “OK so this can talk to MySQL but can it talk to MySQL from Linux”… errr… yes, yes it can!

This is going to be a really short and sweet post because there really isn’t a lot to it!

Intro To Dapper

If you’ve never used Dapper before, I highly recommend this previous write up on getting started with Dapper. It covers a lot of the why and where we might use Dapper, including writing your first few queries with it.

If you want to skip over that. Just understand that Dapper is a lightweight ORM that handles querying a database and turning the rows into your plain objects with very minimal fuss and overhead. You have to write the queries yourself, so no Linq2SQL, but with that comes amazing control and flexibility. In our case, that flexibility is handy when having to write slightly different commands across different types of SQL Databases, because Dapper itself doesn’t have to translate your LINQ to actual queries, instead that’s on you!

MySQL With Dapper

When working with MySQL in .NET Core, you have to install the following nuget package :

Install-Package MySql.Data

Normally when creating a SQL Connection you would do something like so :

using (var connection = new SqlConnection("Server=myServerAddress;Database=myDataBase;User Id=myUsername;Password=myPassword;"))
{
	connection.Query<MyTable>("SELECT * FROM MyTable");
}

With MySQL you would do essentially the same thing but instead you use the MySQLConnection class :

using (var connection = new MySqlConnection("Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword;"))
{
	connection.Query<MyTable>("SELECT * FROM MyTable");
}

And that’s pretty much it! Obviously the syntax for various queries may change (e.g. Using LIMIT in MySQL instead of TOP in MSSQL), but the actual act of talking to the database is all taken care for you and you literally don’t have to do anything else.

PostgreSQL With Dapper

If you’ve read the MySQL portion above.. well.. You can probably guess how Postgres is going to go.

First install the following nuget package :

Install-Package Npgsql

Then again, our normal SQL Connection looks like so :

using (var connection = new SqlConnection("Server=myServerAddress;Database=myDataBase;User Id=myUsername;Password=myPassword;"))
{
	connection.Query<MyTable>("SELECT * FROM MyTable");
}

And our Postgres connection instead looks like so using the NpgsqlConnection class :

using (var connection = new NpgsqlConnection("User ID=root;Password=myPassword;Host=localhost;Port=5432;Database=myDataBase;"))
{
	connection.Query<MyTable>("SELECT * FROM MyTable");
}

Too easy!

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

I was writing some reflection code the other day where I wanted to search through all of my assemblies for a particular interface, and call a method on it at startup. Seemed pretty simple but in reality there is no clear, easy, one size fits all way to get assemblies. This post may be a bit dry for some, but if I help just one person from banging their head against the wall with this stuff, then it’s worth it!

I’m not really going to say “Use this one”, because of the ways to get assemblies, it’s highly likely only one way will work for your particular project so it’s pointless trying to lean towards one or the other. Simply try them all and see which one makes the most sense!

Using AppDomain.GetAssemblies

So the first option you might come across is AppDomain.GetAssemblies. It (seemingly) loads all Assemblies in the AppDomain which is to say essentially every Assembly that is used in your project. But there is a massive caveat. Assemblies in .NET are lazy loaded into the AppDomain. It doesn’t all at once load every assembly possible, instead it waits for you to make a call to a method/class in that assembly, and then loads it up – e.g. Loads it Just In Time. Makes sense because there’s no point loading an assembly up if you never use it.

But the problem is that at the point of you call AppDomain.GetAssemblies(), if you have not made a call into a particular assembly, it will not be loaded! Now if you are getting all assemblies for a startup method, it’s highly likely you wouldn’t have called into that assembly yet, meaning it’s not loaded into the AppDomain!

Or in code form :

AppDomain.CurrentDomain.GetAssemblies(); // Does not return SomeAssembly as it hasn't been called yet. 
SomeAssembly.SomeClass.SomeMethod();
AppDomain.CurrentDomain.GetAssemblies(); // Will now return SomeAssembly. 

So while this might look like an attractive option, just know that timing is everything with this method.

Using The AssemblyLoad Event

Because you can’t be sure when you call CurrentDomain.GetAssemblies() that everything is loaded, there is actually an event that will run when the AppDomain loads another Assembly. Basically, when an assembly is lazy loaded, you can be notified. It looks like so :

AppDomain.CurrentDomain.AssemblyLoad += (sender, args) =>
{
    var assembly = args.LoadedAssembly;
};

This might be a solution if you just want to check something when Assemblies are loaded, but that process doesn’t necessarily have to happen at a certain point in time (e.g. Does not have to happen within the Startup.cs of your .NET Core app).

The other problem with this is that you can’t be sure that by the time you’ve added your event handler, that assemblies haven’t already been loaded (Infact they most certainly would have). So what then? You would need to duplicate the effort by first adding your event handler, then immediately after checking AppDomain.CurrentDomain.GetAssemblies for things that have already been loaded.

It’s a niche solution, but it does work if you are fine with doing something with the lazy loaded assemblies.

Using GetReferencedAssemblies()

Next cab off the rank is GetReferencedAssemblies(). Essentially you can take an assembly, such as your entry assembly which is typically your web project, and you find all referenced assemblies. The code itself looks like this :

Assembly.GetEntryAssembly().GetReferencedAssemblies();

Again, looks to do the trick but there is another big problem with this method. In many projects you have a separation of concerns somewhere along the lines of say Web Project => Service Project => Data Project. The Web Project itself doesn’t reference the Data Project directly. Now when you call “GetReferencedAssemblies” it means direct references. Therefore if you’re looking to also get your Data Project in the assembly list, you are out of luck!

So again, may work in some cases, but not a one size fits all solution.

Looping Through GetReferencedAssemblies()

You’re other option for using GetReferencedAssemblies() is actually to create a method that will loop through all assemblies. Something like this :

public static List GetAssemblies()
{
    var returnAssemblies = new List();
    var loadedAssemblies = new HashSet();
    var assembliesToCheck = new Queue();

    assembliesToCheck.Enqueue(Assembly.GetEntryAssembly());

    while(assembliesToCheck.Any())
    {
        var assemblyToCheck = assembliesToCheck.Dequeue();

        foreach(var reference in assemblyToCheck.GetReferencedAssemblies())
        {
            if(!loadedAssemblies.Contains(reference.FullName))
            {
                var assembly = Assembly.Load(reference);
                assembliesToCheck.Enqueue(assembly);
                loadedAssemblies.Add(reference.FullName);
                returnAssemblies.Add(assembly);
            }
        }
    }

    return returnAssemblies;
}

Rough around the edges but it does work and means that on startup, you can instantly view all assemblies.

The one time you might get stuck with this is if you are loading assemblies dynamically and so they aren’t actually referenced by any project. For that, you’ll need the next method.

Directory DLL Load

A really rough way to get all solution DLLs is actually to load them out of your bin folder. Something like :

public static Assembly[] GetSolutionAssemblies()
{
    var assemblies = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory, "*.dll")
                        .Select(x => Assembly.Load(AssemblyName.GetAssemblyName(x)));
    return assemblies.ToArray();
}

It works but hoooo boy it’s a rough one. But the one big boon to possibly using this method is that a dll simply has to be in the directory to be loaded. So if you are dynamically loading DLLs for any reason, this is probably the only method that will work for you (Except maybe listening on the AppDomain for AssemblyLoad).

This is one of those things that looks like a hacktastic way of doing things, but you actually might be backed into the corner and this is the only way to solve it.

Getting Only “My” Assemblies

Using any of these methods, you’ll quickly find you are loading every Assembly under the sun into your project, including Nuget packages, .NET Core libraries and even runtime specific DLLs. In the .NET world, an Assembly is an Assembly. There is no concept of “Yeah but this one is my Assembly” and should be special.

The only way to filter things out is to check the name. You can either do it as a whitelist, so if all of your projects in your solution start with the word “MySolution.”, then you can do a filter like so :

Assembly.GetEntryAssembly().GetReferencedAssemblies().Where(x => x.Name.StartsWith("MySolution."))

Or instead you can go for a blacklist option which doesn’t really limit things to just your Assemblies, but at the very least cuts down on the number of Assemblies you are loading/checking/processing etc. Something like :

Assembly.GetEntryAssembly().GetReferencedAssemblies()
.Where(x => !x.Name.StartsWith("Microsoft.") && !x.Name.StartsWith("System."))

Blacklisting may look stupid but in some cases if you are building a library that you actually don’t know the end solutions name, it’s the only way you can cut down on what you are attempting to load.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

Even with my love for Dapper these days, I often have to break out EF Core every now and again. And one of the things that catches me out is just how happy some developers are to make an absolute meal out of Entity Configurations with EF Core. By Entity Configurations, I mean doing a code first design and being able to mark fields as “Required”, or limit their length, or even create indexes on tables. Let’s do a quick dive and see what our options are and what gives us the cleanest result.

Attribute (Data Annotations) vs Fluent Configuration

So the first thing you notice when you pick up EF Core is that half the documentation tells you you can simply add an attribute to any entity to mark it as required :

[Required]
public string MyField { get; set; }

And then the other half of the documentation tells you you should override the OnModelCreating inside your context and use “Fluent” configuration like so :

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
	base.OnModelCreating(modelBuilder);
	modelBuilder.Entity<MyEntity>()
		.Property(x => x.MyField).IsRequired();
}

Which one is correct? Well actually both! But I always push for Fluent Configurations and here’s why.

There is an argument that when you use attributes like [Required] on a property, it also works if your model is being returned/created via an API. e.g. It also provides validation. This is really a moot point. You should always aim to have specific ViewModels returned from your API and not return out your entity models. Ever. I get that sometimes a quick and dirty internal API might just pass models back and forth, between the database and the API, but the fact that attributes work for both is simply a coincidence not an actual intended feature.

There’s also an argument that the attributes you use come from the DataAnnotations library from Microsoft. Many ORMs use this library to configure their data models. So for example if you took an EF Core entity and switched to using another ORM, it may be able to just work out of the box with the same configurations. I mean, this one is true and I do see the point but as we are about to find out, complex configuration simple cannot be done with data annotations alone and therefore you’re still going to have to do rework anyway.

The thing is, Fluent Configurations are *much* more powerful than Data Annotations. Complex index that spans two fields and adds another three as include columns? Not a problem in Fluent but no way to do it in DataAnnotations (Infact Indexes in general got ripped out of attributes and are only just now making their way back in with a much weaker configuration than just using Fluent https://github.com/dotnet/efcore/issues/4050). Want to configure a complex HasMany relationship? DataAnnotations relationships are all about conventions so breaking that is extremely hard whereas in Fluent it’s a breeze. Microsoft themselves have come out and said that Fluent Configuration for EF Core is an “Advanced” feature, but I feel like anything more than just dipping your toe into EF Core, you’re gonna run into a dead end with Data Annotations and have to mix in Fluent Configuration anyway. When it gets to that point, it makes even less sense to have your configuration split across Attributes and Fluent.

Finally, from a purely aesthetic stand point, I personally prefer my POCOs (Plain Old C# Objects) to be free of implementation details. While it’s true that in this case, I’m building an entity to store in a SQL Database, that may not always be the case. Maybe in the future I store this entity in a flat XML file. I think adding attributes to any POCO changes it from describing a data structure, to describing how that data structure should be saved. Then again, things like Active Record exist so it’s not a hard and fast rule. Just a personal preferences.

All rather weak arguments I know but honestly, before long, you will have to use Fluent Configuration for something. It’s just a given. So it’s much better to just start there in the first place.

Using IEntityTypeConfiguration

So if you’ve made it past the argument of Attributes vs Fluent and decided on Fluent. That’s great! But you’ll quickly find that all the tutorials tell you to just keep jamming everything into the “OnModelCreating” method of your Context. Kinda like this :

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
	base.OnModelCreating(modelBuilder);
	modelBuilder.Entity<MyEntity>()
		.Property(x => x.MyField).IsRequired();
}

What if you have 10 tables? 20 tables? 50? This class is going to quickly surpass hundreds if not thousands of lines and no amount of comments or #regions is going to make it more readable.

But there’s also a (seemingly) much less talked about feature called IEntityTypeConfiguration. It works like this.

Create a class called {EntityName}Configuration and inherit from IEntityTypeConfiguration<Entity>.

public class MyEntityConfiguration : IEntityTypeConfiguration<MyEntity>
{
	public void Configure(EntityTypeBuilder<MyEntity> builder)
	{
	}
}

You can then put any configuration for this particular model you would have put inside the context, inside the Configure method. The builder input parameter is scoped specifically to only this entity so it keeps things clean and tidy. For example :

public class MyEntityConfiguration : IEntityTypeConfiguration<MyEntity>
{
	public void Configure(EntityTypeBuilder<MyEntity> builder)
	{
		builder.Property(x => x.MyField).IsRequired();
	}
}

Now for each entity that you want to configure. Keep creating more configuration files, one for each type. Almost a 1 to 1 mapping if you will. I like to put them inside an EntityConfiguration folder to keep things nice and tidy.

Finally. Head back to your Context and delete all the configuration work that you’ve now moved into IEntityTypeConfigurations, and instead replace it with a call to “ApplyConfigurationFromAssembly” like so :

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
	modelBuilder.ApplyConfigurationsFromAssembly(typeof(MyContext).Assembly);
}

Now you have to pass in the assembly where EF Core can find the configurations. In my case I want to say that they are in the same assembly as MyContext (Note this is *not* DbContext, it should be the actual name of your real context). EF Core will then go and find all implementations of IEntityTypeConfiguration and use that as config for your data model. Perfect!

I personally think this is the cleanest possible way to configure Entities. If you need to edit the configuration for the Entity “Address”, then you know you just have to go the “AddressConfiguration”.  Delete an entity from the data model? Well just delete the entire configuration file. Done! It’s really intuitive and easy to use.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

I recently came across an interesting params gotcha (Or more like a trap) recently while working around a method in C# that allowed both a params and an IEnumerable parameter. If that sounds confusing, let me give a quick refresher.

Let’s say I have a method that looks like so :

static void Call(IEnumerable<object> input)
{
    Console.WriteLine("List Object");
}

Pretty normal. But sometimes when calling this method, I have say two items, and I don’t want to “new up” a list. So using this current method, I would have to do something like so :

var item1 = new object();
var item2 = new object();

Call(new List<object> { item1, item2 });

Kind of ugly. But there is also the params keyword that allows us to pass in items seperated by commas, and by magic, it turns into an array inside the method. For example :

static void Call(params object[] input)
{
    Console.WriteLine("Object Params");
}

Now we can just do :

var item1 = new object();
var item2 = new object();

Call(item1, item2);

Everything is perfect! But then I ran into an interesting conundrum that I had never seen before. Firstly, let’s suppose I pass in a list of strings to my overloaded call. The code might look like so :

static void Main(string[] args)
{
    Call(new List<string>());
}

static void Call(params object[] input)
{
    Console.WriteLine("Object Params");
}

static void Call(IEnumerable<object> input)
{
    Console.WriteLine("List Object");
}

If I ran this code, what would be output? Because I’ve passed it a List<string>, which is a type of IEnumerable<object>, you might think it would output “List Object”. And… You would be right! It does indeed use the IEnumerable method which makes total sense because List<string> is a type of IEnumerable<object>. But interestingly enough… List<string> is also an object… So theoretically, it could indeed actually be passed to the params call also. But, all is well for now and we are working correctly.

Later on however, I decide that I want a generic method that does some extra work, before calling the Call method. The code looks like so :

static void Main(string[] args)
{
    GenericalCall(new List<string>());
}

static void GenericalCall<T>(List<T> input)
{
    Call(input);
}

static void Call(params object[] input)
{
    Console.WriteLine("Object Params");
}

static void Call(IEnumerable<object> input)
{
    Console.WriteLine("List Object");
}

Well theoretically, we are still giving it a List of T. Now T could be anything, but in our case we are passing it a list of strings same as before so you might expect it to output “List Object” again. Wrong! It actually outputs “Object Params”! Why?!

Honestly. I’m just guessing here. But I think I’ve deduced why. Because the type T could be anything, the compiler isn’t actually sure that it should be able to call the IEnumerable overload as whatever T is, might actually not inherit from object (Although, we know it will, but “theoretically” it could not). Because of this, it treats our List<T> as a single object, and passes that single item as a param into the params call. Crazy! I actually thought maybe at runtime it might try and inspect T and see what type it is to deduce the right call path, but it looks like it happens at compile time.

This is confirmed if we actually add a constraint to our generic method that says the type of T must be a class (Therefore has to be derived from object). For example :

static void Main(string[] args)
{
    GenericalCall(new List<string>());
}

static void GenericalCall<T>(List<T> input) where T : class
{
    Call(input);
}

static void Call(params object[] input)
{
    Console.WriteLine("Object Params");
}

static void Call(IEnumerable<object> input)
{
    Console.WriteLine("List Object");
}

Now we return the List Object output because we have told the compiler ahead of time that T will be a class, which all objects inherit from Object. Easy!

Another way to solve this is to force cast the List<T> to IEnumerable<object> like so :

static void GenericalCall<T>(List<T> input)
{
    Call((IEnumerable<object>)input);
}

Anyway, hopefully that wasn’t too much of a ramble. I think this is one of those things that you sort of store away in the back of your head for that one time it actually occurs.

Where Did I Actually See This?

Just as a little footnote to this story. I actually saw this when trying to use EntityFramework’s “HasData” method.

I had this line inside a generic class that helped load CSV’s as data into a database.

modelBuilder.Entity(typeof(T)).HasData(seedData);

I kept getting :

The seed entity for entity type ‘XXX’ cannot be added because there was no value provided for the required property ‘YYY’

And it took me a long time to realize HasData has two overloads :

HasData(Enumerable<object> data);
HasData(params object[] data);

So for me, it was as easy as casting my seedData input to IEnumerable.

modelBuilder.Entity(typeof(T)).HasData((IEnumerable<object>)seedData)
ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

With the announcement of .NET 5 last year, and subsequent announcements leading up to MSBuild 2020, a big question has been what’s going to happen to “.NET Standard”. That sort of framework that’s not an actual framework just an interface that various platforms/frameworks are required to implement, but not really, and then you have to get David Fowler to do a Github Gist that gets shared a million times to actually explain to people what the hell this thing is.

Anyway. .NET Standard is no more (Or will be eventually). As confusing as it may be at first to get rid of something that was only created 3 odd years ago… It does kinda make sense to get rid of it at this juncture.

Rewinding The “Why” We Even Had .NET Standard

Let’s take a step back and look at how and why .NET Standard came to be.

When .NET Core was first released, there was a conundrum. We have all these libraries that are already written for .NET Framework, do we really want to re-write them all for .NET Core? Given that the majority of early .NET Core was actually a port of .NET Framework to work cross platform, many of the classes and method signatures were identical (Infact I would go as far as to say most of them were).

Let’s use an example. Let’s say that I want to open a File inside my library using the standard File.ReadAllLines(string path) call. Now it just so happens if you write this code in .NET Framework, .NET Core or even Mono, it takes the same parameters (a string path variable), and returns the same thing, (a string array). Now *how* these calls read a file is up to the individual platform (For example .NET Core and Mono may have some special code to handle Mac path files), but the result should always be the same, a string array of lines from the file.

So if I had a library that does nothing but open a file to read lines and return it. Should I really need to release that library multiple times for different frameworks? Well, that’s where .NET Standard comes in. The simplest way to think about it is it defines a list of classes and methods that every platform agrees to implement. So if File.ReadAllLines() is part of the standard, then I can be assured that my library can be released once as a .NET Standard library, and it will work on multiple platforms.

If you’re looking for a longer explanation about .NET Standard, then there’s an article I wrote over 3 years ago that is still relevant today : https://dotnetcoretutorials.com/2017/01/13/net-standard-vs-net-core-whats-difference/

TL;DR; .NET Standard provided a way for different .NET Platforms to share a set of common method signatures that afforded library creators to write code once and be able to run on multiple platforms. 

.NET Standard Is No Longer Needed

So we come to the present day where announcements are coming out that .NET Standard is no longer relevant (sort of). And there’s two main reasons for that….

.NET Core Functionality Surpassed .NET Framework – Meaning New .NET Standard Versions Were Hard To Come By

Initially, .NET Core was a subset of .NET Framework functionality. So the .NET Standard was a way almost of saying, if you wrote a library for .NET Framework, here’s how you know it will work out of the box for .NET Core. Yes, .NET Standard was also used as a way to see functionality across other platforms like Mono, Xamarin, Silverlight, and even Windows Phone. But I feel like the majority of use cases were for .NET Framework => .NET Core comparisons.

As .NET Core built up it’s functionality, it was still essentially trying to reach feature parity with .NET Framework. So as a new version of .NET Core got released each year, a new version of .NET Standard also got released with it that was, again, almost exclusively to look at the common method signatures across .NET Framework <=> .NET Core. So eventually .NET Core surpasses .NET Framework, or at the very least says “We aren’t porting anything extra over”. This point is essentially .NET Standard 2.0.

But obviously work on .NET Core doesn’t stop, and new features are added to .NET Core that don’t exist in .NET Framework. But .NET Framework updates at first are few and far between,  until it’s announced that essentially it’s maintenance mode only (Or some variation there-of). So with the new features being added to .NET Core, do they make sense to be added to a new version of standard given that .NET Framework will never actually implement that standard? Kind of.. .Or atleast they tried. .NET Standard 2.1 was the latest release of the standard and (supposedly, although some would disagree), is implemented by both Mono and Xamarin, but not .NET Framework.

So now we have a standard that was devised to describe the parity between two big platforms, that one platform is no longer going to be participating in. I mean I guess we can keep implementing new standards but if there is only one big player actually adhering to that standard (And infact, probably defining it), then it’s kinda moot.

The Merger Of .NET Platforms Makes A Standard Double Moot

But then of course we rolled around 6 months after the release of .NET Standard 2.1,  and find the news that .NET Framework and .NET Core are being rolled into this single .NET platform called .NET 5. Now we are doubly not needing a standard because the two platforms we were trying to define the parity are actually just going to become one and the same.

Now take that, and add in the fact that .NET 6 is going to include the rolling in of the Xamarin platform. Now all those charts you saw of .NET Standard where you tried to trace your finger along the columns to check which version you should support are moot because there’s only one row now, that of .NET 6.

In the future there is only one .NET platform. There is no Xamarin, no .NET Core, no Mono, no .NET Framework. Just .NET.

So I Should Stop Using .NET Standard?

This was something that got asked of me recently. If it’s all becoming one platform, do we just start writing libraries for .NET 5 going forward then? The answer is no. .NET Standard will still exist as a way to write libraries that run in .NET Framework or older versions of .NET Core. Even today, when picking a .NET Standard version for a library, you try and pick the lowest number you can feasibly go to ensure you support as many platforms as you can. That won’t change going forward – .NET 5 still implements .NET Standard 1.0 for example, so any library that is targeting an older standard still runs on the latest version of the .NET platform.

What will change for the better are those hideously complex charts and nuget dependency descriptions on what platforms can run a particular library/package. In a few years from now it won’t be “Oh this library is for .NET Standard 2.1, Is that for .NET Core 2.1? No, it’s for .NET Core 3+… Who could have known”. Instead it will be, oh this library is for .NET 5, then it will work in .NET 7 no problems.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

So by now you’ve probably heard a little about various .NET 5 announcements and you’re ready to give it a try! I thought first I would give some of the cliffnotes from the .NET 5 announcement, and then jump into how to actually have a play with the preview version of .NET 5 (Correct as of writing this post on 2020-05-22).

Cliff Notes

  • .NET Standard is no more! Because .NET Framework and .NET Core are being merged, there is less of a need for .NET Standard. .NET Standard also covers things like Xamarin but that’s being rolled into .NET 6 (More on that a little later), so again, no need for it.
  • .NET Core coincides with C#9 and F#5 releases (As it typically does), but Powershell will now also be released on the same cadence.
  • They have added a visual designer for building WinForm applications since you could theoretically build WinForm applications in .NET Core 3.X but not design them with quite as much functionality as you typically would.
  • .NET 5 now runs on Windows ARM64.
  • While the concept of Single File Publish already exists in .NET Core 3.X, it looks like there has been improvements where it’s actually a true exe instead of a self extracting ZIP. Mostly for reasons around being on read-only media (e.g. A locked down user may not be able to extract that single exe to their temp folder etc).
  • More features have been added to System.Text.Json for feature parity with Newtonsoft.Json.
  • As mentioned earlier, Xamarin will be integrated with .NET 6 so that there would be a single unifying framework. Also looks like Microsoft will be doing a big push around the .NET ecosystem as a way to build apps once (in Xamarin), and deploy to Windows, Mac, IOS, Android etc. Not sure how likely this actually is but it looks like it’s the end goal.

Setting Up The .NET 5 SDK

So as always, the first thing you need to do is head to the .NET SDK Download page here : https://dotnet.microsoft.com/download/dotnet/5.0. Note that if you go to the actual regular download page of https://dotnet.microsoft.com/download you are only given the option to download .NET Core 3.1 or .NET Framework 4.8 (But there is a tiny little banner above saying where you can download the preview).

Anyway, download the .NET 5 SDK installer for your OS.

After installing, you can run the dotnet info command from a command prompt :

dotnet --info

Make sure that you do have the SDK installed correctly. If you don’t see .NET 5 in the list, the most common reason I’ve found is people installing the X86 version on their X64 PC. So make sure you get the correct installer!

Now if you use VS Code, you are all set. For any existing project you have that you want to test out running in .NET 5 (For example a small console app), then all you need to do is open the .csproj file and change :

<PropertyGroup>
  <OutputType>Exe</OutputType>
  <TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>

To :

<PropertyGroup>
  <OutputType>Exe</OutputType>
  <TargetFramework>net5.0</TargetFramework>
</PropertyGroup>

As noted in the cliffnotes above, because there is really only one Framework with no standard going forward, they ditched the whole “netcoreapp” thing and just went with “net”. That means if you want to update any of your .NET Standard libraries, you actually need them to target “net5.0” as well. But hold fire because there is actually no reason to bump the version of a library unless you really need something in .NET 5 (Pretty unlikely!).

.NET 5 In Visual Studio

Now if you’ve updated your .NET Core 3.1 app to .NET 5 and try and build in Visual Studio, you may just get :

The reference assemblies for .NETFramework,Version=v5.0 were not found. 

Not great! But all we need to do is update to the latest version and away we go. It’s somewhat lucky that there isn’t a Visual Studio release this year (e.g. There is no Visual Studio 2020), otherwise we would have to download yet another version of VS. So to update, inside Visual Studio, simply go Help -> Check For Updates. The version you want to atleast be on is 16.6 which as of right now, is the latest non-preview version.

Now after installing this update for the first time, for the life of me I couldn’t work out why I could build an existing .NET 5 project, but when I went to create a new project, I didn’t have the option of creating it as .NET 5.

As it turns out, by default the non-preview version of Visual Studio can only see non-preview versions of the SDK. I guess so that you can keep the preview stuff all together. If you are like me and just want to start playing without having to install the Preview version of VS, then you need to go Tools -> Options inside Visual Studio. Then inside the options window under Environment there is an option for “Preview Features”.

Tick this. Restart Visual Studio. And you are away laughing!

Do note that some templates such as Console Applications don’t actually prompt you for the SDK version when creating a new project, they just use the latest SDK available. In this case, your “default” for Visual Studio suddenly becomes a preview .NET Core SDK. Perfectly fine if you’re ready to sit on the edge, but just something to note in case this is a work machine or similar.

 

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

Learning basic sorting algorithms is a bit of a Computer Science 101 class. But many examples out there are either in pesudocode, or languages more suited to large computation (e.x. Python). So I thought I would quickly go over the three basic sorting algorithms, and demonstrate them in C#.

Default Sorting In C#/.NET

So going over these, you’re probably going to be thinking “So which one does C# use?”. By that I mean, if you call Array.Sort(), does it use any of these examples? Well.. the answer is “it depends”. In general when using “Sort()” on a List, Array or Collection it will use :

  • If the collection has less than 16 elements, the algorithm “Insertion Sort” will be used (We will talk about this below).
  • If the number of partitions exceeds 2 log *array size*, then Heapsort is used.
  • Otherwise Quicksort is used.

However this is not always the case, For example when using Linq on a list and calling OrderBy, Quicksort is always used as the underlying implementation.

However, what I’m trying to point out here is that the sorting algorithms outlined below are rarely, if ever used in the real world and are more likely to be used as an interview question. But they are important to understand because often other sorting algorithms are built on top of these more “archaic” algorithms (It also helps to understand jokes in Silicon Valley too!).

Array vs List

I just want to point out something very important in talking about sorting algorithms when it comes to C#. When I first started programming, I couldn’t understand why examples always used arrays in their sample code. Surely since we are in C#, Lists are way cooler! And even though a fixed array is obviously faster than a linked list, do we really need to use arrays even in examples?

Well the thing is, these are all “in place” sorts. That is, we do not create a second object to return the result, store state, or hold “partial” results. When we do things like Insertion Sort below, I’ll give an example of how this might easier be done with a List, but it requires an additional array to be created in memory. In almost all sorting algorithms you’ll find that they work within the data structure given and don’t “clone” or select out items into a new List to return an entirely different object. Once I realized that “sorting” was not simply about “give me the lowest item and I’ll just put it in this new list over here and keep going until I select out all items”, but instead almost about the most efficient way to “juggle” items inside an array, those pseudocode sort algorithms suddenly made sense.

Bubble Sort

So first up we are going to look at Bubblesort. This is essentially worse case scenario for sorting data as it takes many “passes” of single swaps for things to actually sort.

Let’s look at the code :

public static void BubbleSort(int[] input)
{
    var itemMoved = false;
    do
    {
        itemMoved = false;
        for (int i = 0; i < input.Count() - 1; i++)
        {
            if (input[i] > input[i + 1])
            {
                var lowerValue = input[i + 1];
                input[i + 1] = input[i];
                input[i] = lowerValue;
                itemMoved = true;
            }
        }
    } while (itemMoved);
}

Now how does BubbleSort work. Starting at index zero, we take an item and the item next in the array and compare them. If they are in the right order, then we do nothing, if they are in the wrong order (e.g. The item lower in the array is actually a higher value than the next element), then we swap these items. Then we continue through each item in the array doing the same thing (Swapping with the next element if it’s higher).

Now since we are only comparing each item with it’s neighbour, each item may only move a single place when it actually needs to move several places. So how does Bubblesort solve this? Well it just runs the entire process all over again. Notice how we have the variable called “itemMoved”. We simply set this to true if we did swap an item and start the scan all over again.

Because we are moving things one at a time, not directly to the right position, and having to multiple passes to get things right, BubbleSort is seen as extremely inefficient.

Insertion Sort

Next up is Insertion Sort. Now while we still check items one by one what we instead do is “insert” the item at the correct index right from the get go. Unlike BubbleSort where we are swapping the item with it’s neighbour, we are instead inserting the item into the correct position given what we have already checked.

I’m actually going to show the code twice. First is what I think is your typical insertion sort :

public static void InsertionSort(int[] input)
{

    for (int i = 0; i < input.Count(); i++)
    {
        var item = input[i];
        var currentIndex = i;

        while (currentIndex > 0 && input[currentIndex - 1] > item)
        {
            input[currentIndex] = input[currentIndex - 1];
            currentIndex--;
        }

        input[currentIndex] = item;
    }
}

So a quick explanation of this code.

We loop through each item in the index and get the value. Then we loop through each item in the indexes *below* the index we started at. If the item has a lower value than the index below them, then we “shift” that item below them up by 1. And check the next item. In a way it’s like a bubble sort because we are comparing the neighbour below them, but if we do swap, then we continue swapping until we get to the end.

If we get to the last index (0), or we hit a new item that has a lower value than our current item, then we “break” and simple insert our current item at that index.

But a simpler way to view the “Insertion” sort algorithm is actually by building a new list to return. For example :

public static List<int> InsertionSortNew(this List<int> input)
{
    var clonedList = new List<int>(input.Count);

    for (int i = 0; i < input.Count; i++)
    {
        var item = input[i];
        var currentIndex = i;

        while (currentIndex > 0 && clonedList[currentIndex - 1] > item)
        {
            currentIndex--;
        }

        clonedList.Insert(currentIndex, item);
    }

    return clonedList;
}

So in this example, instead we create a brand new list where we slowly add items to it by inserting them at the correct location. Again, not quite correct as we are doing things like being able to insert items at certain indexes without having to shift the items above it up an index. But really being able to insert the item at a certain index is just sugar that C# takes care of for us.

Again however, generally when we talk about list sorting, we are talking about “inplace” sorting and not trying to cherry pick items out into a new object.

Selection Sort

Selection Sort is actually very very similar to Insertion Sort. The code looks like so :

public static void SelectionSort(int[] input)
{
    for (var i = 0; i < input.Length; i++)
    {
        var min = i;
        for(var j = i + 1; j < input.Length; j++) { 
            if(input[min] > input[j])
            {
                min = j;
            }
        }

        if(min != i)
        {
            var lowerValue = input[min];
            input[min] = input[i];
            input[i] = lowerValue;
        }
    }
}

What we are essentially doing is scanning the index from start to finish. For each index, we scan the rest of the array for an item that is lower (Infact, the lowest) compared to the current item. If we find one, we swap it with the current item. The fact that the current item goes into a position later in the array isn’t too important as eventually, all elements will be checked.

Now, again, this looks more complicated than it should be because of our in-place array criteria. But really what we are doing is scanning by one one, finding the lowest item in the list, putting it into an array, and then continuing with the next highest etc.

Divide and Conquer Sorting

Not featured here are “Divide and Conquer” sorting algorithms, these are things like MergeSort and QuickSort that divide up the work to many smaller sorting operations, and then combine the results at the end. These are generally the sorting algorithms you will find out in the wild, but it’s maybe a little bit past the “basics” of sorting.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This post is part of a series on using Azure CosmosDB with .NET Core

Part 1 – Introduction to CosmosDB with .NET Core
Part 2 – Azure CosmosDB with .NET Core EF Core


When I first found out EntityFramework supported Azure CosmosDB, I was honestly pretty excited. Not because I thought it would be revolutionary, but because if there was a way to get new developers using Cosmos by leveraging what they already know (Entity Framework), then that would actually be a pretty cool pathway.

But honestly, after hitting many many bumps along the road, I don’t think it’s quite there yet. I’ll first talk about setting up your own small test, and then at the end of this post I’ll riff a little on some challenges I ran into.

Setting Up EFCore For Cosmos

I’m going to focus on Cosmos only information here, and not get too bogged down in details around EF Core. If you already know EF Core, this should be pretty easy to follow!

The first thing you need to do is install the nuget package for EF Core with Cosmos. So from your Package Manager Console :

Install-Package Microsoft.EntityFrameworkCore.Cosmos

In your startup.cs, you will need a line such as this :

services.AddDbContext(options =>
    options.UseCosmos("CosmosEndPoint",
    "CosmosKey",
    "CosmosDatabase")
);

Now.. This is the first frustration of many. There is no overload to pass in a connection string here (Yah know, the thing that literally every other database context allows). So when you put this into config, you have to have them separated out instead of just being part of your usual “ConnectionStrings” configuration.

Let’s say I am trying to store the following model :

public class People
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public Address Address { get; set; }
}

public class Address
{
    public string City { get; set; }
    public string ZipCode { get; set; }
}

Then I would make my context resemble something pretty close to :

public class CosmosDbContext : DbContext
{
    public DbSet People { get; set; }

    public CosmosDbContext(DbContextOptions options)
        : base(options)
    {
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity()
            .ToContainer("People")
            .OwnsOne(x => x.Address);
    }
}

Now a couple of notes here.

For reasons known only to Microsoft, the default name of the collection it tries to pull is the name of the context. e.g. When it goes to Cosmos, it looks for a collection called “CosmosDbContext” even if my DbSet itself is called People. I have no idea why it’s built like this because, again, in every other use for EntityFramework, the table/container name takes after the DbSet not the entire Context. So we have to add in an explicit call to map the container.

Secondly, Cosmos in EFCore seems unable to work out sub documents. I kind of understand this one because in my model, Is Address it’s own collection, or is it a subdocument of People? But the default should be subdocument as it’s unlikely people are doing “joins” across CosmosDB collections, and if they are, they aren’t expecting EF Core to handle that for them through navigation properties. So if you don’t have that “OwnsOne” config, it thinks that Address is it’s own collection and throws a wobbly with :

'The entity type 'Address' requires a primary key to be defined. If you intended to use a keyless entity type call 'HasNoKey()'.'

And honestly, that’s all you need to get set up with EF Core and Cosmos. That’s your basic configuration!

Now Here’s The Bad

Here’s what I found while trying to set up even a simple configuration in Cosmos with EF Core.

  • As mentioned, the default naming of the Collections in Cosmos using the Context name is illogical. Given that in most cases you will have only a single DbContext within your application, but you may have multiple collections you need to access, 9 times out of 10 you are going to need to re-define the container names for each DBSet.
  • The default mappings aren’t what you would expect from Cosmos. As pointed out, the fact that it can’t handle subdocuments out of the box seems strange to me given that if I used the raw .NET Core library it works straight away.
  • You have no control (Or less control anyway) over naming conventions. I couldn’t find a way at all to use camelCase naming conventions at all and it had to use Pascal. I personally prefer NOSQL stores to always be in camelcase, but you don’t get the option here.
  • Before I knew about it trying to connect to a collection with the same name as the context, I wasn’t getting any results back from my queries (Since I was requesting data from a non-existent collection), but my code wasn’t throwing any exceptions, it just returned nothing. Maybe this is by design but it’s incredibly frustrating that I can call a non existent resource and not have any error messages show.
  • Because you might already have a DBContext for SQL Server in your project, things can become hectic when you introduce a second one for Cosmos (Since you can’t use the same Context). Things like migration CLI commands now need an additional flag to say which context it should run on (Even though Cosmos doesn’t use Migrations).

Should You Use It?

Honestly your mileage may vary. I have a feeling that the abstraction of using EF Core may be a little too much for some (e.g. The naming conventions) and that many would prefer to have a bit more control over what’s going on behind the scenes. I feel like EntityFramework really shines when working with a large amount of tables with foreign keys between them using Navigation Properties, something that CosmosDB won’t really have. And so I don’t see a great value prop in wrangling EF Core for a single Cosmos table. But have a try and let me know what you think!

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This post is part of a series on using Azure CosmosDB with .NET Core

Part 1 – Introduction to CosmosDB with .NET Core
Part 2 – Azure CosmosDB with .NET Core EF Core


I haven’t used CosmosDB an awful lot over the years, but when I have, It’s been a breeze to use. It’s not a one size fits all option so forget about being a one for one replacement for something like SQL Server, but I’ve used it many a time to store large amounts of data that we “rarely” need access to. A good example is that for a chatbot project I worked on, we needed to store all the conversations incase there was ever a dispute over what was said within the chatbot. Those disputes are few and far between and it’s a manual lookup anyway, so we don’t need crazy read rates or wild search requirements. But storing every conversation adds up over time when it comes to storage costs. Those sorts of loads are perfect for CosmosDB (Which was formerly DocumentDB by the way just incase you wondered where that went!).

I was having a look today at all the ways you can actually talk to CosmosDB from code, and it’s actually pretty astounding how many options there are. Microsoft have done a really good job porting existing “API’s” from storage solutions like MongoDB, Cassandara and even their own Table Storage, that means you can basically swap one out for the other (Although, there’s obviously big caveats thrown in there). So I thought I would do a quick series rattling off a few different wants of talking to CosmosDB with .NET Core.

Setting Up CosmosDB

Setting up CosmosDB via the Azure portal is pretty straight forward but I do want to point out one thing. That is when creating the resource, you need to select the “API” that you want to use.

This *cannot* be changed at a later date. If you select the wrong API (For example you select MongoDB cause that sounds interesting, and then you want to connect via SQL), then you need to actually create a new resource with the correct API and migrate all the data (An absolute pain). So be careful! For this example, we are going to use the Core API (SQL).

Once created, we then need to create our “container”.

What’s A Container?

CosmosDB has the concept of a “container” which you can kind of think of as a a table. A container belongs to a database, and a database can have multiple containers. So why not just call it a table? Well because the container may be a table, or it may be a “collection” as thought of like in MongoDB, or it could be a graph etc. So we call it a container just as an overarching term for a collection of rows/items/documents etc, because CosmosDB can do them all.

Partition Keys

When creating your container, you will be asked for a Partition Key. If you’ve never used CosmosDB, or really any large data store, this may be new to you. So what makes a good Partition Key? You essentially want to pick a top level property of your item that has a distinct set of values, that can be “bucketed”. CosmosDB uses these to essentially distribute your data across multiple servers for scalability.

So two bad examples for you :

  • A GUID ID. This is bad because it can never be “bucketed”. It’s essentially always going to be unique.
  • A User “Role” where the only options are “Member” and “Administrator”. Now we have gone the opposite way where we only have 2 distinct values that we are partitioning on, but it’s going to be very lopsided with only a handful of users fitting into the Administrator bucket, and the rest going into the Member bucket.

I just want to add that for the above two, I *have* used them as Partition Keys before. They do work and IMO even though they run against the recommendations from Microsoft, they are pretty hard to shoot yourself in the foot when it comes to querying.

And a couple of good examples :

  • ZipCode (This is actually used as an example from Microsoft). There is a finite amount of zipcodes and people are spread out across them. There will be a decent amount of users in each zipcode assuming your application is widely used across the country.
  • You could use something like DepartmentId if you were creating a database of employees as another example.

Honestly, there is a lot more that goes into deciding on Partition keys. The types of filters you will be running and even consistency models go into creating partition keys. While you are learning, you should stick to the basics above, but there are entire video series dedicated to the subject, so if your datastore is going to be hitting 10+GB in size any time soon, it would be best to do further reading.

Our Example Container

For the purposes of this guide, I’m going to be using the following example. My data in JSON format I want to look like :

{
	"id" : "{guid}", 
	"name" : "Joe Blogs", 
	"address" : 
	{
		"city" : "New York", 
		"zipcode" : "90210"
	}

}

Pretty simple (and keeping with the easy Zipcode example). That means that my setup for my CosmosDB will look like so :

Nothing too crazy here!

Creating Items in C#

For the purpose of this demo, we are going to use the basic C# API to create/read items. The first thing we need to do is install the CosmosDB nuget package. So run the following from your Package Manager console :

Install-Package Microsoft.Azure.Cosmos

Next we need to model our data as a C# class. In the past we had to decorate the models with all sorts of attributes, but now they can be just plain POCO.

class People
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public Address Address { get; set; }

}
    
class Address
{
    public string City { get; set; }
    public string ZipCode { get; set; }
}

Nothing too spectacular, now onto the code to create items. Just winging it inside a console application, it looks like so :

var connectionString = "";
var client = new CosmosClientBuilder(connectionString)
                    .WithSerializerOptions(new CosmosSerializationOptions
                    {
                        PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
                    })
                    .Build();

var peopleContainer = client.GetContainer("TestDatabase", "People");

var person = new People
{
    Id = Guid.NewGuid(),
    Name = "Joe Blogs",
    Address = new Address
    {
        City = "New York",
        ZipCode = "90210"
    }
};

await peopleContainer.CreateItemAsync(person);

Now I just want to point out a couple of things. Firstly that I use the CosmosClientBuilder. I found that when I create the client and tried to change settings (Like serializer options in this case), they didn’t work, but when I used the builder, magically everything started working.

Secondly I want to point out that I’m using a specific naming policy of CamelCase. If you’ve used CosmosDB before you’ve probably seen things like :

[JsonProperty("id")]
public Guid Id { get; set; }

Littered everywhere because in C#, we use Pascalcase, but in CosmosDB the pre-defined columns are all camelCase and there was no way to override everything at once. Personally, I prefer that JSON always be camelCase, and the above serialization settings does just that.

The rest of the code should be straight forward. We get our “Container” or table, and we call CreateItem with it. And wallah :

But We Didn’t Define Schema?!

So the first thing people notice when jumping into CosmosDB (Or probably most NoSQL data stores), is that we didn’t pre-define the schema we wanted to store. No where in this process did we go and create the “table” that we could insert data to. Instead I just told CosmosDB to store what I send it, and it does. Other than the ID, everything else is optional and it really doesn’t care what it’s storing.

This is obviously great when you are working on a brand new greenfields project because you can basically riff and change things on the fly. As projects get bigger though, it can become frustrating when a developer might “try” something out and add a new column, but now half your data doesn’t have that column! You’ll find that as time goes on, your models become a hodge podge of nullable data types to handle migration scenarios or columns being added/removed.

Reading Data

There are two main ways to read data from Cosmos.

People person = null;

// Can write raw SQL, but the iteration is a little annoying. 
var iterator = peopleContainer.GetItemQueryIterator("SELECT * FROM c WHERE c.id = '852ad197-a5f1-4709-b16d-5e9019d290af' " +
                                                                "AND c.address.zipCode = '90210'");
while (iterator.HasMoreResults)
{
    foreach (var item in (await iterator.ReadNextAsync()).Resource)
    {
        person = item;
    }
}

// If you prefer Linq
person = peopleContainer.GetItemLinqQueryable(allowSynchronousQueryExecution: true)
                            .Where(p => p.Id == Guid.Parse("852ad197-a5f1-4709-b16d-5e9019d290af"))
                            .ToList().First();

So the first is for fans of Dapper and the like. Personally, I find it kinda unweildy at times to get the results I want, but it does allow for more complete control. The second is obviously using Linq.

Now I want to point something out in the Linq example. Notice that I’m calling ToList()? That’s because the Cosmos Linq provider does not support First/FirstOrDefault. In our case it’s a easy fix because we can just instead execute the query and get our list back, and then get the first item anyway. But it’s a reminder that just because something supports Linq, doesn’t mean that it supports *all* of LINQ.

Finally, I also want to say that generally speaking, every query you write against a CosmosDB should try and include the PartitionKey. Because we’ve used the ZipCode, is that really feasible in our example? Probably not. It would mean that we would have to have the ZipCode already before querying the user, rather unlikely. This is one of the tradeoffs you have to think about when picking a PartitionKey, and really even when thinking about using CosmosDB or another large datastore in general.

Up Next

In the next part of this series, I want to talk about something really cool with CosmosDB. Using it with EntityFramework!

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.