This post is part of a series on using Azure CosmosDB with .NET Core

Part 1 – Introduction to CosmosDB with .NET Core
Part 2 – Azure CosmosDB with .NET Core EF Core


When I first found out EntityFramework supported Azure CosmosDB, I was honestly pretty excited. Not because I thought it would be revolutionary, but because if there was a way to get new developers using Cosmos by leveraging what they already know (Entity Framework), then that would actually be a pretty cool pathway.

But honestly, after hitting many many bumps along the road, I don’t think it’s quite there yet. I’ll first talk about setting up your own small test, and then at the end of this post I’ll riff a little on some challenges I ran into.

Setting Up EFCore For Cosmos

I’m going to focus on Cosmos only information here, and not get too bogged down in details around EF Core. If you already know EF Core, this should be pretty easy to follow!

The first thing you need to do is install the nuget package for EF Core with Cosmos. So from your Package Manager Console :

Install-Package Microsoft.EntityFrameworkCore.Cosmos

In your startup.cs, you will need a line such as this :

services.AddDbContext(options =>
    options.UseCosmos("CosmosEndPoint",
    "CosmosKey",
    "CosmosDatabase")
);

Now.. This is the first frustration of many. There is no overload to pass in a connection string here (Yah know, the thing that literally every other database context allows). So when you put this into config, you have to have them separated out instead of just being part of your usual “ConnectionStrings” configuration.

Let’s say I am trying to store the following model :

public class People
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public Address Address { get; set; }
}

public class Address
{
    public string City { get; set; }
    public string ZipCode { get; set; }
}

Then I would make my context resemble something pretty close to :

public class CosmosDbContext : DbContext
{
    public DbSet People { get; set; }

    public CosmosDbContext(DbContextOptions options)
        : base(options)
    {
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity()
            .ToContainer("People")
            .OwnsOne(x => x.Address);
    }
}

Now a couple of notes here.

For reasons known only to Microsoft, the default name of the collection it tries to pull is the name of the context. e.g. When it goes to Cosmos, it looks for a collection called “CosmosDbContext” even if my DbSet itself is called People. I have no idea why it’s built like this because, again, in every other use for EntityFramework, the table/container name takes after the DbSet not the entire Context. So we have to add in an explicit call to map the container.

Secondly, Cosmos in EFCore seems unable to work out sub documents. I kind of understand this one because in my model, Is Address it’s own collection, or is it a subdocument of People? But the default should be subdocument as it’s unlikely people are doing “joins” across CosmosDB collections, and if they are, they aren’t expecting EF Core to handle that for them through navigation properties. So if you don’t have that “OwnsOne” config, it thinks that Address is it’s own collection and throws a wobbly with :

'The entity type 'Address' requires a primary key to be defined. If you intended to use a keyless entity type call 'HasNoKey()'.'

And honestly, that’s all you need to get set up with EF Core and Cosmos. That’s your basic configuration!

Now Here’s The Bad

Here’s what I found while trying to set up even a simple configuration in Cosmos with EF Core.

  • As mentioned, the default naming of the Collections in Cosmos using the Context name is illogical. Given that in most cases you will have only a single DbContext within your application, but you may have multiple collections you need to access, 9 times out of 10 you are going to need to re-define the container names for each DBSet.
  • The default mappings aren’t what you would expect from Cosmos. As pointed out, the fact that it can’t handle subdocuments out of the box seems strange to me given that if I used the raw .NET Core library it works straight away.
  • You have no control (Or less control anyway) over naming conventions. I couldn’t find a way at all to use camelCase naming conventions at all and it had to use Pascal. I personally prefer NOSQL stores to always be in camelcase, but you don’t get the option here.
  • Before I knew about it trying to connect to a collection with the same name as the context, I wasn’t getting any results back from my queries (Since I was requesting data from a non-existent collection), but my code wasn’t throwing any exceptions, it just returned nothing. Maybe this is by design but it’s incredibly frustrating that I can call a non existent resource and not have any error messages show.
  • Because you might already have a DBContext for SQL Server in your project, things can become hectic when you introduce a second one for Cosmos (Since you can’t use the same Context). Things like migration CLI commands now need an additional flag to say which context it should run on (Even though Cosmos doesn’t use Migrations).

Should You Use It?

Honestly your mileage may vary. I have a feeling that the abstraction of using EF Core may be a little too much for some (e.g. The naming conventions) and that many would prefer to have a bit more control over what’s going on behind the scenes. I feel like EntityFramework really shines when working with a large amount of tables with foreign keys between them using Navigation Properties, something that CosmosDB won’t really have. And so I don’t see a great value prop in wrangling EF Core for a single Cosmos table. But have a try and let me know what you think!

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

This post is part of a series on using Azure CosmosDB with .NET Core

Part 1 – Introduction to CosmosDB with .NET Core
Part 2 – Azure CosmosDB with .NET Core EF Core


I haven’t used CosmosDB an awful lot over the years, but when I have, It’s been a breeze to use. It’s not a one size fits all option so forget about being a one for one replacement for something like SQL Server, but I’ve used it many a time to store large amounts of data that we “rarely” need access to. A good example is that for a chatbot project I worked on, we needed to store all the conversations incase there was ever a dispute over what was said within the chatbot. Those disputes are few and far between and it’s a manual lookup anyway, so we don’t need crazy read rates or wild search requirements. But storing every conversation adds up over time when it comes to storage costs. Those sorts of loads are perfect for CosmosDB (Which was formerly DocumentDB by the way just incase you wondered where that went!).

I was having a look today at all the ways you can actually talk to CosmosDB from code, and it’s actually pretty astounding how many options there are. Microsoft have done a really good job porting existing “API’s” from storage solutions like MongoDB, Cassandara and even their own Table Storage, that means you can basically swap one out for the other (Although, there’s obviously big caveats thrown in there). So I thought I would do a quick series rattling off a few different wants of talking to CosmosDB with .NET Core.

Setting Up CosmosDB

Setting up CosmosDB via the Azure portal is pretty straight forward but I do want to point out one thing. That is when creating the resource, you need to select the “API” that you want to use.

This *cannot* be changed at a later date. If you select the wrong API (For example you select MongoDB cause that sounds interesting, and then you want to connect via SQL), then you need to actually create a new resource with the correct API and migrate all the data (An absolute pain). So be careful! For this example, we are going to use the Core API (SQL).

Once created, we then need to create our “container”.

What’s A Container?

CosmosDB has the concept of a “container” which you can kind of think of as a a table. A container belongs to a database, and a database can have multiple containers. So why not just call it a table? Well because the container may be a table, or it may be a “collection” as thought of like in MongoDB, or it could be a graph etc. So we call it a container just as an overarching term for a collection of rows/items/documents etc, because CosmosDB can do them all.

Partition Keys

When creating your container, you will be asked for a Partition Key. If you’ve never used CosmosDB, or really any large data store, this may be new to you. So what makes a good Partition Key? You essentially want to pick a top level property of your item that has a distinct set of values, that can be “bucketed”. CosmosDB uses these to essentially distribute your data across multiple servers for scalability.

So two bad examples for you :

  • A GUID ID. This is bad because it can never be “bucketed”. It’s essentially always going to be unique.
  • A User “Role” where the only options are “Member” and “Administrator”. Now we have gone the opposite way where we only have 2 distinct values that we are partitioning on, but it’s going to be very lopsided with only a handful of users fitting into the Administrator bucket, and the rest going into the Member bucket.

I just want to add that for the above two, I *have* used them as Partition Keys before. They do work and IMO even though they run against the recommendations from Microsoft, they are pretty hard to shoot yourself in the foot when it comes to querying.

And a couple of good examples :

  • ZipCode (This is actually used as an example from Microsoft). There is a finite amount of zipcodes and people are spread out across them. There will be a decent amount of users in each zipcode assuming your application is widely used across the country.
  • You could use something like DepartmentId if you were creating a database of employees as another example.

Honestly, there is a lot more that goes into deciding on Partition keys. The types of filters you will be running and even consistency models go into creating partition keys. While you are learning, you should stick to the basics above, but there are entire video series dedicated to the subject, so if your datastore is going to be hitting 10+GB in size any time soon, it would be best to do further reading.

Our Example Container

For the purposes of this guide, I’m going to be using the following example. My data in JSON format I want to look like :

{
	"id" : "{guid}", 
	"name" : "Joe Blogs", 
	"address" : 
	{
		"city" : "New York", 
		"zipcode" : "90210"
	}

}

Pretty simple (and keeping with the easy Zipcode example). That means that my setup for my CosmosDB will look like so :

Nothing too crazy here!

Creating Items in C#

For the purpose of this demo, we are going to use the basic C# API to create/read items. The first thing we need to do is install the CosmosDB nuget package. So run the following from your Package Manager console :

Install-Package Microsoft.Azure.Cosmos

Next we need to model our data as a C# class. In the past we had to decorate the models with all sorts of attributes, but now they can be just plain POCO.

class People
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public Address Address { get; set; }

}
    
class Address
{
    public string City { get; set; }
    public string ZipCode { get; set; }
}

Nothing too spectacular, now onto the code to create items. Just winging it inside a console application, it looks like so :

var connectionString = "";
var client = new CosmosClientBuilder(connectionString)
                    .WithSerializerOptions(new CosmosSerializationOptions
                    {
                        PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
                    })
                    .Build();

var peopleContainer = client.GetContainer("TestDatabase", "People");

var person = new People
{
    Id = Guid.NewGuid(),
    Name = "Joe Blogs",
    Address = new Address
    {
        City = "New York",
        ZipCode = "90210"
    }
};

await peopleContainer.CreateItemAsync(person);

Now I just want to point out a couple of things. Firstly that I use the CosmosClientBuilder. I found that when I create the client and tried to change settings (Like serializer options in this case), they didn’t work, but when I used the builder, magically everything started working.

Secondly I want to point out that I’m using a specific naming policy of CamelCase. If you’ve used CosmosDB before you’ve probably seen things like :

[JsonProperty("id")]
public Guid Id { get; set; }

Littered everywhere because in C#, we use Pascalcase, but in CosmosDB the pre-defined columns are all camelCase and there was no way to override everything at once. Personally, I prefer that JSON always be camelCase, and the above serialization settings does just that.

The rest of the code should be straight forward. We get our “Container” or table, and we call CreateItem with it. And wallah :

But We Didn’t Define Schema?!

So the first thing people notice when jumping into CosmosDB (Or probably most NoSQL data stores), is that we didn’t pre-define the schema we wanted to store. No where in this process did we go and create the “table” that we could insert data to. Instead I just told CosmosDB to store what I send it, and it does. Other than the ID, everything else is optional and it really doesn’t care what it’s storing.

This is obviously great when you are working on a brand new greenfields project because you can basically riff and change things on the fly. As projects get bigger though, it can become frustrating when a developer might “try” something out and add a new column, but now half your data doesn’t have that column! You’ll find that as time goes on, your models become a hodge podge of nullable data types to handle migration scenarios or columns being added/removed.

Reading Data

There are two main ways to read data from Cosmos.

People person = null;

// Can write raw SQL, but the iteration is a little annoying. 
var iterator = peopleContainer.GetItemQueryIterator("SELECT * FROM c WHERE c.id = '852ad197-a5f1-4709-b16d-5e9019d290af' " +
                                                                "AND c.address.zipCode = '90210'");
while (iterator.HasMoreResults)
{
    foreach (var item in (await iterator.ReadNextAsync()).Resource)
    {
        person = item;
    }
}

// If you prefer Linq
person = peopleContainer.GetItemLinqQueryable(allowSynchronousQueryExecution: true)
                            .Where(p => p.Id == Guid.Parse("852ad197-a5f1-4709-b16d-5e9019d290af"))
                            .ToList().First();

So the first is for fans of Dapper and the like. Personally, I find it kinda unweildy at times to get the results I want, but it does allow for more complete control. The second is obviously using Linq.

Now I want to point something out in the Linq example. Notice that I’m calling ToList()? That’s because the Cosmos Linq provider does not support First/FirstOrDefault. In our case it’s a easy fix because we can just instead execute the query and get our list back, and then get the first item anyway. But it’s a reminder that just because something supports Linq, doesn’t mean that it supports *all* of LINQ.

Finally, I also want to say that generally speaking, every query you write against a CosmosDB should try and include the PartitionKey. Because we’ve used the ZipCode, is that really feasible in our example? Probably not. It would mean that we would have to have the ZipCode already before querying the user, rather unlikely. This is one of the tradeoffs you have to think about when picking a PartitionKey, and really even when thinking about using CosmosDB or another large datastore in general.

Up Next

In the next part of this series, I want to talk about something really cool with CosmosDB. Using it with EntityFramework!

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.