This post is part of a series on using Azure CosmosDB with .NET Core
Part 1 – Introduction to CosmosDB with .NET Core
Part 2 – Azure CosmosDB with .NET Core EF Core
I haven’t used CosmosDB an awful lot over the years, but when I have, It’s been a breeze to use. It’s not a one size fits all option so forget about being a one for one replacement for something like SQL Server, but I’ve used it many a time to store large amounts of data that we “rarely” need access to. A good example is that for a chatbot project I worked on, we needed to store all the conversations incase there was ever a dispute over what was said within the chatbot. Those disputes are few and far between and it’s a manual lookup anyway, so we don’t need crazy read rates or wild search requirements. But storing every conversation adds up over time when it comes to storage costs. Those sorts of loads are perfect for CosmosDB (Which was formerly DocumentDB by the way just incase you wondered where that went!).
I was having a look today at all the ways you can actually talk to CosmosDB from code, and it’s actually pretty astounding how many options there are. Microsoft have done a really good job porting existing “API’s” from storage solutions like MongoDB, Cassandara and even their own Table Storage, that means you can basically swap one out for the other (Although, there’s obviously big caveats thrown in there). So I thought I would do a quick series rattling off a few different wants of talking to CosmosDB with .NET Core.
Setting Up CosmosDB
Setting up CosmosDB via the Azure portal is pretty straight forward but I do want to point out one thing. That is when creating the resource, you need to select the “API” that you want to use.
This *cannot* be changed at a later date. If you select the wrong API (For example you select MongoDB cause that sounds interesting, and then you want to connect via SQL), then you need to actually create a new resource with the correct API and migrate all the data (An absolute pain). So be careful! For this example, we are going to use the Core API (SQL).
Once created, we then need to create our “container”.
What’s A Container?
CosmosDB has the concept of a “container” which you can kind of think of as a a table. A container belongs to a database, and a database can have multiple containers. So why not just call it a table? Well because the container may be a table, or it may be a “collection” as thought of like in MongoDB, or it could be a graph etc. So we call it a container just as an overarching term for a collection of rows/items/documents etc, because CosmosDB can do them all.
Partition Keys
When creating your container, you will be asked for a Partition Key. If you’ve never used CosmosDB, or really any large data store, this may be new to you. So what makes a good Partition Key? You essentially want to pick a top level property of your item that has a distinct set of values, that can be “bucketed”. CosmosDB uses these to essentially distribute your data across multiple servers for scalability.
So two bad examples for you :
- A GUID ID. This is bad because it can never be “bucketed”. It’s essentially always going to be unique.
- A User “Role” where the only options are “Member” and “Administrator”. Now we have gone the opposite way where we only have 2 distinct values that we are partitioning on, but it’s going to be very lopsided with only a handful of users fitting into the Administrator bucket, and the rest going into the Member bucket.
I just want to add that for the above two, I *have* used them as Partition Keys before. They do work and IMO even though they run against the recommendations from Microsoft, they are pretty hard to shoot yourself in the foot when it comes to querying.
And a couple of good examples :
- ZipCode (This is actually used as an example from Microsoft). There is a finite amount of zipcodes and people are spread out across them. There will be a decent amount of users in each zipcode assuming your application is widely used across the country.
- You could use something like DepartmentId if you were creating a database of employees as another example.
Honestly, there is a lot more that goes into deciding on Partition keys. The types of filters you will be running and even consistency models go into creating partition keys. While you are learning, you should stick to the basics above, but there are entire video series dedicated to the subject, so if your datastore is going to be hitting 10+GB in size any time soon, it would be best to do further reading.
Our Example Container
For the purposes of this guide, I’m going to be using the following example. My data in JSON format I want to look like :
{ "id" : "{guid}", "name" : "Joe Blogs", "address" : { "city" : "New York", "zipcode" : "90210" } }
Pretty simple (and keeping with the easy Zipcode example). That means that my setup for my CosmosDB will look like so :
Nothing too crazy here!
Creating Items in C#
For the purpose of this demo, we are going to use the basic C# API to create/read items. The first thing we need to do is install the CosmosDB nuget package. So run the following from your Package Manager console :
Install-Package Microsoft.Azure.Cosmos
Next we need to model our data as a C# class. In the past we had to decorate the models with all sorts of attributes, but now they can be just plain POCO.
class People { public Guid Id { get; set; } public string Name { get; set; } public Address Address { get; set; } } class Address { public string City { get; set; } public string ZipCode { get; set; } }
Nothing too spectacular, now onto the code to create items. Just winging it inside a console application, it looks like so :
var connectionString = ""; var client = new CosmosClientBuilder(connectionString) .WithSerializerOptions(new CosmosSerializationOptions { PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase }) .Build(); var peopleContainer = client.GetContainer("TestDatabase", "People"); var person = new People { Id = Guid.NewGuid(), Name = "Joe Blogs", Address = new Address { City = "New York", ZipCode = "90210" } }; await peopleContainer.CreateItemAsync(person);
Now I just want to point out a couple of things. Firstly that I use the CosmosClientBuilder. I found that when I create the client and tried to change settings (Like serializer options in this case), they didn’t work, but when I used the builder, magically everything started working.
Secondly I want to point out that I’m using a specific naming policy of CamelCase. If you’ve used CosmosDB before you’ve probably seen things like :
[JsonProperty("id")] public Guid Id { get; set; }
Littered everywhere because in C#, we use Pascalcase, but in CosmosDB the pre-defined columns are all camelCase and there was no way to override everything at once. Personally, I prefer that JSON always be camelCase, and the above serialization settings does just that.
The rest of the code should be straight forward. We get our “Container” or table, and we call CreateItem with it. And wallah :
But We Didn’t Define Schema?!
So the first thing people notice when jumping into CosmosDB (Or probably most NoSQL data stores), is that we didn’t pre-define the schema we wanted to store. No where in this process did we go and create the “table” that we could insert data to. Instead I just told CosmosDB to store what I send it, and it does. Other than the ID, everything else is optional and it really doesn’t care what it’s storing.
This is obviously great when you are working on a brand new greenfields project because you can basically riff and change things on the fly. As projects get bigger though, it can become frustrating when a developer might “try” something out and add a new column, but now half your data doesn’t have that column! You’ll find that as time goes on, your models become a hodge podge of nullable data types to handle migration scenarios or columns being added/removed.
Reading Data
There are two main ways to read data from Cosmos.
People person = null; // Can write raw SQL, but the iteration is a little annoying. var iterator = peopleContainer.GetItemQueryIterator("SELECT * FROM c WHERE c.id = '852ad197-a5f1-4709-b16d-5e9019d290af' " + "AND c.address.zipCode = '90210'"); while (iterator.HasMoreResults) { foreach (var item in (await iterator.ReadNextAsync()).Resource) { person = item; } } // If you prefer Linq person = peopleContainer.GetItemLinqQueryable(allowSynchronousQueryExecution: true) .Where(p => p.Id == Guid.Parse("852ad197-a5f1-4709-b16d-5e9019d290af")) .ToList().First();
So the first is for fans of Dapper and the like. Personally, I find it kinda unweildy at times to get the results I want, but it does allow for more complete control. The second is obviously using Linq.
Now I want to point something out in the Linq example. Notice that I’m calling ToList()? That’s because the Cosmos Linq provider does not support First/FirstOrDefault. In our case it’s a easy fix because we can just instead execute the query and get our list back, and then get the first item anyway. But it’s a reminder that just because something supports Linq, doesn’t mean that it supports *all* of LINQ.
Finally, I also want to say that generally speaking, every query you write against a CosmosDB should try and include the PartitionKey. Because we’ve used the ZipCode, is that really feasible in our example? Probably not. It would mean that we would have to have the ZipCode already before querying the user, rather unlikely. This is one of the tradeoffs you have to think about when picking a PartitionKey, and really even when thinking about using CosmosDB or another large datastore in general.
Up Next
In the next part of this series, I want to talk about something really cool with CosmosDB. Using it with EntityFramework!
Now my question is, when I retrieve the I don’t remember the Guid Id assigned to the item. How do I go about it?
Indexes. You can create indexes on other fields to be able to query on them. Although I would say you typically can use a composite key for your partition key https://docs.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keys so that while your index is helpful, at the very least it knows what partition to inspect.
You can actually use “CosmosClientBuilder” for configuring camelCase for you JSONs.
cosmosClient = new CosmosClientBuilder(EndpointUri, PrimaryKey)
.WithSerializerOptions(new CosmosSerializationOptions { PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase })
.Build();