ELO is a player rating system that was first used to rank chess players, but later found a lot of usage in video games, basketball, baseball and many other sports. It’s designed in a way that when underdogs win in a game, they are given more “credit” for their win than if a favourite was to win. For example if chess grandmaster Magnus Carlsen was to beat me (a non ranked player) in chess, it really shouldn’t affect his ranking. And vice versa should someone with a low rating beat the number 1 player in the world, their ranking should shoot up.

ELO is also designed in a way that means a top player can’t just continually play much lower ranked players and keep moving up the rankings at the same rate. It doesn’t mean that matchups don’t exist where a certain “style” of play (depending on the sport/game) means it’s easier or harder than their ELO lets on, but it’s meant to reward underdogs winning a whole lot more than someone picking favourable matchups.

ELO seems like it should be some massively complicated system, but it can really be boiled down to two “simple” equations.

Expectation To Win Equation

The “Expectation To Win” equation is really part of the ELO equation, but it’s worth splitting out because you often may want to see it as a stand alone number anyway.

After passing in two player ratings, we are returned the likelihood of Player 1 winning the match up. This is represented as a decimal (So a result of 0.5 would mean that Player 1 has a 50% chance to win the match up).

Some example input and output would look like :

Player 1 Rating : 1500
Player 2 Rating : 1500
Expectation For Player 1 Win : 0.5 (50% chance to win)

Player 1 Rating : 1700
Player 2 Rating : 1300
Expectation For Player 1 Win : 0.90 (90% chance to win)

ELO Rating Equation

The next step is, given two players compete against each other with a clear winner and loser, what would their new ELO ranking be? The equation for that looks a little bit like this :

Let’s break this down a little.

We pass in our two players ratings, and the outcome as seen by player 1. Once inside, we use a special number called “K”, we’ll talk a little more about this number later, but for now just think of it as a constant. We then take the outcome (Either 1 or 0), and minus out the actual expected outcome of the game. We then add this delta to player one’s rating, and subtract it away from player two. Because we are using the expected outcome as part of the equation, we can reward the underdog for winning more than if the expected winner actually wins. Let’s look at some actual examples :

Player 1 ELO : 1700
Player 2 ELO : 1300
Outcome : Player 1 Wins
Player 1 New ELO : 1702
Player 2 New ELO : 1298
ELO Shift  : 2

Player 1 ELO : 1700
Player 2 ELO : 1300
Outcome : Player 1 Loses
Player 1 New ELO : 1671
Player 2 New ELO : 1329
ELO Shift  : 29

So as we can see, when player 1 wins, they gain only 2 ELO points because they were expected to come out on top (Infact they were expected to win 90% of the time). However when they lose, they pass along 29 points to the loser which is a huge shift. This represents the underdog winning a game they were not expected to win at all.

Finding The Perfect “K”

So in our calculate method, we use a constant called “K”. In simple terms, we can think of this number of how “fast” ELO will change hands. A low number (Such as 10) will mean that ELO will fluctuate rapidly to results. A high ELO (Such as 32) will mean that ELO rankings will be much slower moving. Typically this means that for sports/games that have a low amount of games played per season (Or per career), you may want a rather low number so that results change pretty rapidly. In sports where there are possibly even hundreds of games a year, you would want a lower ELO to reflect that you are expected to lose a few games here or there, but overall you would have to go on a large losing run to really start slipping.

Other times K can change based on either the ELO rating of the players, or the amount of games already played. For example in a new “season”, with few games played, a high K rating would mean that ELO rankings would fluctuate wildly, and then as the season goes on you could lower K to stabilize the rankings. I’m not a huge fan of this as it begins to put more importance on winning games early in the season rather than the end, but it does make sense for new players in a game to be given a lower K so they can find their true ELO faster.

If you take the varying of K based on the ELO rating of the players, you can give a high ELO to lower/mid range ranked players so that they can dig themselves out of the weeds rather fast. Then lower K as you reach higher ELO’s to reflect that at the top of the rankings, things should be a bit more stabilized.

Ultimately, K should be between 10 and 32. And it will totally depend on what you are rating players in to what it should be.

I recently came across the need to host a .NET Core web app as a Windows Service. In this case, it was because each machine needed to locally be running an API. But it’s actually pretty common to have a web interface to manage an application on a PC without needing to set up IIS. For example if you install a build/release management tool such as Jenkins or TeamCity, it has a web interface to manage the builds and this is able to be done without the need for installing and configuring an additional web server on the machine.

Luckily .NET Core actually has some really good tools for accomplishing all of this (And even some really awesome stuff for being able to run a .NET Core web server by double clicking an EXE if that’s your thing).

A Standalone .NET Core Website/Web Server

The first step actually has nothing to do with Windows Services. If you think about it, all a Windows Service is, is a managed application that’s hidden in the background, will restart on a machine reboot, and if required, will also restart on erroring. That’s it! So realistically what we first want to do is build a .NET Core webserver that can be run like an application, and then later on we can work out the services part.

For the purpose of this tutorial, I’m just going to be using the default template for an ASP.net Core website. The one that looks like this :

We first need to head to the csproj file of our project and add in a specific runtime (Or multiple), and an output type. So overall my csproj file ends up looking like :

Our RuntimeIdentifiers (And importantly notice the “s” on the end there) specifies the runtimes our application can be built for. In my case I’m building only for Windows 10, but you could specify other runtime monkiers if required.

Ontop of this, we specify that we want an outputtype of exe, this is so we can have a nice complete exe to run rather than using the “dotnet run” command to start our application. I’m not 100% sure, but the exe output that comes out of this I think is simply a wrapper to boot up the actual application dll. I noticed this because when you change code and recompile, the exe doesn’t change at all, but the dll does.

Now we need to be able to publish the app as a standalone application. Why standalone? Because then it means any target machine doesn’t have to have the .NET Core runtime installed to get everything running. Ontop of that, there is no “what version do you have installed?” type talk. It’s just double click and run.

To publish a .NET Core app as standalone, you need to run the following command from the project directory in a command prompt/powershell :

It should be rather self explanatory. We are doing a publish, using the release configuration, we pass through the self contained flag, and we pass through that the runtime we are building for is Windows 10 – 64 Bit.

From your project directory, you can head to :  \bin\Release\netcoreapp2.1\win10-x64\publish

This contains your application exe as well as all framework DLL’s to run without the need for a runtime to be installed on the machine. It’s important to note that you should be inside the Publish folder. One level up is also an exe but this is not standalone and relies on the runtime being installed.

From your publish folder, try double clicking yourapplication.exe.

In your browser head to http://localhost:5000 and wallah, you now have your website running from an executable. You can copy and paste this publish folder onto any Windows 10 machine, even a fresh install, and have it spin up a webserver hosting your website. Pretty impressive!

Installing As A Window Service

So the next part of this tutorial is actually kinda straight forward. Now that you have an executable that hosts your website, installing it as a service is exactly the same as setting up any regular application as a service. But we will try and have some niceties to go along with it.

First we need to do a couple of code changes for our app to run both as a service, and still be OK running as an executable (Both for debugging purposes, and in case we want to run in a console window and not as a service).

We need to install the following from your package manager console :

Next we need to go into our program.exe and make your main method look like the following :

This does a couple of things :

  • It checks whether we are using the debugger, or if we have a console argument of “–console” passed in.
  • If neither of the above are true, it sets the content root manually back to where the exe is running. This is specifically for the service runtime.
  • Next if we are a service, we use a special “RunAsService()” method that .NET Core gives us
  • Otherwise we just do a “Run()” as normal.

Obviously the main point of this is that if the debugger is attached (e.g. we are running from visual studio), or we run from a command prompt with the flag “–console”, it’s going to run exactly the same as before. Back in the day we used to have to run the service with a 10 second sleep at the start of the app, and quickly try and attach the debugger to the process before it kicked off to be able to set breakpoints etc. Now it’s just so much easier.

Now let’s actually get this thing installed!

In your project in Visual Studio (Or your favourite editor) add a file called install.bat to your project. The contents of this file should be :

Obviously replace MyService with the name of your service, and be sure to rename the exe to the actual name of your applications exe. Leave the %~dp0 part as this refers to the current batch path (Allowing you to just double click the batch file when you want to install).

The install file creates the service, sets up failure restarts (Although these won’t really be needed), starts the service, and sets the service to auto start in the future if the machine reboots for any reason.

Go ahead and create an uninstall.bat file in your project. This should look like :

Why the timeout? I sometimes found that it took a while to stop the service, and so giving it a little bit of a break inbetween stopping and deleting helped it along it’s way.

Important! For both of these files, be sure to set them up so they copy to the output directory in Visual Studio. Without this, your bat files won’t output to your publish directory.

Go ahead and publish your application again using our command from earlier :

Now in your publish directory, you will find your install and uninstall bat files. You will need to run both of these as Administrator for them to work as installing Windows Services requires elevated access. A good idea is that the first time you run these, you run them from a command prompt so you can catch any errors that happen.

Once installed, you should be able to browse to http://localhost:5000 and see your website running silently in the background. And again, the best part is when you restart your machine, it starts automatically. Perfect!

This week there was a great blog post about Bing.com running on .NET Core 2.1, and the performance gains that brought along with it. Most curious to me was that they singled out the performance gains of string.Equals and string.IndexOf in .NET Core 2.1 as having the largest amount of impact to performance.

Whichever way you slice it, HTML rendering and manipulation are string-heavy workloads. String comparisons and indexing operations are major components of that. Vectorization of these operations is the single biggest contributor to the performance improvement we’ve measured.

At first I thought they must have some very special use case that runs millions of string comparisons under the hood, so it’s not going to be much use to me. But then I kind of thought how many comparison of strings must happen under the hood when building a web application. There is probably a whole lot more happening than we realize, and the singling out of string manipulation performance improvements may not be as off as I first thought.

So let’s just take their word for it and say that doing stuff on the web is a string-heavy workload in .NET. How much of a performance gain can we actually expect to see in .NET Core 2.1 for these methods? We aren’t necessarily looking at the time it takes for this functions to complete, but rather the factor of improvement that .NET Core 2.1 has over versions of .NET Full Framework.

String.Equals Performance Benchmarks

(Before reading too much into these results, see the next section entitled “String.Equals Performance Benchmarks Updated”. Some interesting stuff!)

Now we could write some huge loop and run it on each runtime one by one, or we could write a nice benchmark using BenchmarkDotNet (Guide Here), and get it all in one go.

Our benchmark looks like :

So a couple of things to point out. First that we are using 2 different tool chains. .NET Core 2.1 and .NET Full Framework 4.7.2. Both of which are the latest version of runtimes.

The benchmark itself is simple, we compare the string “Hello World!” to another string that says “Hello World!”. That’s it! Nothing too fancy.

Now typically with benchmarks on large pieces of code, I feel OK to run it on my own machine. While this can give you skewed results, especially if you are trying to use your computer at the same time, for big chunks of code usually I’m just looking to see if there is actually any difference what so ever, not the actual level of difference. Here, it’s a little different. We are going to be talking about differences down to the nano seconds, so we need to be far more careful.

So instead, I spun up a VM in Azure to run the benchmarks on. It’s a D2s_V3 machine, so 2 CPU cores and 8GB of ram. It’s probably pretty typical of your standard web box that you might scale up to, before starting to scale out horizontally in a web farm.

Enough waffle, what do the results look like?

Method Toolchain Mean Error Scaled
IsEqual .NET Core 2.1 0.9438 ns 0.0686 ns 1.00
IsEqual CsProjnet472 1.9381 ns 0.0844 ns 2.06

I ran this a couple of times to make sure… And yes, to do a string compare in full framework took twice as long to complete. And trust me, I ran this multiple times just to make sure I wasn’t doing something stupid, the results were that astounding.

Incase someone doesn’t believe me, the exact tooling as per BenchmarkDotNet that was used was :

.NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
.NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0

Again, prove me wrong because I couldn’t believe the results myself. Pretty unbelievable.

String.Equals Performance Benchmarks Updated (2018-08-23)

I’m always nervous when I post Benchmarks. There is so much that can go wrong, get optimized out, or have something minor completely skew the results. This time was no different. There was a couple of observations with the benchmark.

Compile time strings are interned so I think the string equal test is testing equality on the same string instance

and

You should test with longer string, that’s where the optimizations will kick in

Both good points (hat tip to Jeff Cyr). First I wanted to test the point that if I am using the same string instance, that I shouldn’t see any performance difference (Or not much anyway), because the objects will actually be the same memory space under the hood. So let’s modify our benchmark a little to :

So now it’s definitely a different instance. Running the benchmarks and what do you know :

Method Toolchain Mean Error Scaled
IsEqual .NET Core 2.1 7.370 ns 0.1855 ns 1.00
IsEqual CsProjnet472 7.152 ns 0.1928 ns 0.97

So point proven, when the instance is different and small, there is very little performance difference. Infact .NET Core is actually slower in my benchmark, but within the range of error.

So let’s scale up the test to prove the second point. That for cases where the strings are much longer, we should see the performance benefits kick in. Our benchmark this time will look like :

So the strings we will compare are 12000 long. And they are different instances. Running our benchmark we get :

Method Toolchain Mean Error StdDev Scaled ScaledSD
IsEqual .NET Core 2.1 128.7 ns 4.367 ns 12.88 ns 1.00 0.00
IsEqual CsProjnet472 211.7 ns 6.989 ns 20.28 ns 1.66 0.24

This is what we expected to see, so on larger strings, there is a definite performance improvement in .NET Core.

So what are the takeaways here?

  1. .NET Core has done some work it seems that optimizes equality tests of strings when they are of the same instance
  2. For short strings, there isn’t any great performance benefit.
  3. For long strings, .NET Core has a substantial performance boost.
  4. I’m still nervous about posting benchmarks!

String.IndexOf Performance Benchmarks

Next up let’s take a look at IndexOf performance. This one was interesting because using IndexOf on a string, you can either do IndexOf(string) or IndexOf(char). And from the looks of the change (you can view the original PR into the Core Github repo here), the performance impact should only affect IndexOf(char). But this actually gives us a good opportunity to make sure that we are benchmarking correctly. Let’s include a benchmark that does an IndexOf(string) too! We should expect to see very minimal difference between .NET Core and Full Framework on this, but it would be good to see it in the numbers.

The benchmarking code is :

You’ll notice that in this test case, we are passing in two different arguments for each benchmark. The first is with a string that is 12 characters long, and the second is with a string that is 12,000 characters long. This was mostly because of the comment on the original PR that stated :

for longer strings, where the match is towards the end or doesn’t match at all, the gains are substantial.

Because of this I also made sure that the indexOf didn’t actually find a match at all. So we could see the maximum performance gain that this new code has in .NET Core 2.1.

And the results?

Method Toolchain haystack Mean Error Scaled
IndexOfString .NET Core 2.1 Hello World! 171.212 ns 3.3849 ns 1.00
IndexOfString CsProjnet472 Hello World! 184.194 ns 3.6937 ns 1.08
IndexOfChar .NET Core 2.1 Hello World! 7.962 ns 0.4588 ns 1.00
IndexOfChar CsProjnet472 Hello World! 12.305 ns 0.2841 ns 1.59
IndexOfString .NET Core 2.1 Hello(…)orld! [12000] 39,964.455 ns 781.2495 ns 1.00
IndexOfString CsProjnet472 Hello(…)orld! [12000] 40,476.489 ns 805.1209 ns 1.01
IndexOfChar .NET Core 2.1 Hello(…)orld! [12000] 765.894 ns 15.2256 ns 1.00
IndexOfChar CsProjnet472 Hello(…)orld! [12000] 7,522.823 ns 147.9425 ns 9.83

There is a bit to take in here but here goes.

First, when the method is “IndexOfString”, we see minimal to no difference between the two runtimes. .NET Core is slightly faster, but this could be down to a whole host of factors not related to this specific method.

When we move to the IndexOfChar method, we see that when the string is small, we lop quite a bit of the average time. But if we move down to working on larger strings… wow… We are almost 10x faster in .NET Core than Full Framework. Pretty. Damn. Incredible.

Won’t This Stuff Make It Into .NET Framework?

Because much of this work actually relies on the use of C# 7.2’s new feature of Span, it’s likely it will make it’s way through eventually. But what I typically see now is that the release cycle is that much faster with .NET Core over Framework, that we see these sorts of improvements at a much more rapid pace make their way into the Core runtime, and sort of backfill their way into the framework. I’m sure at some point a reader will come across this post, and in .NET Framework version 4.8.X there is no performance difference, but by that point, there will be some other everyday method that is blazingly fast in Core, but not Framework.

Getting a mime type based on a file name (Or file extension), is one of those weird things you never really think about until you really, really need it. I recently ran into the issue when trying to return a file from an API (Probably not the best practice, but I had to make it happen), and I wanted to specify the mime type to the caller. I was amazed with how things “used” to be done both in the .NET Framework, and people’s answers on Stack Overflow.

How We Used To Work Out The Mime Type Based On a File Name (aka The Old Way)

If you were using the .NET Framework, you had two ways to get going. Now I know this is a .NET Core blog, but I still found it interesting to see how we got to where we are now.

The first way is that you build up a huge dictionary yourself of mappings between file extensions and mime types. This actually isn’t a bad way of doing things if you only expect a few different types of files need to be mapped.

The second was that in the System.Web namespace of the .NET Framework there is a static class for mapping classes. We can actually see the source code for this mapping here : https://referencesource.microsoft.com/#system.web/MimeMapping.cs. If you were expecting some sort of mime mapping magic to be happening well, just check out this code snippet.

400+ lines of manual mappings that were copied and pasted from the default IIS7 list. So, not that great.

But the main issue with all of this is that it’s too hard (close to impossible) to add and remove custom mappings. So if your file extension isn’t in the list, you are out of luck.

The .NET Core Way

.NET Core obviously has it’s own way of doing things that may seem a bit more complicated but does work well.

First, we need to install the following nuget package :

Annoyingly the class we want to use lives inside this static files nuget package. I would say if that becomes an issue for you, to look at the source code and make it work for you in whatever way you need. But for now, let’s use the package.

Now we have access to a nifty little class called FileExtensionContentTypeProvider . Here’s some example code using it. I’ve created a simple API action that takes a filename, and returns the mime type :

Nothing too crazy and it works! We also catch if it doesn’t manage to map it, and just map it ourselves to a default content type. This is one thing that the .NET Framework MimeMapping class did have, was that if it couldn’t find the correct mapping, it returned application/octet-stream. But I can see how this is far more definitive as to what’s going on.

But here’s the thing, if we look at the source code of this here, we can see we are no better off in terms of doing things by “magic”, it’s still one big dictionary under the hood. And the really interesting part? We can actually add our own mappings! Let’s modify our code a bit :

I’ve gone mad with power and created a new file extension called .dnct and mapped it to it’s own mimetype. Everything is a cinch!

But our last problem. What if we want to use this in multiple places? What if we need better control for unit testing that “instantiating” everytime won’t really give us? Let’s create a nice mime type mapping service!

We could create this static, but then we lose a little flexibility around unit testing. So I’m going to create an interface too. Our service looks like so :

So we provide a single method called “Map”. And when creating our MimeMappingService, we take in a content service provider.

Now we need to head to our startup.cs and in our ConfigureServices method we need to wire up the service. That looks a bit like this :

So we instantiate our FileExtensionContentTypeProvider, give it our extra mappings, then bind our MimeMappingService all up so it can be injected.

In our controller we change out code to look a bit like this :

Nice and clean. And it means that any time we inject our MimeMappingService around, it has all our customer mappings contained within it!

.NET Core Static Files

There is one extra little piece of info I should really give out too. And that is if you are using the .NET Core static files middleware to serve raw files, you can also use this provider to return the correct mime type. So for example you can do things like this :

So now when outside of C# code, and we are just serving the raw file of type .dnct, we will still return the correct MimeType.

It’s almost a right of passage for a junior developer to cludge together their own CSV parser using a simple string.Split(‘,’), and then subsequently work out that there is a bit more to this whole CSV thing than just separating out values by a comma. I recently took a look at my options for parsing CSV’s in .NET Core, here’s what I found!

CSV Gotchas

There are a couple of CSV gotchas that have to be brought up before we dive deeper. Hopefully they should go ahead and explain why rolling your own is sometimes more pain than it’s worth.

  • A CSV may or may not have a header row. If there is a header row, then the order of the columns is not important since you can detect what is actually in each column. If there is no header row, then you rely on the order of the columns being the same. Any CSV parser should be able to both read columns based on a “header” value, and by index.
  • Any field may be contained in quotes. However fields that contain a line-break, comma, or quotation marks must be contained in quotes.
  • To re-emphasize the above, line breaks within a field are allowed within a CSV as long as they are wrapped in quotation marks, this is what trips most people up who are simply reading line by line like it’s a regular text file.
  • Quote marks within a field are notated by doing double quote marks (As opposed to say an escape character like a back slash).
  • Each row should have the same amount of columns, however in the RFC this is labelled as a “should” and not a “must”.
  • While yes, the C in CSV stands for comma, ideally a CSV parser can also handle things like TSV (That is using tabs instead of commas).

And obviously this is just for parsing CSV files into primitives, but in something like .NET we will also be needing :

  • Deserializing into a list of objects
  • Handling of Enum values
  • Custom mappings (So the header value may or may not match the name of the class property in C#)
  • Mapping of nested objects

Setup For Testing

For testing out these libraries, I wanted a typical scenario for importing. So this included the use of different primitive types (strings, decimals etc), the usage of a line break within a string (Valid as long as it’s contained within quotes), the use of quotes within a field, and the use of a “nested” object in C#.

So my C# model ended up looking like :

And our CSV file ended up looking like :

So a couple of things to point out :

  • Our “Make” is enclosed in quotes but our model is not. Both are valid.
  • The “Type” is actually an enum and should be deserialized as such
  • The “Comment” field is a bit of a pain. It contains quotes, it has a line break, and in our C# code it’s actually a nested object.

All of this in my opinion is a pretty common setup for a CSV file import, so let’s see how we go.

CSV Libraries

So we’ve read all of the above and we’ve decided that rolling our own library is just too damn hard. Someone’s probably already solved these issues for us right? As it happens, yes they have. Let’s take a look at what’s out there in terms of libraries. Realistically, I think there is only two that really matter, CSVHelper and TinyCSVParser, so let’s narrow our focus down to them.

CSVHelper

Website : https://joshclose.github.io/CsvHelper/
CSVHelper is the granddaddy of working with CSV in C#. I figured I would have to do a little bit of manual mapping for the comment field, but hopefully the rest would all work out of the box. Well… I didn’t even have to do any mapping at all. It managed to work everything out itself with nothing but the following code :

Really really impressive. And it handled double quotes, new lines, and enum parsing all on it’s own.

The thing that wow’d me the most about this library is how extensible it is. It can handle completely custom mappings, custom type conversion (So you could split a field into stay a dictionary or a child list), and even had really impressive error handling options.

The fact that in 3 lines of code, I’m basically done with the CSV parsing really points out how good this thing is. I could go on and on about it’s features, but we do have one more parser to test!

Tiny CSV Parser

Website : http://bytefish.github.io/TinyCsvParser/index.html
Next up was Tiny CSV Parser. Someone had pointed me to this one supposedly because it’s speed was supposed to be pretty impressive. We’ll take a look at that in a bit, but for now we just wanted to see how it handled our file.

That’s where things started to fall down. It seems that Tiny CSV doesn’t have “auto” mappings, and instead you have to create a class to map them yourself. Mine ended up looking like this :

So, right from the outset there is a bit of an overhead here. I also don’t like how I have to map a specific row index instead of a row heading. You will also notice that I had to create my own type converter for the nested comment object. I feel like this was expected, but the documentation doesn’t specify how to actually create this and I had to delve into the source code to work out what I needed to do.

And once we run it, oof! While it does handle almost all scenario’s, it doesn’t handle line breaks in a field. Removing the line break did mean that we could parse everything else, including the enums, double quotes, and (eventually) the nested object.

The code to actually parse the file (Together with the mappings above) was :

So not a hell of a lot to actually parse the CSV, but it does require a bit of setup. Overall, I didn’t think it was even in the same league as CSVHelper.

Benchmarking

The thing is, while I felt CSVHelper was miles more user friendly than Tiny CSV, the entire reason the latter was recommended to me was because it’s supposedly faster. So let’s put that to the test.

I’m using BenchmarkDotNet (Guide here) to do my benchmarking. And my code looks like the following :

A quick note on this, is that I tried to keep it fairly simple, but also I had to ensure that I used “ToList()” to make sure that I was actually getting the complete list back, even though this adds quite a bit of overhead. Without it, I get returned an IEnumerable that might not actually be enumerated at all.

First I tried a 100,000 line CSV file that used our “complicated” format above (Without line breaks in fields however as TinyCSVParser does not handle these).

TinyCSVParser was quite a bit faster. (4x infact). And when it came to memory usage, it was way down on CSVHelper.

Let’s see what happens when we up the size to a million rows :

It seems pretty linear here in terms of both memory allocated, and the time it takes to complete. What’s mostly interesting here is that the file itself is only 7.5MB big at a million rows, and yet we are allocating close to 2.5GB to handle it all, pretty crazy!

So, the performant status of TinyCSV definitely hold up, it’s much faster and uses far less memory, but for my mind CSVHelper is still the much more user friendly library if you are processing smaller files.

I was recently looking into the new Channel<T>  API in .NET Core (For an upcoming post), but while writing it up, I wanted to do a quick refresher of all the existing “queues” in .NET Core. These queues are also available in full framework (And possibly other platforms), but all examples are written in .NET Core so your mileage may vary if you are trying to run them on a different platform.

FIFO vs LIFO

Before we jump into the .NET specifics, we should talk about the concept of FIFO or LIFO, or “First In, First Out” and “Last In, Last Out”. For the concept of queues, we typically think of FIFO. So the first message put into the queue, is the first one that comes out. Essentially processing messages as they go into a queue. The concept of LIFO, is typically rare when it comes to queues, but in .NET there is a type called Stack<T>  that works with LIFO. That is, after filling the stack with messages/objects, the last one put in would then be the first one out. Essentially the order would be reversed.

Queue<T>

Queue<T>  is going to be our barebones simple queue in .NET Core. It takes messages, and then pops them out in order. Here’s a quick code example :

Pretty stock standard and not a lot of hidden meaning here. The Enqueue  method puts a message on our queue, and the Dequeue  method takes one off (In a FIFO manner). Our console app obviously prints out two lines, “Hello” then “World!”.

Barring multi threaded scenarios (Which we will talk about shortly), you’re not going to find too many reasons to use this barebones queue. In a single threaded app, you might pass around a queue to process a “list” of messages, but you may find that using a List<T>  within a loop is a simpler way of achieving the same result. Infact if you look at the source code of Queue, you will see it’s actually just an implementation of IEnumerable anyway!

So how about multi threaded scenarios? It kind of makes sense that you may want to load up a queue with items, and then have multiple threads all trying to process the messages. Well using a queue in this manner is actually not threadsafe, but .NET has a different type to handle multi threading…

ConcurrentQueue<T>

ConcurrentQueue<T>  is pretty similar to Queue<T> , but is made threadsafe by a copious amount of spinlocks. A common misconception is that ConcurrentQueues are just a wrapper around a queue with the use of the lock  keyword. A quick look at the source code here shows that’s definitely not the case. Why do I feel the need to point this out? Because I often see people try and make their use of Queue<T>  threadsafe by using locks, thinking that they are doing what Microsoft does when using ConcurrentQueue, but that’s pretty far from the truth and actually takes a pretty big performance hit when doing so.

Here’s a code sample of a ConcurrentQueue :

So you’ll notice we can no longer just dequeue a message, we need to TryDequeue. It will return true if we managed to pop a message, and false if there is no message to pop.

Again, the main point of using a ConcurrentQueue over a regular Queue is that it’s threadsafe to have multiple consumers (Or producers/enqueuers) all using it at the same time.

BlockingCollection<T>

A blocking collection is an interesting “wrapper” type that can go over the top of any IProducerConsumerCollection<T>  type (Of which Queue<T>  and ConcurrentQueue<T>  are both). This can be handy if you have your own implementation of a queue, but for most cases you can roll with the default constructor of BlockingCollection. When doing this, it uses a ConcurrentQueue<T> under the hood making everything threadsafe (See source code here). The main reason to use a BlockingCollection is that it has a limit to how many items can sit in the queue/collection. Obviously this is beneficial if your producer is much faster than your consumers.

Let’s take a quick look :

What will happen with this code? You will see “Adding Hello”, “Adding World!”, and then nothing… Your application will just hang. The reason is this line :

We’ve initialized the collection to be a max size of 2. If we try and add an item where the collection is already at this size, we will just wait until a message is dequeued. How long will we wait? Well by default, forever. However we can change our add line to be :

So we’ve changed our Add call to TryAdd, and we’ve specified a timespan to wait. If this timespan is hit, then the TryAdd method will return false to let us know we weren’t able to add the item to the collection. This is handy if you need to alert someone that your queue is overloaded (e.g. the consumers are stalled for whatever reason).

Stack<T>

As we talked about earlier, a Stack<T> type allows for a Last In, First Out (LIFO) queuing style. Consider the following code :

The output would be “World!” then “Hello”. It’s rare that you would need this reversal of messages, but it does happen. Stack<T>  also has it’s companion in ConcurrentStack<T> , and you can initialize BlockingCollection with a ConcurrentStack within it.

Channel<T>

There is a brand new Channel<T> type released with .NET Core. Because it’s just so different from the existing queue types in .NET, I’ll have an upcoming post discussing a tonne more about how they work, and why you might use them. In the meantime the only documentation I can find is on Github from Stephen Toub here. Have a look, and see if it works for you until the next post!

While working on an API that was built specifically for mobile clients, I ran into an interesting problem that I couldn’t believe I hadn’t found before. When working on a REST API that deals exclusively in JSON payloads, how do you upload images? Which naturally leads onto the next question, should a JSON API then begin accepting multipart form data? Is that not going to look weird that for every endpoint, we accept JSON payloads, but then for this one we accept a multipart form? Because we will want to be uploading metadata with the image, we are going to have to read out formdata values too. That seems so 2000’s! So let’s see what we can do.

While some of the examples within this post are going to be using .NET Core, I think this really applies to any language that is being used to build a RESTful API. So even if you aren’t a C# guru, read on!

Initial Thoughts

My initial thoughts sort of boiled down to a couple of points.

  • The API I was working on had been hardcoded in a couple of areas to really be forcing the whole JSON payload thing. Adding in the ability to accept formdata/multipart forms would be a little bit of work (and regression testing).
  • We had custom JSON serializers for things like decimal rounding that would somehow manually need to be done for form data endpoints if required. We are even using snake_case as property names in the API (dang iOS developers!), which would have to be done differently in the form data post.
  • And finally, is there any way to just serialize what would have been sent under a multi-part form post, and include it in a JSON payload?

What Else Is Out There?

It really became clear that I didn’t know what I was doing. So like any good developer, I looked to copy. So I took a look at the public API’s of the three social media giants.

Twitter

API Doc : https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload

Twitter has two different ways to upload files. The first is sort of a “chunked” way, which I assume is because you can upload some pretty large videos these days. And a more simple way for just uploading general images, let’s focus on the latter.

It’s a multi-part form, but returns JSON. Boo.

The very very interesting part about the API however, is that it allows uploading the actual data in two ways. Either you can upload the raw binary data as you typically would in a multipart form post, or you could actually serialise the file as a Base64 encoded string, and send that as a parameter.

Base64 encoding a file was interesting to me because theoretically (And we we will see later, definitely), we can send this string data any way we like. I would say that of all the C# SDKs I looked at, I couldn’t find any actually using this Base64 method, so there weren’t any great examples to go off.

Another interesting point about this API is that you are uploading “media”, and then at a later date attaching that to an actual object (For example a tweet). So if you wanted to tweet out an image, it seems like you would (correct me if I’m wrong) upload an image, get the ID returned, and then create a tweet object that references that media ID. For my use case, I certainly didn’t want to do a two step process like this.

LinkedIn

API Doc : https://developer.linkedin.com/docs/guide/v2/shares/rich-media-shares#upload

LinkedIn was interesting because it’s a pure JSON API. All data POSTs contain JSON payloads, similar to the API I was creating. Wouldn’t you guess it, they use a multipart form data too!

Similar to Twitter, they also have this concept of uploading the file first, and attaching it to where you actually want it to end up second. And I totally get that, it’s just not what I want to do.

Facebook

API Doc : https://developers.facebook.com/docs/graph-api/photo-uploads

Facebook uses a Graph API. So while I wanted to take a look at how they did things, so much of their API is not really relevant in a RESTful world. They do use multi-part forms to upload data, but it’s kinda hard to say how or why that is the case,. Also at this point, I couldn’t get my mind off how Twitter did things!

So Where Does That Leave Us?

Well, in a weird way I think I got what I expected, That multipart forms were well and truly alive. It didn’t seem like there was any great innovation in this area. In some cases, the use of multipart forms didn’t look so brutal because they didn’t need to upload metadata at the same time. Therefore simply sending a file with no attached data didn’t look so out of place in a JSON API. However, I did want to send metadata in the same payload as the image, not have it as a two step process.

Twitter’s use of Base64 encoding intrigued me. It seemed like a pretty good option for sending data across the wire irrespective of how you were formatting the payload. You could send a Base64 string as JSON, XML or Form Data and it would all be handled the same. It’s definitely proof of concept time!

Base64 JSON API POC

What we want to do is just test that we can upload images as a Base64 string, and we don’t have any major issues within a super simple scenario. Note that these examples are in C# .NET Core, but again, if you are using any other language it should be fairly simple to translate these.

First, we need our upload JSON Model. In C# it would be :

Not a whole lot to it. Just a description field that can be freetext for a user to describe the image they are upload, and an imagedata field that will hold our Base64 string.

For our controller :

Again, fairly damn simple. We take in the model, then C# has a great way to convert that string into a byte array, or to read it into a memory stream. Also note that as we are just building a proof of concept, I echo out the image data to make sure that it’s been received, read, and output like I expect it would, but not a whole lot else.

Now let’s open up postman, our JSON payload is going to look a bit like :

I’ve obviously truncated imagedata down here, but a super simple tool to turn an image into a Base64 is something like this website here. I would also note that when you send your payload, it should be without the data:image/jpeg;base64, prefix that you sometimes see with online tools that convert images to strings.

Hit send in Postman and :

Great! So my image upload worked and the picture of my cat was echo’d back to me! At this point I was actually kinda surprised that it could be that easy.

Something that became very evident while doing this though, was that the payload size was much larger than the original image. In my case, the image itself is 109KB, but the Base64 version was 149KB. So about 136% of the original image. In having a quick search around, it seems expected that a Base64 version of a file would be about 33% bigger than the original. When it comes to larger files, I think less about sending 33% more across the wire, but more the fact of reading the file into memory, then converting it into a huge string, and then writing that out… It could cause a few issues. But for a few basic images, I’m comfortable with a 33% increase.

I will also note that there was a few code snippets around for using BSON or Protobuf to do the same thing, and may actually cut down the payload size substantially. The mentality would be the same, a JSON payload with a “stringify’d” file.

Cleaning Up Our Code Using JSON Converters

One thing that I didn’t like in our POC was that we are using a string that almost certainly will be converted to a byte array every single time. The great thing about using a JSON library such as JSON.net in C#, is that how the client sees the model and how our backend code sees the model doesn’t necessarily have to be the exact same. So let’s see if we can turn that string into a byte array on an API POST automagically.

First we need to create a “Custom JSON Converter” class. That code looks like :

Fairly simple, all we are doing is taking a value and converting it from a string into a byte array. Also note that we are only worried about reading JSON payloads here, we don’t care about writing as we never write out our image as Base64 (yet).

Next, we had back to our model and we apply the custom JSON Converter.

Note we also change the “type” of our ImageData field to a byte array rather than a string. So even though our postman test will still send a string, by the time it actually gets to us, it will be a byte array.

We will also need to modify our Controller code too :

So it becomes even simpler. We no longer need to bother handling the Base64 encoded string anymore, the JSON converter will handle it for us.

And that’s it! Sending the exact same payload will still work and we have one less piece of plumbing to do if we decide to add more endpoints to accept file uploads. Now you are probably thinking “Yeah but if I add in a new endpoint with a model, I still need to remember to add the JsonConverter attribute”, which is true. But at the same time, it means if in the future you decide to swap to BSON instead of Base64, you aren’t going to have to go to a tonne of places and work out how you are handling the incoming strings, it’s all in one handy place.

For the past few years, everytime I’ve started a new project there has been one sure fire class that I will copy and paste in on the first day. That has been my “TestingContext”. It’s sort of this one class unit testing helper that I can’t do without. Today, I’m going to go into a bit of detail about what it is, and why I think it’s so damn awesome.

First off, let’s start about what I think beginners get wrong about unit testing, and what veterans begin to hate about unit testing.

The Problem

The number one mistake I see junior developers making when writing unit tests is that they go “outside the scope” of the class they are testing. That is, if they are testing one particular class, let’s say ServiceA, and that has a method that calls ServiceB. Your test actually should never ever enter ServiceB (There is always exceptions, but very very rare). You are testing the logic for ServiceA, so why should it go and actually run code and logic for ServiceB. Furthermore, if your tests are written for ServiceA, and ServiceB’s logic is changed, will that affect any of your tests? The answer should be no, but it often isn’t the case. So my first goal was :

Any testing helper should limit the testing scope to a single class only.

A common annoyance with unit tests is that when a constructor changes, all unit tests are probably going to go bust even if the new parameter has nothing to do with that particular unit test. I’ve seen people argue that if a constructor argument changes, that all unit tests should have to change, otherwise the service itself is obviously doing too many things. I really disagree with this. Unless you are writing pretty close to one method per class, there is always going to be times where a new service or repository is injected into a constructor that doesn’t really change anything about the existing code. If anything, sticking to SOLID principles, the class is open for extension but not modification. So the next goal was :

Changes to a constructor of a class should not require editing all tests for that class.

Next, when writing tests, you should try and limit the amount of boilerplate code in the test class itself. It’s pretty common to see a whole heap of Mock instantiations clogging up a test setup class. So much so it becomes hard to see exactly what is boilerplate and going through the motions, and what is important setup code that needs to be there. On top of that, as the class changes, I’ll often find old test classes where half of the private class level variables are assigned, but are actually never used in tests as the class was modified. So finally :

Boilerplate code within the test class should be kept to a minimum.

Building The Solution

I could explain the hell out of why I did what I did and the iterations I went through to get here, but let’s just see the code.

First, you’re gonna need to install the following Nuget package:

This actually does most of the work for us. It’s an auto mocking package that means you don’t have to create mocks for every single constructor variable regardless of whether it’s used or not. You can read more about the specific package here : https://github.com/AutoFixture/AutoFixture. On it’s own it gets us pretty close to solving our problem set, but it doesn’t get us all the way there. For that we need just a tiny bit of a wrapper.

And that wrapper code :

Essentially it’s an abstract class that your test class can inherit from, that provides a simple way to abstract away everything about mocking. It means our tests require minimal boilerplate code, and rarely has to change based on class extension. But let’s take an actual look how this thing goes.

Testing Context In Action

To show you how the testing context works, we’ll create a quick couple of test classes.

First we just have a repository that returns names, then we have a service that has a couple of methods on it that interact with the repository, or in some cases a utility class.

Now two things here, the first being that the TestService takes in an ITestRepository (Interface) and UtilityService (class), so this could get a bit gnarly under normal circumstances because you can’t mock a concrete class. And second, the first method in the service, “GetNamesExceptJohn” doesn’t actually use this UtilityService at all. So I don’t want to have to mess about injecting in the class when it’s not going to be used at all. I would normally say you should always try and inject an interface, but in some cases if you are using a third party library that isn’t possible. So it’s more here as an example of how to get around that problem.

Now onto the tests. Our first test looks like so :

The first thing you’ll notice that we inherit from our TestingContext, and pass in exactly what class we are going to be testing. This means that it feels intuitive that the only thing we are going to be writing tests for is this single class. While it’s not impossible to run methods from other classes in here, it sort of acts as blinders to keep you focused on one single thing.

Our test setup calls the base.Setup() method which just preps up our testing context. More so, it clears out all the data from previous tests so everything is stand alone.

And finally, our actual test. We simply ask the context to get a mock for a particular interface. In the background it’s either going to return one that we created earlier (More on that shortly), or it will return a brand new one for us to setup. Then we run “ClassUnderTest” with the method we are looking to test, and that’s it! In the background it takes any mocks we have requested, and creates an instance of our class for us. We don’t have to run any constructors at all! How easy is that.

Let’s take a look at another test :

In this test, we are doing everything pretty similar, but instead are injecting in an actual class. Again, this is not a good way to do things, you should try and be mocking as much as possible, but there are two reasons that you may have to inject a concrete class.

  1. You want to write a fake for this class instead of mocking it.
  2. You are using a library that doesn’t have an interface and so you are forced to inject the concrete class.

I think for that second one, this is more a stop gap measure. I’ve dabbled taking out the ability to inject in classes, but there is always been atleast one test per project that just needed that little bit of extensibility so I left it in.

The Niceties

There is a couple of really nice things about the context that I haven’t really pointed out too much yet, so I’ll just highlight them here.

Re-using Mocks

Under the hood, the context keeps a list of mocks you’ve already generated. This means you can reuse them without having to have private variables fly all over the place. You might have had code that looked a bit like this in the past :

You can now rewrite like :

This really comes into it’s own when your setup method typically contained a huge list of mocks that you setup at the start, then you would set a class level variable to be re-used in a method. Now you don’t have to do that at all. If you get a mock in a setup method, you can request that mock again in the actual test method.

Constructor Changes Don’t Affect Tests

You might have seen on the first test we wrote above, even though the constructor required 2 arguments, we only bothered mocking the one thing we cared about for the method under test. Everything else we can let the fixture handle.

How often do you see things like this in your tests?

And then someone comes along and adds a new constructor argument, and you just throw a null onto the end again? It’s a huge pain point of mine and makes tests almost unreadable at times.

Test Framework Agnostic

While in the above tests I used NUnit to write my tests, the context itself doesn’t require any particular testing framework. It can work with NUnit, MSTest and XUnit.

Revisiting Our Problems

Let’s go full circle, and revisit the problems that I found with Unit Testing.

Any testing helper should limit the testing scope to a single class only.

I think we covered this pretty well! Because we pass the class we are looking to test into our testing context, it basically abstracts away being able to call other classes.

Changes to a constructor of a class should not require editing all tests for that class.

We definitely have this. Changes to the constructor don’t affect our test, and we don’t have to setup mocks for things we are uninterested in.

Boilerplate code within the test class should be kept to a minimum.

This is definitely here. No more 50 line setup methods just setting up mocks incase we need them later. We only setup what we need, and the re-usability of mocks means that we don’t even need a bag full of private variables to hold our mock instances.

What’s Your Thoughts?

Drop a comment below with your thoughts on the testing context!

At Microsoft Build today, it was announced that a Windows Desktop “pack” or “addon” would be released for .NET Core. It’s important to note that this is a sort of bolt on to .NET Core and actually won’t be part of .NET Core itself. It’s also important to note that this will not make Desktop Applications cross platform. It’s intended that the desktop apps built on top of .NET Core are still Windows only as they have always been (This is usually due to the various drawing libraries of the operating systems).

So you may ask yourself what’s the point? Well..

  • .NET Core has made huge performance improvements for everyday structs and classes within the framework. For example Dictionaries, Enums and Boxing operations are all now much faster on .NET Core 2.1
  • .NET Core comes with it’s own CLI and tooling improvements that you may prefer over the bloated .NET Framework style. For example a much cleaner .csproj experience.
  • Easy to test different .NET Core runtimes on a single machine due to how .NET Core allows multiple runtimes on a single machine.
  • You can bundle .NET Core with your desktop application so the target machine doesn’t require a runtime already. You can bundle .NET Framework with desktop applications, but it basically just does a quick install beforehand.

I think most of all is going to be the speed of .NET Core releases. At this point .NET Core is creating releases at a breakneck speed while the next minor release of the .NET Framework (4.7.2 -> 4.8) is expected to ship in about 12 months. That’s a very slow release schedule compared to Core. While Core doesn’t have too many additional features that .NET Framework doesn’t have, it likely will start drifting apart in feature parity before too long. That’s slightly a taboo subject at times, and it’s actually come up before when Microsoft wanted to discontinue support for running ASP.net Core applications on full framework. Microsoft did cave to pressure that time around, but it’s simply undeniable that Core is moving at a faster pace than the full Framework right now.

You can read the official announcement on the MSDN Blog here : https://blogs.msdn.microsoft.com/dotnet/2018/05/07/net-core-3-and-support-for-windows-desktop-applications/

I want to start off this post by saying if you are starting a new .NET Core project and you are looking to use a ServiceLocator. Don’t. There are numerous posts out there discussing how using a ServiceLocator is an “anti-pattern” and what not, and frankly I find anything that uses a ServiceLocator a right pain in the ass to test. Realistically in a brand new .NET Core project, you have dependency injection out of the box without the need to use any sort of static ServiceLocator. It’s simply not needed.

But, if you are trying to port across some existing code that already uses a ServiceLocator, it may not be as easy to wave a magic wand across it all and make everything work within the confines of .NET Core’s dependency injection model. And for that, we will have to work out a new way to “shim” the ServiceLocator in.

An important thing to note is that the “existing” code I refer to in this post is the “ServiceLocator” class inside the “Microsoft.Practices” library. Which itself is also part of the “Enterprise Library”. It’s a little confusing because this library is then dragged along with DI frameworks like Unity back in the day, so it’s hard to pinpoint exactly what ServiceLocator you are using. The easiest way is, are you calling something that looks like this :

If the answer is yes, then you are 99% likely using the Microsoft.Practices ServiceLocator. If you are using a different service locator but it’s still a static class, you can probably still follow along but change the method signature to your needs.

Creating Our Service Locator Shim

The first thing we are going to do is create a class that simply matches our existing ServiceLocator structure and method signatures. We want to create it so it’s essentially a drop in for our existing ServiceLocator so all the method names and properties should match up perfectly. The class looks like :

It can be a bit confusing because we are mixing in static methods with instance ones. But let’s walk through it.

On the static end, we only have one method that is SetLocatorProvider , this allows us to pass in a ServiceProvider  instance that will be used for all service location requests. ServiceProvider is the built in DI that comes with .NET Core (We’ll take a look at how we hook it up in a second). We also have a static property called Current  that simply creates an actual instance of ServiceLocator, providing us with access to the “instance” methods.

Once we have an instance of the ServiceLocator class, we then gain access to the GetInstance  methods, which perfectly match the existing ones of the old ServiceLocator class. Awesome!

Wiring It Up To .NET Core Service Provider

The next part is easy! In our ConfigureServices method of our startup.cs. We need to set the LocatorProvider. It looks like so :

So all we are doing is passing in an instance of our ServiceProvider and this will be used to fetch any instances that are required going forward.

This is actually all we need to do. If you have existing code that utilizes the ServiceLocator, barring a change to any “Using” statements to be swapped over, you should actually be all ready to go!

Testing It Out

Let’s give things a quick test to make sure it’s all working as intended.

I’m going to create a simple test class with a matching interface.

I need to wire this up in my ConfigureServices method of startup.cs

Then I’m just going to test everything on a simple API endpoint in my .NET Core web app.

Give it a run and…

All up and running!