This week there was a great blog post about Bing.com running on .NET Core 2.1, and the performance gains that brought along with it. Most curious to me was that they singled out the performance gains of string.Equals and string.IndexOf in .NET Core 2.1 as having the largest amount of impact to performance.
Whichever way you slice it, HTML rendering and manipulation are string-heavy workloads. String comparisons and indexing operations are major components of that. Vectorization of these operations is the single biggest contributor to the performance improvement we’ve measured.
At first I thought they must have some very special use case that runs millions of string comparisons under the hood, so it’s not going to be much use to me. But then I kind of thought how many comparison of strings must happen under the hood when building a web application. There is probably a whole lot more happening than we realize, and the singling out of string manipulation performance improvements may not be as off as I first thought.
So let’s just take their word for it and say that doing stuff on the web is a string-heavy workload in .NET. How much of a performance gain can we actually expect to see in .NET Core 2.1 for these methods? We aren’t necessarily looking at the time it takes for this functions to complete, but rather the factor of improvement that .NET Core 2.1 has over versions of .NET Full Framework.
String.Equals Performance Benchmarks
(Before reading too much into these results, see the next section entitled “String.Equals Performance Benchmarks Updated”. Some interesting stuff!)
Now we could write some huge loop and run it on each runtime one by one, or we could write a nice benchmark using BenchmarkDotNet (Guide Here), and get it all in one go.
Our benchmark looks like :
public class MultipleRuntimeConfig : ManualConfig { public MultipleRuntimeConfig() { Add(Job.Default.With(CsProjCoreToolchain.NetCoreApp21).WithBaseline(true)); Add(Job.Default.With(CsProjClassicNetToolchain.Net472)); } } [Config(typeof(MultipleRuntimeConfig))] public class StringEquals { private string String1 = "Hello World!"; private string String2 = "Hello World!"; [Benchmark] public bool IsEqual() => String1.Equals(String2); } class Program { static void Main(string[] args) { var summary = BenchmarkRunner.Run<StringEquals>(); Console.ReadLine(); } }
So a couple of things to point out. First that we are using 2 different tool chains. .NET Core 2.1 and .NET Full Framework 4.7.2. Both of which are the latest version of runtimes.
The benchmark itself is simple, we compare the string “Hello World!” to another string that says “Hello World!”. That’s it! Nothing too fancy.
Now typically with benchmarks on large pieces of code, I feel OK to run it on my own machine. While this can give you skewed results, especially if you are trying to use your computer at the same time, for big chunks of code usually I’m just looking to see if there is actually any difference what so ever, not the actual level of difference. Here, it’s a little different. We are going to be talking about differences down to the nano seconds, so we need to be far more careful.
So instead, I spun up a VM in Azure to run the benchmarks on. It’s a D2s_V3 machine, so 2 CPU cores and 8GB of ram. It’s probably pretty typical of your standard web box that you might scale up to, before starting to scale out horizontally in a web farm.
Enough waffle, what do the results look like?
Method | Toolchain | Mean | Error | Scaled |
---|---|---|---|---|
IsEqual | .NET Core 2.1 | 0.9438 ns | 0.0686 ns | 1.00 |
IsEqual | CsProjnet472 | 1.9381 ns | 0.0844 ns | 2.06 |
I ran this a couple of times to make sure… And yes, to do a string compare in full framework took twice as long to complete. And trust me, I ran this multiple times just to make sure I wasn’t doing something stupid, the results were that astounding.
Incase someone doesn’t believe me, the exact tooling as per BenchmarkDotNet that was used was :
.NET Core 2.1.2 (CoreCLR 4.6.26628.05, CoreFX 4.6.26629.01), 64bit RyuJIT
.NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3062.0
Again, prove me wrong because I couldn’t believe the results myself. Pretty unbelievable.
String.Equals Performance Benchmarks Updated (2018-08-23)
I’m always nervous when I post Benchmarks. There is so much that can go wrong, get optimized out, or have something minor completely skew the results. This time was no different. There was a couple of observations with the benchmark.
Compile time strings are interned so I think the string equal test is testing equality on the same string instance
and
You should test with longer string, that’s where the optimizations will kick in
Both good points (hat tip to Jeff Cyr). First I wanted to test the point that if I am using the same string instance, that I shouldn’t see any performance difference (Or not much anyway), because the objects will actually be the same memory space under the hood. So let’s modify our benchmark a little to :
public class StringEquals { private string String1 = new string("Hello World!".ToCharArray()); private string String2 = new string("Hello World!".ToCharArray()); [Benchmark] public bool IsEqual() => String1.Equals(String2); }
So now it’s definitely a different instance. Running the benchmarks and what do you know :
Method | Toolchain | Mean | Error | Scaled |
---|---|---|---|---|
IsEqual | .NET Core 2.1 | 7.370 ns | 0.1855 ns | 1.00 |
IsEqual | CsProjnet472 | 7.152 ns | 0.1928 ns | 0.97 |
So point proven, when the instance is different and small, there is very little performance difference. Infact .NET Core is actually slower in my benchmark, but within the range of error.
So let’s scale up the test to prove the second point. That for cases where the strings are much longer, we should see the performance benefits kick in. Our benchmark this time will look like :
[Config(typeof(MultipleRuntimeConfig))] public class StringEquals { private string String1; private string String2; [GlobalSetup] public void StringEqualsSetup() { for(int i=0; i < 100; i++) { String1 += "Hello World!"; } String2 = new string(String1.ToCharArray()); } [Benchmark] public bool IsEqual() => String1.Equals(String2); }
So the strings we will compare are 12000 long. And they are different instances. Running our benchmark we get :
Method | Toolchain | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|---|
IsEqual | .NET Core 2.1 | 128.7 ns | 4.367 ns | 12.88 ns | 1.00 | 0.00 |
IsEqual | CsProjnet472 | 211.7 ns | 6.989 ns | 20.28 ns | 1.66 | 0.24 |
This is what we expected to see, so on larger strings, there is a definite performance improvement in .NET Core.
So what are the takeaways here?
- .NET Core has done some work it seems that optimizes equality tests of strings when they are of the same instance
- For short strings, there isn’t any great performance benefit.
- For long strings, .NET Core has a substantial performance boost.
- I’m still nervous about posting benchmarks!
String.IndexOf Performance Benchmarks
Next up let’s take a look at IndexOf performance. This one was interesting because using IndexOf on a string, you can either do IndexOf(string) or IndexOf(char). And from the looks of the change (you can view the original PR into the Core Github repo here), the performance impact should only affect IndexOf(char). But this actually gives us a good opportunity to make sure that we are benchmarking correctly. Let’s include a benchmark that does an IndexOf(string) too! We should expect to see very minimal difference between .NET Core and Full Framework on this, but it would be good to see it in the numbers.
The benchmarking code is :
public class MultipleRuntimeConfig : ManualConfig { public MultipleRuntimeConfig() { Add(Job.Default.With(CsProjCoreToolchain.NetCoreApp21).WithBaseline(true)); Add(Job.Default.With(CsProjClassicNetToolchain.Net472)); } } [Config(typeof(MultipleRuntimeConfig))] public class IndexOf { public IEnumerable<string> hayStacks() { yield return haystackSmall; yield return haystackLarge; } private string haystackSmall = "Hello World!"; private string haystackLarge; public IndexOf() { for (int i = 0; i < 1000; i++) { haystackLarge += haystackSmall; } } [Benchmark] [ArgumentsSource(nameof(hayStacks))] public int IndexOfString(string haystack) => haystack.IndexOf("1"); [Benchmark] [ArgumentsSource(nameof(hayStacks))] public int IndexOfChar(string haystack) => haystack.IndexOf('1'); } class Program { static void Main(string[] args) { var summary = BenchmarkRunner.Run<IndexOf>(); Console.ReadLine(); } }
You’ll notice that in this test case, we are passing in two different arguments for each benchmark. The first is with a string that is 12 characters long, and the second is with a string that is 12,000 characters long. This was mostly because of the comment on the original PR that stated :
for longer strings, where the match is towards the end or doesn’t match at all, the gains are substantial.
Because of this I also made sure that the indexOf didn’t actually find a match at all. So we could see the maximum performance gain that this new code has in .NET Core 2.1.
And the results?
Method | Toolchain | haystack | Mean | Error | Scaled |
---|---|---|---|---|---|
IndexOfString | .NET Core 2.1 | Hello World! | 171.212 ns | 3.3849 ns | 1.00 |
IndexOfString | CsProjnet472 | Hello World! | 184.194 ns | 3.6937 ns | 1.08 |
IndexOfChar | .NET Core 2.1 | Hello World! | 7.962 ns | 0.4588 ns | 1.00 |
IndexOfChar | CsProjnet472 | Hello World! | 12.305 ns | 0.2841 ns | 1.59 |
IndexOfString | .NET Core 2.1 | Hello(…)orld! [12000] | 39,964.455 ns | 781.2495 ns | 1.00 |
IndexOfString | CsProjnet472 | Hello(…)orld! [12000] | 40,476.489 ns | 805.1209 ns | 1.01 |
IndexOfChar | .NET Core 2.1 | Hello(…)orld! [12000] | 765.894 ns | 15.2256 ns | 1.00 |
IndexOfChar | CsProjnet472 | Hello(…)orld! [12000] | 7,522.823 ns | 147.9425 ns | 9.83 |
There is a bit to take in here but here goes.
First, when the method is “IndexOfString”, we see minimal to no difference between the two runtimes. .NET Core is slightly faster, but this could be down to a whole host of factors not related to this specific method.
When we move to the IndexOfChar method, we see that when the string is small, we lop quite a bit of the average time. But if we move down to working on larger strings… wow… We are almost 10x faster in .NET Core than Full Framework. Pretty. Damn. Incredible.
Won’t This Stuff Make It Into .NET Framework?
Because much of this work actually relies on the use of C# 7.2’s new feature of Span, it’s likely it will make it’s way through eventually. But what I typically see now is that the release cycle is that much faster with .NET Core over Framework, that we see these sorts of improvements at a much more rapid pace make their way into the Core runtime, and sort of backfill their way into the framework. I’m sure at some point a reader will come across this post, and in .NET Framework version 4.8.X there is no performance difference, but by that point, there will be some other everyday method that is blazingly fast in Core, but not Framework.
Great post. Thanks for sharing. Would be greater if the last table “table-stripped”
Heh. Good point. Table is now striped 🙂