In C#, there is a grand total of 6 ways to concatenate a string. Those are :
- Using the + (plus) sign (Including +=)
- String.Concat
- String.Join
- StringBuilder
- String.Format
- Using String Interpolation (e.x. $”My string {variable}”).
I recently got asked about performance considerations when joining two strings together. I think everyone knows by now that using the + to join up large strings is (supposedly) a no no. But it got me thinking what actually are the performance implications? If you have two strings you want to concatenate, is it actually worth spinning up an instance of StringBuilder?
I wanted to do some quick benchmarking but by the end of the post, I ended up digging into the source code to atleast begin answering “why” things perform differently.
“Your Methodology Is Wrong!”
I don’t think I’ve ever written a benchmarking post without someone jumping on Twitter, Reddit, or some social media and pointing out how wrong I am. The thing is with benchmarking, and especially C#, there is so much “compiler magic” that happens. Things get optimized out or the compiler knows you are dumb and tries to help you out in a way you never expect.
If I’ve made a misstep somewhere, please drop a comment (Hell, plug your soundcloud while you’re at it). I always come back and add in comments where people think I’ve gone wrong and redo tests where needed. Sharing is caring after all!
My Setup
So as always, your mileage may vary when running these benchmarks yourself (but please do!). I am using an AMD Ryzen CPU with the .NET Core SDK as my runtime. Full details here :
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362 AMD Ryzen 7 2700X, 1 CPU, 16 logical and 8 physical cores .NET Core SDK=3.1.100 [Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Initial Benchmarking
For my benchmark, I’m going to try and do “single line joins”. What I mean by “single line joins” is that I have say 5 variables that I want to all join up in a long string, with a single space between them. I’m not doing this inside a loop and I have all 5 variables on hand. For this, I’m using BenchmarkDotNet. My benchmark looks like so :
public class SingleLineJoin { public string string1 = "a"; public string string2 = "b"; public string string3 = "c"; public string string4 = "d"; public string string5 = "e"; [Benchmark] public string Interpolation() { return $"{string1} {string2} {string3} {string4} {string5}"; } [Benchmark] public string PlusOperator() { return string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5; } [Benchmark] public string StringConcatenate() { return string.Concat(string1, " ", string2, " ", string3, " ", string4, " ", string5); } [Benchmark] public string StringJoin() { return string.Join(" ", string1, string2, string3, string4, string5); } [Benchmark] public string StringFormat() { return string.Format("{0} {1} {2} {3} {4}", string1, string2, string3, string4, string5); } [Benchmark] public string StringBuilderAppend() { StringBuilder builder = new StringBuilder(); builder.Append(string1); builder.Append(" "); builder.Append(string2); builder.Append(" "); builder.Append(string3); builder.Append(" "); builder.Append(string4); builder.Append(" "); builder.Append(string5); return builder.ToString(); } }
I’d also note that StringBuilder also has methods to do things like builder.AppendJoin which is like a hybrid between appending a line to the StringBuilder object but using a string.Join to actually create the line. I’ve skipped these because if you were simply going to use the AppendJoin method, you would instead just use string.Join anyway.
And the results are here :
Method | Mean | Error | StdDev |
---|---|---|---|
Interpolation | 98.58 ns | 1.310 ns | 1.225 ns |
PlusOperator | 98.35 ns | 0.729 ns | 0.646 ns |
StringConcatenate | 94.65 ns | 0.929 ns | 0.869 ns |
StringJoin | 78.52 ns | 0.846 ns | 0.750 ns |
StringFormat | 233.67 ns | 3.262 ns | 2.892 ns |
StringBuilderAppend | 51.13 ns | 0.237 ns | 0.210 ns |
Here’s the interesting thing for me. From what I can see, Interpolation, PlusOperator and Concat are roughly the same. String.Join is fast(er) with StringBuilder being the clear leader. String.Format is slowest by a mile. What’s going on here? We are going to have to do digging as to what goes on under the hood.
Digging Deeper
String.Format
Why is String.Format so slow? Well as it turns out, String.Format also uses StringBuilder behind the scenes, but it falls down to a method called “AppendFormatHelper” https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs#L1322. Now this somewhat makes sense because you have to remember, string.Format can do things like :
String.Format("Price : {0:C2}", 14.00M);//Prints $14.00 (Formats as currency).
So it has to do far more work in trying to format the string taking into account things like formatting a currency correctly etc. Even checking for these format types takes that little bit of extra time.
String.Join
String.Join is an interesting one because the code behind the scenes in my mind doesn’t make too much sense. If you pass in an IEnumerable or a params list of objects, then it simply uses a StringBuilder and doesn’t do much else : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L161
But if you pass in params of string, it uses a char array and does some pretty low level stuff : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L204
So immediately I think… Is there a difference? Well with this benchmark :
public class StringJoinComparison { public string string1 = "a"; public string string2 = "b"; public string string3 = "c"; public string string4 = "d"; public string string5 = "e"; public List<string> stringList; [GlobalSetup] public void Setup() { stringList = new List<string> { string1, string2, string3, string4, string5 }; } [Benchmark] public string StringJoin() { return string.Join(" ", string1, string2, string3, string4, string5); } [Benchmark] public string StringJoinList() { return string.Join(" ", stringList); } }
And the results :
Method | Mean | Error | StdDev |
---|---|---|---|
StringJoin | 80.32 ns | 0.730 ns | 0.683 ns |
StringJoinList | 141.16 ns | 1.109 ns | 1.038 ns |
Big difference. Infact it’s much much slower. Every now and again when I write benchmarks here, the original creator shows up and explains either A. Why I’m doing it wrong. Or B. Why it has to be this way, even with a performance hit. I would love to know what’s going on here because this one has almost a 2x difference depending on the input. Obviously there is different code behind the scenes, but it’s like a minefield here. I don’t think anyone would have suspected this.
String.Concat
Concat is very similar to Join. For example if we pass in an IEnumerable, it uses a StringBuilder : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3145
But if we pass in a params list of string, it instead falls down to the method ConcatArray : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3292
You may start noticing that a lot of methods have a call to “FastAllocateString”. Inferring from the usage and not from special knowledge that I have, it would appear that this allocates memory for the full size of the string, that is then “filled” up later on. For example given a list of strings, you already know ahead of time how large that string will be, so you can pre-allocate that memory and then simply fill in the bytes later.
Plus Operator
This one confused me a bit. I’m pretty sure from the moment I started programming in C#, I got told not to concat strings using the plus operator. But here it wasn’t so bad… Unfortunately I tried to find the source code like I’ve done above but to no avail. So I had to go on instinct to try and diagnose the issue.. Immediately I think I found it.
My hunch was that doing the operator in one big line was optimized out. So I wrote a small benchmark to test this theory :
[MemoryDiagnoser] public class OperatorTest { public string string1 = "a"; public string string2 = "b"; public string string3 = "c"; public string string4 = "d"; public string string5 = "e"; [Benchmark] public string PlusOperatorWithResult() { var result = string1 + " "; result += string2 + " "; result += string3 + " "; result += string4 + " "; result += string5 + " "; return result; } [Benchmark] public string PlusOperator() { var result = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5; return result; } }
If I’m being honest, I think there could still be some optimizer shenanigans going on here. But the idea is that with each string concat being on it’s own line, in theory it should have to create a new string each time. And the results :
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
PlusOperatorWithResult | 106.52 ns | 0.560 ns | 0.497 ns | 0.0459 | – | – | 192 B |
PlusOperator | 95.10 ns | 1.818 ns | 1.701 ns | 0.0324 | – | – | 136 B |
So, a little bit of a slow down which is expected, but maybe not as much as I was expecting. Obviously over time, with larger strings and more joins, this could become more problematic which I think is what people try and point out when they scream “use StringBuilder for everything!”.
Also notice that I added the MemoryDiagnoser to this benchmark to show that yes, more memory is allocated when you are mucking using the += operator as it has to create a brand new string in memory to handle this.
StringBuilder
StringBuilder’s source code can be found here : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs. It’s relatively simply in that it holds a char array until the final moment and then joins everything up right at the end. The reason it’s so fast is because you are not allocating strings until you really need it.
What surprised me most about the use of StringBuilder is that even at 5 appends (Or I guess more if we count the spaces), it’s much much faster than just using the + operator. I thought there would be some sort of breakpoint in the tens, maybe hundreds of concats that the overhead of a StringBuilder becomes more viable. But it seems “worth it”, even if you are only doing a few concats (but more on that below).
Interpolation
I actually can’t find the source code for what does Interpolation in C#. Infact I wasn’t even sure what to search. Because it’s similar to the plus operator, I assume that it’s maybe just sugar around the same piece of code that joins strings deep in the code.
Summary
So where does that leave us? Well we came in with the knowledge that StringBuilder was best practice for building strings, and we left with that still intact. We also found that even when building smaller strings, StringBuilder out performs the rest. Does that mean immediately rewrite your code to use StringBuilder everywhere? I personally doubt it. Based on readability alone, a few nano seconds might not be worth it for you.
We also walk away with the knowledge that string.Format performs extremely poorly even when we aren’t doing any special formatting. Infact we could use literally any other method to join strings together and have it be faster.
And finally, we also found that string concatenation is still a strange beast. With things like string.Concat and string.Join doing every different things depending on what you pass in. 9 times out of 10 you probably don’t even think there is a difference between passing in a IEnumerable vs Params, but there is.
In the beginning you’ve asked a question about joining 2 strings and it’s unfortunate that in your benchmark you’ve actually joined 9 strings, especially since Concat is optimized for joining less than 5 strings. As for the compiler magic, plus concatenation as well as interpolation are substituted with String.Concat
Hey Artur,
What makes you think it’s optimized specifically for less than 5 strings? Is there any documentation on this? The code is just a simple loop that concats the strings so there isn’t anything in particular that points to be it being faster or slower either side of 5 strings.
As you can see in the docs https://docs.microsoft.com/en-us/dotnet/api/system.string.concat?view=netcore-3.1 there are overloads for Concat with 1, 2, 3, 4 string/object arguments in addition to the params one. When you Concat less than 5 strings those overloads will be used which internally are much simpler, it’s just pure allocate and fill.
Also, I’ve checked it with BenchmarkDotNet and the results with joining 3 string where in favor of Concat 🙂
Interesting. Here’s the source code : https://github.com/microsoft/referencesource/blob/17b97365645da62cf8a49444d979f94a59bbb155/mscorlib/system/string.cs#L3251
So up to 4 strings it manually just adds them.
Again, there are so many little magical “breakpoints” where passing one more param, or passing a slightly different type gives vastly different results.
Great article , and very usefull. Thanks !!!
what about converting a string into an object or a class object?
how can we do that?
i have a big delimited string and i want to convert into class object, how is this possible?
You are probably talking about CSV parsing : https://dotnetcoretutorials.com/2018/08/04/csv-parsing-in-net-core/
You are absolutely correct. I didnt knew exactly the words to describe it but you helped me a lot with that. Thank you so much!!!!
We had a discussion today with coworkers: Is StringBuilder actually better than concatenating raw strings?
Raw strings won over sb.Append() by ko. My teammates were fooled by that hype around StringBuilder being faster for anything.. except our precise use case.
//Your Benchmark tutorial is awesome, thanks so much for that!
Your results are valid but it’s a very very specific use case. Because your entire string are string literals (No variables, properties, method etc), they will actually be optimized out to simply be :
public string Fixe() => “a b c d e”;
But if you changed even one of those to a variable, they would be vastly different results.
In the case of adding together several literal strings, concatenation is done as compile time. Since all the work is done at compile time, Fixe has nothing left to do at runtime since the compiler already did the work. That is why Fixe takes no time to run. It is functionally equivalent to the following.
public string Fixe() => “a b c d e”;
In this case the “+” operator is simply fooling the reader into thinking that there is a concatenation operation at runtime, when it is all done at compile time.
very well written. just a reminder: string interpolation also supports formatting and padding:
$”i have {5,-10:C1} dollars.” => “i have $5.0 dollars.” (left padded due to minus sign)
$”i have {5,10:C1} dollars.” => “i have $5.0 dollars.” (right padded)
So,
I have a bit of code that writes a letter. In doing so, it decides if each word has room on a given line for that word.
This method is called many times.
Is there speed to be gained by writing that method in msil?
From the official docs https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated#compilation-of-interpolated-strings
it appears that interpolation in SingleLineJoin.Interpolation() is transformed to String.Concat() as it doesn’t contain formatting
I think this string concatenations happen at compile time. to happen it run time, i think we need to pass them as parameters to method.
public string PlusOperator(string string1,string string2 ,string string3 ,string string4, string string5)
{
return string1 + ” ” + string2 + ” ” + string3 + ” ” + string4 + ” ” + string5;
I love benchmarks! Too much speculation and parroting out there. Thanks for posting! 🙂
In my case I just needed to append one newline byte to an existing UTF8 string and get the result as a byte array (to send over a socket). You inspired me to bench it.
Sorry I have no idea how to format anything in these comments, so I posted a Gist: https://gist.github.com/mpaperno/58ac6c8bc81ddaa62273f80e15cf3b90
TL;DR: The simple `+` wins out even over an array copy (I guess as close to “safe” `memcpy` as we can get with C#?). With `StringBuilder` “way” behind both methods.
And ironically (?) `StringBuilder().AppendLine(string)` is just a hair slower than `StringBuilder(string).Append(‘\n’)`;
I guess I have something about benching string builders… lol. If you check my other Gists you’ll find something similar for the Qt C++ library.
Anyway, thanks again!
-Max
Ha, well I got bit by a similar optimization as Gweltaz because I had marked my input string as a const. So the `+` operator was just optimized out, even though it was happening inside a method (smart little compiler). I updated the Gist.
I redid the tests (even added a `volatile` to the input string for good measure… lol), and now my `BlockCopy()` version is faster and leaner than all the other methods (added memory allocation profile also). Even threw in an “unsafe” call to `memcpy` (in msvcrt.dll) and it actually performed basically the same as BlockCopy() (when error and deviation are taken into account).
Lastly, Concat() and `+` perform identically, as expected. This is what tipped me off to the above issue, because the `+` version was running 2x faster than Concat(). Meanwhile everything I could find pointed to `+` just calling Concat(). StringBuilder() is still a hog here in every respect.
BTW, found this excellent SO answer regarding the `+` operator source code: https://stackoverflow.com/questions/58924625/where-is-the-string-operator-source-code
Cheers,
-Max