Performance Of String Concatenation In C#

In C#, there is a grand total of 6 ways to concatenate a string. Those are :

  • Using the + (plus) sign (Including +=)
  • String.Concat
  • String.Join
  • StringBuilder
  • String.Format
  • Using String Interpolation (e.x. $”My string {variable}”).

I recently got asked about performance considerations when joining two strings together. I think everyone knows by now that using the + to join up large strings is (supposedly) a no no. But it got me thinking what actually are the performance implications? If you have two strings you want to concatenate, is it actually worth spinning up an instance of StringBuilder?

I wanted to do some quick benchmarking but by the end of the post, I ended up digging into the source code to atleast begin answering “why” things perform differently.

“Your Methodology Is Wrong!”

I don’t think I’ve ever written a benchmarking post without someone jumping on Twitter, Reddit, or some social media and pointing out how wrong I am. The thing is with benchmarking, and especially C#, there is so much “compiler magic” that happens. Things get optimized out or the compiler knows you are dumb and tries to help you out in a way you never expect.

If I’ve made a misstep somewhere, please drop a comment (Hell, plug your soundcloud while you’re at it). I always come back and add in comments where people think I’ve gone wrong and redo tests where needed. Sharing is caring after all!

My Setup

So as always, your mileage may vary when running these benchmarks yourself (but please do!). I am using an AMD Ryzen CPU with the .NET Core SDK as my runtime. Full details here :

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
AMD Ryzen 7 2700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Initial Benchmarking

For my benchmark, I’m going to try and do “single line joins”. What I mean by “single line joins” is that I have say 5 variables that I want to all join up in a long string, with a single space between them. I’m not doing this inside a loop and I have all 5 variables on hand. For this, I’m using BenchmarkDotNet.  My benchmark looks like so :

public class SingleLineJoin
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";

    [Benchmark]
    public string Interpolation()
    {
        return $"{string1} {string2} {string3} {string4} {string5}";
    }

    [Benchmark]
    public string PlusOperator()
    {
        return string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
    }

    [Benchmark]
    public string StringConcatenate()
    {
        return string.Concat(string1, " ", string2, " ", string3, " ", string4, " ", string5);
    }

    [Benchmark]
    public string StringJoin()
    {
        return string.Join(" ", string1, string2, string3, string4, string5);
    }

    [Benchmark]
    public string StringFormat()
    {
        return string.Format("{0} {1} {2} {3} {4}", string1, string2, string3, string4, string5);
    }

    [Benchmark]
    public string StringBuilderAppend()
    {
        StringBuilder builder = new StringBuilder();
        builder.Append(string1);
        builder.Append(" ");
        builder.Append(string2);
        builder.Append(" ");
        builder.Append(string3);
        builder.Append(" ");
        builder.Append(string4);
        builder.Append(" ");
        builder.Append(string5);
        return builder.ToString();
    }
}

I’d also note that StringBuilder also has methods to do things like builder.AppendJoin which is like a hybrid between appending a line to the StringBuilder object but using a string.Join to actually create the line. I’ve skipped these because if you were simply going to use the AppendJoin method, you would instead just use string.Join anyway.

And the results are here :

MethodMeanErrorStdDev
Interpolation98.58 ns1.310 ns1.225 ns
PlusOperator98.35 ns0.729 ns0.646 ns
StringConcatenate94.65 ns0.929 ns0.869 ns
StringJoin78.52 ns0.846 ns0.750 ns
StringFormat233.67 ns3.262 ns2.892 ns
StringBuilderAppend51.13 ns0.237 ns0.210 ns

Here’s the interesting thing for me. From what I can see, Interpolation, PlusOperator and Concat are roughly the same. String.Join is fast(er) with StringBuilder being the clear leader. String.Format is slowest by a mile. What’s going on here? We are going to have to do digging as to what goes on under the hood.

Digging Deeper

String.Format

Why is String.Format so slow? Well as it turns out, String.Format also uses StringBuilder behind the scenes, but it falls down to a method called “AppendFormatHelper” https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs#L1322. Now this somewhat makes sense because you have to remember, string.Format can do things like :

String.Format("Price : {0:C2}", 14.00M);//Prints $14.00 (Formats as currency). 

So it has to do far more work in trying to format the string taking into account things like formatting a currency correctly etc. Even checking for these format types takes that little bit of extra time.

String.Join

String.Join is an interesting one because the code behind the scenes in my mind doesn’t make too much sense. If you pass in an IEnumerable or a params list of objects, then it simply uses a StringBuilder and doesn’t do much else : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L161

But if you pass in params of string, it uses a char array and does some pretty low level stuff : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L204

So immediately I think… Is there a difference? Well with this benchmark :

public class StringJoinComparison
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";

    public List<string> stringList;

    [GlobalSetup]
    public void Setup()
    {
        stringList = new List<string> { string1, string2, string3, string4, string5 };
    }


    [Benchmark]
    public string StringJoin()
    {
        return string.Join(" ", string1, string2, string3, string4, string5);
    }


    [Benchmark]
    public string StringJoinList()
    {
        return string.Join(" ", stringList);
    }
}

And the results :

MethodMeanErrorStdDev
StringJoin80.32 ns0.730 ns0.683 ns
StringJoinList141.16 ns1.109 ns1.038 ns

Big difference. Infact it’s much much slower. Every now and again when I write benchmarks here, the original creator shows up and explains either A. Why I’m doing it wrong. Or B. Why it has to be this way, even with a performance hit. I would love to know what’s going on here because this one has almost a 2x difference depending on the input. Obviously there is different code behind the scenes, but it’s like a minefield here. I don’t think anyone would have suspected this.

String.Concat

Concat is very similar to Join. For example if we pass in an IEnumerable, it uses a StringBuilder : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3145

But if we pass in a params list of string, it instead falls down to the method ConcatArray : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3292

You may start noticing that a lot of methods have a call to “FastAllocateString”. Inferring from the usage and not from special knowledge that I have, it would appear that this allocates memory for the full size of the string, that is then “filled” up later on. For example given a list of strings, you already know ahead of time how large that string will be, so you can pre-allocate that memory and then simply fill in the bytes later.

Plus Operator

This one confused me a bit. I’m pretty sure from the moment I started programming in C#, I got told not to concat strings using the plus operator. But here it wasn’t so bad… Unfortunately I tried to find the source code like I’ve done above but to no avail. So I had to go on instinct to try and diagnose the issue.. Immediately I think I found it.

My hunch was that doing the operator in one big line was optimized out. So I wrote a small benchmark to test this theory :

[MemoryDiagnoser]
public class OperatorTest
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";


    [Benchmark]
    public string PlusOperatorWithResult()
    {
        var result = string1 + " ";
        result += string2 + " ";
        result += string3 + " ";
        result += string4 + " ";
        result += string5 + " ";
        return result;
    }


    [Benchmark]
    public string PlusOperator()
    {
        var result = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
        return result;
    }
}

If I’m being honest, I think there could still be some optimizer shenanigans going on here. But the idea is that with each string concat being on it’s own line, in theory it should have to create a new string each time. And the results :

MethodMeanErrorStdDevGen 0Gen 1Gen 2Allocated
PlusOperatorWithResult106.52 ns0.560 ns0.497 ns0.0459192 B
PlusOperator95.10 ns1.818 ns1.701 ns0.0324136 B

So, a little bit of a slow down which is expected, but maybe not as much as I was expecting. Obviously over time, with larger strings and more joins, this could become more problematic which I think is what people try and point out when they scream “use StringBuilder for everything!”.

Also notice that I added the MemoryDiagnoser to this benchmark to show that yes, more memory is allocated when you are mucking using the += operator as it has to create a brand new string in memory to handle this.

StringBuilder

StringBuilder’s source code can be found here : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs. It’s relatively simply in that it holds a char array until the final moment and then joins everything up right at the end. The reason it’s so fast is because you are not allocating strings until you really need it.

What surprised me most about the use of StringBuilder is that even at 5 appends (Or I guess more if we count the spaces), it’s much much faster than just using the + operator. I thought there would be some sort of breakpoint in the tens, maybe hundreds of concats that the overhead of a StringBuilder becomes more viable. But it seems “worth it”, even if you are only doing a few concats (but more on that below).

Interpolation

I actually can’t find the source code for what does Interpolation in C#. Infact I wasn’t even sure what to search. Because it’s similar to the plus operator, I assume that it’s maybe just sugar around the same piece of code that joins strings deep in the code.

Summary

So where does that leave us? Well we came in with the knowledge that StringBuilder was best practice for building strings, and we left with that still intact. We also found that even when building smaller strings, StringBuilder out performs the rest. Does that mean immediately rewrite your code to use StringBuilder everywhere? I personally doubt it. Based on readability alone, a few nano seconds might not be worth it for you.

We also walk away with the knowledge that string.Format performs extremely poorly even when we aren’t doing any special formatting. Infact we could use literally any other method to join strings together and have it be faster.

And finally, we also found that string concatenation is still a strange beast. With things like string.Concat and string.Join doing every different things depending on what you pass in. 9 times out of 10 you probably don’t even think there is a difference between passing in a IEnumerable vs Params, but there is.

17 thoughts on “Performance Of String Concatenation In C#”

  1. In the beginning you’ve asked a question about joining 2 strings and it’s unfortunate that in your benchmark you’ve actually joined 9 strings, especially since Concat is optimized for joining less than 5 strings. As for the compiler magic, plus concatenation as well as interpolation are substituted with String.Concat

    Reply
  2. what about converting a string into an object or a class object?
    how can we do that?
    i have a big delimited string and i want to convert into class object, how is this possible?

    Reply
  3. We had a discussion today with coworkers: Is StringBuilder actually better than concatenating raw strings?

    public class SingleLineJoin
    {
    	[Benchmark]
    	public string Fixe() => "a" + " " + "b" + " " + "c" + " " + "d" + " " + "e";
    
    	[Benchmark]
    	public string StringBuilderAppend()
    	{
    		StringBuilder builder = new StringBuilder();
    		builder.Append("a"); builder.Append(" "); builder.Append("b"); builder.Append(" "); builder.Append("c"); builder.Append(" "); builder.Append("d"); builder.Append(" "); builder.Append("e");
    		return builder.ToString();
    	}
    }
    
    
    |              Method |       Mean |     Error |    StdDev |
    |-------------------- |-----------:|----------:|----------:|
    |                Fixe |  0.0000 ns | 0.0000 ns | 0.0000 ns |
    | StringBuilderAppend | 56.0231 ns | 1.1893 ns | 3.3544 ns |
    

    Raw strings won over sb.Append() by ko. My teammates were fooled by that hype around StringBuilder being faster for anything.. except our precise use case.
    //Your Benchmark tutorial is awesome, thanks so much for that!

    Reply
    • Your results are valid but it’s a very very specific use case. Because your entire string are string literals (No variables, properties, method etc), they will actually be optimized out to simply be :

      public string Fixe() => “a b c d e”;

      But if you changed even one of those to a variable, they would be vastly different results.

      Reply
    • In the case of adding together several literal strings, concatenation is done as compile time. Since all the work is done at compile time, Fixe has nothing left to do at runtime since the compiler already did the work. That is why Fixe takes no time to run. It is functionally equivalent to the following.
      public string Fixe() => “a b c d e”;
      In this case the “+” operator is simply fooling the reader into thinking that there is a concatenation operation at runtime, when it is all done at compile time.

      Reply
  4. very well written. just a reminder: string interpolation also supports formatting and padding:

    $”i have {5,-10:C1} dollars.” => “i have $5.0 dollars.” (left padded due to minus sign)
    $”i have {5,10:C1} dollars.” => “i have $5.0 dollars.” (right padded)

    Reply
  5. So,
    I have a bit of code that writes a letter. In doing so, it decides if each word has room on a given line for that word.
    This method is called many times.
    Is there speed to be gained by writing that method in msil?

    Reply
  6. I think this string concatenations happen at compile time. to happen it run time, i think we need to pass them as parameters to method.
    public string PlusOperator(string string1,string string2 ,string string3 ,string string4, string string5)
    {
    return string1 + ” ” + string2 + ” ” + string3 + ” ” + string4 + ” ” + string5;

    Reply
  7. I love benchmarks! Too much speculation and parroting out there. Thanks for posting! 🙂

    In my case I just needed to append one newline byte to an existing UTF8 string and get the result as a byte array (to send over a socket). You inspired me to bench it.

    Sorry I have no idea how to format anything in these comments, so I posted a Gist: https://gist.github.com/mpaperno/58ac6c8bc81ddaa62273f80e15cf3b90

    TL;DR: The simple `+` wins out even over an array copy (I guess as close to “safe” `memcpy` as we can get with C#?). With `StringBuilder` “way” behind both methods.

    And ironically (?) `StringBuilder().AppendLine(string)` is just a hair slower than `StringBuilder(string).Append(‘\n’)`;

    I guess I have something about benching string builders… lol. If you check my other Gists you’ll find something similar for the Qt C++ library.

    Anyway, thanks again!
    -Max

    Reply
    • Ha, well I got bit by a similar optimization as Gweltaz because I had marked my input string as a const. So the `+` operator was just optimized out, even though it was happening inside a method (smart little compiler). I updated the Gist.

      I redid the tests (even added a `volatile` to the input string for good measure… lol), and now my `BlockCopy()` version is faster and leaner than all the other methods (added memory allocation profile also). Even threw in an “unsafe” call to `memcpy` (in msvcrt.dll) and it actually performed basically the same as BlockCopy() (when error and deviation are taken into account).

      Lastly, Concat() and `+` perform identically, as expected. This is what tipped me off to the above issue, because the `+` version was running 2x faster than Concat(). Meanwhile everything I could find pointed to `+` just calling Concat(). StringBuilder() is still a hog here in every respect.

      BTW, found this excellent SO answer regarding the `+` operator source code: https://stackoverflow.com/questions/58924625/where-is-the-string-operator-source-code

      Cheers,
      -Max

      Reply

Leave a Comment