Performance Of String Concatenation In C#

In C#, there is a grand total of 6 ways to concatenate a string. Those are :

  • Using the + (plus) sign (Including +=)
  • String.Concat
  • String.Join
  • StringBuilder
  • String.Format
  • Using String Interpolation (e.x. $”My string {variable}”).

I recently got asked about performance considerations when joining two strings together. I think everyone knows by now that using the + to join up large strings is (supposedly) a no no. But it got me thinking what actually are the performance implications? If you have two strings you want to concatenate, is it actually worth spinning up an instance of StringBuilder?

I wanted to do some quick benchmarking but by the end of the post, I ended up digging into the source code to atleast begin answering “why” things perform differently.

“Your Methodology Is Wrong!”

I don’t think I’ve ever written a benchmarking post without someone jumping on Twitter, Reddit, or some social media and pointing out how wrong I am. The thing is with benchmarking, and especially C#, there is so much “compiler magic” that happens. Things get optimized out or the compiler knows you are dumb and tries to help you out in a way you never expect.

If I’ve made a misstep somewhere, please drop a comment (Hell, plug your soundcloud while you’re at it). I always come back and add in comments where people think I’ve gone wrong and redo tests where needed. Sharing is caring after all!

My Setup

So as always, your mileage may vary when running these benchmarks yourself (but please do!). I am using an AMD Ryzen CPU with the .NET Core SDK as my runtime. Full details here :

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
AMD Ryzen 7 2700X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
  DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT

Initial Benchmarking

For my benchmark, I’m going to try and do “single line joins”. What I mean by “single line joins” is that I have say 5 variables that I want to all join up in a long string, with a single space between them. I’m not doing this inside a loop and I have all 5 variables on hand. For this, I’m using BenchmarkDotNet.  My benchmark looks like so :

public class SingleLineJoin
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";

    [Benchmark]
    public string Interpolation()
    {
        return $"{string1} {string2} {string3} {string4} {string5}";
    }

    [Benchmark]
    public string PlusOperator()
    {
        return string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
    }

    [Benchmark]
    public string StringConcatenate()
    {
        return string.Concat(string1, " ", string2, " ", string3, " ", string4, " ", string5);
    }

    [Benchmark]
    public string StringJoin()
    {
        return string.Join(" ", string1, string2, string3, string4, string5);
    }

    [Benchmark]
    public string StringFormat()
    {
        return string.Format("{0} {1} {2} {3} {4}", string1, string2, string3, string4, string5);
    }

    [Benchmark]
    public string StringBuilderAppend()
    {
        StringBuilder builder = new StringBuilder();
        builder.Append(string1);
        builder.Append(" ");
        builder.Append(string2);
        builder.Append(" ");
        builder.Append(string3);
        builder.Append(" ");
        builder.Append(string4);
        builder.Append(" ");
        builder.Append(string5);
        return builder.ToString();
    }
}

I’d also note that StringBuilder also has methods to do things like builder.AppendJoin which is like a hybrid between appending a line to the StringBuilder object but using a string.Join to actually create the line. I’ve skipped these because if you were simply going to use the AppendJoin method, you would instead just use string.Join anyway.

And the results are here :

MethodMeanErrorStdDev
Interpolation98.58 ns1.310 ns1.225 ns
PlusOperator98.35 ns0.729 ns0.646 ns
StringConcatenate94.65 ns0.929 ns0.869 ns
StringJoin78.52 ns0.846 ns0.750 ns
StringFormat233.67 ns3.262 ns2.892 ns
StringBuilderAppend51.13 ns0.237 ns0.210 ns

Here’s the interesting thing for me. From what I can see, Interpolation, PlusOperator and Concat are roughly the same. String.Join is fast(er) with StringBuilder being the clear leader. String.Format is slowest by a mile. What’s going on here? We are going to have to do digging as to what goes on under the hood.

Digging Deeper

String.Format

Why is String.Format so slow? Well as it turns out, String.Format also uses StringBuilder behind the scenes, but it falls down to a method called “AppendFormatHelper” https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs#L1322. Now this somewhat makes sense because you have to remember, string.Format can do things like :

String.Format("Price : {0:C2}", 14.00M);//Prints $14.00 (Formats as currency). 

So it has to do far more work in trying to format the string taking into account things like formatting a currency correctly etc. Even checking for these format types takes that little bit of extra time.

String.Join

String.Join is an interesting one because the code behind the scenes in my mind doesn’t make too much sense. If you pass in an IEnumerable or a params list of objects, then it simply uses a StringBuilder and doesn’t do much else : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L161

But if you pass in params of string, it uses a char array and does some pretty low level stuff : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L204

So immediately I think… Is there a difference? Well with this benchmark :

public class StringJoinComparison
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";

    public List<string> stringList;

    [GlobalSetup]
    public void Setup()
    {
        stringList = new List<string> { string1, string2, string3, string4, string5 };
    }


    [Benchmark]
    public string StringJoin()
    {
        return string.Join(" ", string1, string2, string3, string4, string5);
    }


    [Benchmark]
    public string StringJoinList()
    {
        return string.Join(" ", stringList);
    }
}

And the results :

MethodMeanErrorStdDev
StringJoin80.32 ns0.730 ns0.683 ns
StringJoinList141.16 ns1.109 ns1.038 ns

Big difference. Infact it’s much much slower. Every now and again when I write benchmarks here, the original creator shows up and explains either A. Why I’m doing it wrong. Or B. Why it has to be this way, even with a performance hit. I would love to know what’s going on here because this one has almost a 2x difference depending on the input. Obviously there is different code behind the scenes, but it’s like a minefield here. I don’t think anyone would have suspected this.

String.Concat

Concat is very similar to Join. For example if we pass in an IEnumerable, it uses a StringBuilder : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3145

But if we pass in a params list of string, it instead falls down to the method ConcatArray : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/string.cs#L3292

You may start noticing that a lot of methods have a call to “FastAllocateString”. Inferring from the usage and not from special knowledge that I have, it would appear that this allocates memory for the full size of the string, that is then “filled” up later on. For example given a list of strings, you already know ahead of time how large that string will be, so you can pre-allocate that memory and then simply fill in the bytes later.

Plus Operator

This one confused me a bit. I’m pretty sure from the moment I started programming in C#, I got told not to concat strings using the plus operator. But here it wasn’t so bad… Unfortunately I tried to find the source code like I’ve done above but to no avail. So I had to go on instinct to try and diagnose the issue.. Immediately I think I found it.

My hunch was that doing the operator in one big line was optimized out. So I wrote a small benchmark to test this theory :

[MemoryDiagnoser]
public class OperatorTest
{
    public string string1 = "a";
    public string string2 = "b";
    public string string3 = "c";
    public string string4 = "d";
    public string string5 = "e";


    [Benchmark]
    public string PlusOperatorWithResult()
    {
        var result = string1 + " ";
        result += string2 + " ";
        result += string3 + " ";
        result += string4 + " ";
        result += string5 + " ";
        return result;
    }


    [Benchmark]
    public string PlusOperator()
    {
        var result = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
        return result;
    }
}

If I’m being honest, I think there could still be some optimizer shenanigans going on here. But the idea is that with each string concat being on it’s own line, in theory it should have to create a new string each time. And the results :

MethodMeanErrorStdDevGen 0Gen 1Gen 2Allocated
PlusOperatorWithResult106.52 ns0.560 ns0.497 ns0.0459192 B
PlusOperator95.10 ns1.818 ns1.701 ns0.0324136 B

So, a little bit of a slow down which is expected, but maybe not as much as I was expecting. Obviously over time, with larger strings and more joins, this could become more problematic which I think is what people try and point out when they scream “use StringBuilder for everything!”.

Also notice that I added the MemoryDiagnoser to this benchmark to show that yes, more memory is allocated when you are mucking using the += operator as it has to create a brand new string in memory to handle this.

StringBuilder

StringBuilder’s source code can be found here : https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/stringbuilder.cs. It’s relatively simply in that it holds a char array until the final moment and then joins everything up right at the end. The reason it’s so fast is because you are not allocating strings until you really need it.

What surprised me most about the use of StringBuilder is that even at 5 appends (Or I guess more if we count the spaces), it’s much much faster than just using the + operator. I thought there would be some sort of breakpoint in the tens, maybe hundreds of concats that the overhead of a StringBuilder becomes more viable. But it seems “worth it”, even if you are only doing a few concats (but more on that below).

Interpolation

I actually can’t find the source code for what does Interpolation in C#. Infact I wasn’t even sure what to search. Because it’s similar to the plus operator, I assume that it’s maybe just sugar around the same piece of code that joins strings deep in the code.

Summary

So where does that leave us? Well we came in with the knowledge that StringBuilder was best practice for building strings, and we left with that still intact. We also found that even when building smaller strings, StringBuilder out performs the rest. Does that mean immediately rewrite your code to use StringBuilder everywhere? I personally doubt it. Based on readability alone, a few nano seconds might not be worth it for you.

We also walk away with the knowledge that string.Format performs extremely poorly even when we aren’t doing any special formatting. Infact we could use literally any other method to join strings together and have it be faster.

And finally, we also found that string concatenation is still a strange beast. With things like string.Concat and string.Join doing every different things depending on what you pass in. 9 times out of 10 you probably don’t even think there is a difference between passing in a IEnumerable vs Params, but there is.

ENJOY THIS POST?
Join over 3.000 subscribers who are receiving our weekly post digest, a roundup of this weeks blog posts.
We hate spam. Your email address will not be sold or shared with anyone else.

11 comments

  1. In the beginning you’ve asked a question about joining 2 strings and it’s unfortunate that in your benchmark you’ve actually joined 9 strings, especially since Concat is optimized for joining less than 5 strings. As for the compiler magic, plus concatenation as well as interpolation are substituted with String.Concat

    1. Hey Artur,

      What makes you think it’s optimized specifically for less than 5 strings? Is there any documentation on this? The code is just a simple loop that concats the strings so there isn’t anything in particular that points to be it being faster or slower either side of 5 strings.

  2. what about converting a string into an object or a class object?
    how can we do that?
    i have a big delimited string and i want to convert into class object, how is this possible?

      1. You are absolutely correct. I didnt knew exactly the words to describe it but you helped me a lot with that. Thank you so much!!!!

  3. We had a discussion today with coworkers: Is StringBuilder actually better than concatenating raw strings?

    public class SingleLineJoin
    {
    	[Benchmark]
    	public string Fixe() => "a" + " " + "b" + " " + "c" + " " + "d" + " " + "e";
    
    	[Benchmark]
    	public string StringBuilderAppend()
    	{
    		StringBuilder builder = new StringBuilder();
    		builder.Append("a"); builder.Append(" "); builder.Append("b"); builder.Append(" "); builder.Append("c"); builder.Append(" "); builder.Append("d"); builder.Append(" "); builder.Append("e");
    		return builder.ToString();
    	}
    }
    
    
    |              Method |       Mean |     Error |    StdDev |
    |-------------------- |-----------:|----------:|----------:|
    |                Fixe |  0.0000 ns | 0.0000 ns | 0.0000 ns |
    | StringBuilderAppend | 56.0231 ns | 1.1893 ns | 3.3544 ns |
    

    Raw strings won over sb.Append() by ko. My teammates were fooled by that hype around StringBuilder being faster for anything.. except our precise use case.
    //Your Benchmark tutorial is awesome, thanks so much for that!

    1. Your results are valid but it’s a very very specific use case. Because your entire string are string literals (No variables, properties, method etc), they will actually be optimized out to simply be :

      public string Fixe() => “a b c d e”;

      But if you changed even one of those to a variable, they would be vastly different results.

  4. very well written. just a reminder: string interpolation also supports formatting and padding:

    $”i have {5,-10:C1} dollars.” => “i have $5.0 dollars.” (left padded due to minus sign)
    $”i have {5,10:C1} dollars.” => “i have $5.0 dollars.” (right padded)

Leave a Reply

Your email address will not be published. Required fields are marked *