I’ve recently been playing around with all of the new features packaged into C# 7.2. One such feature that piqued my interest because of it’s simplicity was the “in” keyword. It’s one of those things that you can get away with never using in your day to day work, but makes complete sense when looking at language design from a high level.
In simple terms, the in keyword specifies that you are passing a parameter by reference, but also that you will not modify the value inside a method. For example :
public static int Add(in int number1, in int number2) { return number1 + number2; }
In this example we pass in two parameters essentially by reference, but we also specify that they will not be modified within the method. If we try and do something like the following :
public static int Add(in int number1, in int number2) { number1 = 5; return number1 + number2; }
We get a compile time error:
Cannot assign to variable 'in int' because it is a readonly variable
Accessing C# 7.2 Features
I think I should just quickly note how to actually access C# 7.2 features as they may not be immediately available on your machine. The first step is to ensure that Visual Studio is up to date on your machine. If you are able to compile C# 7.2 code, but intellisense is acting up and not recognizing new features, 99% of the time you just need to update Visual Studio.
Once updated, inside the csproj of your project, add in the following :
<PropertyGroup> <LangVersion>7.2</LangVersion> </PropertyGroup>
And you are done!
Performance
When passing value types into a method, normally this would be copied to a new memory location and you would have a clone of the value passed into a method. When using the in keyword, you will be passing the same reference into a method instead of having to create a copy. While the performance benefit may be small in simple business applications, in a tight loop this could easily add up.
But just how much performance gain are we going to see? I could take some really smart people’s word for it, or I could do a little bit of benchmarking. I’m going to use BenchmarkDotNet (Guide here) to compare performance when passing a value type into a method normally, or as an in parameter.
The benchmarking code is :
public struct Input { public decimal Number1; public decimal Number2; } public class InBenchmarking { const int loops = 50000000; Input inputInstance; public InBenchmarking() { inputInstance = new Input { }; } [Benchmark] public decimal DoSomethingInLoop() { decimal result = 0M; for (int i = 0; i < loops; i++) { result = DoSomethingIn(in inputInstance); } return result; } [Benchmark(Baseline = true)] public decimal DoSomethingLoop() { decimal result = 0M; for (int i = 0; i < loops; i++) { result = DoSomething(inputInstance); } return result; } public decimal DoSomething(Input input) { return input.Number1; } public decimal DoSomethingIn(in Input input) { return input.Number1; } public decimal DoSomethingRef(ref Input input) { return input.Number1; } }
And the results :
Method | Mean | Error | StdDev | Scaled | ScaledSD | ------------------ |---------:|----------:|----------:|-------:|---------:| DoSomethingInLoop | 20.89 ms | 0.4177 ms | 0.7845 ms | 0.34 | 0.02 | DoSomethingLoop | 62.06 ms | 1.5826 ms | 2.6003 ms | 1.00 | 0.00 |
We can definitely see the speed difference here. This makes sense because really all we are doing is passing in a variable and doing nothing else. It has to be said that I can’t see the in keyword being used to optimize code in everyday business applications, but there is definitely something there for time critical applications such as large scale number crunchers.
Explicit In Design
While the performance benefits are OK something else that comes to mind is that when you use in is that you are being explicit in your design. By that I mean that you are laying out exactly how you intend the application to function. “If I pass this a variable into this method, I don’t expect the variable to change”. I can see this being a bigger benefit to large business applications than any small performance gain.
A way to look at it is how we use things like private and readonly . Our code will generally work if we just make everything public and move on, but it’s not seen as “good” programming habits. We use things like readonly to explicitly say how we expect things to run (We don’t expect this variable to be modified outside of the constructor etc). And I can definitely see in being used in a similar sort of way.
Comparisons To “ref” (and “out”)
A comparison could be made to the ref keyword in C# (And possibly to a lesser extend the out keyword). The main differences are :
in – Passes a variable in to a method by reference. Cannot be set inside the method.
ref – Passes a variable into a method by reference. Can be set/changed inside the method.
out – Only used for output from a method. Can (and must) be set inside the method.
So it certainly looks like the ref keyword is almost the same as in, except that it allows a variable to change it’s value. But to check that, let’s run our performance test from earlier but instead add in a ref scenario.
Method | Mean | Error | StdDev | Scaled | ScaledSD | ------------------- |---------:|----------:|----------:|-------:|---------:| DoSomethingInLoop | 23.26 ms | 0.6591 ms | 1.0643 ms | 0.61 | 0.04 | DoSomethingRefLoop | 21.10 ms | 0.3985 ms | 0.4092 ms | 0.41 | 0.02 | DoSomethingLoop | 51.36 ms | 1.0188 ms | 2.5372 ms | 1.00 | 0.00 |
So it’s close when it comes to performance benchmarking. However again, in is still more explicit than ref because it tells the developer that while it’s allowing a variable to be passed in by reference, it’s not going to change the value at all.
Important Performance Notes For In
While writing the performance tests for this post, I kept running into instances where using in gave absolutely no performance benefit whatsoever compared to passing by value. I was pulling my hair out trying to understand exactly what was going on.
It wasn’t until I took a step back and thought about how in could work under the hood, coupled with a stackoverflow question or two later that I had the nut cracked. Consider the following code :
struct MyStruct { public int MyValue { get; set; } public void UpdateMyValue(int value) { MyValue = value; } } class Program { static void Main(string[] args) { MyStruct myStruct = new MyStruct(); myStruct.UpdateMyValue(1); UpdateMyValue(myStruct); Console.WriteLine(myStruct.MyValue); Console.ReadLine(); } static void UpdateMyValue(in MyStruct myStruct) { myStruct.UpdateMyValue(5); } }
What will the output be? If you guessed 1. You would be correct! So what’s going on here? We definitely told our struct to set it’s value to 1, then we passed it by reference via the in keyword to another method, and that told the struct to update it’s value to 5. And everything compiled and ran happily so we should be good right? We should see the output as 5 surely?
The problem is, C# has no way of knowing when it calls a method (or a getter) on a struct whether that will also modify the values/state of it. What it instead does is create what’s called a “defensive copy”. When you run a method/getter on a struct, it creates a clone of the struct that was passed in and runs the method instead on the clone. This means that the original copy stays exactly the same as it was passed in, and the caller can still count on the value it passed in not being modified.
Now where this creates a bit of a jam is that if we are cloning the struct (Even as a defensive copy), then the performance gain is lost. We still get the design time niceness of knowing that what we pass in won’t be modified, but at the end of the day we may aswell passed in by value when it comes to performance. You’ll see in my tests, I used plain old variables to avoid this issue. If you are not using structs at all and instead using plain value types, you avoid this issue altogether.
To me I think this could crop up in the future as a bit of a problem. A method may inadvertently run a method that modifies the structs state, and now it’s running off “different” data than what the caller is expecting. I also question how this will work in multi-threaded scenarios. What if the caller goes away and modified the struct, expecting the method to get the updated value, but it’s created a defensive clone? Plenty to ponder (And code out to test in the future).
Summary
So will there be a huge awakening of using the in keyword in C#? I’m not sure. When Expression-bodied Members (EBD) came along I didn’t think too much of them but they are used in almost every piece of C# code these days. But unlike EBD, the in keyword doesn’t save typing or having to type cruft, so it could just be resigned to one of those “best practices” lists. Most of all I’m interested in how in transforms over future versions of C# (Because I really think it has to). What do you think?