While helping a friend get used to EF Core, I noticed that he wasn’t using “Projections” in his LINQ code. That is, he was essentially doing a “SELECT *” and grabbing the entire model instead of just getting the few fields he needed.
“You’ll create a lot of Key Lookups doing that though…”
“What’s a Key Lookup?”
I guess I had forgotten that people starting software development these days often think that they don’t need to learn SQL because they have a fan-dangle ORM that does all the heavy lifting for them! So this is for them. Here’s what a Key Lookup is from a developer’s perspective!
This is a pretty light explanation because it reads more like a brain dump trying to help a friend understand what I’m saying rather than a tutorial, but it should be a great kickstart.
An Index Tuning Crash Course
The easiest way to understand a key lookup is to understand how indexes get used. We’ll start that by creating a table that looks like so :
Id INT FirstName NVARCHAR(250) LastName NVARCHAR(250) Birthday DATETIME
You don’t need to actually create this table as we’ll be just looking at the theory more than anything!
Now let’s say that we want to find all lastnames where someone has a firstname of “John”. We know that our SQL would look like :
SELECT LastName FROM Person WHERE FirstName = 'John'
Now we probably already know that if we have a WHERE clause on a non-PK column, we are going to want to create an index so we can find the data easier. A beginner developer might think that the following index is suitable :
CREATE NONCLUSTERED INDEX IX_Person_FirstName ON Person (FirstName)
Afterall, we’ve now created an index on the Person table with “FirstName”. But with a large amount of data in this table, and us running that query again, we can check the execution path and see the following :
So we get our seek on our index, but for some reason we then do what’s called a “Key Lookup” back on our Clustered Index. Why?
If we mouse over the Key Lookup and look at the “Output List”, it tells us what it was trying to find. In this case, it’s the “LastName”.
What has happened is that our query asks for the LastName of a person, where the Firstname is John. It can query on our index to find people named John, but once it finds them, because our index only contains Firstname (And by default the Id), it has to go back to the Clustered Index to actually find the data we want.
What we can instead do is change our index to “Include” the LastName column. The Include keyword tells SQL that we want the “data” of the column included in the index, but it doesn’t actually need to be searchable. Essentially exactly what our query is doing. It’s “searching” on FirstName, but actually needs the “data” LastName.
CREATE NONCLUSTERED INDEX IX_Person_FirstName_I_LastName ON Person (FirstName) INCLUDE (LastName)
Notice that we also renamed the Index to include what columns we are using in the “Include”. This is something I like to do but it can get unweildy as you add columns so your mileage could vary.
If we re-run this query and check the execution plan :
We are now going to the index, getting the exact data we need and returning. We don’t need to go anywhere else to fetch our data because it’s right there in the index.
Over Selecting And SELECT *
So that brings us to what I call Over Selecting and the notorious SELECT * issue. If we think about the above, what if instead of just getting the LastName, let’s say I ran the following :
SELECT * FROM Person WHERE FirstName = 'John'
Well now we are back to doing Key Lookups even if we include LastName in the index because we are missing DateOfBirth in the Index. Since it will be included in the SELECT * result set even if we don’t use it in our C# code, we would need to add it in.
But then we add another column. So we have to add that to the Index. And repeat. But are we even using those extra fields? And our index is getting pretty big now, and BLAH.
This is the key reason (for me atleast), why I only select the exact columns I need and never use SELECT *
Memory Footprint And Data Transfer
While I won’t dive into it too much, SELECT * also has performance implications just because of the amount of wasted data you are both fetching and transfering. Even if it’s in the Index, you are working with data you actually don’t need.
Here’s a great article on the amazingly named blog “Use The Index Luke” : https://use-the-index-luke.com/blog/2013-08/its-not-about-the-star-stupid. It should hopefully explain it 10x better than I ever could.
The thing is, once you start digging into SQL performance you’ll quickly realize that there are all these performance counters that you *could* measure, but you’ll have no idea which ones you *should* measure. A really good overview of all things performance monitoring can be found here in this article : 15 SQL Server Performance Counters to Monitor.
Here is where I would normally write about the zillion ways to “Project Entities” in EF Core so that you don’t run into Key Lookups, but while having a quick google around for what’s out there already, I found this incredible tutorial on Projection Magic For EF Core : https://benjii.me/2018/01/expression-projection-magic-entity-framework-core/
Again, written a heck of a lot better than I could and it’s a great reference any time you are struggling with complex projections.
An Actual SQL Tutorial
Where did I learn all of this? By the SQL Master himself “Pinal Dave” on PluralSight over here : https://www.pluralsight.com/courses/query-tuning-introduction. If the name Pinal Dave doesn’t ring a bell, his website (https://blog.sqlauthority.com/) probably does. You can’t google a SQL error message without his blog showing up on the first page. The course on Pluralsight does a great job of explaining complex SQL tuning topics in bite size chunks (Often less than 5 minutes at a time), and I highly recommend it.