First() vs. Single() for unique columns
Recently during work, my team came up with the question how to query database entries that have a unique key. Should we use First or Single? We already knew that First will make a database call where it will limit the result to 1 row. Single will limit the result to 2 rows, to determine if there is more than one result and if so, can throw an exception.
Of course Single can be useful if we really want to assure that we only get 1 row, where we expect only 1 row. But we also used it for queries where we filtered the primary key of a table:
await dbContext.Users.Where(user => user.Id == '09262314-084A-4EB2-3EA8-08DB3C28EF82').SingleAsync();
Of course we configured the property Id to be a key and therefor to be unique as well. But still the question remained: Does Single also have worse performance than First when we filter on a unique column?
Analyzing performance
To find out I analyzed the queries. We are using MSSQL as a database engine and the execution plan helped us to solve the mystery.
I included the execution plan of a query where the filter is applied to an indexed and unique column. The following query represents the query that is generated by Entity Framework for the First() method:
SELECT TOP 1 * FROM Users WHERE Id = '09262314-084A-4EB2-3EA8-08DB3C28EF82';
As you can see the database engine had to seek only 1 entry and as expected the cost lies only within seeking the index.
Now to the query that Entity Framework will generate for the Single() method:
SELECT TOP 2 * FROM Users WHERE Id = '09262314-084A-4EB2-3EA8-08DB3C28EF82';
Suprise! We get the exact same execution plan. This means even though using TOP 2, the database is smart enough to only seek the one and only entry, because it knows that there can only be one result for this query.
But of course I do not simply trust an execution plan. I also measured the performance with this query:
DECLARE @StartTime datetime2 = (SELECT SYSDATETIME())
DECLARE @cnt INT = 0;
DECLARE @cnt_total INT = 1000;
WHILE @cnt < @cnt_total
BEGIN
SELECT TOP 1 *
FROM Users
WHERE Id = '09262314-084A-4EB2-3EA8-08DB3C28EF82'
SET @cnt = @cnt + 1;
END;
DECLARE @EndTime datetime2 = (SELECT SYSDATETIME())
DECLARE @Result1 VARCHAR(100) = 'TOP 1 ms: ' + CONVERT(VARCHAR, DATEDIFF(ms, @StartTime, @EndTime) / @cnt_total)
SET @StartTime = (SELECT SYSDATETIME())
SET @cnt = 0;
WHILE @cnt < @cnt_total
BEGIN
SELECT TOP 100 *
FROM Users
WHERE Id = '09262314-084A-4EB2-3EA8-08DB3C28EF82'
SET @cnt = @cnt + 1;
END;
SET @EndTime = (SELECT SYSDATETIME())
DECLARE @Result2 VARCHAR(100) ='TOP 100 ms: ' + CONVERT(VARCHAR, DATEDIFF(ms, @StartTime, @EndTime) / @cnt_total)
PRINT @Result1
PRINT @Result2
Before running this query I filled the Users table with approximately 600’000 rows to ensure that the database has something to work with. I also didn’t use “TOP 2” for the second test but “TOP 100” to make it a bit more obvious. Also I made 1000 inserts in a loop for each test. This helps to get a better value which is not affected by other factors and therefor might be random. In the end, each total time was divided by 1000 to get the average value.
After executing the query, the result was again very clear:
TOP 1 ms: 16 TOP 100 ms: 16
You might ask yourself: Why do you even use Single() when you know that there will be only one result? The reason is, that we might forget to set a unique key. This does not only concern primary keys but can also happen for other unique indexes and even composite keys. So with Single() we can ensure that never multiple results are existing where there shouldn’t.
While we have no performance impact for filtering on unique indexes there still is one for non-unique indexes. Because the database won’t know how many possible entries can exist and will search over more rows when using Single().
Conclusion
- Use Single() over First() where it makes sense.
- For filtering on unique indexes Single() can help you to ensure that the configuration is right. It also will not impact performance.
- When filtering on columns that are not unique, Single() will likely be the wrong approach.
- This is also true for SingleOrDefault() and FirstOrDefault()
- If you know there are multiple possible results use First() over Single().