Linq? IEnumerable? We use it everyday for good reasons, performance, abstraction, developer productivity and etc. You name it. Normally everything works well until the “gotchas” of Linq and IEnumerable takes you into hours of “phantom” production debugging. The reason is that it does not produce any errors and code execution does not get interrupted. In this post, I’ll try to explain the deferred nature of Linq and IEnumerable so we can be more aware when we use them and prevent “phantom” bugs.
The Gotchas
Let’s start with a simple scenario to showcase the deferred “gotchas” of Linq and IEnumerable. We will have a simple Person
model and a method that returns a list of Person
, something similar to a repository. After returning the list of Person
we populate its FullName
property in two (2) ways:
- Use Linq’s
Select
to assign theFirstName
andLastName
to theFullName
property. - Iterate over the list using
foreach
and assign theFullName
property tonull
.
These happens in sequence. Lastly, a check will throw an error if the FullName
property is null
, otherwise it is displayed in the console. This check will highlight the “gotchas”.
Code
Person.cs
1
2
3
4
5
6
7
8
public class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string FullName { get; set; }
}
PersonRepository.cs
1
2
3
4
5
6
7
8
public class PersonRepository
{
public IEnumerable<Person> GetPersons() =>
new List<Person>
{
new Person { FirstName = "Hello", LastName = "World" }
}.AsQueryable();
}
Note: We’ll address later the need to cast the list using .AsQueryable()
. Esentially this is to replicate that the data came from a database.
Program.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public static void Main()
{
var repository = new PersonRepository();
var persons = repository.GetPersons();
persons = persons.Select(p =>
{
p.FullName = $"{p.FirstName} {p.LastName}";
return p;
});
foreach (var p in persons)
{
p.FullName = null;
}
foreach (var p in persons)
{
_ = p.FullName ?? throw new NullReferenceException();
Console.WriteLine(p.FullName);
}
Console.ReadKey();
}
Test
Just from looking at the code we know it will throw an error since it is sequential, the assignment of null
was the last operation before the check.
- Gotcha #1: The code actually prints “Hello World” in the console.
Let’s make the IEnumerable<Person>
into a list var persons = repository.GetPersons().ToList();
.
- Gotcha #2: Now the code throws an error.
Pretty weird right? We have two behaviors on sequential code that looks familiar at first glance. We’ll explore now the reason behind these.
The Why
Deferred execution or lazy evaluation what does it mean? To put it simply it means that the code does not get “runned” in the current time or as we expected based on code structure. This is the reason why the code above seems to skip some code that we expect to have some kind of effect. There are two (2) items that are lazily evaluated from the code above, IEnumerable
and Select
.
- IEnumerable - items inside it does not mean that it is in memory. IEnumerable is more of a promise that something is in memory. We can exhibit this from the note above about the
.AsQueryable()
extension. What it does is that it deferrs the storage of the items in memory for optimized query execution.
Peeking the variable at runtime it shows Expanding the Results View will enumerate the the IEnumerable
in other words materialize it to reality.
Removing the .AsQueryable()
extension essentialy making it a list and peeking at the variable now shows its count and contents. The list is now concrete and in memory.
- Linq - operations are executed when they are needed. Generally Linq extensions follows the behavior of the list its querying on. The Linq operations are deferred until the list is materialized.
Here the list is materialized using the .ToList()
extension.
When the list of Person
was iterated and the FullName
property was set to null
, it was actually performed but was later overriden by the Select
operation. The overriding happen when the FullName
property was accessed on the check.
“Operations are executed when they are needed.”
When the .ToList()
extension was added, the operation was immediately performed and the FullName
property was not overriden on the check.
Conclusion
We saw the “gotchas” of deferred execution or lazy evaluation can cause “phantom” bugs. We also explored the reasons behind it. The example is trivial, we can easily debug it, in the production environment these pieces of code are seperated in classes or even projects which is much much harder to debug and locate.
How can we assure that the code we write are evaluated as we expect them to be? We can actually follow the rule of thumb we use in async/await
in working with deferred execution or lazy evaluation.
“In
async
code weawait
something when we need the value now before proceeding in the code. On the otherhand, inIEnumerable
orLinq
we can.ToList()
something when we need the value now before proceeding in the code.”