My name
is
Jon Skeet

Why the number of calls generated by LINQ FirstOrDefault or First

I've noticed some performance issues with our application and have tracked it down to sheer number of calls to ID properties on our classes.

I've set up a sample to explain.

We have two classes, Person and Address.

I take and create 10 instances of each, all having ID fields (Person_Id, Address_Id).

In the case of this sample, Person_Id of 1 maps to Address_Id of 1.

In order to link these together I have a read-only property in Person of 'Address' and it returns the associated address object by performing a LINQ query against the collection of addresses. For simplicity I'm returning addresses where the Address_Id = the Person_Id since I have an equal # of items in each list and this is for testing.

public Address Address
{
    get
    {
        return Addresses.FirstOrDefault(a => a.Address_Id == Person_Id);
    }
}

Person_Id is a public property with a private backing field. Very simple.

private int _person_Id;
public int Person_Id 
{
    get
    {
        return _person_Id;
    }
    set
    {
        _person_Id = value;
    }
}

When tracking the number of times the get inside of Person_Id is called, the amount is always higher than the number of person records. In this case I'm iterating the list of person records and outputting the name and state of the person.

foreach (var person in persons)
{
   var name = person.Name;
   var state = person.Address.State;
   Console.WriteLine(name + "\t" + state);
}

Here is how the number of calls break down based on the number of person entities iterated:

calls based on # of records

Reviewing the math, we can see that adding the # of address calls for the current entity we are on and above adds to the total calls for Person_Id. For example: If we have 5 person records iterated, there are 5 calls to the get of the 'Address' property of person and 15 calls to the get of the 'Person_Id' property of person. 15 is (5 + 4 + 3 + 2 + 1), the summation of the calls to 'Address.'

I am curious where these numbers are coming from. It is the same for FirstOrDefault and Find. If I use Single the calls are much higher.

If I instead create a local variable such as this:

int personId = Person_Id;

And then use it in the LINQ query:

return Addresses.Find(a => a.Address_Id == personId);

Then the calls are a 1 to 1 -- I have 1 call to Address and Person_Id as I would have expected from the LINQ query.

Does anyone know why the calls are inflated in this way? I'm interested to learn more as I go through the process of optimizing.

Thanks

You're basically saying "for each address within Addresses, evaluate this predicate, until the predicate returns true, at which point return that address."

The predicate is the lambda expression, which uses the Person_Id property, so it has to evaluate it every time.

Or to put it another way, suppose you used a normal method instead of a lambda expression in order to create the predicate:

public Address Address
{
    get
    {
        Predicate<Address> predicate = new Predicate<Address>(AddressIdMatches);
        return Addresses.FirstOrDefault(predicate);
    }
}

private boolean AddressIdMatches(Address a)
{
    return a.Address_Id == Person_Id;
}

Is that clearer? The method will be called once per address, and it's hopefully obvious that every time you call that method, it's going to evaluate Person_Id. That's what the compiler is basically building for you when you use the lambda expression.

See more on this question at Stackoverflow