Which is better performance wise: Union of several similar lists, or Distinct on a really big list with lots of duplicates?

Little bit of background:

I'm making a small app to demonstrate the use of LINQ, so I'm supposed to use mostly LINQ methods. The app is going to show some info about movies and TV shows and make suggestions based on filters.

I've made three classes: TvShow, Season and Episode. TvShow contains List of Season and Season contains List of Episode. Episode contains its List of Actor which is the cast for that episode. I want to make a method in class TvShow that returns a complete list of cast based on individual episode's list of cast.

I decided to use either Union or Distinct, but I'm not sure which approach is better performance wise, since I believe that's the only reason to pick one over the other in this very example (I know performance is not really an issue on an app this small, but I'd like to know how this would behave on a much larger scale).

Here are both methods:

    public List<Actor> AllCast()
    {
        List<Actor> actors = new List<Actor>();
        foreach (Season s in seasons)
        {
            s.Episodes.ForEach(e => actors.AddRange(e.Cast));
        }
        return actors.Distinct().ToList();
    }

OR

    public List<Actor> AllCast()
    {
        List<Actor> actors = new List<Actor>();
        foreach(Season s in seasons)
        {
            s.Episodes.ForEach(e => actors.AddRange(actors.Union(e.Cast)));
        }
        return actors;
    }

The thoughts I'm having are really: is it better to keep adding several lists to one big list and then going through that giant list and returning only distinct values OR is it better to go through one small and one growing list and compare values to find a union (I assume that's how Union finds its result), and then adding it to an already unique list?

P.S. I'm aware of HashSet, but I'd really like to use LINQ here because that's sort of the purpose of my project.

Jon Skeet
people
quotationmark

Your second approach needs to internally build a new HashSet for each season, comparing the actors in that season with all the actors we've seen before - I'd expect that to be slower than doing a single pass over all actors, putting them into one set to obtain uniqueness.

I'd use SelectMany twice to achieve it all in LINQ though:

public List<Actor> AllCast() =>
    seasons                         // All seasons
       .SelectMany(s => s.Episodes) // All episodes as a flat sequence
       .SelectMany(e => e.Cast)     // All actors as a flat sequence
       .Distinct()                  // Distinct
       .ToList(); 

people

See more on this question at Stackoverflow