Getting a count of unique strings from a List<string[]> into a dictionary

I want to input a List<string[]> and

The output is a dictionary where the keys are unique strings used for an index and the values is an array of floats with each position in the array representing the count of the key for a string[] in the List<string[]>

So far here is what I attempted

static class CT
{
    //Counts all terms in array
    public static Dictionary<string, float[]> Termfreq(List<string[]> text)
    {
        List<string> unique = new List<string>();

        foreach (string[] s in text)
        {
            List<string> groups = s.Distinct().ToList();
            unique.AddRange(groups);
        }

        string[] index = unique.Distinct().ToArray();

        Dictionary<string, float[]> countset = new Dictionary<string, float[]>();


         return countset;
    }

}



 static void Main()
    {
        /* local variable definition */


        List<string[]> doc = new List<string[]>();
        string[] a = { "That", "is", "a", "cat" };
        string[] b = { "That", "bat", "flew","over","the", "cat" };
        doc.Add(a);
        doc.Add(b);

       // Console.WriteLine(doc);


        Dictionary<string, float[]> ret = CT.Termfreq(doc);

        foreach (KeyValuePair<string, float[]> kvp in ret)
        {
            Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);

        }


        Console.ReadLine();

    }

I got stuck on the dictionary part. What is the most effective way to implement this?

Jon Skeet
people
quotationmark

It sounds like you could use something like:

var dictionary = doc
    .SelectMany(array => array)
    .Distinct()
    .ToDictionary(word => word,
                  word => doc.Select(array => array.Count(x => x == word))
                             .ToArray());

In other words, first find the distinct set of words, then for each word, create a mapping.

To create a mapping, look at each array in the original document, and find the count of the occurrences of the word in that array. (So each array maps to an int.) Use LINQ to perform that mapping over the whole document, with ToArray creating an int[] for a particular word... and that's the value for that word's dictionary entry.

Note that this creates a Dictionary<string, int[]> rather than a Dictionary<string, float[]> - it seems more sensible to me, but you could always cast the result of Count to float if you really wanted to.

people

See more on this question at Stackoverflow