I'm trying to process (in C#) a large data file with some numeric data. Given an array of integers, how can it be split/grouped, so that previous n elements are grouped if next n (two or more) are negative numbers. For example, in the array below, two or more consecutive negative numbers should be used as a hint to group the same number of previous elements:
0
1
4
-99
-99
-99
1
2
7
9
-99
-99
3
6
8
-99
-99
5
Output:
Group 1:
0
1
4
[3 negative numbers were next, so group previous 3]
Group 2:
1
Group 3:
2
Group 4:
7
9
[2 negative numbers were next, so group previous 2, but keep any prior positive numbers (1, 2) as separate groups]
Group 5:
3
Group 6:
6
8
[2 negative numbers were next, so group previous 2, but keep any prior positive numbers (3) as separate groups]
Group 7:
5
What will be the fastest, most efficient way to process such an array of data into a new grouped collection?
I'd probably try to stream the whole thing, maintaining a buffer of "current non-negative numbers" and a count of negative numbers. Here's some code which appears to work... I'd expect it to be at least pretty efficient. I'd start with that, and if it's not efficient enough for you, look into why.
(If you do want to have the whole array in memory, you don't need to build up the numbersToEmit
list, as you can just maintain indexes instead. But given that a single list is reused, I'd expect this to be okay.)
using System;
using System.Linq;
using System.Collections.Generic;
class Program
{
static void Main(string[] args)
{
int[] input = { 0, 1, 4, -99, -99, -99, 1,
2, 7, 9, -99, -99, 3, 6, 8, -99, -99, 5 };
foreach (var group in GroupByNegativeCounts(input))
{
Console.WriteLine(string.Join(", ", group));
}
}
static IEnumerable<int[]> GroupByNegativeCounts(IEnumerable<int> input)
{
List<int> numbersToEmit = new List<int>();
int negativeCount = 0;
foreach (var value in input)
{
// We never emit anything when we see a negative number
if (value < 0)
{
negativeCount++;
}
else
{
// We emit numbers if we've previously seen negative
// numbers, and then we see a non-negative one.
if (negativeCount > 0)
{
int singles = Math.Max(numbersToEmit.Count - negativeCount, 0);
foreach (var single in numbersToEmit.Take(singles))
{
yield return new[] { single };
}
if (singles != numbersToEmit.Count)
{
yield return numbersToEmit.Skip(singles).ToArray();
}
negativeCount = 0;
numbersToEmit.Clear();
}
numbersToEmit.Add(value);
}
}
// Emit anything we've got left at the end.
foreach (var single in numbersToEmit)
{
yield return new[] { single };
}
}
}
See more on this question at Stackoverflow