The input is like this:
0 00:00:00,000 --> 00:00:00,000 Hello world! 1 00:00:00,000 --> 00:00:00,000 Hello world! This is my new world. 2 00:00:00,000 --> 00:00:00,000 Hello guys!
Using a clear and fast regex, I want to split that into:
Match 1: `0` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello world!` Match 1: `1` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello world! This is my new world.` Match 1: `2` Match 2: `00:00:00,000 --> 00:00:00,000` Match 3: `Hello guys!`
I use (\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n\r].+
for matching, but the problem is It does not match two line of text or more (match 3 in group 2 of the above example).
Note: If you know a way with good readability and better performance without using Regex, feel free to offer me that.
Thanks,
Alireza
Well, here's a non-regex approach:
public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
List<string> lines = new List<string>();
foreach (var line in File.ReadLines(file))
{
if (line == "")
{
// Only take action if we've actually got something to return. This
// handles files starting with blank lines, and also files with
// multiple consecutive blank lines.
if (lines.Count > 0)
{
yield return lines;
lines = new List<string>();
}
}
else
{
lines.Add(line);
}
}
// Check whether we had any trailing lines to return
if (lines.Count > 0)
{
yield return lines;
}
}
I would personally find that easier to understand than a regex, but you may have different tastes, of course.
See more on this question at Stackoverflow