I have a 1GB file containing 1 string per line.
I have to read the first 100MB such that if the boundary falls in the middle of the string, the whole of the last line gets included in the result.
What is the best way to accomplish tis in C#?
One option is to use a StreamReader
to read the lines, but check the Position
on the underlying stream:
List<string> lines = new List<string>();
using (var reader = File.OpenText("file.txt"))
{
string line;
while (reader.BaseStream.Position < DataLimit &&
(line = reader.ReadLine()) != null)
{
lines.Add(line);
}
}
There are two problems here:
StreamReader
is likely to buffer data, so the Stream
will actually be further on than the data you've read. You'll need to add some extra buffer to your limit in order to cope with this, and even then it's still not going to be very precise.Position
on each line is likely to slow things down significantly.Another alternative is to copy as much data as you definitely want to consume into a MemoryStream
, then keep reading (and converting to text) until you find the next line break, then appending that final partial-line data to the MemoryStream
and finally creating a StreamReader
around the MemoryStream
, but again that's quite fiddly.
Yet another alternative would be to create some kind of "length-limiting" stream wrapper where you'd set the limit large enough to definitely include the last line - again, this would overread somewhat, potentially.
See more on this question at Stackoverflow