FileInfo.Length != sum of all line length

I'm trying to make a progress bar for big file's reading. I set the progress bar's maximum value to FileInfo.Length, I read each line using StreamReader.ReadLine and compute the sum of each line length (with String.Length) to set the progress bar's current value.

What I noticed is that there is a difference between the file's total length and the sum of the length of each line. For example : FileInfo.Length= 25577646 Sum of all line length = 25510563

Why is there such a difference ?

Thanks for your help !

Jon Skeet
people
quotationmark

Two problems here:

  • string.Length gives you the number of characters in each string, whereas FileInfo.Length gives you the number of bytes. Those can be very different things, depending on the characters and the encoding used
  • You're not including the line breaks (typically \n or \r\n) as those are removed when reading lines with TextReader.ReadLine

In terms of what to do about this...

  • You presumably know the file's encoding, so you could convert each line back into bytes by calling Encoding.GetBytes to account for that difference. It would be pretty wasteful to do this though.
  • If you know the line break used by the file, you could just add the relevant number of bytes for each line you read
  • You could keep a reference to the underlying stream and use Stream.Position to detect how far through the file you've actually read. That won't necessarily be the same as the amount of data you've processed though, as the StreamReader will have a buffer. (So you may well "see" that the Stream has read all the data even though you haven't processed all the lines yet.)

The last idea is probably the cleanest, IMO.

people

See more on this question at Stackoverflow