Read new line character across different OS

I have come across a situation where I am reading a some log file and then counting the number of lines I encountered via the following code snippet.

byte[] c = new byte[1024];
long count = 0;
int readChars = 0;
while ((readChars = is.read(c)) != -1) {
    for (int i = 0; i < readChars; ++i) {
        if (c[i] == '\n') {
            ++count;
        }
    }
}

My problem is that when I try to read a file (CSV, Syslog, or any other wild format), it runs just fine and gives me the right result. But when I try to run a file that was generated via a mac, it goes hay-wire and simply reports back that a single line was read.

Now my log file is large, I know that it has quite a few thousand lines of logs, but it just read a single line. I opened this file in Sublime and I could see all the separate lines, however when I viewed this file via VIM, It displayed only a single a file with a character '^M' at the end of each line ( My guess it that it is using this as the line terminator).

A sample of two lines is below. You can see that vim is displaying the ^M character where it should have been a new line

15122,25Dec2013,19:42:25,192.168.5.1,log,allow,,eth0,outbound,Application Control,,Network,Bob(+),Bob(+),,,,59857d77,,,,,,,,570033,,,,,,,,,,,,,192.168.5.7,176.32.96.190,tcp,80,56305,15606,554427,60461741,**,,,,,,,1,**,**,**,**,**,**,**,**,**,Other: Wget/1.13.4 (linux-gnu),Other: Server,192.168.5.7,60461741:1,,,,,,**,**,**,,,**,,,,^M359,23Dec2013,18:54:03,192.168.5.1,log,allow,,eth0,outbound,Application Control,,Network,Charlie(+),Charlie(+),,,,c0fa2dac,,,,,,,,1171362,,,,,,,,,,,,,192.168.5.6,205.251.242.54,tcp,80,45483,31395,1139967,60340847,**,,,,,,,2,**,**,**,**,**,**,**,**,**,Other: Wget/1.13.4 (linux-gnu),Other: Server,192.168.5.6,60340847:1,,,,,,,**,**,**,,,**,,,,^M

Any suggestion as to how to tackle this problem ?

Jon Skeet
people
quotationmark

The first problem even before you get to line breaks is that you're reading bytes and then treating those as characters. You're effectively assuming an encoding of ISO-8859-1 which may well not be correct. You should be using an InputStreamReader instead.

Then there's the issue of operating systems having different line breaks... use BufferedReader.readLine() to read a line in a way that handles line breaks of \n, \r or \r\n.

So your code would become:

int count = 0;
try (BufferedReader reader = new BufferedReader(
     new InputStreamReader(is, charset))) {
   while (reader.readLine() != null) {
       count++;
   }
}

people

See more on this question at Stackoverflow