Java Regex to replace Octal value in string

I have set of octal values say (0177-0377). whenever these value I found in string, have to replace with ?.

    String a= "sccce¼»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕerferferfer";
    for (int i = 0177; i<= 0377 ; i++)
    {
        char x= (char) i;
        a= a.replaceAll(Character.toString(x), "?");
    }
    System.out.print(a);

but this looks good when we have small file but I have to perform this operation in 1TB file.

How we can use regex to achieve this task.

Jon Skeet
people
quotationmark

You don't want to do this to the whole file in one go - you need a streaming approach. I'd do something like this:

// TODO: Rename to something more appropriate
public static void replaceInvalidCharacters(Reader reader, Writer writer) {
    char[] buffer = new char[16384]; // Adjust if you want
    int charsRead;
    while ((charsRead = reader.read(buffer)) > 0) {
        for (int i = 0; i < charsRead; i++) {
            if (buffer[i] >= 0177 && buffer[i] <= 0377) {
                buffer[i] = '?';
            }
        }
        writer.write(buffer);
    }
}

So you'd open a reader (with the appropriate encoding) for the current file, a writer (with the appropriate encoding) for the output file, then call the method above. It will read a chunk of data at a time, replace all the "bad" characters in the chunk, then write the chunk out to the writer.

No need for regular expressions.

Note that there are plenty of non-ASCII characters outside that range though - if you really want to remove all non-ASCII, you'd basically want

if (buffer[i] > 126) // Or 127; what do you want to do with U+007F?

people

See more on this question at Stackoverflow