Why regex Matcher.find not showing a word which meets the criteria

The given regex code is

import java.util.regex.*;

public class Test {
    public static void main(String[] args) {
        String longString = " Derek Banas CA 12345 PA (412)555-1212 johnsmith@hotmail.com 412-555-1234 412 555-1234 "; 
        regexChecker("\\s[A-Za-z]{2,20}\\s", longString);
    }

    public static void regexChecker(String theRegex, String str2Check){
        Pattern checkRegex = Pattern.compile(theRegex);     
        Matcher regexMatcher = checkRegex.matcher( str2Check );
        while ( regexMatcher.find() ){
            if (regexMatcher.group().length() != 0){
                System.out.println( regexMatcher.group().trim() );
                System.out.println( "Start Index: " + regexMatcher.start());
                System.out.println( "Start Index: " + regexMatcher.end());
            }
        }
    }
}

The output here is

Derek
Start Index: 0
Start Index: 7
CA
Start Index: 12
Start Index: 16
PA
Start Index: 21
Start Index: 25

1> Why is the output of Derek is 'end index 7' and not 6?
2> Why isn't 'Banas' a part of the output?

Any help would be appreciated.

Jon Skeet
people
quotationmark

Your pattern includes whitespace after the non-whitespace. That accounts for both of your questions: the space between "Derek" and "Banas" counts as part of the first match, which stops it being part of a match for "Banas" (because you'd need another space before "Banas" to match).

If you change both \\s to \\b it will match a word boundary - which doesn't use the space. You then get output of:

Derek
Start Index: 1
Start Index: 6
Banas
Start Index: 7
Start Index: 12
CA
Start Index: 13
Start Index: 15
PA
Start Index: 22
Start Index: 24
johnsmith
Start Index: 39
Start Index: 48
hotmail
Start Index: 49
Start Index: 56
com
Start Index: 57
Start Index: 60

At that point you're probably getting matches you don't want though.

You could continue to capture the whitespace at the front, but remove the one at the end. A pattern like this:

regexChecker("\\s[A-Za-z]{2,20}", longString);

gives output of:

Derek
Start Index: 0
Start Index: 6
Banas
Start Index: 6
Start Index: 12
CA
Start Index: 12
Start Index: 15
PA
Start Index: 21
Start Index: 24
johnsmith
Start Index: 38
Start Index: 48

people

See more on this question at Stackoverflow