The given regex code is
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String longString = " Derek Banas CA 12345 PA (412)555-1212 johnsmith@hotmail.com 412-555-1234 412 555-1234 ";
regexChecker("\\s[A-Za-z]{2,20}\\s", longString);
}
public static void regexChecker(String theRegex, String str2Check){
Pattern checkRegex = Pattern.compile(theRegex);
Matcher regexMatcher = checkRegex.matcher( str2Check );
while ( regexMatcher.find() ){
if (regexMatcher.group().length() != 0){
System.out.println( regexMatcher.group().trim() );
System.out.println( "Start Index: " + regexMatcher.start());
System.out.println( "Start Index: " + regexMatcher.end());
}
}
}
}
The output here is
Derek
Start Index: 0
Start Index: 7
CA
Start Index: 12
Start Index: 16
PA
Start Index: 21
Start Index: 25
1> Why is the output of Derek is 'end index 7' and not 6?
2> Why isn't 'Banas' a part of the output?
Any help would be appreciated.
Your pattern includes whitespace after the non-whitespace. That accounts for both of your questions: the space between "Derek" and "Banas" counts as part of the first match, which stops it being part of a match for "Banas" (because you'd need another space before "Banas" to match).
If you change both \\s
to \\b
it will match a word boundary - which doesn't use the space. You then get output of:
Derek
Start Index: 1
Start Index: 6
Banas
Start Index: 7
Start Index: 12
CA
Start Index: 13
Start Index: 15
PA
Start Index: 22
Start Index: 24
johnsmith
Start Index: 39
Start Index: 48
hotmail
Start Index: 49
Start Index: 56
com
Start Index: 57
Start Index: 60
At that point you're probably getting matches you don't want though.
You could continue to capture the whitespace at the front, but remove the one at the end. A pattern like this:
regexChecker("\\s[A-Za-z]{2,20}", longString);
gives output of:
Derek
Start Index: 0
Start Index: 6
Banas
Start Index: 6
Start Index: 12
CA
Start Index: 12
Start Index: 15
PA
Start Index: 21
Start Index: 24
johnsmith
Start Index: 38
Start Index: 48
See more on this question at Stackoverflow