Description
Discussed in #4344
Originally posted by jmresler April 8, 2023
Hi All,
I'm working on rewriting a batch process that is quite old.
It processes multiple files which have essentially the same data, but one has an additional text field at the end.
The files are comma separated value (CSV) files.
Without going too far into it, I was under the impression the PatternMatchingCompositeLineMapper would support a full regular expression suite. After all, Java does have a good pattern matching library in the Pattern & Matches and the String classes "matches" functionality which is built over the top of those regular expression tools.
What I have discovered though is that the support for regular expressions is rather limited with '*' stars and '?' question marks.
From reviewing the code it looks like it's a very limited, ant pattern matching capability.
The result is that a solution is very inelegant, requiring a long list of ??? and intermittent * to support possibly unknown length white space values.
Granted, I could write a custom line tokenizer but according to "The Definitive Guide to Spring Batch", that's expanding the separation of concerns for that object and not recommended. My understanding is that the author of that book is also head of the Spring Batch project.
Any chance someone would be willing to implement java.util.Pattern matching functionality?
/**
* Lifted from AntPathMatcher in Spring Core. Tests whether or not a string
* matches against a pattern. The pattern may contain two special
* characters:<br>
* '*' means zero or more characters<br>
* '?' means one and only one character
*
* @param pattern pattern to match against. Must not be <code>null</code>.
* @param str string which must be matched against the pattern. Must not be
* <code>null</code>.
* @return <code>true</code> if the string matches against the pattern, or
* <code>false</code> otherwise.
*/
public static boolean match(String pattern, String str) {
int patIdxStart = 0;
int patIdxEnd = pattern.length() - 1;
int strIdxStart = 0;
int strIdxEnd = str.length() - 1;
char ch;
boolean containsStar = pattern.contains("*");
if (!containsStar) {
// No '*'s, so we make a shortcut
if (patIdxEnd != strIdxEnd) {
return false; // Pattern and string do not have the same size
}
for (int i = 0; i <= patIdxEnd; i++) {
ch = pattern.charAt(i);
if (ch != '?') {
if (ch != str.charAt(i)) {
return false;// Character mismatch
}
}
}
return true; // String matches against pattern
}
if (patIdxEnd == 0) {
return true; // Pattern contains only '*', which matches anything
}
// Process characters before first star
while ((ch = pattern.charAt(patIdxStart)) != '*' && strIdxStart <= strIdxEnd) {
if (ch != '?') {
if (ch != str.charAt(strIdxStart)) {
return false;// Character mismatch
}
}
patIdxStart++;
strIdxStart++;
}
if (strIdxStart > strIdxEnd) {
// All characters in the string are used. Check if only '*'s are
// left in the pattern. If so, we succeeded. Otherwise failure.
for (int i = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
return false;
}
}
return true;
}
// Process characters after last star
while ((ch = pattern.charAt(patIdxEnd)) != '*' && strIdxStart <= strIdxEnd) {
if (ch != '?') {
if (ch != str.charAt(strIdxEnd)) {
return false;// Character mismatch
}
}
patIdxEnd--;
strIdxEnd--;
}
if (strIdxStart > strIdxEnd) {
// All characters in the string are used. Check if only '*'s are
// left in the pattern. If so, we succeeded. Otherwise failure.
for (int i = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
return false;
}
}
return true;
}
// process pattern between stars. padIdxStart and patIdxEnd point
// always to a '*'.
while (patIdxStart != patIdxEnd && strIdxStart <= strIdxEnd) {
int patIdxTmp = -1;
for (int i = patIdxStart + 1; i <= patIdxEnd; i++) {
if (pattern.charAt(i) == '*') {
patIdxTmp = i;
break;
}
}
if (patIdxTmp == patIdxStart + 1) {
// Two stars next to each other, skip the first one.
patIdxStart++;
continue;
}
// Find the pattern between padIdxStart & padIdxTmp in str between
// strIdxStart & strIdxEnd
int patLength = (patIdxTmp - patIdxStart - 1);
int strLength = (strIdxEnd - strIdxStart + 1);
int foundIdx = -1;
strLoop: for (int i = 0; i <= strLength - patLength; i++) {
for (int j = 0; j < patLength; j++) {
ch = pattern.charAt(patIdxStart + j + 1);
if (ch != '?') {
if (ch != str.charAt(strIdxStart + i + j)) {
continue strLoop;
}
}
}
foundIdx = strIdxStart + i;
break;
}
if (foundIdx == -1) {
return false;
}
patIdxStart = patIdxTmp;
strIdxStart = foundIdx + patLength;
}
// All characters in the string are used. Check if only '*'s are left
// in the pattern. If so, we succeeded. Otherwise failure.
for (int i = patIdxStart; i <= patIdxEnd; i++) {
if (pattern.charAt(i) != '*') {
return false;
}
}
return true;
}
/**
* Proposed but possibly oversimplified match functionality
* @param regex
* @param pattern
* @return
*/
public static boolean matchUsingFullRegex(final String regex, final String pattern) {
if (regex == null)
throw new NullPointerException("Regulat expression {" + regex + "} cannot be null");
if (pattern == null)
throw new NullPointerException("Pattern {" + pattern + "} cannot be null");
return Pattern.matches(regex, pattern);
}