CONTENTS PREFACE COPYRIGHT AND LICENSE INTRODUCTION Installation Acknowledgements FAQ SYNTAX What is a regular expression? Perl5 regular expressions THE INTERFACES Pattern PatternCompiler PatternMatcher MatchResult THE CLASSES Perl5Pattern Perl5Compiler Perl5Matcher PatternMatcherInput Perl5StreamInput Util Perl5Debug SAMPLE PROGRAMS MatchResult example Difference between matches() and contains() Case sensitivity Searching an InputStream Splits Substitutions APPENDIX Package API reference (javadoc generated)  
SyntaxIt is beyond the scope of this guide to give a detailed explanation of regular expressions to beginners. The OROMatcher ^{TM} package is geared toward programmers who are already familiar with regular expressions, having used them with other languages, and who now want to apply them in their Java programs. However, we shall make a small attempt to cover the basics and summarize the Perl5 syntax supported by the OROMatcher ^{TM} Perl5 classes. For a detailed exploration of regular expressions for both beginners and advanced users, we recommend the book Mastering Regular Expressions by Jeffrey Friedl published by O'Reilly & Associates. What is a regular expression?Part of this discussion is based on page 94 of "Compilers, Principles, Techniques, and Tools" by Aho, Sethi and UllmanA regular expression is a pattern denoted by a sequence of symbols representing a statemachine or miniprogram that is capable of matching particular sequences of characters. Regular expressions have their root in lexical analysis and tokenization where a set of lexemes had to be recognized before being passed on to a parser. Since then, regular expressions took a life of their own, appearing in such languages as AWK, TCL, and of course Perl, for all sorts of textual data extraction and manipulation purposes. The most basic regular expression syntax consists of 4 operations. Let A and B each represent an alphabet (a set of characters) and s and t represent members of those alphabets.
Using this notation you can define a regular expression for positive
integers as follows:
digit + Here digit represents the set of characters 0  9. A range of characters like this can be represented in most regular expression languages as [09] . Because this is such a common
expression, some languages have a special character for it:
\d .
Learning a regular expression language is quite simple once you've learned one, because most of the operations are the same. Only the notation changes. Perl5 regular expressionsHere we summarize the syntax of Perl5 regular expressions, all of which is supported by the OROMatcher ^{TM} Perl5 classes. However, for a definitive reference, you should consult theperlre man page
that accompanies the Perl5 distribution and also the book
Programming Perl, 2nd Edition from O'Reilly & Associates.
We need to point out here that for efficiency reasons the character
set operator [...] is limited to work on only ASCII characters
(Unicode characters 0 through 255). Other than that restriction, all
Unicode characters should be useable in the package's regular expressions.
By default, a quantified subpattern is greedy . In other words it matches as many times as possible without causing the rest of the pattern not to match. To change the quantifiers to match the minimum number of times possible, without causing the rest of the pattern not to match, you may use a "?" right after the quantifier.
Perl5 extended regular expressions are fully supported.
