ORO, Inc. Logo

CONTENTS

PREFACE

COPYRIGHT AND LICENSE

INTRODUCTION

Installation
Acknowledgements

FAQ

SYNTAX

What is a regular expression?
Perl5 regular expressions

THE INTERFACES

Pattern
PatternCompiler
PatternMatcher
MatchResult

THE CLASSES

Perl5Pattern
Perl5Compiler
Perl5Matcher
PatternMatcherInput
Perl5StreamInput
Util
Perl5Debug

SAMPLE PROGRAMS

MatchResult example
Difference between matches() and contains()
Case sensitivity
Searching an InputStream
Splits
Substitutions

APPENDIX

Package API reference (javadoc generated)
      

The Interfaces


OROMatcher TM defines a basic set of interfaces which are implemented by its pattern matching classes. This facilitates the addition of new pattern matching classes supporting different grammars. The basic use of the package consists of creating a PatternCompiler instance to compile your regular expressions and compiling your regular expression to produce a Pattern instance. You then create a PatternMatcher instance to perform pattern searches using the Pattern instance. Pattern matches are accessed through MatchResult instances.

In a non-concurrent program you should only need to create one PatternCompiler and one PatternMatcher instance to compile all your regular expressions and do all your matching. It is wasteful to create a new PatternCompiler and PatternMatcher every time you need to compile a patten or search for a match. In a concurrent program we recommend using separate PatternCompiler and PatternMatcher instances for each thread, because synchronization overhead is high in Java.

The Pattern interface

The Pattern interface represents a compiled regular expression. The only operation exposed to the programmer is the getPattern() method which retrieves the original string representation of the regular expression. Pattern implementations are not meant to be instantiated directly. They can only be created by the compile() method of a PatternCompiler. You pass a Pattern instance to a PatternMatcher matches() or contains() method to look for pattern matches.

The Pattern interface allows multiple representations of a regular expression to be defined. In general, different regular expression compilers will produce different types of pattern representations. Some will produce state transition tables derived from syntax trees, others will produce byte code representations of an NFA, etc. The Pattern interface does not impose any specific internal pattern representation, and consequently, Pattern implementations are not meant to be interchangeable among differing PatternCompiler and PatternMatcher implementations. The documentation accompanying a specific implementation will define what other classes a Pattern can interact with.

The PatternCompiler interface

A PatternCompiler instance is used to compile the string representation (either as a String or char[]) of a regular expression into a Pattern instance. The Pattern can then be used in conjunction with the appropriate PatternMatcher instance to perform pattern searches. Specific PatternCompiler implementations such as Perl5Compiler may have variations of the compile() methods that take extra options affecting the compilation of a pattern. The compile() method will throw a MalformedPatternException if the regular expression to be compiled is invalid.

The PatternMatcher interface

The PatternMatcher interface defines the operations a regular expression matcher must implement. However, the types of the Pattern implementations recognized by a matcher are not restricted. Typically PatternMatcher instances will only recognize a specific type of Pattern. For example, the Perl5Matcher only recognizes Perl5Pattern instances. However, none of the PatternMatcher methods are required to throw an exception in case of the use of an invalid pattern. This is done for efficiency reasons, although usually a CastClassException will be thrown by the Java runtime system if you use the wrong Pattern implementation. It is the responsibility of the programmer to make sure he uses the correct Pattern instance with a given PatternMatcher instance. The current version of this package only contains the Perl5 suite of pattern matching classes, but future ones for other regular expression grammars may be added and users may also create their own implementations of the provided interfaces. Therefore the programmer should be careful not to mismatch classes.

The PatternMatcher interface defines three main types of methods:

matches()
The matches() method tests if an entire string exactly matches a pattern.
contains()
The contains() method looks for the first pattern match somewhere inside a string. Used in conjunction with the PatternMatcherInput class, you can search an entire string for all of the matches occuring within it.
getMatch()
The getMatch() method returns a MatchResult instance containing the result of the match found by the last successful call to matches() or contains()
The matches() and contains() methods return true if they find a match, and false if they don't. Typically you will use the contains() method in a while loop in conjunction with a PatternMatcherInput instance to find all the pattern matches in an input string.

The MatchResult interface

The MatchResult interface allows PatternMatcher implementors to return results storing match information in whatever format they like, while presenting a consistent way of accessing that information. A MatchResult instance contains a pattern match and its saved groups. You can access the entire match directly using the group(int) method with an argument of 0, or by the toString() method which is defined to return the same thing. Saved groups can be accessed by calling the group(int) method with the appropriate group index. It is also possible to obtain the beginning and ending offsets of a match relative to the input producing the match by using the beginOffset(int) and endOffset(int) methods. The begin(int) and end(int) methods are useful in some circumstances and return the begin and end offsets of the subgroups of a match relative to the beginning of the match.

You should look at the matchResultExample.java example program to see how to use all of the MatchResult methods.


Copyright © 1997 ORO, Inc. All rights reserved. Original Reusable Objects, ORO, the ORO logo, and "Component software for the Internet" are trademarks or registered trademarks of ORO, Inc. in the United States and other countries.
Java is a trademark of Sun Microsystems. All other trademarks are the property of their respective holders.