Searching RulesThus far, we have described regular expressions in terms of the abstract set of strings they generate. In this section, we discuss how Epsilon uses this abstract set when it does a regular expression search.
When you tell Epsilon to perform a forward regex search, it looks forward through the buffer for the first occurrence in the buffer of a string contained in the generated set. If no such string exists in the buffer, the search fails.
There may exist several strings in the buffer that match a string in the generated set. Which one qualifies as the first one? By default, Epsilon picks the string in the buffer that begins before any of the others. If there exist two or more matches in the buffer that begin at the same place, Epsilon by default picks the longest one. We call this a first-beginning, longest match. For example, suppose you position point at the beginning of the following line,
then do a regex search for the pattern
Here, the underlined sections indicate portions of the buffer that match the description "s followed by a sequence of letters". We could identify 31 different occurrences of such strings on this line. Epsilon picks a match that begins first, and among those, a match that has maximum length. In our example, then, Epsilon would pick the following match:
since it begins as soon as possible, and goes on for as long as possible. The search would position point after the final "s" in "sessions".
In addition to the default first-beginning, longest match searching, Epsilon provides three other regex search modes. You can specify first-beginning or first-ending searches. For each of these, you can specify shortest or longest match matches. Suppose, with point positioned at the beginning of the following line
you did a regex search with the pattern
By default, Epsilon uses first-beginning, longest matching. You can
include directives in the pattern itself to tell Epsilon to use one
of the other techniques. If you include the directive
You can change Epsilon's default regex searching mode. To make Epsilon use, by default, first-ending searches, set the variable regex-shortest to a nonzero value. To specify first-ending searches, set the variable regex-first-end to a nonzero value. (Examples of regular expression searching in this documentation assume the default settings.)
When Epsilon finds a regex match, it sets point to the end of the
match. It also sets the variables matchstart and
matchend to the beginning and end, respectively, of the match.
You can change what Epsilon considers the end of the match
using the "!" directive. For example, if you searched for
Without the "!" directive, the match would consist of the letters "I sought", but because of the "!" directive, the match consists of only the indicated section of the line. Notice that the first three characters of the line also consist of "I s", but Epsilon does not count that as a match. There must first exist a complete match in the buffer. If so, Epsilon will then set point and matchend according to any "!" directive.
Overgenerating regex sets
You can use Epsilon's regex search modes to simplify patterns that you write. You can sometimes write a pattern that includes more strings than you really want, and rely on a regex search mode to cut out strings that you don't want.
For example, recall the earlier example of
In C, a comment begins with
In this example, you could have written a more complicated regular expression that generated precisely the set of legal C comments, but this pattern proves easier to write.