Lugaru's Epsilon
Programmer's
Editor

Context:
Epsilon User's Manual and Reference
   Commands by Topic
      Changing Text
         . . .
         Replacing
         Regular Expressions
            . . .
            Character Classes
            Regular Expression Examples
            Searching Rules
            Regular Expression Assertions
            Regular Expression Commands
         Rearranging
            Sorting
            Transposing
            Formatting Text
         . . .

Previous   Up    Next
Regular Expression Examples  Commands by Topic   Regular Expression Assertions


Epsilon User's Manual and Reference > Commands by Topic > Changing Text > Regular Expressions >

Searching Rules

Thus far, we have described regular expressions in terms of the abstract set of strings they generate. In this section, we discuss how Epsilon uses this abstract set when it does a regular expression search.

When you tell Epsilon to perform a forward regex search, it looks forward through the buffer for the first occurrence in the buffer of a string contained in the generated set. If no such string exists in the buffer, the search fails.

There may exist several strings in the buffer that match a string in the generated set. Which one qualifies as the first one? By default, Epsilon picks the string in the buffer that begins before any of the others. If there exist two or more matches in the buffer that begin at the same place, Epsilon by default picks the longest one. We call this a first-beginning, longest match. For example, suppose you position point at the beginning of the following line,

When to the sessions of sweet silent thought

then do a regex search for the pattern s[a-z]*. That pattern describes the set of strings that start with "s", followed by zero or more letters. We can find quite a few strings on this line that match that description. Among them:

 When to the sessions of sweet silent thought
 When to the sessions of sweet silent thought
 When to the sessions of sweet silent thought
 When to the sessions of sweet silent thought
 When to the sessions of sweet silent thought
 When to the sessions of sweet silent thought

Here, the underlined sections indicate portions of the buffer that match the description "s followed by a sequence of letters". We could identify 31 different occurrences of such strings on this line. Epsilon picks a match that begins first, and among those, a match that has maximum length. In our example, then, Epsilon would pick the following match:

When to the sessions of sweet silent thought

since it begins as soon as possible, and goes on for as long as possible. The search would position point after the final "s" in "sessions".

In addition to the default first-beginning, longest match searching, Epsilon provides three other regex search modes. You can specify first-beginning or first-ending searches. For each of these, you can specify shortest or longest match matches. Suppose, with point positioned at the beginning of the following line

I summon up remembrance of things past,

you did a regex search with the pattern m.*c|I.*t. Depending on which regex mode you chose, you would get one of the four following matches:

 I summon up remembrance of things past,  (first-ending shortest)
 I summon up remembrance of things past,  (first-ending longest)
 I summon up remembrance of things past,  (first-beginning shortest)
 I summon up remembrance of things past,  (first-beginning longest)

By default, Epsilon uses first-beginning, longest matching. You can include directives in the pattern itself to tell Epsilon to use one of the other techniques. If you include the directive <Min> anywhere in the pattern, Epsilon will use shortest-matching instead of longest-matching. Putting <FirstEnd> selects first-ending instead of first-beginning. You can also put <Max> for longest-matching, and <FirstBegin> for first-beginning. These last two might come in handy if you've changed Epsilon's default regex mode. The sequences <FE> and <FB> provide shorthand equivalents for <FirstEnd> and <FirstBegin>, respectively. As an example, you could use the following patterns to select each of the matches listed in the previous example:

 <FE><Min>m.*c|I.*t      (first-ending shortest)
 <FE><Max>m.*c|I.*t  or  <FE>m.*c|I.*t  (first-ending longest)
 <FB><Min>m.*c|I.*t  or  <Min>m.*c|I.*t  (first-beginning shortest)
 <FB><Max>m.*c|I.*t  or  m.*c|I.*t  (first-beginning longest)

You can change Epsilon's default regex searching mode. To make Epsilon use, by default, first-ending searches, set the variable regex-shortest to a nonzero value. To specify first-ending searches, set the variable regex-first-end to a nonzero value. (Examples of regular expression searching in this documentation assume the default settings.)

When Epsilon finds a regex match, it sets point to the end of the match. It also sets the variables matchstart and matchend to the beginning and end, respectively, of the match. You can change what Epsilon considers the end of the match using the "!" directive. For example, if you searched for "I s!ought" in the following line, Epsilon would match the underlined section:

I sigh the lack of many a thing I sought,

Without the "!" directive, the match would consist of the letters "I sought", but because of the "!" directive, the match consists of only the indicated section of the line. Notice that the first three characters of the line also consist of "I s", but Epsilon does not count that as a match. There must first exist a complete match in the buffer. If so, Epsilon will then set point and matchend according to any "!" directive.

Overgenerating regex sets

You can use Epsilon's regex search modes to simplify patterns that you write. You can sometimes write a pattern that includes more strings than you really want, and rely on a regex search mode to cut out strings that you don't want.

For example, recall the earlier example of /%*(.|<Newline>)*%*/. This pattern generates the set of all strings that begin with /* and end with */. This set includes all the C-language comments, but it includes some additional strings as well. It includes, for example, the following illegal C comment:

/* inside /* still inside */ outside */

In C, a comment begins with /* and ends with the very next occurrence of */. You can effectively get that by modifying the above pattern to specify a first-ending, longest match, with <FE><Max>/%*(.|<Newline>)*%*/. It would match:

/* inside /* still inside */ outside */

In this example, you could have written a more complicated regular expression that generated precisely the set of legal C comments, but this pattern proves easier to write.



Previous   Up    Next
Regular Expression Examples  Commands by Topic   Regular Expression Assertions


Lugaru Copyright (C) 1984, 2012 Lugaru Software Ltd. All Rights Reserved.