Lugaru's Epsilon
Programmer's
Editor 14.04

Context:
Epsilon User's Manual and Reference
   Commands by Topic
      Changing Text
         . . .
         Replacing
         Regular Expressions
            Entering Special Characters
            Character Classes
            Regular Expression Examples
            Searching Rules
            Regular Expression Assertions
            . . .
         Rearranging
            Sorting
            Comparing By Lines
            Transposing
            Formatting Text
         . . .

Previous   Up    Next
Character Classes  Commands by Topic   Searching Rules


Epsilon User's Manual and Reference > Commands by Topic > Changing Text > Regular Expressions >

Regular Expression Examples

  • The pattern if|else|for|do|while|switch specifies the set of statement keywords in C and EEL.

  • The pattern c[ad]+r specifies strings like "car", "cdr", "caadr", "caaadar". These correspond to compositions of the car and cdr Lisp operations.

  • The pattern c[ad][ad]?[ad]?[ad]?r specifies the strings that represent up to four compositions of car and cdr in Lisp.

  • The pattern [a-zA-Z]+ specifies the set of all sequences of 1 or more letters. The character class part denotes any upper- or lower-case letter, and the plus operator specifies one or more of those.

    Epsilon's commands to move by words accomplish their task by performing a regular expression search. They use a pattern similar to [a-zA-Z0-9_]+, which specifies one or more letters, digits, or underscore characters. (The actual pattern includes national characters as well.)

  • The pattern (<Newline>|<Return>|<Tab>|<Space>)+ specifies nonempty sequences of the whitespace characters newline, return, tab, and space. You could also write this pattern as <Newline|Return|Tab|Space>+ or as <Wspace|Return>+, using a character class name.

  • The pattern /%*.*%*/ specifies a set that includes all 1-line C-language comments. The percent character quotes the first and third stars, so they refer to the star character itself. The middle star applies to the period, denoting zero or more occurrences of any character other than newline. Taken together then, the pattern denotes the set of strings that begin with "slash star", followed by any number of non-newline characters, followed by "star slash". You can also write this pattern as /<Star>.*<Star>/.

  • The pattern /%*(.|<Newline>)*%*/ looks like the previous pattern, except that instead of ".", we have (.|<Newline>). So instead of "any character except newline", we have "any character except newline, or newline", or more simply, "any character at all". This set includes all C comments, with or without newlines in them. You could also write this as /%*<Any>*%*/ instead.

  • The pattern <^digit|a-f> matches any character except one of these: 0123456789abcdef.

  • The pattern <alpha&!r&!x-z&!p:softdotted> matches all Latin letters except R, X, Y, Z, I and J (the latter two because the Unicode property SoftDotted, indicating a character with a dot that can be replaced by an accent, matches I and J). It also matches all non-Latin Unicode letters that don't have this property.

An advanced example

Let's build a regular expression that includes precisely the set of legal strings in the C programming language. All C strings begin and end with double quote characters. The inside of the string denotes a sequence of characters. Most characters stand for themselves, but newline, double quote, and backslash must appear after a "quoting" backslash. Any other character may appear after a backslash as well.

We want to construct a pattern that generates the set of all possible C strings. To capture the idea that the pattern must begin and end with a double quote, we begin by writing

"something"

We still have to write the something part, to generate the inside of the C strings. We said that the inside of a C string consists of a sequence of characters. The star operator means "zero or more of something". That looks promising, so we write

"(something)*"

Now we need to come up with a something part that stands for an individual character in a C string. Recall that characters other than newline, double quote, and backslash stand for themselves. The pattern <^Newline|"|\> captures precisely those characters. In a C string, a "quoting" backslash must precede the special characters (newline, double quote, and backslash). In fact, a backslash may precede any character in a C string. The pattern \(.|<Newline>) means, precisely "backslash followed by any character". Putting those together with the alternation operator (|), we get the pattern <^Newline|"|\>|\(.|<Newline>) which generates either a single "normal" character or any character preceded by a backslash. Substituting this pattern for the something yields

"(<^Newline|"|\>|\(.|<Newline>))*"

which represents precisely the set of legal C strings. In fact, if you type this pattern into a regex-search command (described below), Epsilon will find the next C string in the buffer.



Previous   Up    Next
Character Classes  Commands by Topic   Searching Rules


Lugaru Epsilon Programmer's Editor 14.04 manual. Copyright (C) 1984, 2021 by Lugaru Software Ltd. All rights reserved.