paper, section 8.

8. Regular expressions and substitute replacement patterns

8.1. Regular expressions

A regular expression specifies a set of strings of characters. A member of this set of strings is said to be matched by the regular expression. Ex remembers two previous regular expressions: the previous regular expression used in a substitute command and the previous regular expression used elsewhere (referred to as the previous scanning regular expression.) The previous regular expression can always be referred to by a null re, e.g. `//' or `??'.

8.2. Magic and nomagic

The regular expressions allowed by ex are constructed in one of two ways depending on the setting of the magic option. The ex and vi default setting of magic gives quick access to a powerful set of regular expression metacharacters. The disadvantage of magic is that the user must remember that these metacharacters are magic and precede them with the character `\' to use them as ``ordinary'' characters. With nomagic, the default for edit, regular expressions are much simpler, there being only two metacharacters. The power of the other metacharacters is still available by preceding the (now) ordinary character with a `\'. Note that `\' is thus always a metacharacter.

The remainder of the discussion of regular expressions assumes that that the setting of this option is magic.**

8.3. Basic regular expression summary

The following basic constructs are used to construct magic mode regular expressions.

char: An ordinary character matches itself. The characters `\(ua' at the beginning of a line, `$' at the end of line, `*' as any character other than the first, `.', `\', `[', and `~' are not ordinary characters and must be escaped (preceded) by `\' to be treated as such.
\(ua: At the beginning of a pattern forces the match to succeed only at the beginning of a line.
$: At the end of a regular expression forces the match to succeed only at the end of the line.
.: Matches any single character except the new-line character.
\<: Forces the match to occur only at the beginning of a ``variable'' or ``word''; that is, either at the beginning of a line, or just before a letter, digit, or underline and after a character not one of these.
\>: Similar to `\<', but matching the end of a ``variable'' or ``word'', i.e. either the end of the line or before character which is neither a letter, nor a digit, nor the underline character.
[string]: Matches any (single) character in the class defined by string. Most characters in string define themselves. A pair of characters separated by `-' in string defines the set of characters collating between the specified lower and upper bounds, thus `[a-z]' as a regular expression matches any (single) lower-case letter. If the first character of string is an `\(ua' then the construct matches those characters which it otherwise would not; thus `[\(uaa-z]' matches anything but a lower-case letter (and of course a newline). To place any of the characters `\(ua', `[', or `-' in string you must escape them with a preceding `\'.

8.4. Combining regular expression primitives

The concatenation of two regular expressions matches the leftmost and then longest string which can be divided with the first piece matching the first regular expression and the second piece matching the second. Any of the (single character matching) regular expressions mentioned above may be followed by the character `*' to form a regular expression which matches any number of adjacent occurrences (including 0) of characters matched by the regular expression it follows.

The character `~' may be used in a regular expression, and matches the text which defined the replacement part of the last substitute command. A regular expression may be enclosed between the sequences `$' and `$' with side effects in the substitute replacement patterns.

8.5. Substitute replacement patterns

The basic metacharacters for the replacement pattern are `&' and `~'; these are given as `\&' and `\~' when nomagic is set. Each instance of `&' is replaced by the characters which the regular expression matched. The metacharacter `~' stands, in the replacement pattern, for the defining text of the previous replacement pattern.

Other metasequences possible in the replacement pattern are always introduced by the escaping character `\'. The sequence `\n' is replaced by the text matched by the n-th regular subexpression enclosed between `$' and `$'.** The sequences `\u' and `\l' cause the immediately following character in the replacement to be converted to upper- or lower-case respectively if this character is a letter. The sequences `\U' and `\L' turn such conversion on, either until `\E' or `\e' is encountered, or until the end of the replacement pattern.