Regular Expressions
Regex characters explained
Last Update Unknown
Regular Expressions
A regular expression (Regex) is a pattern used to match character combinations in strings.
Character Classes
Characters | Meaning |
---|---|
[xyz] [a-z] | A character class. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character. |
[^xyz] [^a-c] | A negated or complemented character class. That is, it matches anything that is not enclosed in the square brackets. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character. |
. | Matches any single character except line terminators (\n, \r, \u2028 or \u2029), or when inside a character class, the dot loses its special meaning and matches a literal dot. |
\d | Matches any digit (Arabic numeral). Equivalent to [0-9]. |
\w | Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_]. |
\s | Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. |
\t | Matches a horizontal tab. |
\r | Matches a carriage return. |
\n | Matches a linefeed. |
\v | Matches a vertical tab. |
\f | Matches a form-feed. |
[\b] | Matches a backspace. |
\0 | Matches a NUL character. Do not follow this with another digit. |
\ | For characters that are usually treated literally, indicates that the next character is special. For characters that are usually treated specially such as '*', indicates that the next character should be interpreted literally. |
x|y | Matches either "x" or "y". Each component, separated by a pipe (|), is called an alternative. |
Assertions
Characters | Meaning |
---|---|
^ | Matches characters at the beginning of a line |
$ | Matches characters at the end of a line |
Quantifiers
Characters | Meaning |
---|---|
x* | Matches the preceding item "x" 0 or more times. |
x+ | Matches the preceding item "x" 1 or more times. |
x? | Matches the preceding item "x" 0 or 1 times. |
x{n} | Where "n" is a positive integer, matches exactly "n" occurrences of the preceding item "x". |
x{n,} | Where "n" is a positive integer, matches at least "n" occurrences of the preceding item "x". |
x{n,m} | Where "n" is 0 or a positive integer, "m" is a positive integer, and m > n, matches at least "n" and at most "m" occurrences of the preceding item "x". |
Groups
Characters | Meaning |
---|---|
(x) | Capturing group: Matches x and remembers the match. For example, /(foo)/ matches and remembers "foo" in "foo bar". |
\n | Where "n" is a positive integer and the index of the capture group. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, /apple(,)\sorange\1/ matches "apple, orange," in "apple, orange, cherry, peach". |
Regex Examples
Using a combination of grep, regular expressions, and wc via a pipe count how many words in the /usr/share/dict/words dictionary starts with "anti" and ends with an "n".
Using grep and regular expressions, find all the words which start with "tele" from /usr/share/dict/words, and which are exactly 7 characters long.
Use grep on words to find a word that contains each of the vowels in alphabetical (i.e first an A, then an E, etc) order in /usr/share/dict/words. How many such words are there?
(you may include words with extra vowels such as adventitious)
How many words can you find which contain any two characters repeated three times, like the examples "interlinking" and "priestessess". Use /usr/share/dict/words as your list of possible words and grep to find the answer.
How many words are 5 character palindromes? A palindrome is a word spelled the same way forward and backwards, such as "sagas". Use /usr/share/dict/words.
Hint: Use multiple groups and backreferences.
The word minglingly includes the same four characters (e.g. ingl) repeated. How many such words are there which also begin with lower case "m" (any four character are repeated).
Regex Practice Questions
Q1: Which of the following strings would match this regular expression:
[abc]{3}
Select one or more:
- cabin
- ABC
- cab
- yucca
- crab
- aaabbbccc
Q2: Which of these commands finds all words that contain two apostrophes?
Select one:
- grep '.*' /usr/share/dict/words
- grep \'.*\' /usr/share/dict/words
- grep /'.*/' /usr/share/dict/words
- grep \'{2} /usr/share/dict/words
Q3: Which grep switch searches for lines that do not match the given pattern?
Select one:
- -n
- -x
- -i
- -V
- -v
Q4: Which of the following strings would match this regular expression:
(a.c).*a.a\1
Select one or more:
- a5ca1aa5c
- a5cMAGICa1aa5c
- a5ca1aa5cmagic
- abc14azaabc
- gagc1aaaaga
Q5: Which command will find all words that contain an ‘x’ and a ‘y’ anywhere in the word?
Select one or more:
- grep x /usr/share/dict/words | grep y
- grep -E 'x.*y | y.*x' /usr/share/dict/words
- grep [xy].*[xy] /usr/share/dict/words
- grep -E [xy]{2} /usr/share/dict/words