Regular Expressions For Regular Folk

Flags

Flags (or “modifiers”) allow us to put regexes into different “modes”.

Flags are the part after the final / in /pattern/.

Different engines support different flags. We’ll explore some of the most common flags here.

Global (g)

All examples thus far have had the global flag. When the global flag isn’t enabled, the regex doesn’t match anything beyond the first match.

/[aeiou]/g
  • 3 matchescorona
  • 2 matchescancel
  • 0 matchesrhythm
/[aeiou]/
  • 1 matchcorona
  • 1 matchcancel
  • 0 matchesrhythm

(Case) Insensitive (i)

As the name suggests, enabling this flag makes the regex case-insensitive in its matching.

/#[0-9A-F]{6}/i
  • 1 match#AE25AE
  • 1 match#663399
  • 1 matchEven #a2ca2c?
  • 0 matches#FFF
/#[0-9A-F]{6}/
  • 1 match#AE25AE
  • 1 match#663399
  • 0 matchesEven #a2ca2c?
  • 0 matches#FFF
/#[0-9A-Fa-f]{6}/
  • 1 match#AE25AE
  • 1 match#663399
  • 1 matchEven #a2ca2c?
  • 0 matches#FFF

Multiline (m)

Limited Support

In Ruby, the m flag performs other functions.

The multiline flag has to do with the regex’s handling of anchors when dealing with “multiline” strings—strings that include newlines (\n). By default, the regex /^foo$/ would match only "foo".

We might want it to match foo when it is in a line by itself in a multiline string.

Let’s take the string "bar\nfoo\nbaz" as an example:

bar
foo
baz

Without the multiline flag, the string above would be considered as a single line bar\nfoo\nbaz for matching purposes. The regex ^foo$ would thus not match anything.

With the multiline flag, the input would be considered as three “lines”: bar, foo, and baz. The regex ^foo$ would match the line in the middle—foo.

Dot-all (s)

Limited Support

JavaScript, prior to ES2018, did not support this flag. Ruby does not support the flag, instead using m for the same.

The . typically matches any character except newlines. With the dot-all flag, it matches newlines too.

Unicode (u)

In the presence of the u flag, the regex and the input string will be interpreted in a unicode-aware way. The details of this are implementation-dependent, but here are some things to expect:

Whitespace extended (x)

When this flag is set, whitespace in the pattern is ignored (unless escaped or in a character class). Additionally, characters following # on any line are ignored. This allows for comments and is useful when writing complex patterns.

Here’s an example from Advanced Examples, formatted to take advantage of the whitespace extended flag:

^                   # start of line
    (
        [+-]?       # sign
        (?=\.\d|\d) # don't match `.`
        (?:\d+)?    # integer part
        (?:\.?\d*)  # fraction part
    )
    (?:             # optional exponent part
        [eE]
        (
            [+-]?   # optional sign
            \d+     # power
        )
    )?
$                   # end of line