Regular Expressions For Regular Folk

Repetition

Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.

Making things optional

We can make parts of regex optional using the ? operator.

/a?/g
  • 1 match
  • 2 matchesa
  • 3 matchesaa
  • 4 matchesaaa
  • 5 matchesaaaa
  • 6 matchesaaaaa

Here’s another example:

/https?/g
  • 1 matchhttp
  • 1 matchhttps
  • 1 matchhttp/2
  • 1 matchshttp
  • 0 matchesftp

Here the s following http is optional.

We can also make capturing and non-capturing groups optional.

/url: (www\.)?example\.com/g
  • 1 matchurl: example.com
  • 1 matchurl: www.example.com/foo
  • 1 matchHere's the url: example.com.

Zero or more

If we wish to match zero or more of a token, we can suffix it with *.

/a*/g
  • 1 match
  • 2 matchesa
  • 2 matchesaa
  • 2 matchesaaa
  • 2 matchesaaaa
  • 2 matchesaaaaa

Our regex matches even an empty string "".

One or more

If we wish to match one or more of a token, we can suffix it with a +.

/a+/g
  • 0 matches
  • 1 matcha
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

Exactly x times

If we wish to match a particular token exactly x times, we can suffix it with {x}. This is functionally identical to repeatedly copy-pasting the token x times.

/a{3}/g
  • 0 matches
  • 0 matchesa
  • 0 matchesaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

Here’s an example that matches an uppercase six-character hex colour code.

/#[0-9A-F]{6}/g
  • 1 match#AE25AE
  • 1 match#663399
  • 1 matchHow about #73FA79?
  • 1 matchPart of #73FA79BAC too
  • 0 matches#FFF
  • 0 matches#a2ca2c

Here, the token {6} applies to the character class [0-9A-F].

Between min and max times

If we wish to match a particular token between min and max (inclusive) times, we can suffix it with {min,max}.

/a{2,4}/g
  • 0 matches
  • 0 matchesa
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa
Warning

There must be no space after the comma in {min,max}.

At least x times

If we wish to match a particular token at least x times, we can suffix it with {x,}. Think of it as {min,max}, but without an upper bound.

/a{2,}/g
  • 0 matches
  • 0 matchesa
  • 1 matchaa
  • 1 matchaaa
  • 1 matchaaaa
  • 1 matchaaaaa

A note on greediness

Regular expressions, by default, are greedy. They attempt to match as much as possible.

/a*/g
  • 2 matchesaaaaaa
/".*"/g
  • 1 match"quote"
  • 1 match"quote", "quote"
  • 1 match"quote"quote"

Suffixing a repetition operator (?, *, +, …) with a ?, one can make it “lazy”.

/".*?"/g
  • 1 match"quote"
  • 2 matches"quote", "quote"
  • 1 match"quote"quote"

Here, this could also be achieved by using [^"] instead of . (as is best practice).

/"[^"]*"/g
  • 1 match"quote"
  • 2 matches"quote", "quote"
  • 1 match"quote"quote"

[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more

Andrew S on StackOverflow

/<.+>/g
  • 1 match<em>g r e e d y</em>
/<.+?>/g
  • 2 matches<em>lazy</em>

Examples

Bitcoin address

/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g
  • 1 match3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v
  • 1 match1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
  • 1 match2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ

Youtube Video

/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm
  • 1 matchyoutube.com/watch?feature=sth&v=dQw4w9WgXcQ
  • 1 matchhttps://www.youtube.com/watch?v=dQw4w9WgXcQ
  • 1 matchwww.youtube.com/watch?v=dQw4w9WgXcQ
  • 1 matchyoutube.com/watch?v=dQw4w9WgXcQ
  • 1 matchfakeyoutube.com/watch?v=dQw4w9WgXcQ

We can adjust this to not match the last broken link using anchors, which we shall encounter soon.