Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ? operator.
/a?/g- 1 match
- 2 matches
a - 3 matches
aa - 4 matches
aaa - 5 matches
aaaa - 6 matches
aaaaa
Here’s another example:
/https?/g- 1 match
http - 1 match
https - 1 match
http/2 - 1 match
shttp - 0 matches
ftp
Here the s following http is optional.
We can also make capturing and non-capturing groups optional.
/url: (www\.)?example\.com/g- 1 match
url: example.com - 1 match
url: www.example.com/foo - 1 match
Here's the url: example.com.
Zero or more
If we wish to match zero or more of a token, we can suffix it with *.
/a*/g- 1 match
- 2 matches
a - 2 matches
aa - 2 matches
aaa - 2 matches
aaaa - 2 matches
aaaaa
Our regex matches even an empty string "".
One or more
If we wish to match one or more of a token, we can suffix it with a +.
/a+/g- 0 matches
- 1 match
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
Exactly x times
If we wish to match a particular token exactly x times, we can suffix it with {x}. This is functionally identical to repeatedly copy-pasting the token x times.
/a{3}/g- 0 matches
- 0 matches
a - 0 matches
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
Here’s an example that matches an uppercase six-character hex colour code.
/#[0-9A-F]{6}/g- 1 match
#AE25AE - 1 match
#663399 - 1 match
How about #73FA79? - 1 match
Part of #73FA79BAC too - 0 matches
#FFF - 0 matches
#a2ca2c
Here, the token {6} applies to the character class [0-9A-F].
Between min and max times
If we wish to match a particular token between min and max (inclusive) times, we can suffix it with {min,max}.
/a{2,4}/g- 0 matches
- 0 matches
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
There must be no space after the comma in {min,max}.
At least x times
If we wish to match a particular token at least x times, we can suffix it with {x,}. Think of it as {min,max}, but without an upper bound.
/a{2,}/g- 0 matches
- 0 matches
a - 1 match
aa - 1 match
aaa - 1 match
aaaa - 1 match
aaaaa
A note on greediness
Regular expressions, by default, are greedy. They attempt to match as much as possible.
/a*/g- 2 matches
aaaaaa
/".*"/g- 1 match
"quote" - 1 match
"quote", "quote" - 1 match
"quote"quote"
Suffixing a repetition operator (?, *, +, …) with a ?, one can make it “lazy”.
/".*?"/g- 1 match
"quote" - 2 matches
"quote", "quote" - 1 match
"quote"quote"
Here, this could also be achieved by using [^"] instead of . (as is best practice).
/"[^"]*"/g- 1 match
"quote" - 2 matches
"quote", "quote" - 1 match
"quote"quote"
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more
—Andrew S on StackOverflow
/<.+>/g- 1 match
<em>g r e e d y</em>
/<.+?>/g- 2 matches
<em>lazy</em>
Examples
Bitcoin address
/([13][a-km-zA-HJ-NP-Z0-9]{26,33})/g- 1 match
3Nxwenay9Z8Lc9JBiywExpnEFiLp6Afp8v - 1 match
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx - 1 match
2016-03-09,18f1yugoAJuXcHAbsuRVLQC9TezJ
Youtube Video
/(?:https?:\/\/)?(?:www\.)?youtube\.com\/watch\?.*?v=([^&\s]+).*/gm- 1 match
youtube.com/watch?feature=sth&v=dQw4w9WgXcQ - 1 match
https://www.youtube.com/watch?v=dQw4w9WgXcQ - 1 match
www.youtube.com/watch?v=dQw4w9WgXcQ - 1 match
youtube.com/watch?v=dQw4w9WgXcQ - 1 match
fakeyoutube.com/watch?v=dQw4w9WgXcQ
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.