Repetition
Repetition is a powerful and ubiquitous regex feature. There are several ways to represent repetition in regex.
Making things optional
We can make parts of regex optional using the ?
operator.
Here’s another example:
Here the s
following http
is optional.
We can also make capturing and non-capturing groups optional.
Zero or more
If we wish to match zero or more of a token, we can suffix it with *
.
Our regex matches even an empty string ""
.
One or more
If we wish to match one or more of a token, we can suffix it with a +
.
Exactly x
times
If we wish to match a particular token exactly x
times, we can suffix it with {x}
. This is functionally identical to repeatedly copy-pasting the token x
times.
Here’s an example that matches an uppercase six-character hex colour code.
Here, the token {6}
applies to the character class [0-9A-F]
.
Between min
and max
times
If we wish to match a particular token between min
and max
(inclusive) times, we can suffix it with {min,max}
.
There must be no space after the comma in {min,max}
.
At least x
times
If we wish to match a particular token at least x
times, we can suffix it with {x,}
. Think of it as {min,max}
, but without an upper bound.
A note on greediness
Regular expressions, by default, are greedy. They attempt to match as much as possible.
Suffixing a repetition operator (?
, *
, +
, …) with a ?
, one can make it “lazy”.
Here, this could also be achieved by using [^"]
instead of .
(as is best practice).
[…] Lazy will stop as soon as the condition is satisfied, but greedy means it will stop only once the condition is not satisfied any more
—Andrew S on StackOverflow
Examples
Bitcoin address
Youtube Video
We can adjust this to not match the last broken link using anchors, which we shall encounter soon.