Capturing groups
A part of a pattern can be enclosed in parentheses pattern:(…). This is called a “capturing group”. That has two effects: 1. It allows to get a part of the match as a separate item in the result array. 2. If we put a quantifier after the parentheses, it applies to the parentheses as a whole.
Examples
Let’s see how parentheses work in examples.
Example: gogogo
Without parentheses, the pattern pattern:go+ means subject:g character, followed by subject:o repeated one or more times. For instance, match:goooo or match:gooooooooo. Parentheses group characters together, so pattern:(go)+ means match:go, match:gogo, match:gogogo and so on.
Example: domain
Let’s make something more complex – a regular expression to search for a website domain. For example: As we can see, a domain consists of repeated words, a dot after each one except the last one. In regular expressions that’s pattern:(\w+.)+\w+: The search works, but the pattern can’t match a domain with a hyphen, e.g. my-site.com, because the hyphen does not belong to class pattern:\w. We can fix it by replacing pattern:\w with pattern:[\w-] in every word except the last one: pattern:([\w-]+.)+\w+.
Example: email
The previous example can be extended. We can create a regular expression for emails based on it. The email format is: name@domain. Any word can be the name, hyphens and dots are allowed. In regular expressions that’s pattern:[-.\w]+. The pattern: That regexp is not perfect, but mostly works and helps to fix accidental mistypes. The only truly reliable check for an email can only be done by sending a letter.
Parentheses contents in the match
Parentheses are numbered from left to right. The search engine memorizes the content matched by each of them and allows to get it in the result. The method str.match(regexp), if regexp has no flag g, looks for the first match and returns it as an array: 1. At index 0: the full match. 2. At index 1: the contents of the first parentheses. 3. At index 2: the contents of the second parentheses. 4. …and so on… For instance, we’d like to find HTML tags pattern:<.?>, and process them. It would be convenient to have tag content (what’s inside the angles), in a separate variable. Let’s wrap the inner content into parentheses, like this: pattern:<(.?)>. Now we’ll get both the tag as a whole match:
and its contents match:h1 in the resulting array:
Nested groups
Parentheses can be nested. In this case the numbering also goes from left to right. For instance, when searching a tag in subject: we may be interested in: 1. The tag content as a whole: match:span class=“my”. 2. The tag name: match:span. 3. The tag attributes: match:class=“my”. Let’s add parentheses for them: pattern:<(([a-z]+)\s([^>]))>. Here’s how they are numbered (left to right, by the opening paren): In action: The zero index of result always holds the full match. Then groups, numbered from left to right by an opening paren. The first group is returned as result[1]. Here it encloses the whole tag content. Then in result[2] goes the group from the second opening paren pattern:([a-z]+) - tag name, then in result[3] the tag: pattern:([^>]*). The contents of every group in the string:
Optional groups
Even if a group is optional and doesn’t exist in the match (e.g. has the quantifier pattern:(…)?), the corresponding result array item is present and equals undefined. For instance, let’s consider the regexp pattern:a(z)?©?. It looks for “a” optionally followed by “z” optionally followed by “c”. If we run it on the string with a single letter subject:a, then the result is: The array has the length of 3, but all groups are empty. And here’s a more complex match for the string subject:ac: The array length is permanent: 3. But there’s nothing for the group pattern:(z)?, so the result is [“ac”, undefined, “c”].
Searching for all matches with groups: matchAll
When we search for all matches (flag pattern:g), the match method does not return contents for groups. For example, let’s find all tags in a string: The result is an array of matches, but without details about each of them. But in practice we usually need contents of capturing groups in the result. To get them, we should search using the method str.matchAll(regexp). It was added to JavaScript language long after match, as its “new and improved version”. Just like match, it looks for matches, but there are 3 differences: 1. It returns not an array, but an iterable object. 2. When the flag pattern:g is present, it returns every match as an array with groups. 3. If there are no matches, it returns not null, but an empty iterable object. For instance: As we can see, the first difference is very important, as demonstrated in the line (*). We can’t get the match as results[0], because that object is a pseudoarray. We can turn it into a real Array using Array.from. There are more details about pseudoarrays and iterables in the article info:iterable. There’s no need for Array.from if we’re looping over results: …Or using destructuring: Every match, returned by matchAll, has the same format as returned by match without flag pattern:g: it’s an array with additional properties index (match index in the string) and input (source string):
Named groups
Remembering groups by their numbers is hard. For simple patterns it’s doable, but for more complex ones counting parentheses is inconvenient. We have a much better option: give names to parentheses.
That’s done by putting pattern:?
Capturing groups in replacement
Method str.replace(regexp, replacement) that replaces all matches with regexp in str allows to use parentheses contents in the replacement string. That’s done using pattern:\(n, where pattern:n is the group number.
For example,
For named parentheses the reference will be pattern:\)
Non-capturing groups with ?:
Sometimes we need parentheses to correctly apply a quantifier, but we don’t want their contents in results. A group may be excluded by adding pattern:?: in the beginning. For instance, if we want to find pattern:(go)+, but don’t want the parentheses contents (go) as a separate array item, we can write: pattern:(?:go)+. In the example below we only get the name match:John as a separate member of the match:
Summary
Parentheses group together a part of the regular expression, so that the quantifier applies to it as a whole.
Parentheses groups are numbered left-to-right, and can optionally be named with (?
alert( 'Gogogo now!'.match(/(go)+/ig) ); // "Gogogo"
Follow the lesson from Microsoft Web-Dev-For-Beginners course