Menu Close

Grouping in regular expression (Regex) Python…FTC

Basics of Regular Expressions in Python, check the link below ๐Ÿ‘‡ ๐Ÿ‘‡
Regular Expressions Regex basic in Python ๐Ÿ‘ˆ ๐Ÿ‘ˆ

Also with table of contents about symbols and their usage in Regex

For more in regular expressions modules like flags, pre-compiled patterns and search(), links are at bottom
๐Ÿ‘‡ ๐Ÿ‘‡ scroll down

Python makes regular expressions available through the re module.

Grouping is done with parentheses. Calling group() returns a string formed of the matching parenthesized
subgroups.

match.group() # Group without argument returns the entire match found
# Output: '123'

match.group(0) # Specifying 0 gives the same result as specifying no argument
# Output: '123'

Arguments can also be provided to group() to fetch a particular subgroup.

From the docs:

If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument.

Calling groups() on the other hand, returns a list of tuples containing the subgroups.

import re

sentence = "This is a phone number 672-123-456-9910"
pattern = r".*(phone).*?([\d-]+)"
match = re.match(pattern, sentence)
print("groups(): ",match.groups())
# The entire match as a list of tuples of the paranthesized subgroups
# Out: ('phone', '672-123-456-9910')

print("group(): ",match.group())
# The entire match as a string
# Out: 'This is a phone number 672-123-456-9910'

print("group(0): ",match.group(0))
# The entire match as a string
# Out: 'This is a phone number 672-123-456-9910'

print("group(1): ",match.group(1))
# Out: 'phone'
# The first parenthesized subgroup.

print("group(2): ",match.group(2))
# The second parenthesized subgroup.
# Out: '672-123-456-9910'

print("group(1,2): ",match.group(1, 2))
# Multiple arguments give us a tuple.
# Out: ('phone', '672-123-456-9910')

OUTPUT:
groups():  (‘phone’, ‘672-123-456-9910’)
group():  This is a phone number 672-123-456-9910
group(0):  This is a phone number 672-123-456-9910
group(1):  phone
group(2):  672-123-456-9910
group(1,2):  (‘phone’, ‘672-123-456-9910’)


ILLUSTRATION

Executed using python3 CLI

Named groups

import re
match = re.search(r'My name is (?P<name>[A-Za-z ]+)', 'My name is John Smith')
print(match.group('name'))
# Output: 'John Smith'

print(match.group(1))
# Output: 'John Smith'

Creates a capture group that can be referenced by name as well as by index.

Non-capturing groups

Using (?:) creates a group, but the group isn’t captured. This means you can use it as a group, but it won’t pollute your “group space”.

import re
# match = re.match(r'(\d+)(\+(\d+))?', '11+22')
print(re.match(r'(\d+)(\+(\d+))?', '11+22').groups())
# Output: ('11', '+22', '22')

print(re.match(r'(\d+)(?:\+(\d+))?', '11+22').groups())
# Output: ('11', '22')

This example matches 11+22 or 11 , but not 11+ . This is since the + sign and the second term are grouped. On the other hand, the + sign isn’t captured.

ILLUSTRATION USING PYTHON3 CLI

For the first part in using regular expressions and matching the string
you can visit this link โ€“>   ๐Ÿ‘‡ ๐Ÿ‘‡
Matching the beginning of a string (Regex) Regular Expressions in python

The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise.
For the second part in using regular expression and searching the string
visit โ€“> Searching โ€“ Regular Expressions (Regex) in Python

Precompiled_pattern โ€”> ๐Ÿ‘‡ ๐Ÿ‘‡
 Precompiled patterns โ€“ Regular Expression(Regex) in Python
Compiling a pattern allows it to be reused later on in a program.
However, note that Python caches recently-used expressions, so โ€œprograms that use only a few regular expressions at a time neednโ€™t worry about compiling regular expressionsโ€.

Flags in Regex ๐Ÿ‘‰ ๐Ÿ‘‰ Flags in Regular Expressions ( Regex ) in Pythonโ€ฆ.FTC
For some special cases we need to change the behavior of the Regular Expression, this is done using flags. Flags can be set in two ways, through the flags keyword or directly in the expression.