Programming is like sex, one mistake and you have to support it for the rest of your life.

Programming is like sex, one mistake and you have to support it for the rest of your life.

Regular expression Regex matching strings

Python makes regular expressions available through the re module.
Regular expressions are combinations of characters that are interpreted as rules for matching sub-strings.

For instance, the expression ‘amount\D+\d+’ will match any string composed by the word amount plus an integral number, separated by one or more non-digits, such as: amount=100 , amount is 3 , amount is equal to: 33 , etc.

The first argument of re.match() is the regular expression, the second is the string to match:

import re
pattern = r"123"
string = "123zzb"
print(re.match(pattern, string))

OUTPUT:
<_sre.SRE_Match object; span=(0, 3), match=’123′>

match = re.match(pattern, string)
print(match.group())

OUTPUT:
‘123’

ILLUSTRATION:

EXECUTED USING PYTHON3

You may notice that the pattern variable is a string prefixed with r , which indicates that the string is a raw string literal.

A raw string literal has a slightly different syntax than a string literal, namely a backslash \ in a raw string literal means “just a backslash” and there’s no need for doubling up backlashes to escape “escape sequences” such as newlines ( \n ), tabs ( \t ), backspaces ( \ ), form-feeds ( \r ), and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

Hence, r”\n” is a string of 2 characters: \ and n . Regex patterns also use backslashes, e.g. \d refers to any digit character. We can avoid having to double escape our strings ( “\d” ) by using raw strings ( r”\d” ).

For instance:

string = "\\t123zzb" # here the backslash is escaped, so there's no tab, just '\' and 't'
pattern = "\\t123"
# this will match \t (escaping the backslash) followed by 123

re.match(pattern, string).group()
# no match, throws error

re.match(pattern, "\t123zzb").group() 
# matches '\t123'

pattern = r"\\t123"
re.match(pattern, string).group()
# matches '\\t123'

ILLUSTRATION (with output)

executed using python3

Matching is done from the start of the string only. If you want to match anywhere use re.search instead:

match = re.match(r"(123)", "a123zzb")
match is None
# Out: True

match = re.search(r"(123)", "a123zzb")
match.group()
# Out: '123'

ILLUSTRATION (WITH OUTPUT IN NEXT LINE)

executed using python3

Morae Q!