Menu Close

Matching the beginning of a string (Regex) Regular Expressions in python….FTC

Python makes regular expressions available through the re module.
Regular expressions are combinations of characters that are interpreted as rules for matching sub-strings.

For instance, the expression ‘amount\D+\d+’ will match any string composed by the word amount plus an integral number, separated by one or more non-digits, such as: amount=100 , amount is 3 , amount is equal to: 33 , etc.

The first argument of re.match() is the regular expression, the second is the string to match:

import re
pattern = r"123"
string = "123zzb"
print(re.match(pattern, string)) 

OUTPUT:
<_sre.SRE_Match object; span=(0, 3), match=’123′>

match = re.match(pattern, string)
print(match.group())

OUTPUT:
  ‘123’

ILLUSTRATION:

EXECUTED USING PYTHON3

You may notice that the pattern variable is a string prefixed with r , which indicates that the string is a raw string literal.

A raw string literal has a slightly different syntax than a string literal, namely a backslash \ in a raw string literal means “just a backslash” and there’s no need for doubling up backlashes to escape “escape sequences” such as newlines ( \n ), tabs ( \t ), backspaces ( \ ), form-feeds ( \r ), and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.

Hence, r”\n” is a string of 2 characters: \ and n . Regex patterns also use backslashes, e.g. \d refers to any digit character. We can avoid having to double escape our strings ( “\d” ) by using raw strings ( r”\d” ).

For instance:

string = "\\t123zzb" # here the backslash is escaped, so there's no tab, just '\' and 't'
pattern = "\\t123"
# this will match \t (escaping the backslash) followed by 123

re.match(pattern, string).group()
# no match, throws error

re.match(pattern, "\t123zzb").group() 
# matches '\t123'

pattern = r"\\t123"
re.match(pattern, string).group()
# matches '\\t123'

ILLUSTRATION (with output)

executed using python3

Matching is done from the start of the string only. If you want to match anywhere use re.search instead:

match = re.match(r"(123)", "a123zzb")
match is None
# Out: True

match = re.search(r"(123)", "a123zzb")
match.group()
# Out: '123'

ILLUSTRATION (WITH OUTPUT IN NEXT LINE)

executed using python3