Regular Expression in Python with Examples | Set 1
Last Updated: 19-10-2020
Module Regular Expressions(RE) specifies a set of strings(pattern) that matches it.
To understand the RE analogy, MetaCharacters are useful, important and will be used in functions of module re.
There are a total of 14 metacharacters and will be discussed as they follow into functions:
\ Used to drop the special meaning of character following it (discussed below) [] Represent a character class ^ Matches the beginning $ Matches the end . Matches any character except newline ? Matches zero or one occurrence. | Means OR (Matches with any of the characters separated by it. * Any number of occurrences (including 0 occurrences) + One or more occurrences {} Indicate number of occurrences of a preceding RE to match. () Enclose a group of REs
- Function compile()
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.
#Module Regular Expression is imported using <strong>import</strong>().
import re
# compile() creates regular expression character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[a-e]')
# findall() searches for the Regular Expression and return a list upon finding
print(p.findall("Aye, said Mr. Gibenson Stark"))
Output:
['e', 'a', 'd', 'b', 'e', 'a']
Understanding the Output:
First occurrence is ‘e’ in “Aye” and not ‘A’, as it being Case Sensitive.
Next Occurrence is ‘a’ in “said”, then ‘d’ in “said”, followed by ‘b’ and ‘e’ in “Gibenson”, the Last ‘a’ matches with “Stark”.
Metacharacter backslash ‘\’ has a very important role as it signals various sequences. If the backslash is to be used without its special meaning as metacharacter, use’\\’
\d Matches any decimal digit, this is equivalent to the set class [0-9]. \D Matches any non-digit character. \s Matches any whitespace character. \S Matches any non-whitespace character \w Matches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_]. \W Matches any non-alphanumeric character.