Posted in C++/Python

Python Regex

Regular Expression in Python with Examples | Set 1

Last Updated: 19-10-2020

Module Regular Expressions(RE) specifies a set of strings(pattern) that matches it. 
To understand the RE analogy, MetaCharacters are useful, important and will be used in functions of module re. 
There are a total of 14 metacharacters and will be discussed as they follow into functions: 

\   Used to drop the special meaning of character
    following it (discussed below)
[]  Represent a character class
^   Matches the beginning
$   Matches the end
.   Matches any character except newline
?   Matches zero or one occurrence.
|   Means OR (Matches with any of the characters
    separated by it.
*   Any number of occurrences (including 0 occurrences)
+   One or more occurrences
{}  Indicate number of occurrences of a preceding RE 
    to match.
()  Enclose a group of REs



  • Function compile() 
    Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 
     
#Module Regular Expression is imported using <strong>import</strong>().
 import re 
# compile() creates regular expression character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'.
 p = re.compile('[a-e]') 
# findall() searches for the Regular Expression and return a list upon finding
 print(p.findall("Aye, said Mr. Gibenson Stark")) 

Output: 

['e', 'a', 'd', 'b', 'e', 'a']




Understanding the Output: 
First occurrence is ‘e’ in “Aye” and not ‘A’, as it being Case Sensitive. 
Next Occurrence is ‘a’ in “said”, then ‘d’ in “said”, followed by ‘b’ and ‘e’ in “Gibenson”, the Last ‘a’ matches with “Stark”.
Metacharacter backslash ‘\’ has a very important role as it signals various sequences. If the backslash is to be used without its special meaning as metacharacter, use’\\’

\d   Matches any decimal digit, this is equivalent
     to the set class [0-9].
\D   Matches any non-digit character.
\s   Matches any whitespace character.
\S   Matches any non-whitespace character
\w   Matches any alphanumeric character, this is
     equivalent to the class [a-zA-Z0-9_].
\W   Matches any non-alphanumeric character. 
Advertisement

Author:

My name is Truong Thanh, graduated Master of Information Technology and Artificial Intelligent in Frankfurt University,Germany. I create this Blog to share my experience about life, study, travel...with friend who have the same hobbies.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s