Python Regular Expressions

Python Regular Expressions are a powerful tool for various kinds of string manipulation. These are a special text string that is used for describing a search pattern to extract information from text such as code, files, log, spreadsheets, etc.
Python Regular Expressions are a domain-specific language (DSL) that is present as a library in most of the modern programming languages. A regular expression is a special sequence of characters that helps to match or find strings in another string.

match() function:

The match() function matches a pattern to a string with optional flags. It has the following syntax:

re.match(pattern, string, flags=0)

This function tries to match the pattern with a string. The flag field is optional and some values of flags are specified in the following table:

FlagDescription
re.ICase sensitive matching
re.MMatches at the end of the line
re.XIgnores white-space characters
re.UInterprets letters according to Unicode character set.

search() function:

The search() function searches for the first occurrence of a pattern within a string with optional flags. If the search is successful, a match object is returned and none otherwise. It has the following syntax:

re.search(pattern, string, flag=0)

Note: re.search() finds a match of a pattern anywhere in the string.

sub() function:

The sub() function in the re-module can be used to search for a pattern in the string and replace it with another pattern. It has the following syntax:

re.sub(pattern, repl, string, max=0)

findall() function:

findall() function is used to search a string and returns a list of matches of the pattern in the string. If no match is found, then the returned list is empty. It has the following syntax:

matchlist=re.findall(pattern, input_str, flags=0)

Note: re.findall() function returns a list of all substrings that match a pattern.

finditer() function:

The finditer() function is the same as findall() function but instead of returning match objects, it returns an iterator. This iterator can be used to print the index of match in the given string.

Groups:

A group is created by surrounding a part of the regular expression with parentheses. You can even give a group as an argument to the metacharacters such as *and?
Example:

import re
pattern=r"gr(ea)*t"
if re.match(pattern, "great"):
print("Ram is ea")
if re.match(pattern, "greaeaeaeaeaeaeat"):
print("Ram is greaeaeaeaeaeaeat")

 

Output:
Ram is ea
Ram is greaeaeaeaeaeaeat

Python supports two useful types of groups:
1. Named Group
2. Non-capturing Group

Named Group:

It has the format(?P…), where name is the name of the group and is the content. They are just like normal groups but are accessed by their name as well as by number.

Non-capturing Group:

It has the format (?:…) are not accessible by the group method, so, they can be added to an existing regular expression without breaking the numbering.

Example of Named Group and Non-Capturing Group:

import re
pattern=r"Go(?Pod)Go(?:in)gPy(th)on"
match=re.match(pattern, "GoodGoingPythonGoodGoingPythonGoodGoingPython")
if re.match:
print(match.group("FIRST"))
print(match.group(1))
print(match.group(2))
print(match.groups())

Output:
od
od
th
(‘od’, ‘th’)

Application of Regular Expressions:

We can use Regular Expressions to extract date, time, e-mail address, etc from the text.
Example: We know that an e-mail address has a username which consists of characters and it may include dots or dashes. The username is followed by @ sign and the domain name. The domain name may also include characters, dashes, and dots.
Now consider the following e-mail address given below:

[email protected]

Now, the regular expression representing the structure of an e-mail address can be given as:

pattern= r"[\w.-]+@[\w.-]+"

Where [\w.-]+ matches one or more occurrences of characters, dot or dash.

Example:

import re
pattern= r"[\w.-]+@[\w.-]+"
string=" Please write us at [email protected]"
match=re.search(pattern, string)
if match:
print("Email to: ", match.group())
else:
print("No Match")

Output:
Email to: [email protected]