Regular Expression

In this post, we will see how easily we can craft powerful and time saving regular expressions. This post explains the most basic concepts, which will help you even if you are not much aware of regular expressions.
The name regular expression name comes from mathematics where it is defined as

A regular expression is a particular meta-syntax for specifying regular grammars, which has many useful applications.

A regular expression basically is a pattern, describing a String which is processed by some sort of software, which can be called as regular expression engine. The engines processes the regular expression and try to match the pattern to the given string.

* In this post, the words/letters in BLUE denotes a regular expression and GREEN denotes the matches.

Meta-Characters

The opening square bracket [, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket ), these special characters are often called as Meta-characters. If you need to use any of these characters as a literal in a regular expression, you need to escape them with a backslash else it will be treated as a special character.

e.g. if you have the regular expression as a\+b+c, the first plus sign will be treated as a literal character and the other will have a special meaning.

Single Character

A period (.) matches a single character except the line break character (\n).
e.g. .ired matches Fired, Hired, Wired etc.

Character Sets

Square bracket ([]) can be used to match a character from a given set. e.g. to match a or i in Fired or Fared, we can use F[ia]red.

A hyphen inside the square bracket denotes the range of characters. e.g. [0-9] matches a single digit between 0 to 9. More than one range can be used inside one sqaure bracket like [0-9a-z]

When a caret(^) is the the first character just after the opening bracket ‘[^’, it matches any character except the ones specified in the set.
e.g. q[^x] matches qu in ‘quicktest’.

Repetition

An asterisk(*) matches zero or more occurance of the preceding character.
e.g. Ple*ase matches Please, Pleease, Pleeeeeeeeeeeeeeeeease, plase.

A plus sign(+) matches one or more occurance of the preceding character.
e.g. Ple+ase matched Please, Pleease, Pleeeeeeeeeeeeeeeeease but not plase.

A question mark(?) matches zero or one occurance of the preceding character.
e.g. Ple?ase matches Please or Plase only.

See also  Image Based Object Identification - Insight

Grouping

We can place parenthesis around multiple tokens to group them together. The containig sequence is treated as a unit.
e.g. QuickTest(Professional)  In this, the string ‘Professional’ is treated as a single unit and we can apply a quantifier to this group if required.
QuickTest(Professional)? matches QuickTest or QuickTestProfessional.

Alternation

a vertical line(|) matches one of the given expression.
e.g. day|night matches day in ‘for so many days and nights’
if the regular expression is applied again it will match night.

Anchors

anchors matches the position.
^ matches at the beginning of the string.
$ matches at the end of the string.
\w matches any alphanumeric character and the underscore.
\W matches any character other than alphanumeric and underscore.
\b matches at the start and/or end of the string only for if it is a word character.
\B matches every position where \b cannot match.

Usually these operator are combined into one single expression to match the expected search criteria that we need.
e.g’
[0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99
^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$ matches a date in yyyy-mm-dd
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b will match an email address.

We are now, set to apply the ease of Regular expression into our test automation.

More to come on regular expression in upcoming articles.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.