Skip to content

Regular expressions

Regular expressions (also known as regex) are patterns that enable us to verify whether a given string of text (for instance some user input) abide a pre-selected format (like time and date). With regex we can confirm whether the user input data follows the correct formula. For instance this regex helps check whether the user has provided the correct name:

"[A-Z][a-z]+"
Regular expressions are structured out of an atom sequence. The basic atom is a single character,number or a special character. We can group such 'literals' with brackets. We can also use 'quantifiers' which count the number of atom occurrences and an alternative character. The simplest regular expression can take this form:

abcde

Quantifiers

Quantifiers specify the number of occurrences to match against in a given character string. Let us see an example below:

a+bcde
This instance is quite more interesting and complex. This expression will return true for the following strings: "abcde", "aabcde", "aaabcde". By using the '+' quantifier it will match one or more occurrence of the pattern that is present before it. Below you can find the table with popular regex quantifiers:

Quantifiers

Scope and groups

When talking about scope in regular expressions we usually wish to say: "here you will find one of those characters". Such a scope of characters is also understood as an atom. We define it in square brackets. We can do this in two ways: by naming all possible characters (one next to another, no commas used) or introduce a group. We can also combine them. We identify a group by specifying the first and final item separated by a hyphen. This refers to numbers (like 1–3 ; 0–9; 1–5) or characters (a-z ; A-Z).

Below you will find a table of regex groups and scope usage:

Scope and groups

Implementing regular expressions in Java

In Java we accomplish most tasks related to regex by using the 'Pattern' and 'Matcher' classes.

Pattern

The 'Pattern' class represents a compiled regular expression. In other words it's an expression calculated (or 'manufactured') by the computer which makes it's execution more efficient. We get an Object with representation of our expression by using the static method of 'compile(regexAsString)':

Pattern pattern = Pattern.compile("a+bcd");

Matcher

We get an instance of the 'Pattern' class which includes a 'matcher()' method which in the end returns an instance of the 'Matcher' class

Matcher matcher = pattern.matcher("aaaaabcd aaaaaaabbcd");

The 'Matcher' object also has a method called 'matches()' which informs us whether the string of characters used to create an instance of the 'Matcher' class fits into our regular expression:

matcher.matches();  // returns true or false

The 'Matcher' class also has a 'find()' method which returns 'true' if there is something that matches to the regex expression:

matcher.find();     // this will return true to our examples provided above.