Regular Expressions: How will you know it’s them when you see them?
The phrase Regular Expression can be shortened. Regular expressions are commonly referred to as regex, or regexp in conversation. We use regular expressions to find or match a character pattern within a word or a line of text. For example, we can search for the word clock in the word clockwork. We can also find the word lock or work.
Besides matching the characters from the alphabet we can also match other entities, for example special characters like #, $, % and others. In addition, regular expressions make it possible matching character ranges like A-Z or a-z, and also numeric values such as 0,1,2,3,5,6,7? etc.
Regular expressions let us quickly find a character or a sequence of characters within a string of text. They make it possible to find patterns searching from the start or the end of the string. As you may already know, a string is just text.
We can create the RegExp object as shown below:
var regex_obj = new RegExp(pattern, modifiers);
The pattern parameter is the text we want to match. The modifier parameter is how we want to match it. For example, modifier can be set to “g” as in following example.
var regex_obj = new RegExp("lock", "g");
The “g” modifier means “global,” and it means that the entire string of text will be searched.
Had the “g” modifier been omitted, the pattern “lock” would only match once, even if there were more instances of the character combination “lock” in the string we want to test this pattern on. This just depends on what you want to do.
In addition to the RegExp object method, we can simply define a pattern as follows:
var pattern = /lock/g;
Notice that when we do it this way, we no longer use the double quotes to encompass the pattern. We use the slash /. Also the modifier g is simply added after the second dash. Don’t neglect the closing semicolon. Even though you can.
By the way, two other modifiers are available which are /i for case-insensitive matching, and /w for multi-line matching, this is when you have line breaks in the string you are matching the pattern with. You can use them in combination, for example to match case-insensitive characters globally, use this pattern:
var pattern = /lock/gi;
The Rules ? The Pattern Parameter
It’s important to note that I refer to the pattern parameter as a character combination and not as a string. You see, each character in the pattern is like a command to the regular expression algorithm. It’s like a language in itself.
In regular expressions, commands are specified using special characters like parenthesis (abc) or square brackets [cat]. Many other special characters are used to match things in very specific ways. Herein lies the power of regular expressions.
An example of that would be the use of the square brackets[ and ]. Also known as the bracket expression. Whenever we place a sequence of characters, such as “abc” within square brackets, it changes meaning of our search pattern.
The [ and ] brackets will actually not be matched themselves at all, only the characters within them. The bracket expression is just one part of the regular expression language. When using them within the pattern parameter, it means the following: match any single character within the brackets.
So, while the pattern “abc” matches exactly abc in that order, the pattern represented as [abc] would match either a, b or c. Just one character. Here is an example of how one would instantiate such a pattern using the RegExp object:
Some other patterns are shown in following examples:
|[abc]||Find any character between the brackets|
|[^abc]||Find any character not between the brackets|
|[0-9]||Find any digit from 0 to 9|
|[A-Z]||Find any character from uppercase A to uppercase Z|
|[a-z]||Find any character from lowercase a to lowercase z|
|[A-z]||Find any character from uppercase A to lowercase z|
Round parenthesis can be used to search or conditions, with the | symbol:
|(clock|work|alex) Match either one of the three words.|
Note that had the /g modifier been added to the end of this pattern, all separate words or words containing clock, work and alex character sequences would be matched. But without the /g modifier, only one word from this series (separated by | character, also known as the “OR” character) will be matched, whichever one is found first in the target string.
As a quick side note, just before we see how strings can actually be matched with this pattern, consider that the matched results are always returned as an array.
In the case of the latter example (the round parenthesis), with the /g modifier, the resulting array would return the following: [ “clock”, “work”, “alex” ] if all three words were found in the target string. Make a mental note, that the “match” function (and it is literally called match(), as you will see in a moment) belongs to all string objects (that we are about to discuss) returns an array of matches, even if only one match has been returned, for example: [ “clock” ]. This value would be stored at index 0 in the match array as in match. If found, the results work and alex would be stored at match and match indices respectively.
There are many more matching tools at our disposal when using Regular Expressions, including metacharacters and quantifiers. You can look them up, I just want to keep this tutorial simple. I’ll probably talk about them as part of my future work.
Matching the Target String
Okay, we now know how to create some of the basic patterns, but how do you actually match them with real strings? This is shown in the example below.
Remember how we had two different ways of creating a regular expression pattern? One was using the RegExp object, and the other one writing it into a variable? Here is how you would execute these patterns:
Using the RegExp object, find the word light in the word enlightenment:
var pattern = new RegExp("light");
Alternatively, we can create the parameter like this:
var pattern = /light/;
Now that we have the pattern, let’s create the string we want to search and match it with the value stored in this pattern variable. Notice that I use the member function called match. It belongs to the string object (the one created as var) I conveniently called
var string = "enlightenment"; var matched = str.match(pattern);
Objects such as the String object contain special functions that operate on strings. When we define a string with double quotes, an object of type String is created automatically. One of the functions that is already integrated into the string object is called match. This is exactly what we use to match a regular expression pattern with a string.
We can, supposedly also pass a regular text string to the match function, minus the special characters like the dot (.) or [ and ] brackets (and many others). You see, special characters have a unique meaning in a regular expression pattern. If your pattern has the dot (.) character in it, regular expression will not match this dot as part of the text string.
As part of the regular expression, the dot (the period) character has special meaning. It means match any single character. In order to match the actual period, we need to put a backslash in front of it as in \. or alternatively, place the dot inside square brackets [.]
The function match is referred to as a member function of the string object. Objects of other types, for example the Math object have other member functions.
Well, because everything is an object, we can create a string and execute a member function on it right away. For example, the string “enlightenment” itself is an object and you could immediately call the member function on it like this:
var pattern = "light"; var matched = "enlightenment".match(pattern);
Or you can do it this way:
var matched = "enlightenment".match("light");
Or this way:
var matched = "enlightenment".match(/light/);
What was matched will be stored in the matched variable. Let’s display it:
The alert function will display the text string: “light“. You may not find this example very impressive. We created a string called “light” only to find out that it matched with the word enlightenment and the result is the same string “light?”
Well, this example shows only the basic function of how regular expressions work! Had the string not been matched, a value of null would be returned.
Moreover, complex regex patterns may not always return the result that matches the requested pattern (Of course). For example if you passed the pattern [aet] the returned result would be e. Why’s that? Because the expression in square brackets only matches one character out of the entire set. In this case, the string is enlightenment, and the first letter is e. Because e is part of the [aet] pattern, it will be immediately matched and returned. The letters a and t are skipped and no longer considered soon as at least one of them is matched. However, if /g is added to the end of the pattern, both t and e will be matched and returned as [“t”, “e”].
I listened to a song when I was writing this tutorial, so I decided to use some of the lyrics as the source for the target string as an example.
Let’s consider the lyrics from Just One Kiss by The Cure as the source for our string. You can listen to the song here: http://www.youtube.com/watch?v=7j7IarXmAAo
Here are a few examples of Regular Expression matches. You can print this diagram and keep it on your desk to help you out with looking things up. Regular expressions can be a pain to memorize and a reference can’t hurt: