
Find and replace complex patterns on Linux, Mac or Windows using regex. We'll look some examples from very easy to quite complex. We'll use Dreamweaver, BBedit, Vim and Sed.
Back in the day of my innocent web developer days when I was just starting to dabble in PHP, I had no idea about regular expressions. Then one function changed everyting - preg_replace(). I was so excited when I discovered it. I used it for everything - pulling blogs from other sites, reformatting html to add links automatically, remove unwanted tags. In this article, I want to introduce you to regular expressions and show you some useful examples of using it in different programs.
Example #1 (Dreamweaver)
"I've got this drop-down list of names, but it's so long I really don't want to go and change it entry by entry. Can I do a global search and replace?"
Here's what the list looks like:
<option value="http://www.wanttoknow.info/#Baer">Baer, Robert - Case Officer, 21 Years in CIA, Career Intelligence Medal</option>
<option value="http://www.wanttoknow.info/#Bowman">Bowman, Col. Robert - Director of Advanced Space Programs Development under Ford and Carter</option>
<option value="http://www.wanttoknow.info/#Burks">Burks, Fred - State Department Interpreter for Presidents George W. Bush and Bill Clinton</option>
<option value="http://www.wanttoknow.info/#Christison">Christison, William - Director of the CIA's Office of Regional and Political Analysis</option>
<option value="http://www.wanttoknow.info/#Cleland">Cleland, Senator Max - U.S. Senator from Georgia. Member of 9/11 Commission</option>
We want to extract the parts in between the <option> tags. The challenge here is that the <option...> tag is different every time. This is easily accomplished through regular expression search and replace.
Let's use Dreamweaver this time.
<option[^>]*>
This pattern looks for "<option", and then any number of characters that do not equal to ">", followed by a ">".
Note that even though we used Dreamweaver here, you can use BBedit or Vim to accomplish the same thing.
Example #2 (BBedit)
Now we have a nice list:
Baer, Robert - Case Officer, 21 Years in CIA, Career Intelligence Medal
Bowman, Col. Robert - Director of Advanced Space Programs Development under Ford and Carter
Burks, Fred - State Department Interpreter for Presidents George W. Bush and Bill Clinton
Christison, William - Director of the CIA's Office of Regional and Political Analysis
Cleland, Senator Max - U.S. Senator from Georgia. Member of 9/11 Commission
However, the data is in reverse of what we desire, and we want to switch the names (red) with their descriptions (orange). Luckily for us, we have the dash (-) in between the two, so we can use that to make the distinction. Now to do this by hand to a 2000-entry list would take forever, but thanks to regular expressions and back referencing, this will be a breeze.
Let's use BBedit for this.
(.*) - (.*)
\2 - \1
... and voila! Our list looks like this:
Case Officer, 21 Years in CIA, Career Intelligence Medal - Baer, Robert
Director of Advanced Space Programs Development under Ford and Carter - Bowman, Col. Robert
State Department Interpreter for Presidents George W. Bush and Bill Clinton - Burks, Fred
Director of the CIA's Office of Regional and Political Analysis - Christison, William
U.S. Senator from Georgia. Member of 9/11 Commission - Cleland, Senator Max
In this case, we use enclose the two patterns in parentheses "()", separated by a dash (-) with two spaces on either end. Note that if the program you're using doesn't like spaces, the regex code for space is \s. The dot (.) stands for any one character, and the asterisk (*) following the dot (.) tells it that the character can be repeated 0 or more times.
Under Replace, we use what is called back-reference. The contents of the first set of parentheses can be called back with \1, and the contents of the second with \2. The space-dash-space separator can be anything.
Example #3 (Vim and Sed)
Some projects that I work on have Drupal development servers. It's a good practice not to use the live server for development and experimentation. Sometimes it's great to be able to just run a script, which copies the database and files from the live server onto the mirror development server, thus creating a mirror on which I can play around and try all kinds of experiments. (I won't include the complete script here, but I can post it if there's interest.)
As a part of the process I need to a database replacement of the URLs from www.site.com to dev.site.com, otherwise all the links will just point to the live site. This is easily accomplished using regex. Since we'll be working in a Linux (bash) environment, we'll take a look at two ways of doing it: using Vim and Sed.
First let's use Vim:
:%s/www\.site\.com/dev.site.com/g
This is what everything stands for:
This is a very powerful feature to an already amazing cross platform (Linux, Mac and Windows) editor Vim.
Now let's take a look at Sed.
Sed is amazing in that it can be used as a part of a bash script. In our backup/mirror solution it's obvious that we want as little human intervention as possible. Whereas Vim is great for custom searches, once we know the set pattern that needs to be replaced every time, Sed can be integrated into the script and do it automatically.
For this example, let's assume that we did a mysql dump of a database into a file called db.sql. We'll now use Sed to automatically replace all the occurrances of "www.site.com" with "dev.site.com".
sed -r "s/www\.site\.com/dev.site.com/g" db.sql > db-dev.sql
Where:
I use this as a part of my development server migration bash script, and it works wonderfully.
Besides these examples, regular expression (regex or reg-ex) can be found in PHP and almost every other computer language.
Enjoy!
Add article to: |