GREP

Ok, so you’ve got an exported xml of all posts from a trusty old WPblog, but before you magically dump all this content into your shinny new website, your client wants all the images removed from the posts because nobody bothered about copyright, but suddenly they do.

These are not featured images either, they are embedded inline and even if you unattached each image and deleted the each image from the media library, the code would still be there in the post body content, with the alt tag showing…

OO – I thought. I’ll do a search + replace… but hang on, all the images have different image names… hmmm – I need a tool to search and replace content between html tags

Never fear GREP is here…
(globally search a regular expression and print)

All you need TEXT Wrangler (FREE download) or BBEdit

And a special bit of code in the search and replace box.

Find images with this code:

<img (.*?)/>

Find links with this code:

<a(.*?)</a>

Just tick the GREP box…

CLEAN WITH GREP

How to strip HTML code using ‘Grep’ in Text Wrangler:

1. Open document to be stripped of code;
2. Click: Search => Find …
 3. In the Find field put:

 </?[^>]*>

4. Select: Grep
5. Click Replace All.

Courtesy of Art Uptonthanks man

If you want to know more about GERP – try Cari D. Burstein’s excellent explaination from 2004 – or if all else fails, let Physicist Richard Feynman try and explain.

Facearse Tweep Internetmail