Useful Regex for Yahoo Pipes and RSS

March 8th, 2010 by dontai


Regex is an acquired taste: initially bewildering and bitter, but given time it grows on you. Eventually you begin to appreciate its bouquet. Small, efficient, and powerful, Regex epitomizes good code. This is an advanced discussion of Regex I have found useful in Pipes. If you are new to either, then read up and play.

Other reference material includes Regular Expressions in Yahoo Pipes, but is a tad dated. Similarly. The Pipes forum is very helpful, especially hapdaniel.

Purpose Replace With
Truncate naughty item.description efficiently. Note that the Regex will fail with a string longer than ~200k. The alternate solution is an inefficient loop and a substring, but it always works. [^(.{200}).*$] [$1] (g flag)
Remove all HTML [(?s)<.*?>] or alternately [<[^>]*>] [] (g flag)
Remove hyperlinks [(?s)& lt;a.*?& lt;/a& gt;] [] (g flag)
Decoding Google URLs. Use [item.guid.content] copied to [item.link] [^.*?cluster=] []
Google News:

  1. layout 1, no image (url, title, /url, source),
  2. layout 2, image, left (url, img, source, /url), right (url, title, /url, source)
[(?s)& lt;a.*?& lt;/a& gt;] (for both layouts), [^.*?& lt;br\s/.*?& lt;/b& gt;] for layout 1, and layout 2 right [] *
Add link to end of description [$] [ & lt;a href="${link}"& gt;More& lt;/a& gt;] note & nbsp; before first &
Remove from between brackets. Define special character using “\” (escape with a backslash) “\[[^]]+]” or “\([^\)]+\)” []
Accessing RSS fields [${field name}] for example [${description}]
Matching between HTML tags, using character class ["your HTML start tag"([^<]+).*] Creates a character class that includes all characters, repeated (+), up to the first “<" (excludes (^) "<"). [$1] From your HTML start tag, include everything up to when you hit an “<", and exclude everything else.
Remove everything after the first line, or some phrase [^(.*?)[\n\r]+$] [$1]
Append a string. For example, append “?track=count” to .html [(.html)] [$1?track=count], g flag on

*Remove all outside []

Feed Validator: If there’s trouble run it through Feedburner then Pipe it.
YQL Help: In the YQL module the query, select * from html where url=’put_your_url_here’

Leave a Reply