Useful Regex for Yahoo Pipes and RSS

Regex is an acquired taste: initially bewildering and bitter, but given time it grows on you. Eventually you begin to appreciate its bouquet. Small, efficient, and powerful, Regex epitomizes good code. This is an advanced discussion of Regex I have found useful in Pipes. If you are new to either, then read up and play.

Other reference material includes Regular Expressions in Yahoo Pipes, but is a tad dated. Similarly. The Pipes forum is very helpful, especially hapdaniel.

Purpose Replace With
Truncate naughty item.description efficiently. Note that the Regex will fail with a string longer than ~200k. The alternate solution is an inefficient loop and a substring, but it always works. [^(.{200}).*$](g flag) or [^([^>]+>.{80}.*?)\b.*] [$1]
Remove all HTML [(?s)<.*?>] or alternately [<[^>]*>] [] (g flag)
Remove hyperlinks [(?s)& lt;a.*?& lt;/a& gt;] [] (g flag)
Decoding Google URLs. Use [item.guid.content] copied to [item.link] [^.*?cluster=] []
Google News:

  1. layout 1, no image (url, title, /url, source),
  2. layout 2, image, left (url, img, source, /url), right (url, title, /url, source)
[(?s)& lt;a.*?& lt;/a& gt;] (for both layouts), [^.*?& lt;br\s/.*?& lt;/b& gt;] for layout 1, and layout 2 right [] *
Add link to end of description [$] [ & lt;a href=”${link}”& gt;More& lt;/a& gt;] note & nbsp; before first &
Remove from between brackets. Define special character using “\” (escape with a backslash) “\[[^]]+]” or “\([^\)]+\)” []
Accessing RSS fields [${field name}] for example [${description}]
Matching between HTML tags, using character class [“your HTML start tag”([^<]+).*] Creates a character class that includes all characters, repeated (+), up to the first "<" (excludes (^) "<"). [$1] From your HTML start tag, include everything up to when you hit an “<", and exclude everything else.
Remove everything before double “//” [^(.*?)//(.*)] [$1]
Remove everything after “/” [^(.*?)[/](.*) [$1]
Remove everything after the first line, or some phrase [^(.*?)[\n\r]+$] [$1]
Append a string. For example, append “?track=count” to .html [(.html)] [$1?track=count], g flag on
Remove everything before a “/” and lowercase [^.*?/] [\L]
Capitalize the first character [^] [\u]

*Remove all outside []

Feed Validator: If there’s trouble run it through Feedburner then Pipe it.
YQL Help: In the YQL module the query, select * from html where url=’put_your_url_here’

Regex is a bear to learn. Tutorial Possible HTML removal help

2019 Jan 09 The Rise and Demise of RSS

2 thoughts on “Useful Regex for Yahoo Pipes and RSS

  1. Koi

    Hello,

    I took the article from the feed, but the content contains links encoded (base64), how do I do to decode this links in yahoo pipes? Thanks

  2. Pingback: Alur Yahoo Pipes Retweeting | jurnal darmawan

Leave a Reply

Your email address will not be published. Required fields are marked *