Feedly: Somewhat Schizophrenic so Please Settle Down

A dear friend uses Feedly to monitor my site. He complained that he was getting 403s Banned, and asked why. Well, I have found that Feedly usually only takes my RSS feed, but sometimes, not often, it scrapes me mercilessly. Once I see a bot start scraping, I ban it. I moved him over to the more well behaved Feedburner by Google.

Here are the Feedly user agents:

Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
FeedlyBot/1.0 (http://feedly.com)

The latter, FeedlyBot, runs off WZComm, and had previously scraped me, so I banned it. WZComm also runs the surdotly bot, which is also banned. The former, Feedly/1.0, runs off Level 3, and seems well behaved.

New Host Provider, No IP Bans for 21 hrs

Moved, I did, from Site5, to A2. The last 21 hrs was a wet and wild ride all without the protection of my trusty .htaccess file, the one with my Ip ban list. Within that time, 21 hrs, I received a total of 33 spam comments. Usually I receive only one or two. It is clear that without protection I would be inundated by comment spam.

Of course these IPs are only the ones that comment spammed me. There are many more that use their bots to do content scraping, trying to break into my site, trick my host provider, etc. There are too many to list.

Keeping Pinterest in an Ocean of AWS Bots

Big Weed told me to not ban Pinterest. While I am not a huge Pinterest fan, she is/was so I listen to her. The problem is that Pinterest is hosted on Amazon Web Services (AWS), a cloud host provider infamous for hosting bad bots. Here are the IP ranges to ban AWS but keep Pinterest coming back.

# AWS –
deny from
# Pinterest
deny from

Why I Banned Amazon Web Services AWS

My friend was surprised when I told him that I banned all IP ranges of Amazon Web Services (AWS) from my site. It is particularly ironic considering that we both had recently attended an AWS Cloud Computing IoT presentation, which was well done and interesting to both of us.

AWS accounts for a huge chunk of the world’s cloud computing platform, and my decision to ban all IP ranges did not come lightly. I just simply could not keep up with all the comment spammers and scrapers coming out of AWS. It seems like I am not alone. This has been by experience as well. There are others.

Odd htaccess Observations with ISP Site5

Puzzling, it is at times, that my htaccess does not always behave as intended. As a computer scientist I expect that my programs and file input should output consistent, stable and reliable results immediately. This is not the case with my htaccess file, hosted on Site5, my internet service provider.

Delays in htaccess Implementation

When I do certain changes to my htaccess, there may be delays of a day or two. This is very odd to me, because supposedly the htaccess is checked for every server request. Maybe there are some caching that I do not know about. Nevertheless it seems like the htaccess has a unique personality. I know that I should not anthropomorphize a computer, much less a security file such as htaccess on an Apache server, but it is difficult to not.

tanyadokterkeluarga.blogspot Referrer Spam: Research, Ban

tanyadokterkeluarga.blogspot is a persistent referrer spammer. They use a huge amount of Ip addresses that do not repeat the third octet. It has similar strategies to kosmetik-freaks.blogspot, in fact sharing identical IP ranges. They are sister referrer spammers. Both are not banned by the HTTP_REFERER in htaccess. If you kill one you kill the other, a nice double prize. As with the sister, this spammer runs out of Indonesia.

These are the referrers:

hvvc.us Content Scraper: Research, Ban

There are some scrapers and there are others that are ridiculous. I just got scraped hard by, 209-133-216-182.static.hvvc.us, with 105 server entries and 7 unique user agent names. Excessive, to say the least.

Here are the UA’s used:

Mozilla/5.0 (BlackBerry; U; BlackBerry 9900; en) AppleWebKit/534.11+ (KHTML, like Gecko) Version/ Mobile Safari/534.11+
Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20160721-2308 +http://www.exif-search.com)
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201
Opera/12.02 (Android 4.1; Linux; Opera Mobi/ADR-1111101157; U; en-US) Presto/2.9.201 Version/12.02
UniversalFeedParser/3.3 +http://feedparser.org/
Windows-Media-Player/11.0.5721.5145 –
NOC4Hosts, HIvelocity Network

I have sent an email to their ISP, abuse@hivelocity.net.

kosmetik-freaks.blogspot Referrer Spam: Research, Ban

This kosmetik-freaks.blogspot is a referrer spammer that has been harassing me for quite a long time. I have tried to ban them with an HTTP_REFERER ban but this does not work. My ISP, Site5, will not help me. They are predominantly out of Indonesia. They are pret103.47.135.43
too sophisticated to evade my detection for so long.

The sister referrer spammer is tanyadokterkeluarga.blogspot, which uses the identical method and largely shares the same IP ranges. When you kill one you kill the other. Almost all these UAs are mobile devices, leading me to believe these are mobile customers that have downloaded the same spam app.

kwpublisher.com Referrer Spam: Research, Ban

kwpublisher.com is a long-time referrer spammer that I would like to remove. I have tried to ban them with an HTTP_REFERER ban but this does not work. My ISP, Site5, will not help me. This guy seems to have a similar method to kosmetik-freaks.blogspot. They seem to be out of Pakistan mostly, but have gone to Indonesia and China. I am now tracking them closely.

Conclusion: Tracked down the code hotlinking to my site. Complained to their domain names provider. Them they disappeared. Goodbye. x 4 – Pakistan Tel

Host Name 0 Zero or localhost in your Raw Access Log

Does your raw access log display a host name of “0”, or zero? Very odd, is it not? I have been struggling with this for a couple of months, and my ISP Site5 had no answers. It turns out that one of my spammers, NFORCE_ENTERTAINMENT, puts an unprintable character into their host table, so that when my ISP looks them up, they display the unprintable character in my log as “0”.

Trying to control your site’s spam can be challenging. If you try to ban an IP that is simply 0, or a host name of “0” you will fail, because there is no zero in their host name, but an unprintable character. Ban these guys instead.