A dear friend uses Feedly to monitor my site. He complained that he was getting 403s Banned, and asked why. Well, I have found that Feedly usually only takes my RSS feed, but sometimes, not often, it scrapes me mercilessly. Once I see a bot start scraping, I ban it. I moved him over to the more well behaved Feedburner by Google.
Here are the Feedly user agents:
Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
The latter, FeedlyBot, runs off WZComm, and had previously scraped me, so I banned it. WZComm also runs the surdotly bot, which is also banned. The former, Feedly/1.0, runs off Level 3, and seems well behaved.
This is a preview of
Feedly: Somewhat Schizophrenic so Please Settle Down
. Read the full post (136 words, 0 images, estimated 33 secs reading time)
Moved, I did, from Site5, to A2. The last 21 hrs was a wet and wild ride all without the protection of my trusty .htaccess file, the one with my Ip ban list. Within that time, 21 hrs, I received a total of 33 spam comments. Usually I receive only one or two. It is clear that without protection I would be inundated by comment spam.
Of course these IPs are only the ones that comment spammed me. There are many more that use their bots to do content scraping, trying to break into my site, trick my host provider, etc. There are too many to list.
Big Weed told me to not ban Pinterest. While I am not a huge Pinterest fan, she is/was so I listen to her. The problem is that Pinterest is hosted on Amazon Web Services (AWS), a cloud host provider infamous for hosting bad bots. Here are the IP ranges to ban AWS but keep Pinterest coming back.
# AWS 126.96.36.199 – 188.8.131.52 184.108.40.206/11
deny from 220.127.116.11/13 18.104.22.168/16 22.214.171.124/17 126.96.36.199/18 188.8.131.52/19 184.108.40.206/20 220.127.116.11/21
# Pinterest 18.104.22.168/24 22.214.171.124/24
deny from 126.96.36.199/23 188.8.131.52/22 184.108.40.206/15 220.127.116.11/14 18.104.22.168/12
My friend was surprised when I told him that I banned all IP ranges of Amazon Web Services (AWS) from my site. It is particularly ironic considering that we both had recently attended an AWS Cloud Computing IoT presentation, which was well done and interesting to both of us.
AWS accounts for a huge chunk of the world’s cloud computing platform, and my decision to ban all IP ranges did not come lightly. I just simply could not keep up with all the comment spammers and scrapers coming out of AWS. It seems like I am not alone. This has been by experience as well. There are others.
Puzzling, it is at times, that my htaccess does not always behave as intended. As a computer scientist I expect that my programs and file input should output consistent, stable and reliable results immediately. This is not the case with my htaccess file, hosted on Site5, my internet service provider.
Delays in htaccess Implementation
When I do certain changes to my htaccess, there may be delays of a day or two. This is very odd to me, because supposedly the htaccess is checked for every server request. Maybe there are some caching that I do not know about. Nevertheless it seems like the htaccess has a unique personality. I know that I should not anthropomorphize a computer, much less a security file such as htaccess on an Apache server, but it is difficult to not.
tanyadokterkeluarga.blogspot is a persistent referrer spammer. They use a huge amount of Ip addresses that do not repeat the third octet. It has similar strategies to kosmetik-freaks.blogspot, in fact sharing identical IP ranges. They are sister referrer spammers. Both are not banned by the HTTP_REFERER in htaccess. If you kill one you kill the other, a nice double prize. As with the sister, this spammer runs out of Indonesia.
These are the referrers:
This is a preview of
tanyadokterkeluarga.blogspot Referrer Spam: Research, Ban
. Read the full post (499 words, 0 images, estimated 2:0 mins reading time)
There are some scrapers and there are others that are ridiculous. I just got scraped hard by 22.214.171.124, 209-133-216-182.static.hvvc.us, with 105 server entries and 7 unique user agent names. Excessive, to say the least.
Here are the UA’s used:
Mozilla/5.0 (BlackBerry; U; BlackBerry 9900; en) AppleWebKit/534.11+ (KHTML, like Gecko) Version/126.96.36.1996 Mobile Safari/534.11+
Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20160721-2308 +http://www.exif-search.com)
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201
Opera/12.02 (Android 4.1; Linux; Opera Mobi/ADR-1111101157; U; en-US) Presto/2.9.201 Version/12.02
188.8.131.52 – 184.108.40.206 220.127.116.11/19
NOC4Hosts, HIvelocity Network
I have sent an email to their ISP, email@example.com.
Permanent link to this post
(102 words, 0 images, estimated 24 secs reading time)
This kosmetik-freaks.blogspot is a referrer spammer that has been harassing me for quite a long time. I have tried to ban them with an HTTP_REFERER ban but this does not work. My ISP, Site5, will not help me. They are predominantly out of Indonesia. They are pret18.104.22.168
too sophisticated to evade my detection for so long.
The sister referrer spammer is tanyadokterkeluarga.blogspot, which uses the identical method and largely shares the same IP ranges. When you kill one you kill the other. Almost all these UAs are mobile devices, leading me to believe these are mobile customers that have downloaded the same spam app.
This is a preview of
kosmetik-freaks.blogspot Referrer Spam: Research, Ban
. Read the full post (1077 words, 0 images, estimated 4:18 mins reading time)
kwpublisher.com is a long-time referrer spammer that I would like to remove. I have tried to ban them with an HTTP_REFERER ban but this does not work. My ISP, Site5, will not help me. This guy seems to have a similar method to kosmetik-freaks.blogspot. They seem to be out of Pakistan mostly, but have gone to Indonesia and China. I am now tracking them closely.
Conclusion: Tracked down the code hotlinking to my site. Complained to their domain names provider. Them they disappeared. Goodbye.
22.214.171.124 x 4 126.96.36.199 – 188.8.131.52 Pakistan Tel
Does your raw access log display a host name of “0”, or zero? Very odd, is it not? I have been struggling with this for a couple of months, and my ISP Site5 had no answers. It turns out that one of my spammers, NFORCE_ENTERTAINMENT, puts an unprintable character into their host table, so that when my ISP looks them up, they display the unprintable character in my log as “0”.
Trying to control your site’s spam can be challenging. If you try to ban an IP that is simply 0, or a host name of “0” you will fail, because there is no zero in their host name, but an unprintable character. Ban these guys instead.
This is a preview of
Host Name 0 Zero or localhost in your Raw Access Log
. Read the full post (900 words, 0 images, estimated 3:36 mins reading time)