Why I Banned Amazon Web Services AWS

My friend was surprised when I told him that I banned all IP ranges of Amazon Web Services (AWS) from my site. It is particularly ironic considering that we both had recently attended an AWS Cloud Computing IoT presentation, which was well done and interesting to both of us.

AWS accounts for a huge chunk of the world’s cloud computing platform, and my decision to ban all IP ranges did not come lightly. I just simply could not keep up with all the comment spammers and scrapers coming out of AWS. It seems like I am not alone. This has been by experience as well. There are others.

sjwright:
I run http://whirlpool.net.au and I religiously check the Amazon EC2 forum announcements for new IP ranges to ban.

sjwright:

  • Shitloads of rogue bots doing “social media monitoring”.
  • Shitloads of rogue bots stealing content for black-hat SEO.
  • Shitloads of rogue bots harvesting email addresses.
  • Shitloads of rogue bots submitting spammy replies.

sjwright:

Unlike the assumptions you’re limited to making, I know how much of my AWS traffic is human, and it’s really very very very small. The sad reality is I’m sick and tired of rogue bots, and the tiny sliver of collateral damage can fill out the CAPTCHA validation every so often.

source

This is from StackExchange:

Amazon cloud services are blocked from accessing anything but the API due to a good deal of abuse coming from those services – spammers, scrapers, bots that don’t compress their requests and ask for all of our sites causing a good deal of load on our services.

We may re-visit this at some point in the future, but I doubt the limitation will be lifted.

In March 2016 I filed a complaint to AWS with proof in the form of part of my access log, of a couple of content scrapers from AWS. Here is their reply:

Hello,

Thank you for your report, we appreciate your assistance in helping to identify potentially abusive content on our networks.

We’ve reviewed your report and at this time, the content appears to be no longer active or available. If you have any evidence otherwise, please let us know.

Reported Content: http://www.profound.net/domainappender

Best regards,
Amazon EC2 Abuse Team

From then on I realized that AWS had a business model that catered to these spammers and scrapers and they were unwilling to remove them. AWS is the ground zero of cloud computing black hat bots.

It started slowly. I would continue to slowly ban the last octet of the AWS IP spammer, but this grew quickly to encompassing the third octet and then grew so large that I started banning whole IP ranges. Every day I was overwhelmed at the number and breadth of comment spammers and content scrapers that I had no choice but a wide scale ban.

There are exceptions to my bans. Duck Duck Go, Pinterest and a few others that use AWS are not banned. They have never abused their access to my site. If I see some user agent (UA) that I recognize as legit that would benefit my site I would certainly unban them and let then in. Making exceptions to a large IP block ban is more complex work for me.

I used to let Feedspot take my feed, but then realized that they also started scraping my site, so I banned them. They stopped scraping, so I let them take my feed again, but they returned to their scraping behaviour so the ban is now permanent. The ban was not for me trying, but I don’t control their bad behaviour.

As with most bots today, including white hat bots, they do not uniquely identify themselves, preferring the anonymity of Mozilla such and such. This makes it harder to determine their identity and thus easier to ban. The white hat bots usually have a unique UA, but also masquerade anonymously.

AWS seems to amaze me on a regular basis. I have a very popular page that is regularly referenced by Pinterest, and this is Ok and I welcome people to my site. Like clockwork, for every referral from Pinterest I get three times the number of content scrapers from AWS. I have days where I receive 15 people referred for this single post from Pinterest, followed by 45 separate attempts to scrape my site by AWS bots. Thankfully these AWS bots are banned and receive error 403s. The AWS bots that get through go on by ban list for the next day.

Today is a typical day. I received 1 Pinterest referral and also received 30 AWS referrals for the same post. Thankfully all the AWS requests were already banned. It is not possible for anyone to go back to AWS and have these content scrapers stopped. There are too many to combat and I receive this many every day.

54.236.1.11 [05/Oct/2016:04:33:31 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 200 43559 – Pinterest/0.2 (+http://www.pinterest.com/)

52.90.94.107 [04/Oct/2016:08:43:30 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 635
54.165.24.101 [04/Oct/2016:07:25:05 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.172.246.155 [04/Oct/2016:09:23:04 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.172.246.155 [04/Oct/2016:13:00:52 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.173.168.233 [04/Oct/2016:09:06:44 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.173.168.233 [04/Oct/2016:12:18:26 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.173.168.233 [05/Oct/2016:00:21:08 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.175.100.106 [04/Oct/2016:14:12:22 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.175.100.106 [05/Oct/2016:00:17:34 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.175.100.113 [04/Oct/2016:12:10:58 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.175.84.147 [04/Oct/2016:22:09:39 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.175.91.205 [04/Oct/2016:16:22:05 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.175.91.205 [04/Oct/2016:20:56:43 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.175.91.34 [05/Oct/2016:04:24:41 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 635
54.209.47.180 [05/Oct/2016:01:18:17 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.210.101.142 [04/Oct/2016:21:24:07 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.210.101.142 [05/Oct/2016:00:01:03 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.210.101.148 [04/Oct/2016:10:29:47 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.210.101.148 [04/Oct/2016:18:10:19 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.210.102.118 [04/Oct/2016:16:12:29 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 637
54.210.209.38 [04/Oct/2016:10:48:18 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.210.93.172 [05/Oct/2016:05:04:13 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.210.97.205 [04/Oct/2016:12:52:42 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.210.97.205 [04/Oct/2016:16:18:01 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.210.99.189 [04/Oct/2016:10:11:28 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
54.242.15.11 [04/Oct/2016:15:47:30 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 635
54.86.137.210 [04/Oct/2016:23:55:30 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 636
107.22.44.0 [04/Oct/2016:08:32:33 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 634
107.22.44.0 [04/Oct/2016:09:10:24 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 634
107.22.97.6 [05/Oct/2016:06:45:34 GET /wp/2015/06/15/motorcycle-headlight-configuration-conspicuity/ HTTP/1.1 403 634

I am excited about the idea of cloud computing model and wish it well in the future. We may all go there. But currently if cloud computing is financed by content spammers and scraper bots gone wild, I have no choice but to man the complete AWS platform. I am sure there are reputable companies that might help my site, and I will make exceptions for them. Until AWS changes their company behaviour and ban these black hat bots, the ban will continue.

Leave a Reply

Your email address will not be published. Required fields are marked *