Tag: bot

Bot Strategy: Fetch, Scrape, Change IP, repeat

Four IPs scraped my site in identical ways: Fetch the most recent document, then scrape parts of the rest of the site. The IP changes, and they repeat. They fetch the same identical document, but then scrape different parts of my site but only for images.

I’ll keep my eye on such activity and see if I further pin down something more definite.

UA: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727), which seems to be not unique

Dual IP Comment Spammers with Canadian Content

We Canadians are always overshadowed by the 10 larger in population US. If at all possible I like to highlight our accomplishments, or in this case, sophisticated comment spamming from Canada. Bad, Canada.

Comment spammers on my site usually use a single IP to first read the post, determine if they can submit spam, then submit the spam comment. This shows up in my Akismet spam comments. They are simple to identify and ban.

Attack from spbot OpenLinkProfiler.org

Today I received a massive 1,000 line scraper attack from spbot, from OpenLinkProfiler.org. The ip address is 138.197.47.148, a Digital Ocean IP, which I have banned. I’ve also added spbot to by robots.txt. Sent a complaint letter to Digital Ocean at abuse@digitalocean.com:

Hi there,
Today I received a 1000 line scrape from one of your IP addresses:
138.197.47.148

The UA is Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )

Please have them cease their scraping activity as it unnecessarily uses up my bandwidth and CPU time.

I have included today’s log entry with their activity:

Thanks, Don

DomainCrawler Attack using 5 IP addresses

Domain Crawler hit my server a 500 transaction attack today, using 5 IP addresses, all from Sweden. They scraped me hard! Their user agent is “DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/dontai.com)”. I have banned all these IP addresses with their last octet. Good riddance.

80.248.225.142 Internetbolaget Se domaincrawler
80.248.227.107 Internetbolaget Se domaincrawler
176.74.192.36 Tralex Se domaincrawler
193.183.102.178 Internetbolaget Se domaincrawler

Documenting A Referrer Spam Campaign

Get, I do, a lot of referrer spam on my site. I’m pretty sure that every site gets referrer spam, they are ubiquitous. Usually I have already banned them and they are usually from Russia, such as xrus, dealing with lovely, nubile, young Russian women. These I treat like background noise: I glance at the error 403 and move on. Then occasionally, about once a month, I get a bona fide referrer spam marketing campaign, where someone really wants to make a negative impression on both my Google Analytics and myself. I then find and ban them.

Keeping Pinterest in an Ocean of AWS Bots

Big Weed told me to not ban Pinterest. While I am not a huge Pinterest fan, she is/was so I listen to her. The problem is that Pinterest is hosted on Amazon Web Services (AWS), a cloud host provider infamous for hosting bad bots. Here are the IP ranges to ban AWS but keep Pinterest coming back.

# AWS 52.192.0.0 – 52.223.255.255 52.192.0.0/11
deny from 52.192.0.0/13 52.200.0.0/16 52.201.0.0/17 52.201.128.0/18 52.201.192.0/19 52.201.224.0/20 52.201.240.0/21
# Pinterest 52.201.248.0/24 52.201.249.0/24
deny from 52.201.250.0/23 52.201.252.0/22 52.202.0.0/15 52.204.0.0/14 52.208.0.0/12

Why I Banned Amazon Web Services AWS

My friend was surprised when I told him that I banned all IP ranges of Amazon Web Services (AWS) from my site. It is particularly ironic considering that we both had recently attended an AWS Cloud Computing IoT presentation, which was well done and interesting to both of us.

AWS accounts for a huge chunk of the world’s cloud computing platform, and my decision to ban all IP ranges did not come lightly. I just simply could not keep up with all the comment spammers and scrapers coming out of AWS. It seems like I am not alone. This has been by experience as well. There are others.

tanyadokterkeluarga.blogspot Referrer Spam: Research, Ban

tanyadokterkeluarga.blogspot is a persistent referrer spammer. They use a huge amount of Ip addresses that do not repeat the third octet. It has similar strategies to kosmetik-freaks.blogspot, in fact sharing identical IP ranges. They are sister referrer spammers. Both are not banned by the HTTP_REFERER in htaccess. If you kill one you kill the other, a nice double prize. As with the sister, this spammer runs out of Indonesia.

These are the referrers:
tanyadokterkeluarga.blogspot.ca
tanyadokterkeluarga.blogspot.co.id
tanyadokterkeluarga.blogspot.com
tanyadokterkeluarga.blogspot.in
tanyadokterkeluarga.blogspot.my
tanyadokterkeluarga.blogspot.sg