Four IPs scraped my site in identical ways: Fetch the most recent document, then scrape parts of the rest of the site. The IP changes, and they repeat. They fetch the same identical document, but then scrape different parts of my site but only for images.
I’ll keep my eye on such activity and see if I further pin down something more definite.
UA: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727), which seems to be not unique
We Canadians are always overshadowed by the 10 larger in population US. If at all possible I like to highlight our accomplishments, or in this case, sophisticated comment spamming from Canada. Bad, Canada.
Comment spammers on my site usually use a single IP to first read the post, determine if they can submit spam, then submit the spam comment. This shows up in my Akismet spam comments. They are simple to identify and ban.
It is always good to see international cooperation amongst different nations in this great world. However, when China, India and Russia cooperate to try to break into my site, forgive me when I get a little upset. While I usually file complaints to internet host providers, in this case the complaint would fall on deaf ears: hosts in China, India and Russia ignore abuse emails. Then most hosts from all over the world ignore abuse emails.
Number of login attempts: 417
All the user agent names are the same: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0
Today I received a massive 1,000 line scraper attack from spbot, from OpenLinkProfiler.org. The ip address is 138.197.47.148, a Digital Ocean IP, which I have banned. I’ve also added spbot to by robots.txt. Sent a complaint letter to Digital Ocean at abuse@digitalocean.com:
Hi there,
Today I received a 1000 line scrape from one of your IP addresses:
138.197.47.148
The UA is Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )
Please have them cease their scraping activity as it unnecessarily uses up my bandwidth and CPU time.
I have included today’s log entry with their activity:
Domain Crawler hit my server a 500 transaction attack today, using 5 IP addresses, all from Sweden. They scraped me hard! Their user agent is “DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/dontai.com)”. I have banned all these IP addresses with their last octet. Good riddance.
80.248.225.142 Internetbolaget Se domaincrawler
80.248.227.107 Internetbolaget Se domaincrawler
176.74.192.36 Tralex Se domaincrawler
193.183.102.178 Internetbolaget Se domaincrawler
These five lot came on my site with a innocent but fake User Agent name of “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”, scraped some documents, and then proceeded to try to break into my site’s security. Cheeky bastards.
Seven attempts at document scraping, followed by 9 attempted logins. The interesting thing is that when you use a computer to do these campaigns, if you are not clever they really do look like a computer generated attempt and are thus easy to identify. Which user would have this behaviour? Of course they have all been banned.
Get, I do, a lot of referrer spam on my site. I’m pretty sure that every site gets referrer spam, they are ubiquitous. Usually I have already banned them and they are usually from Russia, such as xrus, dealing with lovely, nubile, young Russian women. These I treat like background noise: I glance at the error 403 and move on. Then occasionally, about once a month, I get a bona fide referrer spam marketing campaign, where someone really wants to make a negative impression on both my Google Analytics and myself. I then find and ban them.
Big Weed told me to not ban Pinterest. While I am not a huge Pinterest fan, she is/was so I listen to her. The problem is that Pinterest is hosted on Amazon Web Services (AWS), a cloud host provider infamous for hosting bad bots. Here are the IP ranges to ban AWS but keep Pinterest coming back.
My friend was surprised when I told him that I banned all IP ranges of Amazon Web Services (AWS) from my site. It is particularly ironic considering that we both had recently attended an AWS Cloud Computing IoT presentation, which was well done and interesting to both of us.
AWS accounts for a huge chunk of the world’s cloud computing platform, and my decision to ban all IP ranges did not come lightly. I just simply could not keep up with all the comment spammers and scrapers coming out of AWS. It seems like I am not alone. This has been by experience as well. There are others.
tanyadokterkeluarga.blogspot is a persistent referrer spammer. They use a huge amount of Ip addresses that do not repeat the third octet. It has similar strategies to kosmetik-freaks.blogspot, in fact sharing identical IP ranges. They are sister referrer spammers. Both are not banned by the HTTP_REFERER in htaccess. If you kill one you kill the other, a nice double prize. As with the sister, this spammer runs out of Indonesia.
These are the referrers:
tanyadokterkeluarga.blogspot.ca
tanyadokterkeluarga.blogspot.co.id
tanyadokterkeluarga.blogspot.com
tanyadokterkeluarga.blogspot.in
tanyadokterkeluarga.blogspot.my
tanyadokterkeluarga.blogspot.sg