Hit I was, by a terribly time wasteful spambot pushing weight loss ads. Yes, my Recaptcha did send them to my spam folder for analysis, but it was still a lot. I just wished they would simply stop. All the comment spammers were pushing weight loss. I’m sure they are telling me something about my slightly widening girth, but I am already making amends. There is no need for added pressure, nor waste of bandwidth and technology.
Fool, it would, an automated anti-bot system, because humans are more intelligent than bots. They are innovative, in their evil genius way. Computer security is all about the arms race. The better the methods, the better the counter measures, and then it repeats. No security measure is foolproof for very long.
IPVNow.com has a slew of host names that when you look them up, resolve successfully and all point to the same IP address, 126.96.36.199. This misdirection is what would fool the anti-bot software, because this IP is real and it points to a valid company, Trellian, which owns IPVNow.com. But banning this single IP does not stop the content scraping. Each host name has its own IP address that uses ISPs Ubiquity and Nobis. These are the IPs you need to ban.
This host name is constantly scraping my site, but when I look it up it does not resolve. Searches on Google reveal that they seem to change their IP address very often. Many other sites are getting spammed and content scraped by this host. I have no alternative than to ban the whole IP range of customer.worldstream.nl.
I read my raw access log and the first column provides me with an IP address or host name. This first column is usually enough to target the specific IP that is errant, and I ban the last IP octet of 256 addresses.
These host names try hard to evade detection of their IP addresses, in order to scrape content and sometimes break into from web sites. They have specifically scraped mine and so I hunted them down and banished them. Often times the unix host command returns nothing, so research is required. This usually works.
0x667.crypt.gy came back with a host lookup of 188.8.131.52, OVH. I cannot verify this IP address. Research is inconclusive. This guy uses a Microsoft server error code “1639 (0x667). Invalid command line argument” in his hostname.
These user agents, or bots, somehow fool and subvert my .htaccess user agent rules and continue to scrape my site. I’ve looked at my htaccess user agent rule many times and don’t know why. The next step is to ban their IP.
AhrefsBot is a large content scraper that hits my site hard, reads robots.txt but ignores it, fools my htaccess, bot is “Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)”
OVH 184.108.40.206 – 220.127.116.11
OVH 18.104.22.168 – 22.214.171.124
OVH 126.96.36.199 – 188.8.131.52
Spammers are never welcome, clutter up your comments and are a pain in the arse. They blow through your bandwidth, which then gets your ISP on your tail asking you to upgrade your account type. This costs you money. Here are some dual IP strategies that I found in analyzing my WordPress site’s comments.
Pain in the butt, no doubt, is this spammer. He’s been spamming my blog for the last 6 months and whatever I did in my ban manager, it would not ban. I got mad enough to track him down, figure out how he does it, and hopefully ban him. Take a look at the audit trail he left me in my WordPress Akismet anti-spam filter. I am very thankful that Akismet stopped him from wrecking my blog, and I’ll be more careful and vigilant from now on.