Tag Archives: robots

Strange Host Names that I Cracked

These host names try hard to evade detection of their IP addresses, in order to scrape content and sometimes break into from web sites. They have specifically scraped mine and so I hunted them down and banished them. Often times the unix host command returns nothing, so research is required. This usually works.

User Agents I Could not Ban with htaccess

These user agents, or bots, somehow fool and subvert my .htaccess user agent rules and continue to scrape my site. I’ve looked at my htaccess user agent rule many times and don’t know why. The next step is to ban their IP.

AhrefsBot is a large content scraper that hits my site hard, reads robots.txt but ignores it, fools my htaccess, bot is “Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)”
OVH 51.254.0.0 – 51.255.255.255
51.255.65.0/24
51.255.66.0/24
OVH 151.80.16.0 – 151.80.31.255
151.80.31.0/24
OVH 164.132.0.0 – 164.132.255.255
164.132.161.0/24

Reducing your Bandwidth for WordPress and Drupal

Busy I have been recently, with not much time for my blog, but it was all for a good cause. My internet service provider (ISP) informed me that I was taking up too much CPU time on their shared service and banned me. I am a good guy and generally follow the rules, so getting banned is out of character. After a frantic email they restored my account so that I could figure out what happened. I truly am a “less is more” type of guy, and that includes IT resources, and my online sites are pretty consistent, so a propensity of new content was not the issue. Eventually I took some steps to rein in the numerous bots that were scraping and doing whatever to my site, wasting my CPU usage on my tab, and eventually getting me banned. If your site is suffering the same fate, you may glean some hints and tips for reducing your CPU usage.