These user agents, or bots, somehow fool and subvert my .htaccess user agent rules and continue to scrape my site. I’ve looked at my htaccess user agent rule many times and don’t know why. The next step is to ban their IP.
AhrefsBot is a large content scraper that hits my site hard, reads robots.txt but ignores it, fools my htaccess, bot is “Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)”
OVH 18.104.22.168 – 22.214.171.124
OVH 126.96.36.199 – 188.8.131.52
OVH 184.108.40.206 – 220.127.116.11
The web is said to be about free access, and I certainly agree. When China’s Great Firewall entered a more rigorous phase, and Google decided to leave China, some said that free access to information on the internet was a basic human right, I disagreed. Still, here in Toronto, Canada I do appreciate open internet access. There are limits, however, when certain people take advantage of your hospitality. People try to scrape your site to use for their purposes, they try to break in and use your site to launch their own malicious doings, they try to spam you so that your site’s comments increase their link and trackback stats. There are all kinds of schemes that cost the site owner bandwidth, and eventually money. The site owner is forced to increase his level of service from his ISP (or get kicked off of his shared service), or move to another ISP. This is not a zero sum issue: The site owner loses financially.