Ad fraud software in action. Target website, randomized referrers, browser agents and proxy IP addresses. That is enough to spoof anti-bot software.
It is no secret that I battle and ban bad bots on my site. If a bot is not a well known search engine or provides me some type of service then I usually ban it. Sure it can visit my site, but it will receive a blank page. But why do they visit? Who is paying them? Welcome to the world of Online Ad Fraud.
Want you do, to go to a concert, but just after the supposed start time for ticket sales, all the tickets are gone. You, again, have lucked out. Minutes later these tickets are all available on reseller sites for double the price. It really does sound like a scam. While the US just enacted a federal law, here in Ontario we are just starting the investigation phase. I hope that we can adopt something as strong as the US in order to keep an even keel with bot technology and online shopping safe.
It is always warming to see the two Chinas, the PRC and Taiwan, getting along. Today they ganged up and tried to break into my site.
60.217.64.210 s China Unicom Shandong, level 10 risk, malware Spam Zero-Day
60.248.0.230 s Hinet Chunghwa Tel Taiwan, known for bots and infected zombie computers
183.167.228.134 s Chinanet Anhui, level 10 risk, malware Spam Zero-Day
218.21.43.238 s Dou shi-BAR Yin chuan Ningxia, level 10 risk, malware Spam Zero-Day
The last one, from Ningxia, looks surprisingly small as compared to the usually huge number of IP addresses for Chinanet or China Unicom, but they are part of Chinanet Ningxia, which is large.
Lose, humans will, in a competition with a bot. Smarter people in the stock market know this and acknowledge that bots play a major role in their online trading system. This has yet to occur in the online ticket sales area. The CBC documented an ex-bot operator on how bots rig the system. I do agree, and something must be done about it. Tragically Hip concert tickets selling out to bots before humans means that fans will pay a huge premium for tickets, and none of this premium will go to the artists. This is simply not right.
These host names try hard to evade detection of their IP addresses, in order to scrape content and sometimes break into from web sites. They have specifically scraped mine and so I hunted them down and banished them. Often times the unix host command returns nothing, so research is required. This usually works.
These user agents, or bots, somehow fool and subvert my .htaccess user agent rules and continue to scrape my site. I’ve looked at my htaccess user agent rule many times and don’t know why. The next step is to ban their IP.
AhrefsBot is a large content scraper that hits my site hard, reads robots.txt but ignores it, fools my htaccess, bot is “Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)”
OVH 51.254.0.0 – 51.255.255.255
51.255.65.0/24
51.255.66.0/24
OVH 151.80.16.0 – 151.80.31.255
151.80.31.0/24
OVH 164.132.0.0 – 164.132.255.255
164.132.161.0/24
Busy I have been recently, with not much time for my blog, but it was all for a good cause. My internet service provider (ISP) informed me that I was taking up too much CPU time on their shared service and banned me. I am a good guy and generally follow the rules, so getting banned is out of character. After a frantic email they restored my account so that I could figure out what happened. I truly am a “less is more” type of guy, and that includes IT resources, and my online sites are pretty consistent, so a propensity of new content was not the issue. Eventually I took some steps to rein in the numerous bots that were scraping and doing whatever to my site, wasting my CPU usage on my tab, and eventually getting me banned. If your site is suffering the same fate, you may glean some hints and tips for reducing your CPU usage.