Humans are slow and somewhat unpredictable, at least compared to a bot scraping this web site. I actually like that. After all, my posts are meant to be read by humans and not bots. I welcome bots only if they provide a route for humans to my site, such as search engines. For all other bots, known and unknown intent, you will receive a 403 if I can help it.
Recently I have been observing a different WordPress spam technique that uses WP trackbacks. This technique has some interesting characteristics that are unlike other types of spam, so my usual clues as to origin and banning method did not work. Fortunately this technique also has some unique characteristics that can be used to ban them. Fortunately.
When one WP site links to another WP site, the WP sites communicate with each other using a method called trackbacks. The first site sends a trackback request to the second site. The second site posts the trackback as a special comment, which invites the user to click through to the first site. These trackbacks are automated, making it convenient for both sites.
This is a preview of
WordPress Trackback Spam Technique for Content Spamming
. Read the full post (1127 words, 1 image, estimated 4:30 mins reading time)
188.8.131.52 24/Aug/2018:21:04:47 to 24/Aug/2018:21:21:05 You attempted 434 login attempts. I see you. I know when you visited and that you are trying to break into my site. You have been logged and sent packing with 403s. I have 2,425 of your header logs. Do not do this again.
184.108.40.206 – 220.127.116.11
org-name: Webmaster Agency Ltd
person: Dmitry V. Volkov
address: REALTY.RU LTD
address: 1, Kurchatov Sq.
address: 107005, Moscow
Permanent link to this post
(89 words, 0 images, estimated 21 secs reading time)
When someone, such as a person or a bot, the requester, requests a resource from your server, this request, for Apache, is logged in the raw access log. The requester also leaves some information about itself called http request headers. While not standard to log on Apache, with a little bit of php added to the html, this extra information can be logged and examined to help determine if the requester is a bot or human.
As an additional file will be created daily, I opted to put these files into a subdirectory. The headers, one per line, are being logged into a headers-yyyymmdd.log file, which seems free form. Different requesters leave different sets of headers.
I received this message on my site which on the surface looked like a human. Though they had grammar errors there was enough there to pass. With further analysis I believe this to be a bot.
hey hai this is ashok , i have lg optimusp768 with rooted, unlocked bootloader and also cwm , but i cant find custom roms any wheere please prepare one custom rom , or atleast one stock rom with more features
The comment was on topic. The English, which had grammar and spelling mistakes, was passable.
Hacked By An0n 3xPloiTeR And 8B0K3N H34R7 Team Pak Cyber Ghosts [P.C.G], main message screen with running footer 1
This hack suspended the hosting account and the web site as a malware infected account. The hack set up a malware attack for anyone who visited the site, specifically targeting Windows. I am still trying to figure out how they got in, This is a Pakistani-based attack, or so their message says. I’ll try to document as much as I can to help others in the same situation.
This is a preview of
Hacked By An0n 3xPloiTeR, 8B0K3N H34R7, Team Pak Cyber Ghosts: Cyber Hack Forensic Examination
. Read the full post (1149 words, 6 images, estimated 4:36 mins reading time)
A dear friend uses Feedly to monitor my site. He complained that he was getting 403s Banned, and asked why. Well, I have found that Feedly usually only takes my RSS feed, but sometimes, not often, it scrapes me mercilessly. Once I see a bot start scraping, I ban it. I moved him over to the more well behaved Feedburner by Google.
Here are the Feedly user agents:
Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
The latter, FeedlyBot, runs off WZComm, and had previously scraped me, so I banned it. WZComm also runs the surdotly bot, which is also banned. The former, Feedly/1.0, runs off Level 3, and seems well behaved.
This is a preview of
Feedly: Somewhat Schizophrenic so Please Settle Down
. Read the full post (136 words, 0 images, estimated 33 secs reading time)
China is a sovereign country, the same as any other independent and the world must respect this. What is unique about China is their willingness to use any means to exert their influence far beyond Chinese jurisdiction. I see that here in Canada, but there are reports of the same tactics being used in Australia and New Zealand.
Funding education programs that have a pro-Chinese viewpoint
There is great concern here in Canada about their funding tactics. While it is great to encourage the study of Mandarin language, China is using this platform to teach a pro-Chinese viewpoint to very young kids. More than worrisome, this is meddling in the internal affairs of Canada. The Toronto District School Board had signed an agreement with this group, but the decision was reversed.
I can only label this a Russian referrer bot because it uses predominantly Russian referrers, used for referrer spam. In fact I have no evidence of its origin. The list of 46 unique requesting IPs are from around the world, seemingly random. While it is easy to ban these 43, there is no way to find the originator of this bot.
Referrer spam is unique in that the originating IP does not care about returned data. All the IP request wishes to do is insert their referrer info into the request. This request goes back to and therefore affects and pollutes your Google Analytics. The requesting IPs, not wanting any information in return, could be from anywhere and could well be faked.
Ad fraud software in action. Target website, randomized referrers, browser agents and proxy IP addresses. That is enough to spoof anti-bot software.
It is no secret that I battle and ban bad bots on my site. If a bot is not a well known search engine or provides me some type of service then I usually ban it. Sure it can visit my site, but it will receive a blank page. But why do they visit? Who is paying them? Welcome to the world of Online Ad Fraud.