As an additional file will be created daily, I opted to put these files into a subdirectory. The headers, one per line, are being logged into a headers-yyyymmdd.log file, which seems free form. Different requesters leave different sets of headers.
I received this message on my site which on the surface looked like a human. Though they had grammar errors there was enough there to pass. With further analysis I believe this to be a bot.
hey hai this is ashok , i have lg optimusp768 with rooted, unlocked bootloader and also cwm , but i cant find custom roms any wheere please prepare one custom rom , or atleast one stock rom with more features
The comment was on topic. The English, which had grammar and spelling mistakes, was passable.
A dear friend uses Feedly to monitor my site. He complained that he was getting 403s Banned, and asked why. Well, I have found that Feedly usually only takes my RSS feed, but sometimes, not often, it scrapes me mercilessly. Once I see a bot start scraping, I ban it. I moved him over to the more well behaved Feedburner by Google.
Here are the Feedly user agents:
Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
The latter, FeedlyBot, runs off WZComm, and had previously scraped me, so I banned it. WZComm also runs the surdotly bot, which is also banned. The former, Feedly/1.0, runs off Level 3, and seems well behaved.
22.214.171.124 strider.delmarvagroup.com, from the MCI Communications block, you really need to put some smarts into your bot. What are you thinking?
126.96.36.199 – 188.8.131.52 MCI Communications
I’m not sure why you are doing this, but please stop. I don’t have a contact form at that location.
Four IPs scraped my site in identical ways: Fetch the most recent document, then scrape parts of the rest of the site. The IP changes, and they repeat. They fetch the same identical document, but then scrape different parts of my site but only for images.
I’ll keep my eye on such activity and see if I further pin down something more definite.
UA: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727), which seems to be not unique
We Canadians are always overshadowed by the 10 larger in population US. If at all possible I like to highlight our accomplishments, or in this case, sophisticated comment spamming from Canada. Bad, Canada.
It is always good to see international cooperation amongst different nations in this great world. However, when China, India and Russia cooperate to try to break into my site, forgive me when I get a little upset. While I usually file complaints to internet host providers, in this case the complaint would fall on deaf ears: hosts in China, India and Russia ignore abuse emails. Then most hosts from all over the world ignore abuse emails.
Number of login attempts: 417
All the user agent names are the same: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0
Today I received a massive 1,000 line scraper attack from spbot, from OpenLinkProfiler.org. The ip address is 184.108.40.206, a Digital Ocean IP, which I have banned. I’ve also added spbot to by robots.txt. Sent a complaint letter to Digital Ocean at firstname.lastname@example.org:
Today I received a 1000 line scrape from one of your IP addresses:
The UA is Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )
Please have them cease their scraping activity as it unnecessarily uses up my bandwidth and CPU time.
I have included today’s log entry with their activity: