A dear friend uses Feedly to monitor my site. He complained that he was getting 403s Banned, and asked why. Well, I have found that Feedly usually only takes my RSS feed, but sometimes, not often, it scrapes me mercilessly. Once I see a bot start scraping, I ban it. I moved him over to the more well behaved Feedburner by Google.
Here are the Feedly user agents:
Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
The latter, FeedlyBot, runs off WZComm, and had previously scraped me, so I banned it. WZComm also runs the surdotly bot, which is also banned. The former, Feedly/1.0, runs off Level 3, and seems well behaved.
This is a preview of
Feedly: Somewhat Schizophrenic so Please Settle Down
. Read the full post (136 words, 0 images, estimated 33 secs reading time)
Playing, I am, with the Nikto web server scanning package. I scanned my own site, just for fun. While it does take some time, it did finish. I wondered how it would look from my site’s raw access log viewpoint. In summary, Nikto is not stealthy at all. It is also easily detected and banned mid-scan, as it takes a long time to complete.
Essentially you start a Terminal, and type “nikto -h “. There are lots of options, such as output to a log. The Nikto output highlights web site vulnerabilities and cross references these with a database of known hacks. Using this tool you can highlight the site’s weaknesses and then strengthen your site from hackers.
220.127.116.11 strider.delmarvagroup.com, from the MCI Communications block, you really need to put some smarts into your bot. What are you thinking?
18.104.22.168 – 22.214.171.124 MCI Communications
I’m not sure why you are doing this, but please stop. I don’t have a contact form at that location.
This is a preview of
strider.delmarvagroup.com 126.96.36.199 really wants to contact me
. Read the full post (371 words, 0 images, estimated 1:29 mins reading time)
City of Toronto internet scraper bot scrapes my site a couple of times per month. Why? Toronto, Canada
I live in the City of Toronto, and write about Toronto-related subjects. What is surprising is that the City of Toronto has an internet bot that randomly scrapes content from my site a couple of times each month. The bot started scraping me near the end of January 2017.
What is interesting was that I, concerned citizen, actually emailed them because I thought they had a Zombie PC taken over by a bot, or some other security issue. I sent the City a log of the relevant entries related to their IP address. Was I naive. Here is their reply (email@example.com):
Four IPs scraped my site in identical ways: Fetch the most recent document, then scrape parts of the rest of the site. The IP changes, and they repeat. They fetch the same identical document, but then scrape different parts of my site but only for images.
I’ll keep my eye on such activity and see if I further pin down something more definite.
UA: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727), which seems to be not unique
We Canadians are always overshadowed by the 10 larger in population US. If at all possible I like to highlight our accomplishments, or in this case, sophisticated comment spamming from Canada. Bad, Canada.
Comment spammers on my site usually use a single IP to first read the post, determine if they can submit spam, then submit the spam comment. This shows up in my Akismet spam comments. They are simple to identify and ban.
It is always good to see international cooperation amongst different nations in this great world. However, when China, India and Russia cooperate to try to break into my site, forgive me when I get a little upset. While I usually file complaints to internet host providers, in this case the complaint would fall on deaf ears: hosts in China, India and Russia ignore abuse emails. Then most hosts from all over the world ignore abuse emails.
Number of login attempts: 417
All the user agent names are the same: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0
Today I received a massive 1,000 line scraper attack from spbot, from OpenLinkProfiler.org. The ip address is 188.8.131.52, a Digital Ocean IP, which I have banned. I’ve also added spbot to by robots.txt. Sent a complaint letter to Digital Ocean at firstname.lastname@example.org:
Today I received a 1000 line scrape from one of your IP addresses:
The UA is Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )
Please have them cease their scraping activity as it unnecessarily uses up my bandwidth and CPU time.
I have included today’s log entry with their activity:
Domain Crawler hit my server a 500 transaction attack today, using 5 IP addresses, all from Sweden. They scraped me hard! Their user agent is “DomainCrawler/3.0 (email@example.com; http://www.domaincrawler.com/dontai.com)”. I have banned all these IP addresses with their last octet. Good riddance.
184.108.40.206 Internetbolaget Se domaincrawler
220.127.116.11 Internetbolaget Se domaincrawler
18.104.22.168 Tralex Se domaincrawler
22.214.171.124 Internetbolaget Se domaincrawler
Permanent link to this post
(58 words, 0 images, estimated 14 secs reading time)
These five lot came on my site with a innocent but fake User Agent name of “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”, scraped some documents, and then proceeded to try to break into my site’s security. Cheeky bastards.
Seven attempts at document scraping, followed by 9 attempted logins. The interesting thing is that when you use a computer to do these campaigns, if you are not clever they really do look like a computer generated attempt and are thus easy to identify. Which user would have this behaviour? Of course they have all been banned.
This is a preview of
Bad Bot: Cheeky Scraper campaign, then login attempts
. Read the full post (1522 words, 1 image, estimated 6:05 mins reading time)