Tag: scraper

City of Toronto Internet Scraper Bot

City of Toronto internet scraper bot scrapes my site a couple of times per month. Why? Toronto, Canada

I live in the City of Toronto, and write about Toronto-related subjects. What is surprising is that the City of Toronto has an internet bot that randomly scrapes content from my site a couple of times each month. The bot started scraping me near the end of January 2017.

What is interesting was that I, concerned citizen, actually emailed them because I thought they had a Zombie PC taken over by a bot, or some other security issue. I sent the City a log of the relevant entries related to their IP address. Was I naive. Here is their reply (isg@toronto.ca):

This is a preview of City of Toronto Internet Scraper Bot. Read the full post (409 words, 1 image, estimated 1:38 mins reading time)

Attack from spbot OpenLinkProfiler.org

Today I received a massive 1,000 line scraper attack from spbot, from OpenLinkProfiler.org. The ip address is 138.197.47.148, a Digital Ocean IP, which I have banned. I’ve also added spbot to by robots.txt. Sent a complaint letter to Digital Ocean at abuse@digitalocean.com:

Hi there,
Today I received a 1000 line scrape from one of your IP addresses:
138.197.47.148

The UA is Mozilla/5.0 (compatible; spbot/5.0.3; +http://OpenLinkProfiler.org/bot )

Please have them cease their scraping activity as it unnecessarily uses up my bandwidth and CPU time.

I have included today’s log entry with their activity:

Thanks, Don

This is a preview of Attack from spbot OpenLinkProfiler.org. Read the full post (125 words, 0 images, estimated 30 secs reading time)

DomainCrawler Attack using 5 IP addresses

Domain Crawler hit my server a 500 transaction attack today, using 5 IP addresses, all from Sweden. They scraped me hard! Their user agent is “DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/dontai.com)”. I have banned all these IP addresses with their last octet. Good riddance.

80.248.225.142 Internetbolaget Se domaincrawler
80.248.227.107 Internetbolaget Se domaincrawler
176.74.192.36 Tralex Se domaincrawler
193.183.102.178 Internetbolaget Se domaincrawler

Permanent link to this post (58 words, 0 images, estimated 14 secs reading time)

Bad Bot: Cheeky Scraper campaign, then login attempts

These five lot came on my site with a innocent but fake User Agent name of “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”, scraped some documents, and then proceeded to try to break into my site’s security. Cheeky bastards.

Seven attempts at document scraping, followed by 9 attempted logins. The interesting thing is that when you use a computer to do these campaigns, if you are not clever they really do look like a computer generated attempt and are thus easy to identify. Which user would have this behaviour? Of course they have all been banned.

This is a preview of Bad Bot: Cheeky Scraper campaign, then login attempts. Read the full post (1522 words, 1 image, estimated 6:05 mins reading time)

Strange Host Names that I Cracked

These host names try hard to evade detection of their IP addresses, in order to scrape content and sometimes break into from web sites. They have specifically scraped mine and so I hunted them down and banished them. Often times the unix host command returns nothing, so research is required. This usually works.

This is a preview of Strange Host Names that I Cracked. Read the full post (2264 words, 0 images, estimated 9:03 mins reading time)