City of Toronto Internet Scraper Bot

City of Toronto internet scraper bot scrapes my site a couple of times per month. Why? Toronto, Canada

City of Toronto internet scraper bot scrapes my site a couple of times per month. Why? Toronto, Canada

I live in the City of Toronto, and write about Toronto-related subjects. What is surprising is that the City of Toronto has an internet bot that randomly scrapes content from my site a couple of times each month. The bot started scraping me near the end of January 2017.

What is interesting was that I, concerned citizen, actually emailed them because I thought they had a Zombie PC taken over by a bot, or some other security issue. I sent the City a log of the relevant entries related to their IP address. Was I naive. Here is their reply (isg@toronto.ca):

According to our analysis of internet traffic, these requests are from corporate internet proxies that are trying to prefetch (similar to googlebot) the WordPress pages for fast retrieval.
If you wish to disable/disallow WordPress embeds you can use plugins or remove references in HTML head.

Regards,
Internet Security Support
City of Toronto

Ok, the City of Toronto wants to prefetch my content, for what purpose? I am a citizen and ratepayer of the City of Toronto, so what is up with the prefetch?

IP Range: 204.187.67.0/24 206.130.160.0 – 206.130.174.255
IPs: 206.130.173.25 206.130.173.28 206.130.173.29 206.130.174.27
Bot name: anonymous
Host Name: none
Net Name: TORNET-03

User Agents

  • Mozilla/4.0 (compatible;)
  • Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
  • Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko
  • Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
  • Mozilla/5.0 (Windows NT 6.1; rv:44.0) Gecko/20100101 Firefox/44.0
  • Mozilla/5.0 (Windows NT 6.1; rv:51.0) Gecko/20100101 Firefox/51.0
  • Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
  • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36
  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586
  • Mozilla/5.0 (Windows NT 6.1; rv:43.0) Gecko/20100101 Firefox/43.0

As far as bots go, this is anonymous, written to fly below the radar and be unnoticed. The user agent names change and are pretty non-descript, and do not identify the City of Toronto. The bot scrapes everything, including content, images, javascript, plugins, everything. It is your typical, anonymous, internet scraper bot.

I will continue to monitor my log and track their odd behaviour.

City of Toronto, why do you operate a scraper bot?

Addendum 2017-apr-21: They turned off their host name, so a lookup returns “not found”. Changed their net name to tornet-02. They are trying to evade detection.

Leave a Reply

Your email address will not be published. Required fields are marked *