Tag: content scraper

vnpt.vn Content Scraper: Research, Ban

static.vnpt.vn does not resolve as a host name, and as they scraped me I will track them down. They are pretty tricky. One of their tactics is that they use the host name “localhost”, which looks odd in the access log. Tech staff cannot find the actual IP address.

As I work with these IP ranges it is clear that this content scraper is doing a real detriment to Viet Nam. The use of his IPs would force me to pretty much ban the whole country. As an emerging country this would be very bad for Viet Nam, all for the greed and selfishness of a single bot maker. I know that there are no morals with stealing content, as with thieves, but at this stage of Viet Nam’s development this bot maker could easily damage the country.

dps.gov.co Content Scraper: Research, Ban

lyncdiscover.dps.gov.co has nothing to do with the Government of Columbia, and a good thing, because it is a content scraper bot.

dps.gov.co is the Departamento para la Prosperidad Social, part of the Columbian Government. I am unsure how a content scraper got hold of a Columbian Government extent, legally.

As this is a Government site I have contacted their tech contact, but they do not look too sophisticated. At least I have done my part to try to stop this abuse of the dps.gv.co host name.

Research:
186.170.31.134 186.170.0.0 /15 COLOMBIA TEL
186.170.31.134
186.170.31.134

hdesknet.com.br Content Scraper: Research, Ban

pool.hdesknet.com.br is part of the fix-website-errors.com by Semalt SEO content scraper campaign, huge and very annoying. I wish they would just stop scraping my site. This botnet is huge and does not seem to want to end. It started with keywords-monitoring-success and free-video-tool.com, which then involved Virtua and megared.net.mx. The vast majority of these content scraper bots reside in Brazil and South America, but there are others from Italy and the US.

Thankfully, only one ip range kills this.

Observed:
pool.hdesknet.com.br

Research:
177.67.176.0 177.67.176.0 – 177.67.183.255 177.67.176.0/21 HELP DESK Br
177.67.176.129
177.67.176.131
177.67.177.192
177.67.177.0
177.67.177.228
177.67.178.1
177.67.178.88
177.67.178.158
177.67.178.158
177.67.179.126
177.67.179.167
177.67.179.181

fix-website-errors.com by Semalt: Research, Ban

fix-website-errors.com is a new content scraper campaign from Semalt. It follows from the keywords-monitoring-your-success.com and free-video-tool.com campaign, which I have already banned. That botnet was huge. They involved virtua in Brazil as well. Damn them.

Anyway, they hit your site, you track them down, ban them, rinse and repeat.

bb.sky.com Content Scraper: Research, Ban

bb.sky.com is a regular content scraper on my site, so I have decided to track them down. I finally figured out their hex IP address, so I can target ranges better.

Sky is a very large TV and internet provider in the Uk. They have a huge range of IPs.

Site hits:
5ad4e517.bb.sky.com 90.212.229.12 90.212.0.0 – 90.213.255.255
027e2f4c.bb.sky.com 2.126.47.76 2.126.0.0 – 2.126.255.255
5ad00af4.bb.sky.com 90.208.10.244 90.208.0.0 – 90.209.255.255
b0fb523c.bb.sky.com 176.251.82.60 176.248.0.0 – 176.251.255.255

megared.net.mx: Research, Ban

This is part of the keywords-monitoring-your-success.com, free-video-tool.com Semalt Botnet that spread to other South American hosts, but they have changed the referrer name slightly to keywords-monitoring-success.com. This host is tricky because they only provide the last 2 octets of the IP address, leaving me to guess the first two.

Here is my clue: customer-qro-199-67.megared.net.mx

There are clues to the same pattern used by megared.net.mx, using a variety of new 2 initial octets combined with the last 2 from the host name. While I only have this one IP as a content scraper, their reputation is one of an email spammer. I guess they moved into a newer but related business model.

hosted-by.snel.com Content Scraper: Research, Ban

This bot comes around and scrapes content pretty much every week. It is not rampant but still annoying. I banned it.

Observations:
5.104.224.7 hosted-by.snel.com 2016-oct-12

These are the most common to ban:
78.41.202.116 78.41.200.0 – 78.41.207.255 78.41.200.0/21
128.204.207.19 128.204.207.0/24
77.95.224.121 77.95.224.0 – 77.95.231.255 77.95.224.0/21
77.95.225.0/24
77.95.229.0/24
37.148.160.27 37.148.160.0 – 37.148.167.255 37.148.160.0/21
193.33.61.64 193.33.60.0 – 193.33.61.255 193.33.60.0/23
128.204.203.103 128.204.192.0 – 128.204.207.255 128.204.192.0/20
128.204.207.19 128.204.207.0/24
89.207.130.11 89.207.128.0 – 89.207.135.255 89.207.128.0/21

These are less common:
5.104.224.0/24 5.104.224.0/21
176.124.255.0/24
185.62.56.0/22
195.20.204.0/23
193.34.166.0 – 193.34.167.255 193.34.166.0/23

virtua.com.br Content Scraper: Research, Ban

Persistent this botnet is. It’s like a virus that mutates but does not go away. Or an itch you scratch but does not stop. virtua.com.br has a content scraping bot going at my site that I need to stop. virtua.com.br is part of a large Semalt-led botnet I am trying to remove. They have no website. The host addresses I receive on my access log do not resolve, and there’s nothing specific on Google. I’m just giving this a simple domain ban to see how it goes. They also have a huge number of IP blocks, as they are connected to Akamai in the US.

keywords-monitoring-your-success.com and free-video-tool.com: Semalt Botnet

Both keywords-monitoring-your-success.com and free-video-tool.com are Semalt tools for content scraping. This botnet is pretty extensive and tiring to kill.

The raw access log entries look seemingly legit, but being referred from the two Semalt tools, they could not be legit users.

These host names and Ip address, masquerading as valid browsers, took up a lot of my bandwidth. This botnet used mainly companies from Brazil such as TELEFÔNICA BRASIL, Vivo, Global Village, Brasil Telecom, Yawl, portalmail but also used a bunch of Italian and US companies as well.

Virtua.com.br continues to content scrape for Semalt. I have a separate research report on them.

hosted-ny.securefastserver.com Content Scraper: Research and Ban

This one is difficult. They are elusive. They use partial IP ranges that start randomly, like a disk that needs defragmenting. This masks their use of larger IP ranges. The names James Prado and Private Layer are always involved. What they do is bury the hosted-ny.securefastserver into small IP segments, but the IP ranges before and after are also owned by the same company but are under the Private Layer or James Prado name. Tricky. Just ban the complete range, as it is the same company.

DNS Record:
Fast Serv Inc. d.b.a. QHoster.com
1 Mapp Str.
Belize City, Belize