Both keywords-monitoring-your-success.com and free-video-tool.com are Semalt tools for content scraping. This botnet is pretty extensive and tiring to kill.
The raw access log entries look seemingly legit, but being referred from the two Semalt tools, they could not be legit users.
These host names and Ip address, masquerading as valid browsers, took up a lot of my bandwidth. This botnet used mainly companies from Brazil such as TELEFÔNICA BRASIL, Vivo, Global Village, Brasil Telecom, Yawl, portalmail but also used a bunch of Italian and US companies as well.
Virtua.com.br continues to content scrape for Semalt. I have a separate research report on them.
This one is difficult. They are elusive. They use partial IP ranges that start randomly, like a disk that needs defragmenting. This masks their use of larger IP ranges. The names James Prado and Private Layer are always involved. What they do is bury the hosted-ny.securefastserver into small IP segments, but the IP ranges before and after are also owned by the same company but are under the Private Layer or James Prado name. Tricky. Just ban the complete range, as it is the same company.
DNS Record:
Fast Serv Inc. d.b.a. QHoster.com
1 Mapp Str.
Belize City, Belize
This content scraper pinspb.ru is a regular on my site and I’d like to ban it. Very mysterious and hard to pin down. Not much on the DNS record. At least they have a web site. They look like an ISP. They have a lot of IP blocks.
My content scraper host name was 98-68.furanet.com. It looks like their pattern or strategy is a reverse order domain name with the first 2 octets missing. Looking at their IP range I would guess 93.93.64.0/21, which covers the 68 of 98-68.furanet.com. From my Google search I’ve added 91.192.108.0/22 which they also commonly use.
Ban these most commonly used IPs:
91.192.108.0/22
93.93.64.0/21
My site has been getting content and image scraped by bb-81-107.018.net.il and bb-153-46.018.net.il, but these two host names do not resolve. Furthermore there is very little on the internet on them. My next step is to ban their complete IP range.
Pattern:
If there are 4 octets in the host name, then reverse the octets. If there are only 2 octets then these are the last 2 of the IP. You will need to use the host command and try the first 2 octets of their common ranges.
454a986e.cst.lightpath.net is a content scraper bot that has been visiting my site, so I would like to remove the welcome mat.
lightpath.net seems to change their front extent many times, as a search on Google did not yield an exact match, but many variants.
Pattern:
Take the numbers before “.cst.lightpath.net” and convert them from hex to decimal, giving you 4 octets.
lightpath.net resolves to 216.2.192.141, Optimum Online or Cablevision Systems, XO Communications (ISP), but they have no website. cablevisionlightpath.org also resolves to the same ip address.
454a986e.cst.lightpath.net Their hex converts to 69.74.152.110, Cablevision Systems.
Fool, it would, an automated anti-bot system, because humans are more intelligent than bots. They are innovative, in their evil genius way. Computer security is all about the arms race. The better the methods, the better the counter measures, and then it repeats. No security measure is foolproof for very long.
IPVNow.com has a slew of host names that when you look them up, resolve successfully and all point to the same IP address, 103.224.182.241. This misdirection is what would fool the anti-bot software, because this IP is real and it points to a valid company, Trellian, which owns IPVNow.com. But banning this single IP does not stop the content scraping. Each host name has its own IP address that uses ISPs Ubiquity and Nobis. These are the IPs you need to ban.
This host name is constantly scraping my site, but when I look it up it does not resolve. Searches on Google reveal that they seem to change their IP address very often. Many other sites are getting spammed and content scraped by this host. I have no alternative than to ban the whole IP range of customer.worldstream.nl.
I read my raw access log and the first column provides me with an IP address or host name. This first column is usually enough to target the specific IP that is errant, and I ban the last IP octet of 256 addresses.
These host names try hard to evade detection of their IP addresses, in order to scrape content and sometimes break into from web sites. They have specifically scraped mine and so I hunted them down and banished them. Often times the unix host command returns nothing, so research is required. This usually works.
0x667.crypt.gy came back with a host lookup of 94.23.147.30, OVH. I cannot verify this IP address. Research is inconclusive. This guy uses a Microsoft server error code “1639 (0x667). Invalid command line argument” in his hostname.
server.crypt.gy 188.165.211.48