Content Scraper: Research, Ban

You never know what you will find in your travels. was content scraping me, so I decided to target it. It is part of the large Semalt botnet that started with and free-video-tool.comand then continued with fix-website-errors, with a sprinkling of buttons-for-websites thrown in.

Its host name is unique in that it is numerically very long. I could see remnants of a decimal IP address, but there was something odd.

Their pattern is not as predictable as required by a computer but that is precisely the point: They want to fool anti-bot software, but allow their admin staff to figure it out. If staff have a couple of errors it is no problem.

Friend DI pointed me to an IBM web page on Entity Resolution, specifically recognition. This is a machine recognition problem. I will never know if the Colombians purposely used this system, if they are just sloppy, or if the person creating the host names has an arts background!

My observation is a great example: 181500198200. Is the IP but what about the last 2 zeros? but what about the last 200? seems to be the best answer, but the third octet has a leading zero. This would throw off a machine. on only these first 2 octets did they add a leading zero to the 3 digit third octet. Odd for computers and computer people, and this is the point.

As I have 3 other examples of the third octet, triple digit number with a leading zero, this must be a strategy.

As this pattern is ambiguous, I can see many problems when managing their server farm.

Observed: seems to resolve to

Research: – 181.48/13 Telmex Colombia * * * – 181.56/13

Leave a Reply

Your email address will not be published. Required fields are marked *