Fine Tuning Access to your Web site

The web is said to be about free access, and I certainly agree. When China’s Great Firewall entered a more rigorous phase, and Google decided to leave China, some said that free access to information on the internet was a basic human right, I disagreed. Still, here in Toronto, Canada I do appreciate open internet access. There are limits, however, when certain people take advantage of your hospitality. People try to scrape your site to use for their purposes, they try to break in and use your site to launch their own malicious doings, they try to spam you so that your site’s comments increase their link and trackback stats. There are all kinds of schemes that cost the site owner bandwidth, and eventually money. The site owner is forced to increase his level of service from his ISP (or get kicked off of his shared service), or move to another ISP. This is not a zero sum issue: The site owner loses financially.

So what can a site owner do? There are a couple of things you can do, all with almost no expense other than your time and effort. You will need to be able to use some services from your ISP and edit some files on your account. You will also need to use Google search, which is pretty easy.

The first couple of tools are available on your account from your ISP: AWStats and the IP Ban Manager. AWStats shows you accumulated statistics on your site, including who is eating up your bandwith and how much. Copy their IP address and put this into Google Search along with keywords such as “spam”. If their search comes up with postings that say they are bad people or they are not a legitimate search engine, then go to your IP ban manager and ban them. The IP ban manager simply excludes their IP from accessing your site, through a file called the htaccess. You need not edit this directly, but you could. This single step, over time, will markedly reduce your bandwidth.

The second thing you can do is create a robots.txt file and tell the bots to exclude certain parts of your directory or website. The less you get indexed, the less bandwidth you will use. For example if you have a test web site that you never use, exclude this from indexing so that bots will not waste your time and bandwidth. The robots.txt is pretty easy to create and maintain, but must be done on every major directory level.

Your ISP will also give you access to your Raw Access Log. This file is large, somewhat intimidating, but really useful. Download it and unzip it to a folder of your choice. You will need to rename it *.txt. Open Excel and import it in.

Once you are browsing your raw access log in Excel, sort the file by the first column, the IP address. It is from this file that you can determine a lot about how people are accessing your site. Firstly, if anyone is accessing the files you have excluded in your robots.txt, then ban them. A polite bot should read the robots.txt file and comply. If they do not then they deserve to be banned.

Browse the file in Access. If you see large chunks of your site being accessed by an IP address, like screens of accesses in a short period of time, search Google for this IP address to see if it is a legitimate search engine or simply wasting your resources. All sorts of people and companies both private and public can visit your site. If you look them up and they are a known spammer, then ban them. Google Search will tell you. Reputation is everything on the internet, as in life.

In one of the columns you will see GET or PUT. GET is a request to access and read your content. If you see a lot of PUTs into URLs such as your site’s login screen, you know they are up to no good, so ban them. If you see a wall of PUTs into your site’s comments, you know they are trying to spam you, so ban them.

These steps are actually not difficult to do, but they can take time. The result is that your resources and therefore money are not wasted on those that do you nor the greater Internet community, any favours. Kill off all abusers of your site.

You need not do this every day, just once in a while. You will see results as your bandwidth goes down and your ISP’s happiness level goes up. Of course these people will always be looking for new ways of wasting your bandwidth, or breaking into your site, so keep up your guard and be vigilant.

Leave a Reply

Your email address will not be published. Required fields are marked *