Odd htaccess Observations with ISP Site5

Puzzling, it is at times, that my htaccess does not always behave as intended. As a computer scientist I expect that my programs and file input should output consistent, stable and reliable results immediately. This is not the case with my htaccess file, hosted on Site5, my internet service provider.

Delays in htaccess Implementation

When I do certain changes to my htaccess, there may be delays of a day or two. This is very odd to me, because supposedly the htaccess is checked for every server request. Maybe there are some caching that I do not know about. Nevertheless it seems like the htaccess has a unique personality. I know that I should not anthropomorphize a computer, much less a security file such as htaccess on an Apache server, but it is difficult to not.

Immediate Results:

  • Any IP bans result in immediate action, as expected.
  • RewriteCond %{HTTP_USER_AGENT} or RewriteCond %{HTTP_REFERER}, if they work, will work right away

Delay of 1-2 days:
Any RewriteCond statements not related to HTTP_USER_AGENT or HTTP_REFERER seem to have a delay before they fully kick in.

HTTP_USER_AGENT and HTTP_REFERER bans often do not work
Should they not work all the time? I see UAs and referrers in my log, so why are such bans often ineffective? I have asked Site5 tech staff on many occasions and they seem to not know. They tell me that some requests are able to subvert my htaccess. This seems odd.

Difference between Apache return code 500 and 403
In general, a HTTP_USER_AGENT or HTTP_REFERER ban will result in a 500 (general error), and an IP ban will result in a 403 (forbidden). I do notice exceptions.

216.145.5.42 www.whois.sc

[29/Sep/2016:04:22:11 GET /robots.txt HTTP/1.1 403 635 – Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)
216.145.5.42
[29/Sep/2016:04:22:11 GET / HTTP/1.1 301 231 http://whois.domaintools.com/dontai.com Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)
216.145.5.42
[29/Sep/2016:04:22:11 GET /root/ HTTP/1.1 200 40923 http://whois.domaintools.com/dontai.com Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)

The three lines for Whois’ SurveyBot received three different Apache return codes: 403, 301 and 200. The date and times were identical: 29/Sep/2016:04:22:11. They all used identical user agents (UA). While I used a UA ban for SurveyBot, it received an error 403, or Prohibited. I was expecting an error 500. The next request resulted in a code 301, redirect, followed with the next request receiving a 200 accepted.

Not only did I inadvertently banned Whois, the ban did and then did not work, all within the same second. I am confused.

Requester IP shows up as “localhost” or “0” (zero)
I eventually did figure this out. My htaccess had some unknown error that the Apache server did not like, so it changed from all IPs in col 1 to IP/hostnames. When I eventually fixed it back to what the Apache server liked, of course the “localhost” or “0” entries were gone, replaced by Ip addresses. It turns out that some requesters purposely use “localhost” or an unprintable character, which shows up in logs as “0”. I have sonce found these people out and banned their IPs. They still visit me. Site5 tech staff could not help me track these spammers down.

Permissions for robots.txt
I’m still undecided whether it is Ok to give all read permission for robots.txt. Only the white hat bots read it, and the black hat bots use it to find out where they should not go and then go anyway. I tried not leaving it for all to read, but this did not allow the white hat bots to read it as well, so I let everyone read it, read permission for all. It turns out that those banned by the htaccess cannot read the robots.txt. This does make sense as it does not allow banned bots to read the robots.txt and further scrape folders that I wish uncrawled.

RewriteRule to limit images to specific sites
Four sites, 3 blogspot and KWPublisher.com, were giving me a lot of hotlink and referrer spam. I tried banning as a referrer, without result. I then tried banning with IPs, with mixed but mostly bad results. I put in a rule to stop image linking and 2 days later they were given the meta info but not my image. This greatly cuts down on my bandwidth stealing.
ilpuntoantico.blogspot.*: using my images, hotlinking
KWPublisher.com: referrer spam
kosmetik-freaks.blogspot.*: referrer spam
tanyadokterkeluarga.blogspot.co.id: referrer spam

Blogspot is run by GoogleUserContent.com, or what Google calls itself “Blogger”, so I tried contacting them. markmonitor.com is their domain name registrar, who does not control content. Google sent me their complaint process, but said that hotlinking is not illegal. That is just wrong. Google, therefore would do nothing for me.

I continue to observe. my friend. htaccess.

Leave a Reply

Your email address will not be published. Required fields are marked *