Googlebot Errors caused by Errant htaccess rule

Difficult it was, this afternoon, after 7 days of Googlebot crawl error 500s, but I am learning. One htaccess regex line error was the cause. Hopefully it will go away.

I tried to compress some HTTP_USER_AGENT mod rewrite rules in my htaccess, into a single line, in order to shorten my htaccess, from:

RewriteCond %{HTTP_USER_AGENT} ^.*Go\ 1\.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go\!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go\-Ahead\-Got\-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go\-http [NC,OR]

to:
RewriteCond %{HTTP_USER_AGENT} ^.*Go[\ 1\.1|\!Zilla|\-Ahead\-Got\-It|\-http] [NC,OR]

Little did I know that “Google” or even “Goo” would evaluate to true in this regex. I used a regex tester. I still cannot understand this. “Go” is not evaluated to true but these do:

Goa
God
Goe
Goh
Goi
Gol
Goo
Gop
Got
Go-
Go!
“Go ”

It seems like any character in the [ ] appended with Go will evaluate to true. The more I play with the regex the more confused I get. You think something works one way and then you are proven wrong. I have changed my htaccess rule back to long form, and the Google fetch tool confirmed it now fetches without error.

Ok, these square brackets are called a character set, and will match any character in the set. I should have used curved brackets, which now work.

Damn that was difficult.

Leave a Reply

Your email address will not be published. Required fields are marked *