Why web client accesses lower-case URLs?

Fri Jul 17 06:17:58 UTC 2009

On Thu, Jul 16, 2009 at 03:37:28PM -0700, Yong Huang wrote:
> 
> [Excuse me for a message not really related to Red Hat Linux.]
> 
> I have a bunch of static html and txt files using mixed case, say,
> SomeFile.html, linked to from a main page.  Apache access log often shows
> that some clients, which could be from anywhere in the world, try to access
> the file somefile.html and of course get 404 return code.  Yesterday one
> single client tried to access quite a number of these in all lower-case (and
> failed).  All successfully retrieved pages happened to be files that are
> indeed all lower-case.  Since I can't find the real user from the client IP,
> is there anything I can do on my end to solve the problem, short of renaming
> all files to all lower-case or creating symbolic links (or perhaps moving the
> files to a Microsoft IIS server)? What kind of web browser is the client
> possibly using?

The non-domain name part of a URL is explicitly case sensitive, and *no*
popular browser automatically folds case -- there are far too many mixed case
URLs on the internet. 

So the real question is how the client got those urls in the first place. 
Three possibilities spring to mind -- 1) a list was published somewhere, and
that list erroneously used all lowercase; 2) it's not a browser at all, but
instead is some ad hoc poorly written spider; 3) something happened to the 
pages in question to turn all the embedded links into lower case.

You could change all your filenames to lowercase, but then someone might 
come along with all upper case names.  You could set up rewrite 
rules in apache to convert to lower case, but that's really a pain to 
maintain.

Personally, unless you have some really unusual situation, I think you should
ignore it. 

Kent