[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Why web client accesses lower-case URLs?



On Thu, Jul 16, 2009 at 03:37:28PM -0700, Yong Huang wrote:
>>
>> I have a bunch of static html and txt files using mixed case, say,
>> SomeFile.html, linked to from a main page.  Apache access log often shows
>> that some clients, which could be from anywhere in the world, try to access
>> the file somefile.html and of course get 404 return code.  Yesterday one
>> single client tried to access quite a number of these in all lower-case (and
>> failed).  All successfully retrieved pages happened to be files that are
>> indeed all lower-case.  Since I can't find the real user from the client IP,
>> is there anything I can do on my end to solve the problem, short of renaming
>> all files to all lower-case or creating symbolic links (or perhaps moving the
>> files to a Microsoft IIS server)? What kind of web browser is the client
>> possibly using?
> 
> The non-domain name part of a URL is explicitly case sensitive, and *no*
> popular browser automatically folds case -- there are far too many mixed case
> URLs on the internet.
> 
> So the real question is how the client got those urls in the first place.
> Three possibilities spring to mind -- 1) a list was published somewhere, and
> that list erroneously used all lowercase; 2) it's not a browser at all, but
> instead is some ad hoc poorly written spider; 3) something happened to the
> pages in question to turn all the embedded links into lower case.
> 
> You could change all your filenames to lowercase, but then someone might
> come along with all upper case names.  You could set up rewrite
> rules in apache to convert to lower case, but that's really a pain to
> maintain.
> 
> Personally, unless you have some really unusual situation, I think you should
> ignore it.
> 
> Kent

Thanks. I think the most likely is 2), a poorly written spider. I'll ignore it for now. I only made one symbolic link for one file. Somewhere else I have a text article that has the URL. If people read it offline and then decide to go to the URL by manually typing it, they could be lazy to type somefile.html instead of SomeFile.html.

Yong Huang


      



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]