Looking for some apache config help to block evil spiders
Sharpe, Sam J
sam.sharpe+lists.redhat at gmail.com
Sat Oct 10 20:39:33 UTC 2009
2009/10/10 Steven W. Orr <steveo at syslang.net>:
> On 10/10/09 14:37, quoth Steven W. Orr:
>> RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [OR]
>> RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR]
>> RewriteCond %{HTTP_USER_AGENT} ^NaverBot.* [OR]
>> RewriteCond %{HTTP_USER_AGENT} ^Sogou-Test-Spider.*
>> RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0.*
>> RewriteCond %{HTTP_USER_AGENT} ^T-Mobile Dash.*
>> RewriteRule .* - [F,L]
>> The goal is to see the spiders bouncing off.
> On 10/10/09 14:55, quoth Sharpe, Sam J:
>> Are you actually missing the [OR] at the end of the 4th and 5th
>> RewriteCond lines, or is that a mispaste...
>
> Yes, thanks, I missed that, but that isn't the problem. The problem is that I
> want to be able to see what gets rejected in the log files.
Your rule didn't match anything, because there are mutually exclusive
options ANDed - that was my point.
You can't have a user_agent that starts with Mozilla AND Sogou - it
has to be one or the other, so you would have never seen anything in
the logs.
Without access to ALL your rewrite rules, I can't tell you whether
lines such as:
>> 72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
>> [vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass through /d1/fn
are hits on the match set you posted above, or hits on another rewrite
rule, but I don't see any evidence that it's the Spider matching rule
that is generating those lines either.
You might also try upping RewriteLogLevel to something higher than 1
to see more detail...
--
Sam
More information about the fedora-list
mailing list