wget

Andrew Bacchi bacchi at rpi.edu
Mon Jun 30 12:45:38 UTC 2008


I've already sent you a link that provides explanation and examples.  I 
don't mind pointing someone in the right direction, but I won't sit here 
and solve all your problems for you.  Try searching google.

Joy Methew wrote:
> Bacchi
>
> how i use "robost.txt" plz explain with example.
>
> Daniel......
>
> it`s working for "wget" but still we can download from other utilities
> like..."DownloadStudio"
>
> On 6/27/08, Daniel Carrillo <daniel.carrillo at gmail.com> wrote:
>   
>> 2008/6/27 Joy Methew <ml4joy at gmail.com>:
>>
>>     
>>> hiii all....
>>>
>>> we can download any site from "wget -r " options.
>>> if i want to stop downloading of my site from web server how i can do
>>> this???
>>>       
>> You can configure Apache for refuse connections with UserAgent "wget",
>> but note that wget can use any UserAgent (--user-agent option).
>>
>> SetEnvIfNoCase User-Agent "^wget" blacklist
>> <Location />
>>   ...
>>   your options
>>   ...
>>   Order allow,deny
>>   Allow from all
>>   Deny from env=blacklist
>> </Location>
>>
>> BTW: robots.txt only can stop crawling from "good" crawlers, like
>> google, yahoo, alexa, etc.
>>
>>
>> --
>> redhat-list mailing list
>> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>>     

-- 
veritatas simplex oratio est
        -Seneca

Andrew Bacchi
Systems Programmer
Rensselaer Polytechnic Institute
phone: 518.276.6415  fax: 518.276.2809

http://www.rpi.edu/~bacchi/




More information about the redhat-list mailing list