Wget question...

Gordon Messmer yinyang at eburg.com
Thu Jun 29 23:57:35 UTC 2006


bruce wrote:
> 
> i know wget isn't as robust as nutch, but can someone tell me if wget keeps
> a track of the URLs that it's bben through so it doesn't repeat/get stuck in
> a never ending processs...

I don't know about the implementation details, but if I create two pages 
that link to each other, and tell wget to download them recursively, it 
does not loop.  Maybe it does so if there are references that can't be 
detected by examining the "stack" leading back to the first page.

You may want to look at the section of the man page detailing the "-nc" 
option.  I use the options "-r -nc" when downloading a complex set of pages.




More information about the fedora-list mailing list