mirroring websites solution? wget problems

Cameron Simpson cs at zip.com.au
Wed Mar 30 04:40:59 UTC 2005


On 29Mar2005 14:15, Rick Stevens <rstevens at vitalstream.com> wrote:
| I'm assuming the web site owners want you to mirror their sites.  If so,
| why not just set up rsync and be done with it?

Well for me, some mirror sites (planetmirror, aarnet) offer MUCH more
via HTTP than via rsync - for planetmirror it's a side effect of their
paying-members-get-more-bandwidth system (which only works for HTTP,
not RSYNC). For AARNet it seems to be some kind of maintenance or
policy issue. Let me state that I MUCH prefer rsync for mirroring.

At any rate, some things are only available via HTTP.

I have my own problems with wget; not authentication since I'm mirroring
public repositories. My invocation says:

    wget --mirror -D hostname -nH --cut-dirs=ncut -P local-dir http://blah...

where ncut is set to get the right portion of the tree. It seems to leak
out of the site (or, possibly, out of the HTTP subtree) into other data
producing a currupt mirror at my end.

Are people aware of problems with --mirror? Are there options I shouldn't
use with it? Is there a standard recipe for mirroring http://blah/sub/path
into an arbitrary subdir _without_ the leading cruft dirs wget normally
prepends (thus the --cut-dirs)?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

We would've believed it was an accidental shooting if he hadn't changed
magazines ......TWICE   - suicide at will.apana.org.au




More information about the fedora-list mailing list