[Spacewalk-list] Spacewalk Proxy misbehving with metadata....

Miroslav Suchý msuchy at redhat.com
Fri May 21 10:46:56 UTC 2010


On 05/21/2010 11:23 AM, James Hogarth wrote:
> 2010/5/21 James Hogarth<james.hogarth at gmail.com>:
>> 2010/5/21 Miroslav Suchý<msuchy at redhat.com>:
>>> On 05/20/2010 03:46 PM, James Hogarth wrote:
>>>>
>>>>   up until I get in the office tomorrow and can pull
>>>> some logs to get a better picture but putting it out there now in case
>>>> anyone sees similar....
>>>>
>>>> Since upgrading to spacewalk 1.0 it has become a much more common
>>>> occurrence to find clients on the far side of the spacewalk-proxy to
>>>> report checksum didn't match metadata or some such error on yum
>>>> installs/updates.
>>>
>>> Just guess... does this happen for clients which use SHA256 for their rpm?
>>> I.e Fedora 12 and 13?
>>>
>>> --
>>> Miroslav Suchy
>>> Red Hat Satellite Engineering
>>>
>>
>> Hi Miroslav - back in the office today so digging through log files...
>>
>> All our systems are Centos 5 64bit across the board.
>>
>> Looking through the squid access.log I think what happened was that
>> the repomd.xml file had a TCP_REFRESH_MISS and thus got a new version.
>> Meanwhile the primary.xml.gz got a TCP_MEM_HIT but was of course
>> slightly older and thus out of sync with the repomd.xml (and also
>> wouldn't have had our new package).
>>
>> Been a while since I last dug through the squid documentation but can
>> we perhaps either reduce the caching time on any repo metadaa related
>> information (ie anything on a /repodata path) or provide a way via CLI
>> (so it can be scripted) to invalidate that path before a yum
>> makecache?
>>
>> Rather than messing around with my config file figured it'd be best to
>> discuss it here for others implementing the proxy to see and to get
>> any changes upstream with you guys if it is worthwhile to do so...
>>
>> James
>>
>
> Further to the above.....
>
> The squid.conf installed by the spacewalk proxy has the following directives...
>
> refresh_pattern  \.rpm$  10080 100% 525960 override-expire
> override-lastmod ignore-reload reload-into-ims
> refresh_pattern         .               0       100%    525960
>
> What arethe thoughts on a line of...
>
> refresh_pattern \/repodata\/.*$ 0 100% 10
>
> My understanding of the docs is that line should give a max 10 minute
> time limit on the repo metadata being considered fresh.... or perhaps


10 minutes seem to little for me. The repodata usually have several MB, 
so I think that Spacewalk Proxy do it better then just cache it for 
several minutes.

> proxy behaviour is unlikely to cause refresh problems... But the repo
> metadata is pretty dynamic and important to have in line with the
> package information... but not that large for a basic refresh. During
> a refresh of data won't squid check for a 304 not modified from the
> host before requesting new data anyway?

I just dig up what is fresh and stale state. I find good information here:
http://www.david-guerrero.com/papers/squid/squid.htm

 From the article:
A document in a cache server can have three different states: FRESH, 
NORMAL and STALE. When an object is FRESH, it is served normally when a 
request for it arrives without cheking the source to see if the object 
has been modified since its last retrieval. If it’s in NORMAL state, an 
If-Modified-Since GET request is sent to the source, so the cache server 
only downloads the object from the source if it has changed since its 
last retrieval. A STALE document is no longer valid, and it’s retrieved 
from the source again.

Normally, when a web server sends a document, it adds an HTTP header 
called Last-Modified containing the date the object created or last 
modified. This data is used by cache servers to heuristically calculate 
how much time may pass for the object to still be considered FRESH. 
Usually, a proportion of the time elapsed between the date the document 
was last modified and the date when the document was received is used. A 
normal proportion is 10%-30% of this time. If this proportion is set to 
20%, a document modified ten days before being checked for changes.

End of verbatim.

So if I understand it correctly, with current setting:
refresh_pattern        .               0       100%    525960
If file is is:
old less then 0 minutes (which never happen) it is FRESH and send 
without validation
old more then 525960 minutes (365d) it is STALE and always retrieved 
from parent.
old more then 0 min and less then 365d and if:
   - file have been old (in meaning response date - last modified) N 
minutes, it will be considered FRESH for N minutes. And server without 
validation in parent
   -  otherwise we check it with 304 on parent if file modified.

I'm not sure if server send for repodata Last-Modified header. I had to 
check it, but I doubt it is send.
If this header is sent we can safely change those 100% to 10% in general 
and even to 1% for /repodata/*
But if it is not we should came with some clever idea, because lowering 
those percent with effectively means downloading those files again.

-- 
Miroslav Suchy
Red Hat Satellite Engineering




More information about the Spacewalk-list mailing list