proposal: best practices for cron jobs in RPM packages

Florin Andrei florin at andrei.myip.org
Thu Jun 10 19:32:54 UTC 2004


(sorry for the cross-post, but the potential audience for the proposal
is quite large)

(i'll start with a particular case, then develop the issue towards more
general aspects)

I'm looking at the clamav-db-0.72-1.1.fc2.dag package. It's pretty
cleanly built (which is a feature of that entire repository, kudos to
Dag Wieers!), however, my pet peeve with it is the freshclam cron job.

By default, the package relies on the /etc/cron.daily/freshclam script
to perform the update. That means the job is run whenever the cron.daily
job is set to.
FC2 by default sets cron.daily to run at 04:02 AM every day.

Due to timezone dispersion, the clamav-db users of Dag's repository will
run that cron job pretty much around the clock, but still all of them
will run it at XX:02 - two minutes after :00 each hour. Yet still
probably most of them are in timezone bunches: Northern America (3
timezones), Europe (about 2), parts of Asia (2...3 timezones) so the
timezone dispersion is perhaps not as even as it seems.
This will create artificial spikes on the ClamAV servers, making it more
difficult for the ClamAV team to continue to provide the free service to
the community.
The problem is yet more alleviated somehow by the fact that
database.clamav.net is a distributed database. Still the load spike
problem still remains (it's just not as big).
The solution to it is to spread the load created by all those
.dag.i386.rpm users even further.

However, changing the schedule of a job in /etc/cron.daily cannot be
done by the sysadmin without affecting all other jobs in that directory.

I think i have an improvement.

Why use a file in /etc/cron.daily which has all those disadvantages
(creates load spikes worldwide, it's hard to change) and not just stick
a file in /etc/cron.d ? After all, that's what the FC2 MRTG package
does, and it does it successfully.

It's much easier to change: each file in /etc/cron.d can be adjusted to
its own schedule.
It's a lot more fine-grained: a sysadmin could configure it to run every
hour, every half an hour, every 5 minutes on Fridays 13th or other weird
combinations.
Better yet, the %postinstall RPM macro could go ahead and fill up upon
install a random 0...59 value for the minutes field and a random 0...23
value for the hour field, this way making things a lot easier for the
ClamAV database servers.

In fact, /etc/cron.d seems to be the ideal spot to stick cron jobs for
RPM packages. There are very few applications that need cron jobs to be
run exactly once a day/week/month in every conceivable situation, and
most of those are actually within tightly interlocked sequences such as:
run logwatch/webalizer first, then anacron, then finally logrotate.

So, what i'm saying is, perhaps the authors of custom RPM packages (and
even the non-custom ones, the ones included with the distribution)
should use more aggresively /etc/cron.d to run the cron jobs for their
packages.
You get more flexibility, ease of administration, more intelligent
behaviour of the system overall (see the automatic randomization of the
schedule after installing the package).
/etc/cron.daily confines the packages to quite tight limits; sometimes
that's good (see above mention about cron sequences) but i believe it's
usually not optimal.

Sure, the sysadmin can change everything in a package, including the
ways cron jobs are set. But, as i painfully noticed after upgrading my
home server to FC2, the more you deviate from the trodden path, the more
work you end up doing.

Note: I believe (but i'm not sure) that crond must be notified after
adding/removing a file to/from /etc/cron.d - if that's true, then any
package adding/removing a script to/from /etc/cron.d should do a
"service cron reload" in the %postinstall and %postun macros. That's not
hard and it's actually an elegant solution.

-- 
Florin Andrei

http://florin.myip.org/






More information about the fedora-devel-list mailing list