anaconda performance thoughts [Re: ANNOUNCE: Severn Test 2 Anaconda Updates Image Available]

James Olin Oden joden at malachi.lee.k12.nc.us
Thu Oct 2 13:31:42 UTC 2003


On Wed, 1 Oct 2003, Ingo Molnar wrote:

<snip> 
> it's not the HD that is keeping up things - it's the non-overlap of CDROM
> and HD IO that hurts. We use the CDROM, then we use the HD to install the
> rpm, then we use the CDROM again, etc. - instead of using them in parallel
> and cutting latencies into half.
> 
Not that it would be impossible to do, but any sort of approach that 
interleaves cdrom and hd i/o is going to have to take into account 
switchinig of cdroms.  Also, and correct me if a I am wrong but all the
rpm's are installed as one transaction using librpm.  Basically, a 
transaction of all the rpms that you need is built, and then ran.  
Anaconda passes to librpm a callback such that anaconda can report 
status, and switch cdroms for RPM (sick but its true (-;).  In order to
interleave the cdrom access (which is mainly for reading rpms) I think 
you would somehow have to drill something into rpm to allow for this,
as its ultimately the one reading from the cdrom, and then installing
to the system.  If you followed the pattern of what is done for the
cdrom change, you would have to add another callentry point that would
get called before the rpm is opened for reading, that would pass pass 
filehandle back to rpm (this is what is done to allow for cdrom changing).
This is of course convoluted and unless it was somehow made optional such
that it only occured in the anaconda environment (i.e. anaconda turned
this "feature" on) then it would break everything using librpm to install
packages.  The other approach would be to add threading to rpm.  Now I
know Jeff Johnson has been looking into doing that in order to do parallel
installs of packages, but I don't think he was thinking of having a 
seperate thread to read an rpm, and another to output it to the disk 
(simplified I know).

Anyway, all I am really trying to express that the road to what you are
wanting to do would ultimately produce convolutions that will either make
rpm or anaconda or both harder to support for very limited gains.  
Probably, Jeff's work toward's parallel installs will actually give you
some of what you want as the psm threads (package state machine) would 
likely most of the time be interleaving (though not on purpose) cdrom and
disk i/o.  This path that Jeff is taking has its own set of gotchas mainly 
centered around the fact that it means that scriplets that modify things
on the system need to employ some sort of locking mechansim to make sure
no two scriptlets touch the same file at the same time.  This though is
not a problem for rpm to solve as scriptlets are opaque (as they should
be) but for designers of rpms to solve.

Well I have probably said to much and not enouugh, but I hope this sheds
at least some light.

Cheers...james

> ext2/ext3 only speeds up the HD access (by a very small amount). Also,
> ext2/ext3 mostly differs in CPU overhead not IO overhead - and the
> install-to-hd process is mostly limited by IO latencies (disk seeks).
> 
> 	Ingo
> 
> 
> --
> fedora-test-list mailing list
> fedora-test-list at redhat.com
> http://www.redhat.com/mailman/listinfo/fedora-test-list
> 





More information about the fedora-test-list mailing list