(

Fri Mar 2 19:33:15 UTC 2007

At 9:50 AM +0100 3/2/07, Karel Zak wrote:
>On Fri, Mar 02, 2007 at 01:07:05AM -0500, Tony Nelson wrote:
>> Also, if it were to always run:
>>
>> Readahead-collector allocates memory in big chunks.  It uses lots of memory
>> -- when I ran it, 39 MB of /var/log/readahead-rac.log (which produced about
>> .33 MB of /etc/readahead.d/custom.* -- but see bz 230687).  (I note that
>> readahead-collector will collect without limit, but that readahead will
>> only use the first 32K entries.)  Thus, while readahead-collect uses too
>> much memory now to run every time, if it used a better data structure, say
>> a balanced tree, and parsed the audit data into the tree as the data
>> arrived, it could use about 2% of what is currently does.
>
> It's not so easy. My first implementation has collected only paths, but
> this way is not reliable. You need to collect all events and parse it
> by libauparse, because every syscall produces three events (syscall,
> cwd and path) and the collector requires data from all three events. The
> order of events could be *random* and before parsing you need to
> all events for the syscall.

I suggested parsing as the events are received.  The order may vary, but
all events for a particular file come in a group, according to parse_events
and man auparse_next_record.  It's the same event while the strings match
up to the first ")" ("audit(1234567890.123:1): ").  Collect strings until
it changes and then auparse the preceding event.

(For "balanced tree" read whatever mapping type you prefer.)

> I think a simple solution is reduce number of fields in events and
> store to memory simplificated event strings. I hope libauparse
> doesn't have care about number of fields. This way can save 80% of
> used memory (I think). I'll try to implement it.

That would help, but it seems to rely on more of auparse's internals than
would looking at the start of the string.

> Frankly, I'm not sure if 30MB of RAM is so big problem in particular
> case that readahead is effective solution for machines where is a lot of
> memory for kernel cache.

ISTM that it is too much memory if readahead-collector is to run every boot.

> But you're right that there is a place for optimization.
>
>> Neither program seems to take account of the memory used by the files that
>> are read, though readahead can report it. (Possibly readahead-collect
>> should avoid the largest files, as they probably aren't mostly used and
>> don't cause so much seeking.)
>
> Any example of really large file (during boot)?

default.early
54204 KB  /usr/lib/locale/locale-archive

default.later
54204 KB  /usr/lib/locale/locale-archive
46556 KB  /var/lib/rpm/Packages
13748 KB  /usr/share/icons/Bluecurve/icon-theme.cache

custom.early
54204 KB  /usr/lib/locale/locale-archive
10240 KB  /var/lib/mysql/ibdata1
 7373 KB  /usr/share/fonts/japanese/TrueType/sazanami-gothic.ttf
 5120 KB  /var/lib/mysql/ib_logfie0
 5120 KB  /var/lib/mysql/ib_logfile1

custom.later
54204 KB  /usr/lib/locale/locale-archive
25254 KB  /usr/share/icons/crystalsvg/icon-theme.cache
13748 KB  /usr/share/icons/Bluecurve/icon-theme.cache
 7373 KB  /usr/share/fonts/japanese/TrueType/sazanami-gothic.ttf
 7356 KB  /usr/lib/firerfox-1.5.0.10/components/libgklayoyut.so
 6518 KB  /usr/share/icons/gnome/icon-themee.cache
 4756 KB  /etrc/gconf/gconf.xml.defaults/%gconf-tree.xxml
 4623 KB  /usr/share/icons/hicolor/icon-theme.cache

(I wrote a tool at <http://georgeanelson.com/readaheadsize.py>, but this is
hand-copied.)

Note that locale-archive is read twice, in both early and later.  The early
files list should be subtracted from the later files list, by a merge after
sorting.  (Ask me to do it?)

Packages is for yum-updatesd, which I'm not running, and which, as a
daemon, doesn't need to be sped up anyway.  The mysql stuff is also for a
daemon.  Probably all daemons' files should be skipped?

I think it unlikely that reading an icon-theme.cache is as useful as its
size.  I'm using gnome and Bluecurve, but I have some KDE stuff installed,
so that's where the crystalsvg stuff comes from.  It's clearly not worth
its weight.

Another issue is files opened for writing and not reading.  There's no use
reading them in at all.  I don't have any examples right away.  I expect
that the open mode is in the message somewhere, but I can't read it well
enough.

>> Readahead-collector runs for 5 minutes, so its output might need pruning if
>> it ran each boot.  When run manually, one knows to start stuff up and then
>> wait for readahead to finish.  BTW, the collection loop has a 30 second
>> timeout that isn't being used.  It might be reasonable to stop collecting
>> if no event has come in in that time.
>
> Good idea, but I'm pessimistic that there is 30s when system doesn't
> call open() :-)

In that case, readahead isn't going to help anyway. 8-b

>> If readahead-collect could run automatically, readahead might request it
>> for the next boot if "too many" files are not found (say, after a firefox
>> update).
>
> Very good point.
>
> TODO updated:
>
>
>http://git.fedoraproject.org/?p=hosted/readahead;a=blob_plain;f=TODO;hb=HEAD
>
>
> Thanks.

You're welcome.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>