Using madvise for ld.so

Andrew Morton akpm at osdl.org
Wed Mar 17 20:58:17 CET 2004


Lubos Lunak <l.lunak at suse.cz> wrote:
>
>  I have a somewhat opposite question, related to the 'yet more preloading' 
> post. Just like this ld.so patch (and the hack that started this) speeds up 
> loading of large binaries by loading them sequentially in a large chunk, is 
> there some possibility to do the equivalent for inodes?

yup.  This is specific to ext2 and ext3, but that's not a bad thing to be
optimising for.

The inodes are laid out on disk as bunches of up to 32768 inodes, separated
by 128MB.  That's one inode table per blockgroup.  They are ordered by
ascending inode number.  So it's basically an array with gaps in it.

If you have a large number of small files which need to be read you should
run readdir() against each directory, reading the *entire* directory.  This
will give you all the inode numbers.  Then you should sort the pathnames by
inode number and then stat each file.  This will cause all the inodes to be
pulled into memory in ascending disk offset order.

>  I.e. I used the scan binary to simply recursively search and stat every file 
> under the given path. Would it be possible to do simply do a large read from 
> the HDD that would read all that information?

Yes, it would.  If you can locate the inode tables' disk blocks then reading
those blocks via /dev/hda1 will cause a subsequent stat() or open() to find
the inode in pagecache.  But the inode number -> disk offset translation is
tricky, and quite filesystem-specific.  You'd need to poke around in
e2fsprogs to find the appropriate code.

> > > Which reminds me of a question I asked myself: If I open the file,
> > > fadvise a WILLNEED on it, all its pages are scheduled for IO.  Completion
> > > is not waited for.  So, if the program then exits, the file gets closed,
> > > and assuming that was the last ref to that inode: will the not yet
> > > completed IO be canceled or will it still be completed?
> >
> > The I/O will not be cancelled.
> 
>  Is this the case also with mmap+madvise+munmap? It seems a bit strange, 
> although for these preloading tricks it's indeed handy.
> 

umm, yes, MADV_WILLNEED is also async, and will proceed after application
exit.




More information about the Kde-optimize mailing list