[Nepomuk] Review Request: Reduce number of watches created by nepomukfilewatch

Simeon Bird bladud at gmail.com
Mon Oct 22 21:54:30 UTC 2012



> On Oct. 8, 2012, 12:15 p.m., Vishesh Handa wrote:
> > services/filewatch/kinotify.cpp, line 157
> > <http://git.reviewboard.kde.org/r/106086/diff/2/?file=81961#file81961line157>
> >
> >     Why wouldn't we want to walk the directory tree in subfolders we aren't indexing?
> >     
> >     They could have tags/rating and other metadata which we need to preserve on file move events, and delete on a file delete event.
> 
> Simeon Bird wrote:
>     We still watch the directory tree for subfolders we aren't indexing - it is only for subfolders on the filter list that we do not. 
>     
>     This is a speed optimisation (see below for my reasoning). The idea is to treat folders on the exclude list like hidden folders.
> 
> Simeon Bird wrote:
>     Also we're going to need this for symlinks to work properly again, I think.
> 
> Vishesh Handa wrote:
>     How is this related to systemlink handling?
>     
>     Also, as much as I like speed improvments, I don't like the idea of sacrificing correctness over speed. Specially since this may be a very common case. Eg - You index your home directory, but do not index one particular sub-folder. You then have tags/ratings in that sub-folder.
> 
> Simeon Bird wrote:
>     symlinks: At the moment, if we have ~/dir/A/la and ~/dir/symlink-to-A, we index and watch ~/dir/A/la twice. 
>     We really want to detect the symlink and only watch/index it once (I have a patch to do this now, but untested). 
>     But, to preserve sanity, that means we need to not walk or watch ~/dir/sylink-to-A/* (we've done that already). 
>     So for that we need part of the patch - but not the contentious part that stops watching the exclude filter list.
>     
>     To be clear, in the case you mention, this patch will still watch the folder. The code still watches folders which are not indexed and walks their directory trees. It only changes behaviour for folders which are explicitly on the exclude filter list. You have to actually tag CMakeFiles/libnepomukcore.so/resultqueryiterator.o to notice. I think in fact that CMakeFiles and autom4te are the only non-hidden folder entries on the default exclude list, so for anything else at all the patch has no effect.
>     
>     Would you consider this if I also added a checkbox (unchecked by default) to the kcm saying "I faithfully swear that I have no tags or ratings in folders matching the exclude filters list. Allow nepomuk to ignore them"?
> 
> Simeon Bird wrote:
>     Actually, no, you're right, this *is* a bad idea. 
>     
>     Instead, how about a version of the patch which changes the directory walking as here, so that it stops walking the directory tree if filterWatch ever returns false, BUT where filterWatch always returns true. So there is no change to behaviour at all (yet). 
>     
>     Then I write a second patch so that filterWatch returns false for files which are symlinks to already watched directories.
>     
>     Then a third patch implementing PRUNEFS, to deal with my "mounting sshfs makes nepomuk go crazy" problem. 
>     
>     Then perhaps a fourth patch reorganising the kcm so that PRUNEFS and exclude filters are in an 'advanced' section, separate from "folders to index" (as per your task somewhere in the bug tracker).
>     
>     Is that a good plan?
> 
> Vishesh Handa wrote:
>     Patch 1 and 2 - Do it. After thinking about it more, I'm actually okay with not watching directories such as CMakeFiles and all. So maybe that could be incorporated into patch 2?
>     
>     (3) and (4) also sound like a good idea. But I'm not that aware of PRUNEFS.
>     
>

Ok - the first I'll try to do tonight. 

2 and 3 I tried, but I'm not sure I got the behaviour quite right yet. Maybe we can talk on IRC about it at some point...

PRUNEFS in locate just means "don't index files on these filesystems". Usually it means, amongst others, tmpfs, sysfs or procfs.


- Simeon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106086/#review20070
-----------------------------------------------------------


On Aug. 28, 2012, 5:42 a.m., Simeon Bird wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106086/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2012, 5:42 a.m.)
> 
> 
> Review request for Nepomuk, Vishesh Handa and Sebastian Trueg.
> 
> 
> Description
> -------
> 
> Current master nepomukfilewatcher installs watches on all sub-folders of a watched folder.
> 
> This is problematic:
> 
>  - It means we have to walk the entire directory tree, even for not-indexed folders.
> This is quite a lot of work if you happen to have a large complex directory structure
> mounted over a network in your $HOME (as I do)
> 
> - It means we get inotify watches for directories which are on the filter list; eg, on this computer
> $HOME/build/nepomuk-core/services/filewatch/CMakeFiles/nepomukfilewatch.dir/__/fileindexer
> is watched, causing the filewatcher to go nuts every time I build something.
> 
> - It means we install many more watches than we need to, vastly increasing the probability  
> of hitting the inotify limit.
> 
> This code instead walks the tree until it finds a folder we don't want to index and then STOPS. 
> I couldn't find a way to avoid walking the whole tree with QDirIterator and QDir::Subdirectories, 
> so I use QDirIterator without subdirectories, then create a new QDirIterator for each subfolder to index.
> 
> I can see two objections to this change: 
> 
> 1) If someone moves a file into an ignored directory, they will now presumably lose their metadata. 
> This is true, in my opinion not a big problem; the default configuration is to watch
> $HOME minus temporary build directories. If people are moving files into temporary 
> directories they should probably lose the metadata, and if people manually add directories
> to the ignore list they probably have a good reason and expect nepomuk to ignore them. 
> 
> 2) I changed filterWatch from always returning true to returning true if we want to watch the
> file and false otherwise. I couldn't work out the reason for it always returning true before, 
> so whatever it was, I've probably broken it. 
> 
> Bonus fixes:
> 
>  - Properly pass the return value of addWatch up the tree, so that if we run out of watches, 
> we stop trying to add more.
> 
> - Check for inotify on kernels that have a two-number version string, like 3.0
> 
> - To find the length of event->name, qstrlen was used. If an event is returned 
> for a file outside a watched directory, event->name will not be allocated, and qstrlen 
> may read beyond the end of allocated memory, causing chaos, anarchy and confusion. 
> Use qstrnlen instead.
> 
> Thanks, let me know what you think.
> 
> 
> Diffs
> -----
> 
>   services/filewatch/kinotify.h ab12d66 
>   services/filewatch/kinotify.cpp 4a744d4 
>   services/filewatch/nepomukfilewatch.cpp 9fd5d9c 
> 
> Diff: http://git.reviewboard.kde.org/r/106086/diff/
> 
> 
> Testing
> -------
> 
> Compiled, run, used for a couple of days, checked which files were actually watched, timed the filewatch service's startup.
> 
> 
> Thanks,
> 
> Simeon Bird
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20121022/2fcae2f8/attachment.html>


More information about the Nepomuk mailing list