[Nepomuk] Review Request: Reduce number of watches created by nepomukfilewatch
Simeon Bird
bladud at gmail.com
Tue Aug 21 03:59:52 UTC 2012
> On Aug. 20, 2012, 9:10 a.m., Sebastian Trueg wrote:
> > This is an absolute no-go for several reasons:
> > 1. Even non-indexed files can have annotations like tags or comments or download locations, etc.
> > 2. The move-into-unwatched dir problem you already mention.
> > 3. Nepomuk's index-folder system is more complex. It allows to index sub-folders of non-indexed folders.
> >
> > Sadly there is currently no way around installing this many watches. It needs to be fixed in the kernel.
> >
> > The qstrnlen and kernel version fixes are good though.
>
> Simeon Bird wrote:
> Well - is there some way to compromise a bit here?
>
> Sorry to be argumentative, I imagine you have to explain this a lot.
>
> The problem I'm trying to solve is not so much the number of watches,
> as to provide finer control over which directories are watched.
>
> There are (I think) good reasons for not wanting to watch certain specific directories;
> for pretty much the same reason as there is already an option to not watch removable drives.
>
> In my case ~/sshfs was (basically) a removable network drive, which just happened
> to be mounted under $HOME. CMakeFiles is similarly a temporary directory, whose contents can change rapidly.
>
> I know, because I made these directories, that they are just temporary storage mounted in an unusual way,
> so they don't need to be watched, but I need a way to tell nepomuk this.
> Currently nepomuk watches $HOME but not hidden folders (I think), which is already a heuristic;
> what I want to do is add a way to tune this heuristic by hand.
>
> I actually always (mis-)understood (from long before I read the code) that "index these folders" meant
> "these folders are the ones nepomuk cares about" which is why I tried to re-use the index list
> for this feature. However, from what you say this is not a good choice.
> Perhaps one could add another list of "do not watch these folders,
> they are really removable drives" to the kcm?
>
> Or (better) perhaps re-use the filter list? The default for that contains
> mostly temporary build files already.
>
> So then we would have:
> - Folders on the index list are watched and indexed.
> - All other folders in $HOME not on the filter list are watched
> - Folders on the filter/do-not-watch list and their subfolders are neither watched nor indexed.
>
> What do you think? Could you consider accepting something like that?
> I agree that fixing the problem in the kernel is the right way of doing things,
> but until that happens we're stuck with heuristics and best guess behaviour, so
> we might as well try to make the best guesses as good as we can.
>
> Thanks,
> Simeon
PS: qstrnlen and kernel versions pushed and cherry-picked to 4.9, thanks.
- Simeon
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106086/#review17745
-----------------------------------------------------------
On Aug. 19, 2012, 5:50 p.m., Simeon Bird wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106086/
> -----------------------------------------------------------
>
> (Updated Aug. 19, 2012, 5:50 p.m.)
>
>
> Review request for Nepomuk, Vishesh Handa and Sebastian Trueg.
>
>
> Description
> -------
>
> Current master nepomukfilewatcher installs watches on all sub-folders of a watched folder.
>
> This is problematic:
>
> - It means we have to walk the entire directory tree, even for not-indexed folders.
> This is quite a lot of work if you happen to have a large complex directory structure
> mounted over a network in your $HOME (as I do)
>
> - It means we get inotify watches for directories which are on the filter list; eg, on this computer
> $HOME/build/nepomuk-core/services/filewatch/CMakeFiles/nepomukfilewatch.dir/__/fileindexer
> is watched, causing the filewatcher to go nuts every time I build something.
>
> - It means we install many more watches than we need to, vastly increasing the probability
> of hitting the inotify limit.
>
> This code instead walks the tree until it finds a folder we don't want to index and then STOPS.
> I couldn't find a way to avoid walking the whole tree with QDirIterator and QDir::Subdirectories,
> so I use QDirIterator without subdirectories, then create a new QDirIterator for each subfolder to index.
>
> I can see two objections to this change:
>
> 1) If someone moves a file into an ignored directory, they will now presumably lose their metadata.
> This is true, in my opinion not a big problem; the default configuration is to watch
> $HOME minus temporary build directories. If people are moving files into temporary
> directories they should probably lose the metadata, and if people manually add directories
> to the ignore list they probably have a good reason and expect nepomuk to ignore them.
>
> 2) I changed filterWatch from always returning true to returning true if we want to watch the
> file and false otherwise. I couldn't work out the reason for it always returning true before,
> so whatever it was, I've probably broken it.
>
> Bonus fixes:
>
> - Properly pass the return value of addWatch up the tree, so that if we run out of watches,
> we stop trying to add more.
>
> - Check for inotify on kernels that have a two-number version string, like 3.0
>
> - To find the length of event->name, qstrlen was used. If an event is returned
> for a file outside a watched directory, event->name will not be allocated, and qstrlen
> may read beyond the end of allocated memory, causing chaos, anarchy and confusion.
> Use qstrnlen instead.
>
> Thanks, let me know what you think.
>
>
> Diffs
> -----
>
> services/filewatch/kinotify.h ab12d66
> services/filewatch/kinotify.cpp 509abff
> services/filewatch/nepomukfilewatch.cpp 9fd5d9c
>
> Diff: http://git.reviewboard.kde.org/r/106086/diff/
>
>
> Testing
> -------
>
> Compiled, run, used for a couple of days, checked which files were actually watched, timed the filewatch service's startup.
>
>
> Thanks,
>
> Simeon Bird
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120821/6395a1f6/attachment.html>
More information about the Nepomuk
mailing list