config option for KDirWatch method

Tue Aug 21 20:23:50 BST 2007

On Tuesday 21 August 2007 19:16:43 Aaron J. Seigo wrote:
> On Tuesday 21 August 2007, Dirk Mueller wrote:
> > On Tuesday, 21. August 2007, Flavio Castelli wrote:
> > > We watch recursively all indexed directories. In this way Strigi's
> > > index will be up-to-date and user searches will be consistent (we won't
> > > return deleted or no more valid contents nor we will omit a valid one).
> >
> > As I tried to express previously: thats an extremely stupid idea. strigi
>
> only so long as it is implemented stupidly or you don't actually care about
> having a useful search system on your computer. i grant it will be tricky
> to get right and should not be used on all parts of the file system.

It is definitely possible to make it right by giving power users some 
configurability.

> > should not interfer while my "svn update" touches 15000 files. it can do
> > that when I`m away and do not care about my computer, but it shouldn`t do
> > that while I`m sitting in front of the machine and wait for something to
> > finish.

In fact, those files are best indexed while they are still in the cache. It 
will work just fine as long as you don't svn up over 1G pipe on a 500mhz 
machine. Also, you could optimize the process by turning off/deferring 
hashing and potentially some other expensive analyzers. Indexing of text 
files is not as expensive as you might think. Also consider that SSDs will 
soon finally materialize on power user desktop.

As for me, having an up-to-date index to navigate the code right after svn up 
is a very useful feature. A single brute-force search will waste a good deal 
of resources, right while you are sitting in front of the machine.

> but it should know that something needs to be updated and schedule that.
> the reality check here is that most people don't touch 15k files at once.
> *most* people save a file here or there, open a file here or there, get 15
> or 20 emails, chat with a couple of friends (creating log histories).

Exactly. Having up-to-date stuff is the #1 priority. Moreso when you want to 
search different sources like chat logs, emails etc at once.

> things like chat logs and probably even emails should not be handled
> via "watch the file system for changes". best would be that index updates
> are fed to the storage system by the application that is processing the
> data: it's already in memory and being processed, so it's the perfect time
> to also index it.

Not only that. Changes are not necessarily instantly flushed to disk and 
reindexing the whole log as compared to small incremental additions is 
certainly not the best thing we can do.

> pretty much everything else can and should be indexed when (or shortly
> after being) changed, with the ability to exclude areas on disk for those
> crazy people like us who svn up and change a few thousand files all at
> once. in a beautiful world, we'd get scm projects to add optional
> on-the-fly indexing as well.

The only concern here is the notification system. SCM repos are the first 
thing I'd add to indexer list.

> indexing should also be throttled when operating on battery power.

Not to bump cpu up to higher voltage modes. Still the situation where you 
write a lot of information constantly is very unlikely one.

> as i said, not trivial to get perfect, but doable.
>
> > I`ve tried to talk about this with Jos already and it seems we have
> > conflicting goals here.
>
> if by "we" you mean "us as developers and the rest of the world as users"
> i'd agree.

Config files are known to resolve a lot of conflicts.

> > However, from experience with beagle I know that
> > the number 1 complain isn`t that it doesn`t find not-yet indexed
> > documents, but that it drains system ressources like crazy.
>
> i think those two things are linked. the more out of date a system is, the
> less useful it is. if i can't find items that are new or updated in the
> last N hours, then i have to fall back to other means (e.g. manual file
> system searching in konqi/dolphin) which makes the search system less
> useful. less useful == worth less, which in turn means that every bit of
> resources that search system uses is more and more annoying.
>
> alternatively, if that search system is "always" accurate and has what i
> need at my fingertips it becomes insanely more useful, and i'm much more
> forgiving towards its resource usage.
>
> usefulness == willingness to forgive.

The question is just how adequate these complaints are.

First of all, there's a price to pay for features. Not all people realize 
this.

People's perception of crazy or significant may differ. Some people find that 
their avg resource usage has gone up from 2% to 7% and say OMG.

Beagle had quite some time to gather complaints, computers are much more 
powerful today. 

Beagle is written in a managed language, has to drag huge runtime with it. 
Nobody said it is optimally written either.

> that said, this is all a bit moot because kernel devs, such as those that
> work on the linux OS, can't seem to figure out how to do these things
> efficiently. either it's a touch problem in the kernel level, they don't
> care about the needs of the desktop or they are just plain stupid when it
> comes to solving these kinds of problems.

This bothers me much since it's such an obviously useful functionality 
probably on both desktop and server.

Has there been an "official" request from KDE for this functionality? Are 
Linux devs aware of our concerns? 

-- Evgeny