Baloo - Not Indexing everything by default

Mark Gaiser markg85 at gmail.com
Fri Oct 17 22:47:56 UTC 2014


On Thu, Oct 16, 2014 at 1:20 PM, Vishesh Handa <me at vhanda.in> wrote:
> Hey guys
>
> While Baloo performs better than Nepomuk. It does have its share of problems
> - mostly large text files, and high IO usage. Additionally, users on linux
> often seem to have the craziest files. Currently, we do not index plain text
> files which do not have a `.txt` extension, because otherwise we land up
> indexing genome data and other strange files. (Actual bugs)
>
> I've been thinking about actually disabling the file indexing by default.
> However, that might be too radical. Instead, we could only index -
>
> * $HOME - Not including any subfolders.
> * Desktop, Documents, Videos, Pictures and Music. All of these are xdg user
> directories.
>
> Gnome Tracker actually does something quite similar.
>
> Comments?

Hi Vishesh,

First of all, please don't even consider turning it off by default.
Baloo might have some issues (can't really say for Plasma 5.x
experience since im still on KDE 4.xx) but having "a" desktop search
application is really worth a lot! It's just a difficult area to get
right.

Also, i don't think you should limit the indexer to the XDG folders. I
personally have quite some data in $HOME specifically in sub folders.

So it sounds like the indexer job needs to learn some new tricks then.
Here's my idea on the top of my head that "might" work.

1. delay indexing file content that any application has open already.
I don't know which C++/Qt function you can use for this, but the idea
here is to not index the files/folders that you see when you type
"iostat". Store the list for later indexing. To still have some data
there, just index the filenames.

2. What's left is everything _not_ currently in iostat which you could
potentially index. Here you should filter out any file types that you
don't have indexers for.

3. Now you're left with a list of files which could potentially be
indexed. Here you should probably filter out those that are bigger
then 5MB.

4. Everything that's left: index!

5. While indexing, periodically look at iostat to see if any process
(other then the baloo processes) have new files.folders open. Like a
compile job. If something like that is detected then you should
probably follow step 1 again :)


This would be a nice indexer, right?
The only problem i see here is doing an "iostat" call in code. I have
no clue if there is a posix c/c++ function for that.


More information about the Plasma-devel mailing list