Baloo - Not Indexing everything by default
Martin Steigerwald
Martin at lichtvoll.de
Thu Oct 16 11:39:27 UTC 2014
Am Donnerstag, 16. Oktober 2014, 13:20:57 schrieb Vishesh Handa:
> Hey guys
Hi Vishesh,
> While Baloo performs better than Nepomuk. It does have its share of
> problems - mostly large text files, and high IO usage. Additionally, users
> on linux often seem to have the craziest files. Currently, we do not index
> plain text files which do not have a `.txt` extension, because otherwise we
> land up indexing genome data and other strange files. (Actual bugs)
How about limiting size for problematic files? I.e. only smaller text files?
Here Baloo runs quite well. But I´d like it to also index *.txt files.
Anything else that can be done to make is more efficient? In my experience its
already a lot more efficient than Nepomuk. It indexed a lot of text files here,
about a million or more. My mails that is :).
> I've been thinking about actually disabling the file indexing by default.
> However, that might be too radical. Instead, we could only index -
>
> * $HOME - Not including any subfolders.
> * Desktop, Documents, Videos, Pictures and Music. All of these are xdg user
> directories.
>
> Gnome Tracker actually does something quite similar.
Hmmm, I actually don´t use these, except for a images folder. I store my files
in categories / directories I want. I usually don´t sort by file type, but by
purpose – okay I have an images folder, but mostly for Digikam, but music and
audio meditations I already have split into two main directories. Thus I for
me above structure just doesn´t fit.
> Comments?
I´d rather like Baloo to be *intelligent* about errors, i.e.:
If an indexer fails on a file to skip it next time. Optionally at some time
present a list of files it failed to index to the user, maybe via a non
intrusive summary notification at the end of an indexing cycle. And report each
failed file just once in it.
Extra points for offering to report a bug with the file. But is a bit difficult,
cause it may well be a private file the user does not want to share.
Actually I´d also like to have advanced configuration options. On my Debian the
settings are very simplistic I can just say where not to search, no extension
list, no file size restrictions, no nothing. I think this could help users who
have problems with extra large text files.
But… I think advanced error handling, i.e. not trying on a file that is known
to fail, again and again and again, might be able to circumvent the need for
further configuration options.
I´d like to scan it for text files and source files tough. Just probably with
some delay… to avoid I/O load durging git checkout or compile runs. Right now
I do not seem to be able to set anything. I´d also like to see what filetypes
it actually indexes. I wonder whether it indexes opendocument files for
example, or PDF files. It seems from my files it finds less than Nepomuk. Ok, but
PDF it seems to find.
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
More information about the Plasma-devel
mailing list