[gcompris-devel] Search engine

Bruno Coudoin bruno.coudoin at gcompris.net
Sat Nov 2 00:45:04 UTC 2013


Le 02/11/2013 00:11, Nicolas Adenis-Lamarre a écrit :
> Hi,
>
> i'm trying to implement a search engine for gcompris activities.
> For that, i plan to use the sqlite fts4 feature.
> In 2 words, it's text indexing in sqlite.
>
> The idea is first, to build a gcompris_search.db sqlite database (a
> separate database).
> This dabase would be created before releasing a new gcompris version, to
> avoid
> people to create it.
> I've created a python script that does the job. It runs in about 10 seconds.
> Parsing the 144 activities xml, and loading the 74 translations (title,
> description, goal, manual).
>
> It works nicely and very quickly:
>
> sqlite> select board_name, language, title, description from boards_texts
> where language = 'fr' and boards_texts match 'train';
> memory_sound_tux|fr|Jeu de memory auditif, contre Tux.|Joue au memory
> auditif, contre Tux.
> railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
> memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
> musicales en cliquant dessus pour les écouter.
> CPU Time: user 0.004000 sys 0.000000
> sqlite>
>
> sqlite> select board_name, language, title, description from boards_texts
> where language = 'de' and boards_texts match 'geld';
> money|de|Geld|Übe die Verwendung von Geld.
> money_back_cents|de|Gib Tux sein Wechselgeld, einschließlich Cents|Übe die
> Verwendung von Geld durch Rückgabe des Wechselgelds an Tux
> money_cents|de|Geld|Übe die Verwendung von Geld inklusive Cents.
> money_back|de|Gebe Tux sein Wechselgeld|Übe die Verwendung von Geld durch
> Rückgabe des Wechselgelds an Tux
> CPU Time: user 0.000000 sys 0.000000
> sqlite>
>
> sqlite> select board_name, language, title, description from boards_texts
> where language = 'fr' and boards_texts match 'train mémoire';
> railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
> CPU Time: user 0.000000 sys 0.000000
> sqlite>
>
> sqlite> select board_name, language, title, description from boards_texts
> where language = 'fr' and boards_texts match 'musical* écout*';
> memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
> musicales en cliquant dessus pour les écouter.
> CPU Time: user 0.004000 sys 0.000000
> sqlite>
>
> There are however 2 downsides.
> So before continuing my work, i would like a feedback about them, that
> could make my work undesirable :
>
> 1) it is not as sweet as google is. You can ignore case, but é is not e.
> You can use *, but music is not musics (but music* include musics, but
> people should not use the star)
> 2) the database is 18mo. So, it would make grow a lot the gcompris package.
>

Hi,

First I don't think it is worthless in our case to use sqlite for this. 
In fact we already have all the data in memory. As a first 
implementation a simple g_strstr_len would do the trick. To not be case 
sensitif you can pass the strings through g_utf8_casefold first. If you 
want to support a 'like' mode then you have to search for an 
implementation (gpl compliant) for this and import it in our code.

Bruno.






More information about the Gcompris-devel mailing list