[gcompris-devel] Search engine

Nicolas Adenis-Lamarre nicolas.adenis.lamarre at gmail.com
Fri Nov 1 23:11:11 UTC 2013


Hi,

i'm trying to implement a search engine for gcompris activities.
For that, i plan to use the sqlite fts4 feature.
In 2 words, it's text indexing in sqlite.

The idea is first, to build a gcompris_search.db sqlite database (a
separate database).
This dabase would be created before releasing a new gcompris version, to
avoid
people to create it.
I've created a python script that does the job. It runs in about 10 seconds.
Parsing the 144 activities xml, and loading the 74 translations (title,
description, goal, manual).

It works nicely and very quickly:

sqlite> select board_name, language, title, description from boards_texts
where language = 'fr' and boards_texts match 'train';
memory_sound_tux|fr|Jeu de memory auditif, contre Tux.|Joue au memory
auditif, contre Tux.
railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
musicales en cliquant dessus pour les écouter.
CPU Time: user 0.004000 sys 0.000000
sqlite>

sqlite> select board_name, language, title, description from boards_texts
where language = 'de' and boards_texts match 'geld';
money|de|Geld|Übe die Verwendung von Geld.
money_back_cents|de|Gib Tux sein Wechselgeld, einschließlich Cents|Übe die
Verwendung von Geld durch Rückgabe des Wechselgelds an Tux
money_cents|de|Geld|Übe die Verwendung von Geld inklusive Cents.
money_back|de|Gebe Tux sein Wechselgeld|Übe die Verwendung von Geld durch
Rückgabe des Wechselgelds an Tux
CPU Time: user 0.000000 sys 0.000000
sqlite>

sqlite> select board_name, language, title, description from boards_texts
where language = 'fr' and boards_texts match 'train mémoire';
railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
CPU Time: user 0.000000 sys 0.000000
sqlite>

sqlite> select board_name, language, title, description from boards_texts
where language = 'fr' and boards_texts match 'musical* écout*';
memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
musicales en cliquant dessus pour les écouter.
CPU Time: user 0.004000 sys 0.000000
sqlite>

There are however 2 downsides.
So before continuing my work, i would like a feedback about them, that
could make my work undesirable :

1) it is not as sweet as google is. You can ignore case, but é is not e.
You can use *, but music is not musics (but music* include musics, but
people should not use the star)
2) the database is 18mo. So, it would make grow a lot the gcompris package.

Nicolas Adenis-Lamarre



More information about the Gcompris-devel mailing list