[gcompris-devel] Search engine

Nicolas Adenis-Lamarre nicolas.adenis.lamarre at gmail.com
Sat Nov 2 11:13:04 UTC 2013


Hi,

thanks for your feedback.
However, i don't understand why in a first time you prefer a g_strstr_len
implementation.
g_strstr_len implementation is far from being perfect, and i noticed no
advantage compared to the sqlite database.
The word "first implementation" is strange in fact in your mail, and i've
not really associated a meaning for the moment.

functionnally :
contrary to the sqlite one, it doesn't analyse words, just letter sequence,
for example, "train" matches for "entrainement" (french)
sqlite implementation doesn't require extra prerequisite (sqlite already
there) ; it includes a word analyser library specially done for word
searches (exactly the need).
Personnally i prefer letting library do their job than reimplementing the
wheel. (ok, very simple in this case, so can be discussed).

Technically :
The implementation difficulty is not complicated.
The fact that data are already in memory is something that could/should be
removed in the future.
it works because gcompris has a view activities, but it makes the gcompris
startup O(n), and memory occupation O(n) where n
is the number of activities which is not a problem because the number of
activities is 144 at the moment. With 2000 activities, it would make things
more complicated.
Gcompris uses the boards sqlite table only as a backup, not as a database :
it's loaded at startup in memory, and then, gcompris implements functions
to look into that memory, not quering the database on time (saving memory
and cpu).
Basically,
what i mean, is not that the memory loadind should be changed, because the
number of activities in gcompris increase slowly, and everything works
nicely and will work nicely for several years again,
but doing the g_strstr_len implementation would add a requirement to keep
data in memory to not break the search feature if we want to change the
memory management to make gcompris eat less memory.

sqlite database is a bit less than 20MB. However, i was not precise, once
compressed in a package, it's about 500KB.

I'm new on gcompris, so basically, i think that a lot of things that i can
say are simplyly wrong, or not linked to the projects goals.
Please correct me if needed ; i'm just trying to make exchanges to choose
the better solution before trying to implement.

Nicolas Adenis-Lamarre


2013/11/2 Bruno Coudoin <bruno.coudoin at gcompris.net>

> Le 02/11/2013 00:11, Nicolas Adenis-Lamarre a écrit :
> > Hi,
> >
> > i'm trying to implement a search engine for gcompris activities.
> > For that, i plan to use the sqlite fts4 feature.
> > In 2 words, it's text indexing in sqlite.
> >
> > The idea is first, to build a gcompris_search.db sqlite database (a
> > separate database).
> > This dabase would be created before releasing a new gcompris version, to
> > avoid
> > people to create it.
> > I've created a python script that does the job. It runs in about 10
> seconds.
> > Parsing the 144 activities xml, and loading the 74 translations (title,
> > description, goal, manual).
> >
> > It works nicely and very quickly:
> >
> > sqlite> select board_name, language, title, description from boards_texts
> > where language = 'fr' and boards_texts match 'train';
> > memory_sound_tux|fr|Jeu de memory auditif, contre Tux.|Joue au memory
> > auditif, contre Tux.
> > railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
> > memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
> > musicales en cliquant dessus pour les écouter.
> > CPU Time: user 0.004000 sys 0.000000
> > sqlite>
> >
> > sqlite> select board_name, language, title, description from boards_texts
> > where language = 'de' and boards_texts match 'geld';
> > money|de|Geld|Übe die Verwendung von Geld.
> > money_back_cents|de|Gib Tux sein Wechselgeld, einschließlich Cents|Übe
> die
> > Verwendung von Geld durch Rückgabe des Wechselgelds an Tux
> > money_cents|de|Geld|Übe die Verwendung von Geld inklusive Cents.
> > money_back|de|Gebe Tux sein Wechselgeld|Übe die Verwendung von Geld durch
> > Rückgabe des Wechselgelds an Tux
> > CPU Time: user 0.000000 sys 0.000000
> > sqlite>
> >
> > sqlite> select board_name, language, title, description from boards_texts
> > where language = 'fr' and boards_texts match 'train mémoire';
> > railroad|fr|Chemin de fer|Un jeu de mémoire basé sur des trains
> > CPU Time: user 0.000000 sys 0.000000
> > sqlite>
> >
> > sqlite> select board_name, language, title, description from boards_texts
> > where language = 'fr' and boards_texts match 'musical* écout*';
> > memory_sound|fr|Jeu de memory auditif|Essaie d'apparier des cartes
> > musicales en cliquant dessus pour les écouter.
> > CPU Time: user 0.004000 sys 0.000000
> > sqlite>
> >
> > There are however 2 downsides.
> > So before continuing my work, i would like a feedback about them, that
> > could make my work undesirable :
> >
> > 1) it is not as sweet as google is. You can ignore case, but é is not e.
> > You can use *, but music is not musics (but music* include musics, but
> > people should not use the star)
> > 2) the database is 18mo. So, it would make grow a lot the gcompris
> package.
> >
>
> Hi,
>
> First I don't think it is worthless in our case to use sqlite for this.
> In fact we already have all the data in memory. As a first
> implementation a simple g_strstr_len would do the trick. To not be case
> sensitif you can pass the strings through g_utf8_casefold first. If you
> want to support a 'like' mode then you have to search for an
> implementation (gpl compliant) for this and import it in our code.
>
> Bruno.
>
>
>
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> gcompris-devel mailing list
> gcompris-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gcompris-devel
>



More information about the Gcompris-devel mailing list