[Nepomuk] Automated testing in Nepomuk

Wed Aug 22 07:18:04 UTC 2012

Hey guys

I've been lately working on better automated testing for Nepomuk, and since
I've never attempted stuff like this before, I'm not sure if I'm going in
the right direction.

Nepomuk has a server-client architecture where there is a 'nepomukstorage'
process which hosts the the virtuoso database, and all other nepomuk
processes communicate with this process via dbus + local socket. Most of
the client libraries are just thin wrappers around these socket/dbus calls
along with some caching. So, in order to test any of the client libraries,
we need to have the nepomuk server components running.

I wrote a small library to create a fake dbus + kde session and then start
a proper nepomuk environment. This environment is created before each of
the tests.

Problem -

* Unit tests are very slow - They require a fresh nepomuk instance to be
run. We cannot independently test any one class since they all generally
need to communicate with the database, and require the ontologies to be
installed.

Approximate time running a test = 3 - 5 minutes. Maybe someone could look
into the code (nepomuk-core/autotests/lib/tools/)? They are just a couple
of shell scripts.

Installing the library
----------------------------

Should I be installing the library? It's purely for testing, but it would
be useful if someone else wanted to write nepomuk enabled test.

Query Testing
--------------------
Nepomuk has a query library which provides a C++ interface to write
queries, which it then converts to sparql. The current existing tests
simply test the string output of the query library with hand crafted sparql
queries. Maintaining these tests is hard since a slight optimization might
change the sparql query, even though the results are the same.

In order to improve this situation, I started writing proper tests for the
QueryLibrary which actually check if the correct results are returned. In
order for these tests to work, we need to push data into Nepomuk, which
requires the entire unit testing environment described above. It also
requires injection of data into Nepomuk, which depending on the quantity,
could take some time.

I wrote a simple DataGenerator class, which is supposed to create contacts,
files, emails and other data into Nepomuk. The queries are then run against
this data, and the results are checked. Is this the right way to go about
it?

Backup-Restore Testing
----------------------------------
Exact same problem, we need data and a test environment. Both take a lot of
time.

Benchmarking
---------------------
I have been thinking about using this data generator in order to be able to
quantify improvements. How fast are searches for a email when you have 100
emails? 1k? 100k? The same is the case when pushing that kind of data into
Nepomuk. It's fairly slow right now, and we need proper measurements.

Where do these benchmarks go? Are they supposed to be in the main repo?
They also require this entire test environment.
-----

And finally - Are these really unit tests? We aren't testing one class at a
time. Should these reside in the autotests directory?

Overall, I'd just like to know if that I'm not doing something stupid :)

-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120822/c8446aeb/attachment-0001.html>