[Nepomuk] Automated testing in Nepomuk

Wed Aug 22 09:20:11 UTC 2012

On Wednesday 22 August 2012 12.48:04 Vishesh Handa wrote:
> Hey guys
> 
> I've been lately working on better automated testing for Nepomuk, and since
> I've never attempted stuff like this before, I'm not sure if I'm going in
> the right direction.
> 
> Nepomuk has a server-client architecture where there is a 'nepomukstorage'
> process which hosts the the virtuoso database, and all other nepomuk
> processes communicate with this process via dbus + local socket. Most of
> the client libraries are just thin wrappers around these socket/dbus calls
> along with some caching. So, in order to test any of the client libraries,
> we need to have the nepomuk server components running.
> 
> I wrote a small library to create a fake dbus + kde session and then start
> a proper nepomuk environment. This environment is created before each of
> the tests.
> 

We recently did some work in akonadi to support another approach, instead of 
the "starting a separate dbus-session + server and making sure their 
connecting to the right session"- approach, which I think is somewhat messy.

Instead we allow each instance to have an identifier, and the prefix every dbus 
interface and the mysql db with that identifier. It is thus possible to run 
several instances in parallel on the same dbus session and the same mysql 
database, which is IMOH a lot nicer (and faster).

That might be a viable option for you too.

> Problem -
> 
> * Unit tests are very slow - They require a fresh nepomuk instance to be
> run. We cannot independently test any one class since they all generally
> need to communicate with the database, and require the ontologies to be
> installed.
> 
> Approximate time running a test = 3 - 5 minutes. Maybe someone could look
> into the code (nepomuk-core/autotests/lib/tools/)? They are just a couple
> of shell scripts.
> 
> Installing the library
> ----------------------------
> 
> Should I be installing the library? It's purely for testing, but it would
> be useful if someone else wanted to write nepomuk enabled test.
> 
> Query Testing
> --------------------
> Nepomuk has a query library which provides a C++ interface to write
> queries, which it then converts to sparql. The current existing tests
> simply test the string output of the query library with hand crafted sparql
> queries. Maintaining these tests is hard since a slight optimization might
> change the sparql query, even though the results are the same.
> 
> In order to improve this situation, I started writing proper tests for the
> QueryLibrary which actually check if the correct results are returned. In
> order for these tests to work, we need to push data into Nepomuk, which
> requires the entire unit testing environment described above. It also
> requires injection of data into Nepomuk, which depending on the quantity,
> could take some time.
> 
> I wrote a simple DataGenerator class, which is supposed to create contacts,
> files, emails and other data into Nepomuk. The queries are then run against
> this data, and the results are checked. Is this the right way to go about
> it?
> 

I think these are valuable integration tests, but it would be nice to have 
unit-tests as well. Basically it should be possible to unit-test it on a 
sparql level without having to change the tests in more than one place if you 
optimize something. You just need to make sure that every part is only tested 
once (so you ideally break just one test if you change something, or, if you 
break many tests, break every test in the same, shared testcode).

Another idea would be a visitor pattern to evaluate the Query, which would 
allow you to hook in a test-visitor...

> Backup-Restore Testing
> ----------------------------------
> Exact same problem, we need data and a test environment. Both take a lot of
> time.
> 
> Benchmarking
> ---------------------
> I have been thinking about using this data generator in order to be able to
> quantify improvements. How fast are searches for a email when you have 100
> emails? 1k? 100k? The same is the case when pushing that kind of data into
> Nepomuk. It's fairly slow right now, and we need proper measurements.
> 
> Where do these benchmarks go? Are they supposed to be in the main repo?
> They also require this entire test environment.

On that I'm fairly lost as well, because absolute numbers don't seem to make a 
whole lot of sense to test. Tell me if you figure something out.

But I do think the should go in the main repo and be regularly run to catch 
introduced performance regressions as quick as possible.

> -----
> 
> And finally - Are these really unit tests? We aren't testing one class at a
> time. Should these reside in the autotests directory?
> 

Well, I think they are more integration tests than unit tests, but it's 
understandable that you're having a hard time separating it from the rest of 
the system. 

In order to create unit-test you'd probably have separate the tests from the 
db and possibly also dbus. E.g. by writing a mock Soprano::Model, which checks 
if the right calls have been made. Of course that would not replace the tests 
which do use the db, to check that part as well.
I realize thought that this is getting messy when you have to start writing 
you're own sparql parser in a mock object =P

But my approach generally is to extract the parts which access a remote 
system, that way I can fake that class and deliver pre-built responses thus 
separating the class under test from the rest of the system.

In case you are able to create a set of small/fast unit tests it probably 
makes sense to be able to run them separately.

Cheers,
Christian

> Overall, I'd just like to know if that I'm not doing something stupid :)