[Nepomuk] Automated testing in Nepomuk

Wed Aug 22 14:21:32 UTC 2012

On Wednesday 22 August 2012 12:48:04 Vishesh Handa wrote:
> Hey guys
> 
> I've been lately working on better automated testing for Nepomuk, and since
> I've never attempted stuff like this before, I'm not sure if I'm going in
> the right direction.
> 
> Nepomuk has a server-client architecture where there is a 'nepomukstorage'
> process which hosts the the virtuoso database, and all other nepomuk
> processes communicate with this process via dbus + local socket. Most of
> the client libraries are just thin wrappers around these socket/dbus calls
> along with some caching. So, in order to test any of the client libraries,
> we need to have the nepomuk server components running.
> 
> I wrote a small library to create a fake dbus + kde session and then start
> a proper nepomuk environment. This environment is created before each of
> the tests.
> 
> Problem -
> 
> * Unit tests are very slow - They require a fresh nepomuk instance to be
> run. We cannot independently test any one class since they all generally
> need to communicate with the database, and require the ontologies to be
> installed.
> 
> Approximate time running a test = 3 - 5 minutes. Maybe someone could look
> into the code (nepomuk-core/autotests/lib/tools/)? They are just a couple
> of shell scripts.

For a single test? Or for all tests? Generally, are you running these tests on 
dummy data, or on your users data? Have you figured out where the time is 
spent?

> Installing the library
> ----------------------------
> 
> Should I be installing the library? It's purely for testing, but it would
> be useful if someone else wanted to write nepomuk enabled test.

The best way would be to not install them, but from my experience in KDevelop 
I know that this is not always possible. If you use plugins e.g. then they 
must be known to kbuildsycoca4 and hence must be installed...

> Query Testing
> --------------------
> Nepomuk has a query library which provides a C++ interface to write
> queries, which it then converts to sparql. The current existing tests
> simply test the string output of the query library with hand crafted sparql
> queries. Maintaining these tests is hard since a slight optimization might
> change the sparql query, even though the results are the same.

Imo, you should have a designated test for the optimizations of the query 
builder which compares strings and of course must be updated when you change 
the format of the queries. On the other hand, that allows you to keep that 
test simple and limited to a specific part of nepomuk.

> In order to improve this situation, I started writing proper tests for the
> QueryLibrary which actually check if the correct results are returned. In
> order for these tests to work, we need to push data into Nepomuk, which
> requires the entire unit testing environment described above. It also
> requires injection of data into Nepomuk, which depending on the quantity,
> could take some time.

Other than the above, you should indeed also test whether the data is properly 
returned. If it takes time, optimize it. If it takes too long to save a few 
entries and read them again, then you probably have an issue to solve, no?

> I wrote a simple DataGenerator class, which is supposed to create contacts,
> files, emails and other data into Nepomuk. The queries are then run against
> this data, and the results are checked. Is this the right way to go about
> it?

While it sometimes makes sense to use a big database to run tests against, I 
would personally recommend to isolate these things. I.e. just before you read 
data, write it into the DB and ensure that the previously written data is 
returned. Tests should not depend on each other and should in theory be able 
to run in parallel.

> Backup-Restore Testing
> ----------------------------------
> Exact same problem, we need data and a test environment. Both take a lot of
> time.

Profile it then, and improve the start up times etc. pp.? As I said above, in 
principle this should not take much time. If it does, then find out why.

> Benchmarking
> ---------------------
> I have been thinking about using this data generator in order to be able to
> quantify improvements. How fast are searches for a email when you have 100
> emails? 1k? 100k? The same is the case when pushing that kind of data into
> Nepomuk. It's fairly slow right now, and we need proper measurements.
> 
> Where do these benchmarks go? Are they supposed to be in the main repo?
> They also require this entire test environment.

Use QTest + QBENCHMARK?

> -----
> 
> And finally - Are these really unit tests? We aren't testing one class at a
> time. Should these reside in the autotests directory?

Nominally they are integration tests, but since they run automatically, why 
not leave them in autotests?

> Overall, I'd just like to know if that I'm not doing something stupid :)
-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120822/71b83c28/attachment.sig>