[Nepomuk] Loads of Questions

Mon Sep 13 10:42:22 CEST 2010

Hi Vishesh and all,

I finally get to your email. This is in fact a big one and I suppose I
will not answer everything right now. But I will dig deep in your code
and try to understand most of it to be able to comment.

Let me start by some comments:
* Currently you have the IdentificationRequest class which is very
complex and big and hard to understand. But actually each of the
resources to identify are identified separately. Thus, I suppose it
would make more sense to have a smaller, simpler class which identifies
only one resource. And then based on that you can create an identifier
that can use a changelog as input.
Also you could have another class which simply hashes already identified
resources. This could be used as optional input to an identification
process.
Actually it would be even better to have one container of already
identified resources which you can ask to identify another one. Thus,
the IdentificationRequest becomes something like a ResourceIdentificator
which does not know about syncfile or changelog or anything but simply
has a hash of already identified resources and a method to identify
another one.
Then based on that you could create the class that can use a changelog
as input instead of a model - a derived class which you probably only
need in your service and not in the lib.
As for addIdentify(): IMHO this should not be done by the identification
class but by a client to that class because it would be a reaction to a
failed identification.

* Also if your API is synchronous IMHO there is no need for signals
informing about the result.

Let us discuss this first and then continue - just in case I
misunderstood something.

But there is more to read below anyway. :)

On 09/06/2010 09:10 PM, Vishesh Handa wrote:
> 1. *Design question -* Generally when syncing something, it is done in 2
> steps -> Identification and Merging. During the Identification all the
> resources that could not be identified and are NOT of type
> nfo:FileDataObject are created. This is done *during the Identification
> process*. However Artem made me realize that this might not be ideal. In
> his case he likes to check if the identification was successful and
> according merge the resources. In that case the IdentificationRequest
> would land up creating some Resources which would be ignored if he
> chooses not to merge. 
> 
> One solution to this is to have some kind of placeholder uri which the
> IdentificationRequest maps the resources to when creating a new
> Resources, and the actual creation is done during the merge process. Do
> you approve? Or should I just let it be the way it is?

If we go the way I draft above then no resources are created in the
identification process anyway. It is up to the client. In this case that
is Artem's code. In your case it is the syncing code which would create
resources that could not be identified and maybe put them back into the
identifier so it can make use of them in the next id request.

> 2. *The addition of vital properties and required properties for
> identification -* I still haven't gotten around to doing this, mainly
> because it would be a huge change that would break everything ( The
> existing syncfile format, and the internals of IR ). Do you think it is
> required? I didn't feel like starting it during the gsoc period as if I
> broke too many things I wouldn't have anything proper to submit. I'm
> CCing Artem.
>
> Btw, in case you don't remember - Vital properties would be those
> properties without which the identification would fail. And optional
> would be those which are nice if they get identified but don't
> contribute to the actual score (How would that be useful?)

I think Artem made a good point here, right?

> *1. Nepomuk database identifier -*  Syncing works pretty well when you
> do it once, but if you do it multiple times ( as most people would ) you
> have to perform the identification every time. This is not acceptable if
> the identification fails and the user has to do it manually. I was
> hoping to have some kind of identifier which could be used to uniquely
> identify a Nepomuk database. That way I could save the resource mappings
> after the initial identification are avoid re-identifying the same
> resources each time.

Buf if the user identifies a resource A manually as resource B in the
remote store then A will be synced on top of B - thus, the next time all
identifying properties will be available. At least in most cases. Or not?

> *2. Some way to store the last sync date - *Initially I was going to
> store the date in some config file, but that wouldn't be appropriate as
> I need to store the last sync date to each Nepomuk database. So we
> probably need some kind of ontology which could help us represent this
> internally.

What do you need the last sync date for?

> *3. Communication Medium - *The Nepomuk enabled machines need to
> communicate with each other in order to generate the correct sync files,
> and transfer them to each other. How do I go about doing that?
> Telepathy? I was looking into Avahi, but I thought I'll just ask before
> I start trying something.

You need to elaborate a bit here. Why do they need to communicate? You
mean during the sync?

> *4. The GUI - *Any ideas or anything you can think of would be great!

We will think of something nice after the rest is solved. :)

> After GSoC
> =========
> 
> Now that gsoc is over I would ideally like to start the process of
> polishing the Sync Library so that it can be moved to kdelibs. That way
> I could start using it in all the places it should be used, namely,
> Strigi, Removable Media stuff and the Akonadi Feeders. I was hoping you
> could go through the library API, and just help out. 

I did that. Comments are in the beginning of the email.

> *KUrls - *I know how you prefer KUrls over QUrls ( rightly so! ), and I
> think I should use KUrls everywhere in the library. It would however
> break the current API, which isn't a big deal since Artem and I are the
> only ones using it. Do you think we should go for it? Or should we just
> use KUrl internally? 

Use KUrl. There is no need to think about BC just yet.

> Metadata Sharing 
> ==============
> 
> Our meeting during Akademy was cut short, and there are still many
> things that need to be finalized. The main thing that needs to be taken
> care of is the new ontology for security and a couple of other things.
> Maybe we could set a date to discuss this?

I am on the telepathy sprint next weekend. So the week after that would
be good.

Cheers,
Sebastian