[Nepomuk] Loads of Questions

Mon Sep 13 16:59:16 CEST 2010

Hey!

On Mon, Sep 13, 2010 at 2:12 PM, Sebastian Trüg <trueg at kde.org> wrote:

> Hi Vishesh and all,
>
> I finally get to your email. This is in fact a big one and I suppose I
> will not answer everything right now. But I will dig deep in your code
> and try to understand most of it to be able to comment.
>
> Let me start by some comments:
> * Currently you have the IdentificationRequest class which is very
> complex and big and hard to understand. But actually each of the
> resources to identify are identified separately. Thus, I suppose it
> would make more sense to have a smaller, simpler class which identifies
> only one resource.

I had thought of it earlier but the problem is that identification of one
resource may depend on another.  So it's easier to club them together.

And then based on that you can create an identifier
> that can use a changelog as input.
> Also you could have another class which simply hashes already identified
> resources. This could be used as optional input to an identification
> process.
>

What do you mean by simply hashes already identified resources?

> Actually it would be even better to have one container of already
> identified resources which you can ask to identify another one. Thus,
> the IdentificationRequest becomes something like a ResourceIdentificator
> which does not know about syncfile or changelog or anything but simply
> has a hash of already identified resources and a method to identify
> another one.
>

This seems like a nice idea.

> Then based on that you could create the class that can use a changelog
> as input instead of a model - a derived class which you probably only
> need in your service and not in the lib.
>

The inputs which it should be able to take are - model, syncfile ( actually
just the identFile, the change log really isn't required ) and a list of
statements, for strigi, which is basically a model.

It doesn't make any sense for other users of the library to be encumbered
with the knowledge of changelogs and identification sets. That's why I had
added convenience functions and wrappers, but I think your approach would be
better. I'll try something.

As for addIdentify(): IMHO this should not be done by the identification
> class but by a client to that class because it would be a reaction to a
> failed identification.
>

I agree.

>
> * Also if your API is synchronous IMHO there is no need for signals
> informing about the result.
>
> Let us discuss this first and then continue - just in case I
> misunderstood something.
>
> But there is more to read below anyway. :)
>
> On 09/06/2010 09:10 PM, Vishesh Handa wrote:
> > 1. *Design question -* Generally when syncing something, it is done in 2
> > steps -> Identification and Merging. During the Identification all the
> > resources that could not be identified and are NOT of type
> > nfo:FileDataObject are created. This is done *during the Identification
> > process*. However Artem made me realize that this might not be ideal. In
> > his case he likes to check if the identification was successful and
> > according merge the resources. In that case the IdentificationRequest
> > would land up creating some Resources which would be ignored if he
> > chooses not to merge.
> >
> > One solution to this is to have some kind of placeholder uri which the
> > IdentificationRequest maps the resources to when creating a new
> > Resources, and the actual creation is done during the merge process. Do
> > you approve? Or should I just let it be the way it is?
>
> If we go the way I draft above then no resources are created in the
> identification process anyway. It is up to the client. In this case that
> is Artem's code. In your case it is the syncing code which would create
> resources that could not be identified and maybe put them back into the
> identifier so it can make use of them in the next id request.
>
> > 2. *The addition of vital properties and required properties for
> > identification -* I still haven't gotten around to doing this, mainly
> > because it would be a huge change that would break everything ( The
> > existing syncfile format, and the internals of IR ). Do you think it is
> > required? I didn't feel like starting it during the gsoc period as if I
> > broke too many things I wouldn't have anything proper to submit. I'm
> > CCing Artem.
> >
> > Btw, in case you don't remember - Vital properties would be those
> > properties without which the identification would fail. And optional
> > would be those which are nice if they get identified but don't
> > contribute to the actual score (How would that be useful?)
>
> I think Artem made a good point here, right?
>
> > *1. Nepomuk database identifier -*  Syncing works pretty well when you
> > do it once, but if you do it multiple times ( as most people would ) you
> > have to perform the identification every time. This is not acceptable if
> > the identification fails and the user has to do it manually. I was
> > hoping to have some kind of identifier which could be used to uniquely
> > identify a Nepomuk database. That way I could save the resource mappings
> > after the initial identification are avoid re-identifying the same
> > resources each time.
>
> Buf if the user identifies a resource A manually as resource B in the
> remote store then A will be synced on top of B - thus, the next time all
> identifying properties will be available. At least in most cases. Or not?
>

Some of the cases yes, but mostly no. Example - A file
whose identifiable properties is just it's url and filename. If both of them
have changed, the user would have to re-identify the file each time. This is
a fairly common case with file strigi has not indexed or cannot index.

>
> > *2. Some way to store the last sync date - *Initially I was going to
> > store the date in some config file, but that wouldn't be appropriate as
> > I need to store the last sync date to each Nepomuk database. So we
> > probably need some kind of ontology which could help us represent this
> > internally.
>
> What do you need the last sync date for?
>

I need the last sync date the generate a proper sync file when syncing. Only
the modifications to the database from the 'last sync date' would be used.
It doesn't make any sense for all the changes to be transmitted.

>
> > *3. Communication Medium - *The Nepomuk enabled machines need to
> > communicate with each other in order to generate the correct sync files,
> > and transfer them to each other. How do I go about doing that?
> > Telepathy? I was looking into Avahi, but I thought I'll just ask before
> > I start trying something.
>
> You need to elaborate a bit here. Why do they need to communicate? You
> mean during the sync?
>
>
They would need to communicate once in order to convey their respective sync
dates, and in order to transfer the generated syncfiles. Worst case scenario
we can have the use manually transfer the syncfiles, but I would prefer
having some proper mechanism is place.

> *4. The GUI - *Any ideas or anything you can think of would be great!
>
> We will think of something nice after the rest is solved. :)
>
>
> > After GSoC
> > =========
> >
> > Now that gsoc is over I would ideally like to start the process of
> > polishing the Sync Library so that it can be moved to kdelibs. That way
> > I could start using it in all the places it should be used, namely,
> > Strigi, Removable Media stuff and the Akonadi Feeders. I was hoping you
> > could go through the library API, and just help out.
>
> I did that. Comments are in the beginning of the email.
>
> > *KUrls - *I know how you prefer KUrls over QUrls ( rightly so! ), and I
> > think I should use KUrls everywhere in the library. It would however
> > break the current API, which isn't a big deal since Artem and I are the
> > only ones using it. Do you think we should go for it? Or should we just
> > use KUrl internally?
>
> Use KUrl. There is no need to think about BC just yet.
>
>
Okay!

> > Metadata Sharing
> > ==============
> >
> > Our meeting during Akademy was cut short, and there are still many
> > things that need to be finalized. The main thing that needs to be taken
> > care of is the new ontology for security and a couple of other things.
> > Maybe we could set a date to discuss this?
>
> I am on the telepathy sprint next weekend. So the week after that would
> be good.
>
> Enjoy the sprint!

- Vishesh Handa

> Cheers,
> Sebastian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20100913/26fc800e/attachment.htm