Fwd: follow up from our talk / responce to your paper

Thu Oct 21 02:55:00 CEST 2004

Hmm, tried sending this half an hour ago and it never went through, trying 
again.

-----------------------------------------------------------

Hi folks -- I'm now back from my vacation in Chile.  I ended up doing one talk 
on this stuff at one of the universities there, and as it turned out Ricardo 
Baeza (author of Modern Information Retrieval) is a professor at another 
university in Santiago (Univerciudad de Chile) and invited me out for coffee 
while I was there.

He recommended one of his papers to me and asked that I respond to it.  I've 
put a copy of it here:

http://developer.kde.org/~wheeler/files/baeza-1.pdf

The stuff below is my responce to the paper and our conversation.

I'm just starting to get back into working on KLink and whatnot.  I'd like to 
set up a mailing list so that I don't have to dig out the list of CC's in the 
future.  The list would be public, but I don't have any plans on announcing 
it as I'd like to keep it fairly quiet at the moment.  Is there anyone 
currently getting this that would prefer to not be added to the list?

Cheers,

-Scott

----------  Forwarded Message  ----------

Subject: follow up from our talk / responce to your paper
Date: Tuesday 19 October 2004 13:25
From: Scott Wheeler <wheeler at kde.org>
To: rbaeza at dcc.uchile.cl

Hi Ricardo --

Ok -- so I finally am getting settled back into my normal life and got around 
to reading your paper.  As you probably expected, there are many points which 
I find interesting and probably worth some commentary.

First, I'll start with what's in fact clear and common in the things that 
we've been looking at.  We're both looking at ways to move the desktop 
towards a search centric interface -- and what some of the steps are in 
making that meaningful.  We're both also focusing on relationships between 
sets of attributes and "data" (though you're working more on blurring those 
lines).  The other thing that really resonated with me was the fact that it 
was repeated a few times that the current method or organizing information 
completely based on arbitrary bit of a user's memory is completely broken.  
I've repeated something very similar in my recent talks.

I guess there are a number of observations that seem relevant -- some 
academic, some practical -- and I'm probably freely mixing concepts from 
different parts of the paper.  Hopefully I'll manage with enough clarity to 
be useful.

=== Approaches ============================================

I think the first notable different we're starting from different places.  One 
of the the things that you're explicitly building on is the idea of "what if 
we could throw everything away [...]".  I think a lot of the ideas there 
hinge on getting things right at the lower layers.

From my side I've worked backwards from the interface.  I've taken the 
perspective of "What do I want to see in the interface?" and worked backwards 
from that question and towards building the necessary bits of technology to 
make something like that interface possible.

I've also focused on the idea of relationships of information, because at 
least for the set of interface ideas that I'm working with that's the most 
curtial element.  Of course from an implementation standpoint the system that 
I'm working on maps more naturally to an AVS than an HFS, but because I can't 
work from the perspective of being able to throw everything away, I'm working 
at a higher level and looking at what can be bolted onto the current HFS 
structures that will make contextual navigation and search more natural.

=== Domains ===============================================

Because I'm working with a web analogy -- represented as a graph -- there's 
also the notion of domains, called a NodeGroup in the current API, that 
represents a similar construct.

However, one of the things that I've been thinking about there -- and this 
applies to domains as well is if that's best left as an emergent property of 
a graph of connected information rather than an explicit grouping.  I'm not 
even sure if that's possible or practical in the set of applications that I'm 
looking at, but insofar as I've been attempting to use the WWW as an analogy 
to the infrastructure that I'm creating, I'm tempted to say that domains are 
almost ready to be deprecated.

I think more of what we find on the web are more dynamic groupings based on 
contextual linkage that tend to define much looser "domains" -- I think the 
notion of a grouping of information is still useful in the abstract, but I'm 
not yet quite certain that explicit grouping is an idea that's going to stay 
around indefinitely.  At least on the web at the present moment explicit 
domains have largely become useless whereas tightly coupled bits of 
information that emerge from a set of relationships much more often 
represents a conceptual "domain" for a given set of information than an 
explicit grouping.

I think our current representations of Domains (or for us NodeGroups) solve 
similar problems -- the represent logical groupings of information -- we've 
introduced them to solve things like mime-type associations or other 
information that's necessary, but I've already been wondering if the need for 
such is indicative of weaknesses elsewhere in the framework.

This is one that I'm still not sure on, but I'd be interested in your opinions 
on.

=== Domains & Documents vs. Objects =======================

Another thing that I wondered about in your paper was that given the starting 
point -- that everything could be thrown out -- and the desire to move away 
from arbitrary constructs why there was still the inclusion of a separation 
between documents and domains.

Was there a reason to not simply use a generalized "Object" abstraction where 
an attribute of an object could be another object or list of objects?

This was something that my thoughts tended towards when I was thinking on how 
I could map some of my own ideas to the idea of generalized domains (more on 
that in a moment) and would like domain properties.  So -- in a system where 
all assumptions can be thrown out, what is the advantage of differentiating 
between documents and domains?

=== Lack of Addressability ================================

When looking at how to build a useful search based interface framework I 
stumbled across a few issues, which led to other issues, and so on.

Initially I was working with two different conceptual problems -- two things 
that I thought were missing on the modern desktop:  useful search and 
linkage.  After a bit of thought and talking to others at the first 
conference where I presented some of these ideas it became clear that these 
were really part of the same problem.

At least from my (admittedly small) knowledge of modern web-based search 
systems, notably Google, relevance is seen as an emergent property of context 
-- in a nutshell, something's more important if it has a bunch of related 
things pointing to it.  This idea seemed fairly natural to me and immediately 
pointed to one of the other deficiencies (specifically one that I was already 
interested in) on the modern desktop:  there is no generalized way to link 
information on the desktop even in the somewhat crude way that we're able to 
on the WWW.  On the other hand the idea of linked and related information 
makes a lot more sense on the desktop than it even does on the WWW.  So, what 
do we need to get there?

Well, the first thing that was missing was an idea of resource addressability.  
We needed some way to point from a specific place in one resource to a 
specific place in another.

Once there was the abstraction for addressability this made it pretty easy to 
build the idea of links directionally between two addresses.  These will be 
weighted in terms of how they're generated.  The idea in such as system is 
that there will be a mix of types of links -- some gathered from metadata 
that points back to its source, some from explicit user or developer 
connection and some gather based on usage patterns.

Once that's there a lot of other things come for free; the original idea was 
to build such a system to make search more feasible, but as a side effect the 
ability to link information across applications and documents comes for free.  
With the goal of breaking down current interface hierarchies this is a nice 
bonus.

But of course this also gives us a place to start building a search algorithm 
that uses graph traversal and path lengths / weights to determine relevance.  
At least that's the idea.  Get all of the information we can into a graph 
like structure (likely stored in a relational database, but that's just an 
implementation detail) and do cool things with it.

Now, going back to the original point of this section -- it seems that the 
idea of addressability is missing in your paper.  This may be seen as 
something that's above the layers that you're interested in, but this has 
been something that in looking at how to build user interfaces based on this 
kind of stuff has been pretty fundamental to my thinking.

Beyond that linkage seems to be also missing.  (Again, possibly for similar 
reasons.)  Domains are similar in many ways -- but this is where I got back 
to the idea of domains needing properties to map links as a domain.  (i.e. a 
domain with two members and some attributes describing the relationship)

=== Network Issues ========================================

One of the things that I think is a problem with both of our systems -- at 
least as I see them is how to deal with networks and moving resources across 
networks without destroying the context that they typically exist in.

In one sense, if we can assume network persistance, we could simply address 
objects across the network and have references to them stored in our local 
resource store.  That's at least the best solution that I've been able to 
come up with at the moment.

However, since I've still got the backdrop of traditional files -- there's 
still *something* that the users can do with the data once its been moved 
across a network, however this would seem to break down even more with the 
idea of purely a "document composed of attributes" system -- especially since 
those attributes aren't particular to a specific document.

I guess the entire group of related object could be sent across in some kind 
of encapsulated transport or something, but here I'm just throwing around 
ideas.

This is one of the rather nasty issues that's stuck around for me at the 
moment; if you've got some sort of magical solution to such, I'd certainly be 
interested in hearing it.  :-)

===========================================================

Ok -- so those were the major groups of issues that jumped out at me.  Myself 
and one other person from the KDE project have worked out an informal design 
document, which I'll probably formalize and kind of structure in a paper-like 
way (though if I publish anything on such it would probably be in a "light" 
format for the Linux-geared press, since that's really the only thing that 
I'm connected to at the moment).

Anyway -- looking forward to your thoughts if you have time.  Again thanks for 
meeting with me in Santiago.

Cheers,

-Scott

-------------------------------------------------------