KLink::Group

Mon Nov 15 21:36:45 CET 2004

On Monday 15 November 2004 18:56, Aaron J. Seigo wrote:
> > It's only when it's in relation to a music file that it becomes
> > an artist. [...]
>
> well, the graph that says "Bob Dylan is a musical artist" may actually look
> like: [...]

Yes, I didn't mean that exclusively.  Said generally it would have been "it's 
only when the node is related to some concrete resource that it has 
meaningful context".

> in the actual data representation, the links are implicitly grouped (by
> arriving at or leaving from the same node (or set of nodes?))...

I don't agree there, actually.  The *nodes* are grouped by sharing linkage to 
another node.  For instance a node the represents a group "head" as we called 
it groups other nodes by them being linked to that "head", but it doesn't 
really say anything about the links themselves.

The second part -- "set of nodes" -- could be more useful, but I'm not sure 
that it's the best way to express node grouping.  I don't think it makes 
sense to have links that terminate at multiple places.  I think that would 
just be a hack to make things kind of fit without introducing an additional 
structure.

I think I like explicit node grouping more than links with multiple endpoints.

> in the API it would be useful to give the programmer easy access to this
> group of links that extend from a node for traversal, as this is certainly
> a way to show the implicit structure in the graph.

Yes, I think I was fumbling around with this idea in the KLink::Search API, 
but hadn't really developed it much since in the last couple of weeks I've 
been focusing on the bits that will need to be there to get the indexer 
working.  Once that's in place it'll be natural to build up the parts of the 
API that are needed for search / traversal.

But indeed.  I think something like this will be needed:

KLink::Link::List incomingLinks() const;
KLink::Link::List outgoingLinks() const;

> we're starting to get into "how do we show the concept of a graph via the
> API?" i'd really like to see a class that represents a "star" datatype
> (node with multiple radiating, directional links) that supports the
> following things:
>
> 	o link enumeration/iteration ("let's list our links")
> 	o traversal ("move along this link")
>
>  this is a logical extension of  a Node, really. and i think the first
> point above (along with the Graph class below) may be what you are looking
> for when it comes to link groups?

I don't think so, because it still doesn't describe the links, really.  The 
paths to and from a node describe the node, rather than being meta-data about 
the type of link that it is.

I'm thinking mostly from a search point of view here with full text indexing 
in mind.  Let's say we have the word "foo".  That's a node.  We want to 
figure out what relates to "foo".  We start from the node "foo" and traverse 
the links from there (using methods like the above).  But they're going to 
lead to all sorts of other nodes -- there's not a single source for "foo".  

If we want some way to restrict the domain of the search / traversal we need  
to know something additional about those links though.  We can store that bit 
of information either in the link itself (i.e. multiple destinations) or we 
can have an external grouping mechanism and work on intersections of those 
groups.  I prefer the latter really.

This would lead to something like:

class KLink::LinkGroup;

KLink::Link::List incomingLinks(const KLink::LinkGroup::List &domains) const;
KLink::Link::List outgoingLinks(const KLink::LinkGroup::List &domains) const;

Where the intersection of the list of groups would be the search domain.  We 
might want to set up some stuff for intersections / unions, but that's easy 
enough.

> and a collection class that manages a collection of these things that
> supports, call it perhaps Graph:
>
> 	o transparent node loading ("i'm going to extend the graph in the
> direction you are traversing, so you can keep traversing in that
> direction") o frame shifting, where a frame is defined as having a "trunk"
> node (an arbitrary, and even temporary, distinction) and the links/nodes
> around it, allowing one to move the subgraph by traversing
>
> Node "iterators" that operate on a Graph, and an ABC that provides the
> basis for operations on graphs which takes a Graph and does things to it,
> e.g. maximal flow or fingerprinting.
>
> thoughts?

Iterators normally assume something that is linear and finite, so the mapping 
would be a bit quirky (since for all practical purposes the graph will be 
neither), but while at first I didn't like the idea, I may like it in some 
cases.

Specifically a generalized iterator might be an interesting way to return 
search results.  The real question is how much "look ahead" we'll need to get 
accurate results, but that would solve the problem of having a bounded set of 
results returned and figuring out what to do when you want more.  It would 
also give a pretty natural way to represent relevance and to encapsulate 
different search mechanisms.

Generalized (unsorted) traversal will yield some interesting structures, but I 
don't think it's that hard, really.

> eventually we may want to add the frills of a NodeCache and reference
> counted Nodes, opportunistic loading in Graphs, etc.. but that's all
> completely unnecessary to the initial goal, just something to keep in mind
> (or in the archives so i can remember after i forget)

Yeah -- I actually removed the d-pointers for most of the stuff for the 
moment.  At first I decided that I'd like to implement everything as being 
queried from the database on demand.  Depending on the performance of such we 
may need to do some caching later, but well, optimizing already is a bit 
silly.

-Scott

-- 
If the answer is "more lawyers" then the question shouldn't have been asked.