Small (un-representative) benchmark on sqlite with blobs

Tue Jun 26 08:23:10 UTC 2007

On 6/25/07, David Nolden <david.nolden.kdevelop at art-master.de> wrote:
> On Monday 25 June 2007 15:59:16 Jens Dagerbo wrote:
> > On 6/25/07, Andreas Pakulat <apaku at gmx.de> wrote:
> > But.. in KDev3 the database was actually used for queries.. if you are
> > saying it should all go into a blob, why not simply use a file on
> > disk? What does the the database give you at all in this case?
> >
> > And while on topic.. all data in one blob?? You're only interested in
> I was thinking about the file-on-disk thing too. But noone is talking about
> storing everything in one blob here, of course it would be many many separate
> blobs, that could be loaded each on demand.
>
> > persistence here? It seems to suggest that you will have all of the
> > data (duchain, whatever) in memory during normal operation, but surely
> > that will use up way too much memory? KDev3 kept (and persisted) only
> > the project PCS in memory, and used bdb for lookups against external
> > libraries. This to keep the memory usage down. (And with large
> > projects, this was a bit too heavy too.)
> >
> > // jens
>
> The general idea is that every du-chain ever parsed is stored in a database on
> disk, and can be loaded on-demand when needed. That way, after some usage, we
> would have a database for most possible versions of all commonly used
> header-files in the database, and parsing could become lightning fast even on
> a new project, while still being correct.
>
> The du-chain database would consist of this:
> - A mapping from absolute file-names to a list of environment-matching nodes
> - One environment-matching node for each du-chain
> - All the du-chains
>
> The environment-matching nodes will be needed to decide whether one of the
> stored du-chains can be used in a given environment.
>
> We cannot store it all in a flat file, because everything may change on the
> fly as code changes, including the environment-matching nodes and the
> du-chains. Any node should also be deletable from the database any time(for
> example when one a file changes, all other versions of the file including the
> matching-nodes can be deleted). Also it would be hard to just load a specific
> needed du-chain in a flat file.
>
> The only option would probably be storing everything in separate files. But
> that might either become very slow when a lot of small du-chains need to be
> loaded, or we would need to implement some complex logic to group du-chains
> that commonly appear together into bigger files, so we could load them all at
> once.
>
> I think the best solution might be storing it all in a very simple and
> efficient database, because the management-complexity would stay within a
> stable database-implementation and we wouldn't need to care much about it.
>

If I understand this correctly, it still means that if you want to use
this system to implement something like KDev3s Quick Open Function or
the Ctags lookup you would still need to suck all the data into memory
before you can perform the query?

I guess this may or may not be sufficient, depending on what features
you want to support.
Example where I don't expect it to work: "Goto declaration" for a
referenced function in a correctly included header would reasonably be
cheap, but a cool feature would be if you could ask KDevelop "given
this here printf() function, what header do I need to include?". With
a real query frontend to a database, that could be a really cheap
operation, with what you suggest it probably can't.

// jens