Small (un-representative) benchmark on sqlite with blobs

Mon Jun 25 18:01:57 UTC 2007

On Monday 25 June 2007 15:59:16 Jens Dagerbo wrote:
> On 6/25/07, Andreas Pakulat <apaku at gmx.de> wrote:
> But.. in KDev3 the database was actually used for queries.. if you are
> saying it should all go into a blob, why not simply use a file on
> disk? What does the the database give you at all in this case?
>
> And while on topic.. all data in one blob?? You're only interested in
I was thinking about the file-on-disk thing too. But noone is talking about 
storing everything in one blob here, of course it would be many many separate 
blobs, that could be loaded each on demand.

> persistence here? It seems to suggest that you will have all of the
> data (duchain, whatever) in memory during normal operation, but surely
> that will use up way too much memory? KDev3 kept (and persisted) only
> the project PCS in memory, and used bdb for lookups against external
> libraries. This to keep the memory usage down. (And with large
> projects, this was a bit too heavy too.)
>
> // jens

The general idea is that every du-chain ever parsed is stored in a database on 
disk, and can be loaded on-demand when needed. That way, after some usage, we 
would have a database for most possible versions of all commonly used 
header-files in the database, and parsing could become lightning fast even on 
a new project, while still being correct.

The du-chain database would consist of this:
- A mapping from absolute file-names to a list of environment-matching nodes
- One environment-matching node for each du-chain
- All the du-chains

The environment-matching nodes will be needed to decide whether one of the 
stored du-chains can be used in a given environment.

We cannot store it all in a flat file, because everything may change on the 
fly as code changes, including the environment-matching nodes and the 
du-chains. Any node should also be deletable from the database any time(for 
example when one a file changes, all other versions of the file including the 
matching-nodes can be deleted). Also it would be hard to just load a specific 
needed du-chain in a flat file.

The only option would probably be storing everything in separate files. But 
that might either become very slow when a lot of small du-chains need to be 
loaded, or we would need to implement some complex logic to group du-chains 
that commonly appear together into bigger files, so we could load them all at 
once.

I think the best solution might be storing it all in a very simple and 
efficient database, because the management-complexity would stay within a 
stable database-implementation and we wouldn't need to care much about it.

greetings, David