Small (un-representative) benchmark on sqlite with blobs

Mon Jun 25 08:49:37 UTC 2007

Hi,

Il giorno 25/giu/07, alle ore 02:40, Andreas Pakulat ha scritto:

> Hi,
>
> David, Kris and myself had a (short) discussion about how to persist
> duchain data. Especially with multi-projects in mind we might get  
> quite
> some data.

My problem with SQL is its textual representation. In general the  
engines are pretty fasts, but unfortunately you have to parse and  
generate SQL statements for every single operation. Think about the  
result SQL statement when you ask for all the symbols in the global  
namespace ;-) That takes quite a bit of resources, especially in a  
real time environment. KDevelop is not a compiler, you don't need to  
store a lot of information in its persistent storage. I think you can  
use an approach similar to the per project PCS file I did for  
KDevelop 3. The project's PCS file is a dump of the Code Model,  
KDevelop uses it to speedup the project loading. My feeling is you  
just need a better file format for PCS file. What do you need to  
store in the PCS file? for sure:

  * The macro definitions
  * The type table
  * The name table
  * The file table
  * The symbol table (variables, functions, classes, uses, ...)
  * The scope chain

You can encode the type table as a vector of unique type ids. In  
general the type table is very compact. You have a fixed set of  
primitive types(less than 20), the function signatures(I think in Qt  
we have about 2000 different signatures), the class definitions, and  
then a sequence of arrays, references, pointers, and pointer to members.

The symbol table is just a sequence of symbols, so you can store the  
array of symbols. Some thing for the name and file table.

The scope chain is more interesting. A Scope in C++ has stack-like  
access. and for each symbol you need to know the original scope of  
the symbol, and the shadowed symbols. You can pretty much encode the  
scope with a pointer to the previous scope, an array of the symbols  
introduced in the scope, and a array of buckets (linked list of  
symbols stored in reverse order). You find your bucket using the name  
id's hash value. The first symbol in the bucket with name `id' is the  
visible definition of `id'. The other symbols in the bucket with name  
`id' and the symbols in the previous scope with name `id' are  
shadowed symbols.. hmm, two arrays and a pointer to the previous  
scope. You can store that ;-)

Now, the whole thing here is in the order of thousands of elements.  
You don't need an SQL engine for that ;-) The data structures used  
are trivial (structs, arrays, and pointers), you just dump this stuff  
to a file by replacing pointers with ids(like we do in kdev3's PCS  
files) and you are in a pretty good shape for KDevelop4 :-)

One last thing. You want to be able to load and unload PCS(and maybe  
part of PCS) files on demand, so you can write a little cache of PCS  
files ;-) In case you want to load 4-5 projects.

ciao robe