[KPhotoAlbum] Optimization of index.xml

Jesper K. Pedersen blackie at blackie.dk
Mon Aug 28 02:53:28 BST 2006


On Sunday 27 August 2006 21:47, Robert L Krawitz wrote:
|    From: "Jesper K. Pedersen" <blackie at blackie.dk>
|    Date: Sun, 27 Aug 2006 20:43:32 -0400
|
|    Hmmm rather undecided on this. It would indeed make it harder to
|    write scripts and other similar things against the index.xml file.
|
| I don't see how any of the suggestions I made would make it harder to
| write scripts against it, but I agree that storing option values as
| ID's would make it harder.
Well they can't assume that a given item is there. I'm not saying much, but a 
little

|    Here is a few random thought:
|    - if it really would change speed, then I would be rater interestd
|
|    - My long term goal is to get away from the index.xml and rather
|    use a database (no no, breath again, please, and read the rest of
|    the sentence :) this database should be something which does not
|    require any installation, like sqlite. In addition there would be
|    an option to export the db from the index.xml format on exit, and
|    an option to import from this file, so that people would still have
|    this safetynet. The reason for this move would be to free resource
|    from maintaining two backends.
|
| Actually, I have nothing really against a database back end per se,
| other than the fact that it seems like overkill.  My intuition may not
| be correct, however.  Certainly a flat file is very expensive if
| you're typically doing only a few updates, and that may be a very
| common way of doing things.
Well there are two issues.
1) loading everything into memory is bad when you have a big DB - heck someone 
sent me an index.xml file the other day that I could not open on my laptop 
with 500 Mb of ram.
2) loading everything into memory means that only one person can access the db 
at a time. Therefore you and your wife (or you and your coworkers) can't 
annotate images at the same time.

|    - the compressed index.xml option in the settings menu does
|    actually only save index for each image, did you try that?
|
| Given the history of the compressed option, no.  If the compressed
| index.xml isn't simply a zip or gzip or bzip2 of the index.xml file
| (and it apparently isn't, given what you say here and what other
| people have reported), I'm not touching it with a ten foot pole.
| Anything that increases the number of code paths through the save code
| is asking for trouble.
OK, here is a very good reason for using this if you think that way:
*I* am using the compressed option.

Basically what it does is that it saves a more unreadable index.xml (indexes 
vs. the real names). This index.xml is approx twice as fast loading as the 
full index.xml.
|
|    On Sunday 27 August 2006 20:34, Robert L Krawitz wrote:
|    | I think we could further optimize the index.xml file by removing data
|    | that either has obvious defaults or can otherwise be computed easily
|    | without having to look at the actual image file.
|    |
|    | 1) What's the purpose of storing both a startDate and an endDate in
|    |    the index.xml file?  Is this for videos (and if so, my index.xml
|    |    shows identical start and end dates for my videos)?  Would it make
|    |    sense to store only the startDate unless the endDate differs?
|    |
|    | 2) All images have a "description", even though I rarely use it.
|    |    Would it make more sense to not insert the description unless it's
|    |    actually present?
|    |
|    |    Also, would it make more sense for the description to be a child of
|    |    the image, rather than an attribute?  That way it could be free
|    |    text.
|    |
|    | 3) The angle is always stored, even though for most people it's 0
|    |    (landscape format) for most images.  Again, would it make more
|    |    sense to only store this if needed?
|    |
|    | 4) Finally, the label is usually (if not always) simply the basename
|    |    of the image.  Would it be better to not actually store this and
|    |    simply find it when loading the file?  It could be found
|    |    efficiently while parsing the folder -- simply skip beyond the last
|    |    separator and search for the final . in the filename.
|    |
|    | Some stats for my current index.xml:
|    |
|    | 	       Size		% vs. snapshot	% vs. SVN
|    | Last snapshot: 8801509		100.0		N/A
|    | Current SVN:   7108972		 80.8		100.0
|    | (1):	       6616631		 75.2		 93.1
|    | (2):	       6372641		 72.4		 89.6
|    | (3):	       6232615		 70.8		 87.7
|    | (4):	       5953336		 67.6		 83.8
|    |
|    | We could save more by storing option values as their id's rather than
|    | in actual text form.  That would offer the potential of quite
|    | substantial savings, but I'm not so sure that we should do that
|    | because it's a lot riskier if something goes wrong -- if index numbers
|    | get mixed up, it could be very hard to unscramble -- and because it
|    | makes it harder for someone to examine the file.  On the other hand, I
|    | don't really see why we need to have the index numbers stored for each
|    | value as opposed to simply building up the list of values as the file
|    | is loaded (is it to preserve ordering in the attribute lists?).
|
|    --
|    Having trouble finding a given image in your collection containing
|    thousands of images?
|
|    http://www.kphotoalbum.org might be the answer.
|
|    _______________________________________________
|    KPhotoAlbum mailing list
|    KPhotoAlbum at kdab.net
|    http://mail.kdab.net/mailman/listinfo/kphotoalbum

-- 
Having trouble finding a given image in your collection containing
thousands of images?

http://www.kphotoalbum.org might be the answer.




More information about the Kphotoalbum mailing list