[KPhotoAlbum] Optimization of index.xml

Robert L Krawitz rlk at alum.mit.edu
Mon Aug 28 02:47:47 BST 2006


   From: "Jesper K. Pedersen" <blackie at blackie.dk>
   Date: Sun, 27 Aug 2006 20:43:32 -0400

   Hmmm rather undecided on this. It would indeed make it harder to
   write scripts and other similar things against the index.xml file.

I don't see how any of the suggestions I made would make it harder to
write scripts against it, but I agree that storing option values as
ID's would make it harder.

   Here is a few random thought:
   - if it really would change speed, then I would be rater interestd

   - My long term goal is to get away from the index.xml and rather
   use a database (no no, breath again, please, and read the rest of
   the sentence :) this database should be something which does not
   require any installation, like sqlite. In addition there would be
   an option to export the db from the index.xml format on exit, and
   an option to import from this file, so that people would still have
   this safetynet. The reason for this move would be to free resource
   from maintaining two backends.

Actually, I have nothing really against a database back end per se,
other than the fact that it seems like overkill.  My intuition may not
be correct, however.  Certainly a flat file is very expensive if
you're typically doing only a few updates, and that may be a very
common way of doing things.

   - the compressed index.xml option in the settings menu does
   actually only save index for each image, did you try that?

Given the history of the compressed option, no.  If the compressed
index.xml isn't simply a zip or gzip or bzip2 of the index.xml file
(and it apparently isn't, given what you say here and what other
people have reported), I'm not touching it with a ten foot pole.
Anything that increases the number of code paths through the save code
is asking for trouble.

   On Sunday 27 August 2006 20:34, Robert L Krawitz wrote:
   | I think we could further optimize the index.xml file by removing data
   | that either has obvious defaults or can otherwise be computed easily
   | without having to look at the actual image file.
   |
   | 1) What's the purpose of storing both a startDate and an endDate in
   |    the index.xml file?  Is this for videos (and if so, my index.xml
   |    shows identical start and end dates for my videos)?  Would it make
   |    sense to store only the startDate unless the endDate differs?
   |
   | 2) All images have a "description", even though I rarely use it.
   |    Would it make more sense to not insert the description unless it's
   |    actually present?
   |
   |    Also, would it make more sense for the description to be a child of
   |    the image, rather than an attribute?  That way it could be free
   |    text.
   |
   | 3) The angle is always stored, even though for most people it's 0
   |    (landscape format) for most images.  Again, would it make more
   |    sense to only store this if needed?
   |
   | 4) Finally, the label is usually (if not always) simply the basename
   |    of the image.  Would it be better to not actually store this and
   |    simply find it when loading the file?  It could be found
   |    efficiently while parsing the folder -- simply skip beyond the last
   |    separator and search for the final . in the filename.
   |
   | Some stats for my current index.xml:
   |
   | 	       Size		% vs. snapshot	% vs. SVN
   | Last snapshot: 8801509		100.0		N/A
   | Current SVN:   7108972		 80.8		100.0
   | (1):	       6616631		 75.2		 93.1
   | (2):	       6372641		 72.4		 89.6
   | (3):	       6232615		 70.8		 87.7
   | (4):	       5953336		 67.6		 83.8
   |
   | We could save more by storing option values as their id's rather than
   | in actual text form.  That would offer the potential of quite
   | substantial savings, but I'm not so sure that we should do that
   | because it's a lot riskier if something goes wrong -- if index numbers
   | get mixed up, it could be very hard to unscramble -- and because it
   | makes it harder for someone to examine the file.  On the other hand, I
   | don't really see why we need to have the index numbers stored for each
   | value as opposed to simply building up the list of values as the file
   | is loaded (is it to preserve ordering in the attribute lists?).

   -- 
   Having trouble finding a given image in your collection containing
   thousands of images?

   http://www.kphotoalbum.org might be the answer.

   _______________________________________________
   KPhotoAlbum mailing list
   KPhotoAlbum at kdab.net
   http://mail.kdab.net/mailman/listinfo/kphotoalbum





More information about the Kphotoalbum mailing list