Tiles data format

Tue Jun 15 17:49:52 CEST 2010

Not sure why but I almost missed your mail :(

On Sunday 13 June 2010, Dmitry Kazakov wrote:
> He-he :) Right at the moment i was planning swapping and compression
> architecture =).
> 
> Well, i have several points:
> 1) If we change the format, then the first thing we should do is add
> VERSION field to the beginning of the file. Just to make our lives easier.

Sure.

> 2) I was considering swapping, and i wanted to use LZO compression
> algorithm for this. According to some papers it is the fastest. (But those
> papers didn't cover LZF). [0]
Well my experimentation show that LZO and LZF are of the same speed order, but 
that LZO compress better. I was leaning toward LZF because LZO seems to have a 
very complex code.

> 3) For swapping it will surely be *not* efficient to work with a single
> tile. So i was going to join them into slabs and transport to disk at once,
> though compression will most likely be by-tile.
> 
> 4) The thing that worries me the most is "compression sliding window". This
> term relates to most of LZx algorithms. LZO uses ~8-64KiB (do not know
> exactly). Regular tile has a size of 4*64*64=16KiB. It means that the tile
> can be fit into this window fully. This (not sure) means that the algorithm
> will work less efficient(!), than it can.
> 
> We must understand this effect fully before writing anything, or we can get
> many problems later.
> 
> 
> More than that, effect you were exploring in March can (not sure, again) be
> a consequence of this. Original LZ77 algorithm does not depend on the size
> of input and almost linear in time (when the size of the input is much
> bigger than the size of the sliding window).

Wikipedia pages on KZ77 [1] explain clearly the meaning of a sliding window : 
"The encoder and decoder must both keep track of some amount of the most 
recent data, such as the last 2 kB, 4 kB, or 32 kB. The structure in which 
this data is held is called a sliding window, which is why LZ77 is sometimes 
called sliding window compression. The encoder needs to keep this data to look 
for matches, and the decoder needs to keep this data to interpret the matches 
the encoder refers to. This is why the encoder can use a smaller size sliding 
window than the decoder, but not vice-versa."

It just means that the compression algorithm won't look for similar pattern on 
more than 64kb, it explains why in my March exploration the result with LZ77 
(zip/libz) is similar when we compress all the tiles at the same time, or only 
one.

So yes, in theory, it is more efficient to compress a set of tile than an 
individual tile, since we would be able to use data from different tile to find 
patterns. But then we would need to be able to stream compression and 
decompression, and lose the possibility to access only some of the tiles. 
Also, as my experimentation has shown, the gain is marginal.

[1] http://en.wikipedia.org/wiki/LZ77_and_LZ78
> 
> If that is so, this non-linearity can be a flaw of libLZF. Maybe, some
> post-processing with Haffman trees.
> 
> [0] - http://www.oberhumer.com/opensource/lzo/
> 
> 
> My conclusion:
> 1) I don't see any reason of writing tile-save-compression system
> separately and before the one that will be used in a swapper. We'll just
> make many code-duplications.
Apart from the compression method, I don't see many code-duplication (and if 
you want my opinion, tile-save-compression is way more important than swapper 
:) it is the last thing that truly annoys me in krita 2.2)

> 2) We can make kra-format more extensible before this. Yes. And if we
> define new format well I can easily fix KisTiledDataManager::write/read
> accordingly.
I have no objection to your proposed changed.

-- 
Cyrille Berger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/kimageshop/attachments/20100615/58ad1a20/attachment.htm