[Kde-imaging] Lossless image compression

Wed Feb 8 12:56:14 UTC 2012

On Tuesday 07 February 2012 14:58:26 Dmitry Kazakov wrote:
> I can tell what we currently have.
> 
> Swapping
> We have tiling, pooling and swapping at the level higher than the
> compression itself. Every tile, which is 64 by 64 pixels is
> compressed independently. The compressed data is written to a file
> mapped to RAM and OS decides when this mapped data should be
> actually swapped out to the hard drive. Of course the size of the
> mapped region is limited so the data is flushed automatically when
> the "sliding window" of the mapped region moves forward in the
> file. The size of the window is 16 MiB. It is chosen in
> consideration to house 8-bit RGBA image of size 2048x2048 pixels
> (not counting the reduction of size due to compression). No
> benchmarks were actually done, so the size of the window might be
> not optimal. Actually, I think we can make this window larger on
> modern machines, but I haven't finished the benchmark yet.
> 
> Compression
> For the actual compression of the tile data we use LZF algorithm. I
> did some research on the compression times of LZO and LZF [0],
> [1]. It turned out that LZF has better times. It takes twice
> faster to compress the image with LZF while keeping almost the
> same compression rate. One more thing I found then is that it is
> faster to "linearize" colors before compression and then compress
> the image than compress the image directly (RGBARGBA -> RRGGBBAA).
> It boosts both the speed and the compression rate. The time of the
> "linearization" itself is negligible in comparison with the boost
> it give, so now we do it this way.
> 

Thanks for the description, Dmitry.

> API
> The compression is done at the level of the datamanager so it
> doesn't know the colorspace we work in. It knows the size of a

Well, having that information is needed to get a good compression 
rate. Otherwise, you are simply interpreting all values as bytes, 
without taking into account that small pixel differences are likely to 
occur more often. I will have to check, if compression can have access 
to raw tile data, so that linearization is not needed.

> tile and the size of a pixel. The algorithm is split into two
> classes:  KisTileCompressor2 [2] and KisLzfCompression [3]. The
> benchmarks are placed in [4].

Link [3] and [4] are identical.

> [0] - http://dimula73.narod.ru/lzo_vs_lzf.pdf
>        In "Linearized" column you see the total time of
> linearization + compression.

Do you still have that 640x441 RGBA image? Which memcpy 
implementation? Also, why is memcpy of a single tile three times as 
fast as memcpy of the complete image? Did you iterate multiple times 
and measure cache performance? :)

From the compression rate, it looks like the compression is actually 
only done for the alpha channel, whereas the RGB data is nearly 
uncompressed. I am not sure how often the data actually gets written 
to disk.

If it gets written to the disk, having a better compression rate will 
result in the speedup, not a faster compression method. If you are 
mostly doing in-memory operations, having the fastest method available 
will be beneficial, but with the marginal improvement you get, I would 
not bother compressing the RGB data at all, only the alpha.

IZ compression is symmetrical, i.e. needs the same time for 
compression and decompression. From looking at your numbers, 
compression speed for IZ should be similar to LZF, but decompression 
will be much slower. I can try to extract the LZF code and make a 
comparison on my machine, if you have representative test data (Krita 
isn't really for photos, which I used for all my benchmarks).