[Digikam-devel] [Bug 283013] Accelerating writing metadata back to image files

Gerhard Kulzer gerhardk at gmx.ch
Fri Sep 30 09:05:03 BST 2011


https://bugs.kde.org/show_bug.cgi?id=283013





--- Comment #4 from Gerhard Kulzer <gerhardk gmx ch>  2011-09-30 08:05:03 ---
First, thank you very much Andreas for this detailed explanation, it's good to
memorize this one.

Concerning the rsync mechanisms, I found this description on the Wikipedia site
of rsync:


"The rsync utility uses an algorithm invented by the Australian computer
programmer Andrew Tridgell for efficiently transmitting a structure (such as a
file) across a communications link when the receiving computer already has a
similar, but not identical, version of the same structure.

The recipient splits its copy of the file into fixed-size non-overlapping
chunks and computes two checksums for each chunk: the MD4 hash, and a weaker
'rolling checksum'. (Version 30 of the protocol, released with rsync version
3.0.0, now uses MD5 hashes rather than MD4.[14]) It sends these checksums to
the sender.

The sender computes the rolling checksum for every chunk of size S in its own
version of the file, even overlapping chunks. This can be calculated
efficiently because of a special property of the rolling checksum: if the
rolling checksum of bytes n through n + S − 1 is R, the rolling checksum of
bytes n + 1 through n + S can be computed from R, byte n, and byte n + S
without having to examine the intervening bytes. Thus, if one had already
calculated the rolling checksum of bytes 1–25, one could calculate the rolling
checksum of bytes 2–26 solely from the previous checksum, and from bytes 1 and
26.

The rolling checksum used in rsync is based on Mark Adler's adler-32 checksum,
which is used in zlib, and is itself based on Fletcher's checksum.

The sender then compares its rolling checksums with the set sent by the
recipient to determine if any matches exist. If they do, it verifies the match
by computing the hash for the matching block and by comparing it with the hash
for that block sent by the recipient.

The sender then sends the recipient those parts of its file that did not match
the recipient's blocks, along with information on where to merge these blocks
into the recipient's version. This makes the copies identical."

There is a longish but nice interview with Andrew Tridgell, the creator of
rsync here: http://oceanpark.com/webmuseum/rsync.html

So it works on blocks, which seem to be chunks of 500-1000 bytes (as I read on
various sources). Anyways, judging from the logs I get from rsyncing, the
change size is usually less than 1% of an image, and that may contain several
blocks of course.

-- 
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Digikam-devel mailing list