[Digikam-users] backup and data integrity

Gerhard Kulzer gerhardkgmx at gmail.com
Wed Jan 23 16:28:31 GMT 2008


Am Tuesday 22 January 2008 schrieb Arnd Baecker:
> On Mon, 21 Jan 2008, Gerhard Kulzer wrote:
>
> [...]
>
> > Arnd, can you send me the script? I'd like to try too.
>
> Done (off-list, it is really not ment for general consumption ... ;-)
>
> > I just read that strigi is exactly doing what we want, comparing files
> > with sha1. Maybe sha1 is faster than md5?
>
> No idea. Maybe we should do a speed test at some point ;-)
>
> > Strigi creates a sha1 of every file and stores it its DB. Then it checks
> > for file date changes and if yes, runs sha1 to see if it really has
> > changed before grepping it thouroughly.
>
> Looking at
> http://strigi.sourceforge.net/?q=features
> it does not seem to support images?
>
> I don't yet fully see how strigi will finally fit into
> "the" solution, this is definitively something to look at in more detail!
> Thanks for the pointer.
>
> Best, Arnd

Hi Arnd,
I try to sumarize what we said last night on IRC, just as a public memo.

Aim is to 
a) prevent corrupt images to be saved onto disk and to 
b) detect existing corrupt files on disk 
  (to prevent overwriting of potentially good backups)

Strategies like DIF and HARD are not available in the consumer market for 
another couple of years, but given the inclrease in size, speed and 
complexity of systems, consumer system will implement some kind of ECC 
(horizon ~ 3y).

Protection on file system level as provided by zfs and btrfs are good but 
insufficient as they protect the disk only and not the transmission chain 
appl - OS - I/O controller - fs

So we have to do it 'by hand' (meaning digikam)
While saving a file after modification a)
1. keep it in memory
2. save it to disk
3. flush disk to clear cache
(3a. make sure all disk internal buffers are cleared by reading other data the 
size of the disk buffer) = optional
5. run CRC checksum on file on disk and file in memory
5a. alternative: store checksum already in metadata and save it with file. 
6. if mismatch, re-write file and repeat procedure

for problem b)
7. if 5a was used, as simple scrubbing scan can be launched, manually or 
programmed at frequency X
7a. try to open files and look for errors produced (but this method is not 
reliable, I have images that show the upper part, are corrupt and produce no 
error message. However, the more severe error can be found)
8. generate user alert so that one can manually check between backup and 
original.

This method may seem tedious, but has the advantage of being independent of OS 
and file system, works on nfs as well.

Gerhard

-- 
><((((º> ¸.·´¯`·... ><((((º> ¸.·´¯`·...¸ ><((((º>
http://www.gerhard.fr
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/digikam-users/attachments/20080123/d9a74a13/attachment.sig>


More information about the Digikam-users mailing list