[Digikam-users] backup and data integrity

Gerry Patterson thedeepvoice at gmail.com
Mon Jan 21 20:25:35 GMT 2008


On Jan 21, 2008 1:44 PM, Arnd Baecker <arnd.baecker at web.de> wrote:

> On Thu, 17 Jan 2008, Arnd Baecker wrote:
>
> [...]
>
> > Would some checksum system, integrated into digikam, be useful,
> > in view of ensuring data integrity for backups?
> > I think it wouldn't be too difficult to implement something like
> > this (I briefly discussed with Marcel on the IRC and
> > with digikam >=0.10 such additions to the database will be easy).
> > Note that it might come with a bit of a speed penalty when
> > images/metadata get changed; however, this could be made
> > configurable.
>
> So in order to not just talk about stuff, but to try it out, I
> set up two python scripts which
> A) Generate a recursive tree which contains
>   for each file below digikams root (e.g. ~/Pictures)
>   a corresponding md5sum *.hash file
>
> B) Perform a check for each file in the backup
>   if the checksum matches.
>
> Interestingly, in my case this already revealed
> around 500 files which did not match.
> (In this particular case it was essentially a user
> error, because I changed the metadata (GPS info) for
> those files, but without changing the file date.
> As I used rsync such that it would not copy over these
> files, the back-up went out of sync).
>
> So without a hash comparison, I would have never realized
> the inconsistency!
>
> Well, in my opinion we should get some tools to
> enable the check of data-integrity into digikam itself ...
>
> Any thoughts/comments/suggestions/... are welcome
> to flesh out the ideas of what would be necessary/what makes sense/...!
>
> Best, Arnd
>

Hello Arnd,

What options are you passing to rsync?  If you give it the '-c' option rsync
will skip based on a checksum instead of mod-time and size.  This would at
least make your backup consistent with your master.  However, it would not
avoid the original-corrupted-then-backup issue you brought up earlier.

As I think about this, it sounds like implementing a SCM.  Basically, you
want to know if a file has changed on disc with or, in your case, without
intention.  In theory, when you have a new file you would 'check it in' to
the picture repository.  If you make changes you 'check in' the new version
of the file.  In your case a "check-in" would be to create a check-sum of
the file.  This leads me to thinking about the "Versioned image" request
that is already in digikam.  Perhaps a single solution would handle both
cases?

Best Regards,

 Gerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/digikam-users/attachments/20080121/aaea7762/attachment.html>


More information about the Digikam-users mailing list