[Digikam-users] backup and data integrity

Arnd Baecker arnd.baecker at web.de
Tue Jan 22 19:57:23 GMT 2008


On Mon, 21 Jan 2008, Gerry Patterson wrote:

> On Jan 21, 2008 1:44 PM, Arnd Baecker <arnd.baecker at web.de> wrote:
>
> > On Thu, 17 Jan 2008, Arnd Baecker wrote:
> >
> > [...]
> >
> > > Would some checksum system, integrated into digikam, be useful,
> > > in view of ensuring data integrity for backups?
> > > I think it wouldn't be too difficult to implement something like
> > > this (I briefly discussed with Marcel on the IRC and
> > > with digikam >=0.10 such additions to the database will be easy).
> > > Note that it might come with a bit of a speed penalty when
> > > images/metadata get changed; however, this could be made
> > > configurable.
> >
> > So in order to not just talk about stuff, but to try it out, I
> > set up two python scripts which
> > A) Generate a recursive tree which contains
> >   for each file below digikams root (e.g. ~/Pictures)
> >   a corresponding md5sum *.hash file
> >
> > B) Perform a check for each file in the backup
> >   if the checksum matches.
> >
> > Interestingly, in my case this already revealed
> > around 500 files which did not match.
> > (In this particular case it was essentially a user
> > error, because I changed the metadata (GPS info) for
> > those files, but without changing the file date.
> > As I used rsync such that it would not copy over these
> > files, the back-up went out of sync).
> >
> > So without a hash comparison, I would have never realized
> > the inconsistency!
> >
> > Well, in my opinion we should get some tools to
> > enable the check of data-integrity into digikam itself ...
> >
> > Any thoughts/comments/suggestions/... are welcome
> > to flesh out the ideas of what would be necessary/what makes sense/...!
> >
> > Best, Arnd
> >
>
> Hello Arnd,
>
> What options are you passing to rsync?  If you give it the '-c' option rsync
> will skip based on a checksum instead of mod-time and size.  This would at
> least make your backup consistent with your master.

Yes, I should have used that.
I did not do so because I feared that this would take
much longer, but never verified this belief ...

> However, it would not
> avoid the original-corrupted-then-backup issue you brought up earlier.

It seems to be something which happens more often
than one thinks. At least Gerhard told me that he
has this problem frequently ...

> As I think about this, it sounds like implementing a SCM.  Basically, you
> want to know if a file has changed on disc with or, in your case, without
> intention.  In theory, when you have a new file you would 'check it in' to
> the picture repository.  If you make changes you 'check in' the new version
> of the file.  In your case a "check-in" would be to create a check-sum of
> the file.

Yes, this sounds like what we will need!

> This leads me to thinking about the "Versioned image" request
> that is already in digikam.  Perhaps a single solution would handle both
> cases?

It depends a lot on how the versioning of images
will be realized. But this should definitively be kept in mind!

Thanks a lot for your comments!

Best, Arnd



More information about the Digikam-users mailing list