[KPhotoAlbum] I moved some photos....

Robert Krawitz rlk at alum.mit.edu
Sun Aug 20 03:22:25 BST 2017


On Sun, 20 Aug 2017 12:48:04 +1200, Kerry Sainsbury wrote:
> I recently rearranged photos on the filesystem and was surprised to see
> that KPA had lost all the metadata about the photos.

It uses the filename as the key; if the file's gone, there's not a lot
it can do very easily.

> I've since read this thread
> <http://kphotoalbum.kdab.narkive.com/xXVk5JPC/recalc-checksum-upon-image-modification>
> from 9 years ago and understand my mistake, but is there really no way to
> improve this?
>
> Would a background thread that looks for files that have changed since 'the
> last time' and (re)calculated the checksum on such files really be such a
> problem in 2017?

Yes; CPU's aren't all *that* much faster than they were in 2008.
Maybe 10-20x (counting both parallelism and per-core), but it's still
going to bog down very badly recomputing checksums if you have a big
database (and if that isn't a problem, I/O will be, unless you're
storing data on NVMe, which is very cost-inefficient).

If you have a multi-terabyte image database as I do, it simply won't
be practical with any plausible combination of hardware to do that.

> It might not be ideal for network filesystems, but perhaps this proposed
> function is configurable.

It would be absolutely awful on network filesystems.  On local
filesystems a simple-minded check (name, size, mod time) wouldn't be
too bad; I was pleasantly surprised to find that stat'ing all of the
files on my images filesystem (about 224,000) via find only took about
3 seconds (on conventional spinning rust), but if you have files
scattered about in a lot of directories, it might be rather less
efficient.  It took 9 seconds to stat all of the files on my root
filesystem (SSD, about 1.7M files); on a spinning disk, that took
about 127 seconds.  Which is a lot better than it did in 2008, because
I was using ReiserFS back then, which was not tuned for that kind of
application (ext4 is a lot better).

Regardless, actually recomputing checksums on all of your files is not
something I'd want to contemplate.

> Losing all that metadata was really quite frustrating, so if there's
> something that can be done to stop it biting someone else, that would be
> awesome.

I contributed a script that's now in git named "kpa-merge" whose
purpose in life is to merge the metadata from one database into
another.  It uses filename as its join key rather than MD5 checksum,
but you could modify it to use checksum (just beware of hash
collisions, perhaps from duplicate files) if you want.  But that won't
solve the other problem you mentioned, modifying the image.

You'd also have to update the EXIF database; that would be a bit
faster because it doesn't have to read the entire image to snarf the
EXIF data.

In any event, I think this kind of thing should be something done
manually rather than automatically.  Adding something significant to
startup cost to handle a rare event is probably not the right thing to
do; making this an operation you have to explicitly invoke when you
rearrnnage your filesystem makes more sense IMO.
-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton



More information about the Kphotoalbum mailing list