Kphotoalbum ignores the 'Do not read RAW files if a matching JPEG/TIFF file exists' setting

Fri Jan 10 15:33:46 GMT 2025

On 1/10/25 10:04, Andreas Schleth via KPhotoAlbum wrote:
> Hi Robert,
> 
> speeding up things is always nice (and too often ignored nowadays :-) - thanks for that!
> 
> So, would the speed take a drastic hit if the import would do a two-pass over the file list? 
> Only if the setting to NOT import the raws is set: Find all jpgs and tiffs and then remove the
> corresponding raws from the list.

No, as long as you do the filtering only on the filename.  It wouldn't be entirely free, but if you
don't have a huge number of files on any given import and were intelligent about the matching
algorithm, it should be cheap.  The performance issue is simply around looking at the file contents
to determine the MIME type vs. trusting the filename.

(Note that while looking at the filenames in this way is superlinear or more than constant time per
file, while looking at file contents is linear or constant time per file, the constant cost of
inspecting file contents is so great that for any practical number of files it will still dominate.
Complexity analysis alone is not sufficient!)

> The list inspection would not need any disk access (therefore, quite fast) and importing fewer files
> would even make things faster.

Correct.

> But, I guess, this is more like a "feature request" ...
> Anyway, as my recommendation would be to keep raws and jpegs in separate folder trees, I could live
> with the current situation.

The more common scenario is probably people wanting to ignore JPEGs if there is a matching RAW, but
the principle is the same.

> @Andrew: My workflow has the raw files separated from the developed JPEGs in parallel folder
> structure (YYYY/MM_Event/original_filenames.RW2). I do a lot of sifting in Darktable and only a
> subset of the images make it to my KPA folder. In the rare event that I want to revisit my raw
> development, I can find the respective file just by navigating to the corresponding folder. However,
> I see the use case for importing both formats from the camera.

It's all a matter of one's personal workflow.

> For the RAW-folder I do archiving on external disk (and delete older files after a few years from
> the local ssd-drive), My KPA-folder is on nfs (spinning rust) and has a real backup. Using /
> browsing the images is more or less read-only (except for tagging, but that is just the index.xml).

Ouch.  You *really* want to avoid unnecessary I/O in that case!

> Am 10.01.25 um 14:40 schrieb Robert Krawitz:
>> On 1/10/25 07:42, Andreas Schleth via KPhotoAlbum wrote:
>>> Hi Andrew,
>>>
>>> I can confirm this behaviour on openSuse Leap 15.6 (KPA 5.13.0). Most probably it is not an OS
>>> problem but one of KPA itself.
>>>
>>> I tried this:
>>> a) copy an existing *.jpg to *.RW2, *.dng, *.DNG - KPA imported all 4 and shows them side by side
>>> b) copy a RW2 file into the same folder as the corresponding *.jpg - KPA does not import or show the
>>> RW2 (Note: the jpg was imported earlier)
>>> c) copy a *.jpg and the corresponding *.RW2 into the same folder (new name) - KPA imports both
>>>
>>> My conclusion:
>>> a) It seems, that KPA does not look into the files to determine the MIME type before import but
>>> seems to go by file extension.
>> Just a word of caution about changing this: actually inspecting the files before deciding whether to
>> import them would dramatically slow down import.  Indeed, even stat()'ing the file is very
>> expensive, particularly on higher latency media, and a NAS is high latency, even with SSD storage,
>> due to the network round trip.  Reading a directory is usually pretty fast, since there can be lots
>> of filenames and inode numbers in one disk block or a sequential set of blocks, but stat() requires
>> finding the inode on disk and reading it in.  Actually opening files would only increase the
>> overhead that much more.
>>
>> The Qt5 directory class always stat()'s every file in the directory so the information would be
>> available.  I did a lot of performance work on KPA 5 some odd years ago and found this to be a major
>> bottleneck doing an import into a cold filesystem. I wound up writing a fast directory class that
>> just returns the list of files in the directory that sped up importing a lot.  Of course, just doing
>> readdir() means that it doesn't even know whether the entry is a directory or a file, so if a
>> directory has a name ending in an unknown suffix (as opposed to the more common case of no suffix at
>> all) it will be skipped.  So a directory named mydir.ods or the like would not get examined, but
>> that's a small price to pay for much faster import.
>>
>