KDirWatch bug and the analysis. Help is welcome!

Mark markg85 at gmail.com
Fri Aug 2 14:07:19 BST 2013


On Fri, Aug 2, 2013 at 12:15 PM, David Faure <faure at kde.org> wrote:
> On Thursday 01 August 2013 15:06:05 Mark wrote:
>> On Thu, Aug 1, 2013 at 10:41 AM, David Faure <faure at kde.org> wrote:
>> > On Thursday 01 August 2013 01:30:08 Mark wrote:
>> >> However, we have been given the power of inotify which gives more
>> >> detailed
>> >> signals and lets us know which files have been created/added/modified
>> >> which
>> >> we should be used imho.
>> >
>> > OK. First let's imagine that it's not a hidden file. Say you create "foo"
>> > file (from the command line) in a directory currently shown in dolphin.
>> > When using inotify, we could get a "foo was created" signal, but then
>> > what? KDirLister is going to need more details anyway (size, mimetype,
>> > date, icon, etc.). To get that, it re-lists the directory. Don't say it
>> > could just KIO::stat the new file, it becomes very slow if many files get
>> > created/modified, and it creates much more complex code paths in
>> > kdirlister
>> > which is already complex (it would also need to handle deletion
>> > separately). Instead we have a single reaction to "something changed in
>> > this directory" : re-list it and update it to show the changes (after
>> > basically diff'ing the new listing and the old listing).
>>
>> This part is very interesting! Thus far i didn't test it out to see
>> what happens, till now. Like you, i was not expecting a stat call
>> hence you probably wrote the above. However, i've been proved wrong by
>> the code. I created a new file on the command line to exclude possible
>> special dolphin handling. So all i would expect was a inotify signal
>> about a new file, certainly no stat calls for that new file.
>
> Well, I didn't say there was no stat call at all, especially in KDirWatch
> itself.
>
> I meant that KDirListerCache would not just get the data for new file(s) using
> KIO::stat on the new files, but using KIO::listDir on the whole dir.
>
> Don't confuse stat() and KIO::stat(), I wasn't talking about the first one.

Just for my view on this. What's the difference between stat() and kio::stat()?
>
>> However, i've been proven wrong in that assumption. Twice even. When a
>> new file is created there are actually two stat calls being done. I'il
>> walk you though it.
>> 1. create a new file
>> 2. an IN_CREATE event is being send by inotify and is being handled in
>> in kdirwatch [1] starting from line 346.
>> 3. That path ends up calling clientsForFileOrDir on line 362 [1]
>> 4. clientsForFileOrDir is then being handled from line 532 [1] and
>> executes the first stat call to determine if a folder is added.
>> 5. then somehow somewhere a timer fires and slotRescan is being
>> executed. That ends up calling scanEntry on line 1427 [1]
>> 6. scanEntry then ends up doing another stat call to determine of the
>> file exists and ultimately return "Changed" on line 1283 [1]
>>
>> I'm guessing this will even surprise you? ;)
>
> Not that much, sorry :)
>
>> What i think needs to happen is send a special - new - signal from
>> KDirWatch that tells KDirListener which files got
>> added/changed/deleted in a folder that KDirListener is listening to.
>> It's not that difficult to make something like that and seems a lot
>> saner to me. Specially provided that the stat calls you want to
>> prevent are already done. At least twice for as far as i can tell.
>
> No, I have nothing against KDirWatch calling stat() if it needs to find out
> more about what happened. If you see ways to make sure it doesn't need to do
> that, in the inotify-backend case, great, feel free to improve kdirwatch.
>
>> Or follow your initial approach and not let KDirWatch do these stat
>> calls for new files and just re-list.
>
> You're confusing two layers. KDirLister[Cache] is the one doing the re-
> listing. But that's one layer above KDirWatch (which has other users than
> KDirLister).

I actually wrote it down poorly. KDirWatch is sending a signal on
which KDirLister responds and re-lists the whole folder.
>
>> > The other reason why things work that way is that inotify isn't available
>> > everywhere. On systems without inotify, all we have is a regular stat() on
>> > the directory. So when anything changes in the directory, all we get from
>> > KDirWatch is "something changed", no details => relist.
>> > There's another technology available, FAM, but IIRC it gives us the same
>> > (directory-level signal).
>>
>> Yes, i understand that. But i think both KDirWatch and KDirListener
>> should be able to be smart enough if the file system monitor backend
>> can provide more detailed information. It can't be the intention to
>> make KDirWatch and KDirListener as dumb as the dumbest file monitoring
>> backend. That kinda beats point of using inotify at all.
>
> You have a point there. What's difficult is that there's a common API in
> between. Code gets complex if a given API (kdirwatch signals) behaves
> differently depending on the backend it uses. This is bad design in general,
> since it breaks encapsulation. But everything can be solved by an indirection
> layer, as they say, so maybe all that's needed is to query kdirwatch for
> "supports file-level notifications [for a given path...]" and then listen to
> created()/deleted() signals rather than dirty().

It actually is a big mess already. All backend implementations are in
one file with a lot of ifdef lines :(
Funny thing is that KDirWatch already has the proper API for this. It
just doesn't seem to be used.

It has the signals:
created(QString)
deleted(QString)
dirty(QString)

where dirty is a bit abused i guess ;)

>
> If you work on this, ensure the unittests (not only kdirwatchtest but
> especially kdirlistertest and kdirmodeltest in kio) still pass, before you
> even ask for reviews :-)

ehh, or i remove functionality without knowing it ;-)
Yes, i will pay very close attention to specially those things.

>
>> > We could, however, fix this by filtering out hidden files ("dot files") in
>> > the KDirWatch inotify event handling, optionally. (since this depends on
>> > whether the kdirlister is showing hidden files or not). But with care in
>> > case multiple kdirwatch clients specify a different value for the "notify
>> > for hidden files" new flag, for the same directory. Feel free to look
>> > into that.
>>
>> If you still want that after the responses in the points above.. Sure :)
>
> OK, I guess the idea of more fine-grained communication between kdirwatch and
> kdirlister is somewhat better than just filtering out hidden files in
> kdirwatch. It takes care of more scenarios (other kinds of filtering, handling
> deletions without relisting, etc.). So let's forget about the hidden-file-
> filtering idea, I guess.

Ok, awesome.
>
>> Specially since i'm now fully on KIO since today (even though
>> KDirWatch isn't in kio.. Why isn't it anyway?)
>
> It used to be. Then it moved down to kdecore because, well, it's not only
> useful in relation to KIO (the network transparency technology).
> Just like QFileSystemWatcher is also in QtCore.

So where is it going to end up in KF5? Since i've seen a message
somewhere (by you if i'm not mistaken) that kdecore should be gone by
the end of this year. Back to KIO i guess?


+ a little idea. While monitoring the inotify events i noticed that
you can have quite a bit of stat overhead just by updating a file in a
way that dolphin is doing it. The .directory. It is probably the
safest way with minimal risk of data loss.
So now i was thinking, if i try to improve this code to actually emit
the create and delete signals as well then in this case you will end
up with some needless signals. Lets take my initial example, these are
send from inotify:

(CREATE signal) /home/mark/massive_folder/.directory.lock
(MODIFY signal) /home/mark/massive_folder/.directory.lock
(CREATE signal) /home/mark/massive_folder/.directoryc13357.new
(MODIFY signal) /home/mark/massive_folder/.directoryc13357.new
(DELETE signal) /home/mark/massive_folder/.directoryc13357.new
(CREATE signal) /home/mark/massive_folder/.directory
(DELETE signal) /home/mark/massive_folder/.directory.lock

in a fixed KDirWatch that would become:

(CREATED signal) /home/mark/massive_folder/.directory.lock
(DIRTY signal) /home/mark/massive_folder/.directory.lock
(CREATED signal) /home/mark/massive_folder/.directoryc13357.new
(DIRTY signal) /home/mark/massive_folder/.directoryc13357.new
(DELETED signal) /home/mark/massive_folder/.directoryc13357.new
(CREATED signal) /home/mark/massive_folder/.directory
(DELETED signal) /home/mark/massive_folder/.directory.lock

That's just in a few miliseconds while the end result is a dirty file:
/home/mark/massive_folder/.directory

The temporary file and the lock file both get deleted and where not
there in the first place so there is no need to send any signal about
those files.

That makes me wonder if we could prevent some signals to be send by
delaying the signal sending for - lets say - 100ms. If during that
time another inotify event comes by that makes it pointless to send a
created signal (for example an inotify that tells us the file is
deleted again, like the lock file) then we simply don't send any
signal for that file. This would be somewhat the same like the
batching i implemented for listDir. It would be sending
created/deleted/dirty signals if needed for every 100ms as long as
inotify events stream in.

This will obviously not prevent every case. If a file got added or
removed just after the 100ms trigger then you'd end up with the same
signals as in the above example. But it has the potential to prevent
signals from being send for small file changes.

Note, the 100ms can very well be 300ms like in listDir.

>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Working on KDE, in particular KDE Frameworks 5
>




More information about the kde-core-devel mailing list