Critical Denial of Service bugs in Discover

Aleix Pol aleixpol at kde.org
Wed Feb 9 19:20:00 GMT 2022


On Tue, Feb 8, 2022 at 7:00 PM Ben Cooksley <bcooksley at kde.org> wrote:
>
> On Tue, Feb 8, 2022 at 4:24 AM Aleix Pol <aleixpol at kde.org> wrote:
>>
>> On Sat, Feb 5, 2022 at 10:16 PM Ben Cooksley <bcooksley at kde.org> wrote:
>> >
>> > Hi all,
>> >
>> > Over the past week or so Sysadmin has been dealing with an extremely high volume of traffic directed towards both download.kde.org and distribute.kde.org.
>> >
>> > This traffic volume is curious in so far that it is directed at two paths specifically:
>> > - distribute.kde.org/khotnewstuff/fonts-providers.xml
>> > - download.kde.org/ocs/providers.xml
>> >
>> > The first path is an "internal only" host which we were redirecting a legacy path to prior to the resource being relocated to cdn.kde.org. The second path has been legacy for numerous years now (more than 5) and is replaced by autoconfig.kde.org.
>> > It is of extreme concern that these paths are still in use - especially the ocs/providers.xml one.
>> >
>> > The volume of traffic has reached an extent that to prevent the server disk filling up we have had to disable logging for those two sites. Whilst dependent on the time of day the server is currently dealing with the current volume of requests, which is far outside normal specifications:
>> >
>> > <dt>555 requests/sec - 4.5 MB/second - 8.3 kB/request - .739199 ms/request</dt>
>> >
>> > Analysis of a fragment of logs (comprising just a few minutes of traffic) reveals the following:
>> >
>> >      63 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.89.0-discoverupdate/5.23.5"
>> >      64 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.89.0-discoverupdate/5.23.4"
>> >     104 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.90.0-discoverupdate/5.23.90"
>> >     105 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.88.0-discoverupdate/5.23.5"
>> >    1169 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.86.0-plasma-discover-update/"
>> >    1256 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "KNewStuff/5.90.0-discoverupdate/5.23.5"
>> >    2905 "GET /ocs/providers.xml HTTP/1.1" 301 6585 "-" "Mozilla/5.0"
>> >
>> >      86 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 200 6773 "-" "Mozilla/5.0"
>> >     130 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.89.0-discoverupdate/5.23.5"
>> >     136 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.89.0-discoverupdate/5.23.4"
>> >     197 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.88.0-discoverupdate/5.23.5"
>> >     199 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.90.0-discoverupdate/5.23.90"
>> >    2624 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.86.0-plasma-discover-update/"
>> >    2642 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "KNewStuff/5.90.0-discoverupdate/5.23.5"
>> >    6117 "GET /khotnewstuff/fonts-providers.xml HTTP/1.1" 304 6132 "-" "Mozilla/5.0"
>> >
>> > This indicates that the bug lies solely within Plasma's Discover component - more precisely it's updater.
>> >
>> > Examining the origin of these requests has indicated that some clients are making requests to these paths well in excess of several times a minute with a number of IP addresses appearing more 60 times in a 1 minute sized sample window.
>> >
>> > Given that Sysadmin has raised issues with this component and it's behaviour in the past, it appears that issues regarding the behaviour of the OCS componentry within Discover remain unresolved.
>> >
>> > Due to the level of distress this is causing our systems, I am therefore left with no other option other than to direct the Plasma Discover developers to create and release without delay patches for all versions in support, as well as for all those currently present in any actively maintained distributions, that disable all OCS functionality in the Discover updater. Distributions are requested to treat these patches as security patches and to distribute them to users without delay.
>> >
>> > In 24 hours time Sysadmin will be making a posting to kde-announce requesting that users immediately cease use of the Discover update client as it is creating a Denial of Service attack on our infrastructure.
>>
>> I feel like your response here is out of proportion.
>>
>> Last time we had this conversation, my impression was that the problem
>> was addressed for the most part. If you wanted people working on
>> KNewStuff, Attica or OCS to take any actions, we needed to at the very
>> least have information about you are complaining before you burst out
>> into mailing lists and the likes.
>
>
> Based on the information I had to hand back in September it was solved yes.
> Our server monitoring system indicated that this issue did not exist back in September - so this is new, although in the same Discover code.
>
>>
>>
>> In terms of actual solutions this in would probably help to some
>> extent. We never merged it because it's not great design but good
>> results is more important than good design. At the moment they're in
>> their way in but it will take time until it hits users.
>> https://invent.kde.org/frameworks/knewstuff/-/merge_requests/141/
>> https://invent.kde.org/plasma/discover/-/merge_requests/165/
>
>
> Yes, the delay in rolling stuff out to users is always the biggest pain point.
>
> That is partly why the fixes process should include bug fixes to previously released versions we can ask distributions to pick up.
>
>>
>>
>>
>> These were of course not the only mitigation solutions put into place
>> back then. In fact many of them were geared towards giving more
>> information about what was happening and I've still to get any
>> feedback there. Without information it's impossible that we can take
>> any solutions.
>
>
> Correct, and that appears to have borne results as noted in the user agent extracts I provided above.
>
>>
>>
>> For what it's worth, I looked into what was happening within Qt and
>> indeed using download.kde.org instead of autoconfig.kde.org is not
>> hitting the cache paths. If this is something that worries us, maybe
>> we should put together tools that help us detect this kind of problems
>> as exploding in here does nothing to helpthis issue, only stresses
>> people out. Personally, I found very demotivating how the problem was
>> presented once again.
>
>
> That would certainly explain why download.kde.org is getting hammered while autoconfig.kde.org is not.
>
> Once all the fixes are in could we please:
> a) Run diagnostic tools such as Wireshark/strace/etc against Discover on some developer systems to ensure that it is behaving as we expect it to and there isn't another codepath we've missed?
> b) Get the fixes backported and released to distributions so they can start moving towards user systems immediately rather than waiting for users to update (which will take months/years)
>
> Note that some of the endpoints being hit are part of the v1 iteration of GHNS/KNS system (with the current OCS system being v3) so it would be nice if we could ensure that the fixes are working for both v1 and the OCS version.
> (the fonts kcm being a notable user of v1)

We still haven't discussed here is how to prevent this problem from
happening again.

If we don't have information about what is happening, we cannot fix problems.

Is there anything that could be done in this front? The issue here
could have been addressed months ago, we just never knew it was
happening.

Aleix


More information about the Plasma-devel mailing list