Jenkins-kde-ci (many CI failures)
Ben Cooksley
bcooksley at kde.org
Sat May 14 06:00:46 UTC 2016
On Tue, May 10, 2016 at 7:39 PM, David Faure <faure at kde.org> wrote:
> On Sunday 08 May 2016 23:06:02 Ben Cooksley wrote:
>> On Sun, May 8, 2016 at 2:44 AM, David Faure <faure at kde.org> wrote:
>> > kdewebkit just failed with "Broken pipe" (the TCP error you mentionned)
>> > (and kxmlrpcclient failed again with an anongit error). This is like playing wack-a-mole...
>>
>> Yeah :( Fortunately the Broken Pipe error is the least common one.
>>
>> >
>> > I thought TCP was more robust than that. Would it help to increase some
>> > TCP-related timeout somewhere?
>>
>> TCP should definitely be more reliable, I agree.
>> I suspect the root cause of the Broken Pipe issue will be the same as
>> the Temporary failure in name resolution error.
>>
>> The /etc/hosts fix should be deployed shortly - the images are rebuilding now.
>
> kmediaplayer job #63 failed with
> ssh: Could not resolve hostname build.kde.org: Temporary failure in name resolution
> at 12:56 yesterday (CI system time).
>
> Is build.kde.org missing from /etc/hosts?
Turns out Docker replaces /etc/hosts with it's own version.
I've now adjusted the system configuration to have build.kde.org in /etc/hosts.
>
>> The only thing I can think of at the moment are some kind of traffic
>> storm on the network bridge which disrupts arp or something similar at
>> that level when one or more containers start/stop in a short amount of
>> time. This could very well be Docker itself determining which IP / MAC
>> addresses it can use for the newly starting container - with
>> connections being broken and data lost when it steps on one that is in
>> use. I do seem to recall having the issue, albeit to a lesser extent
>> with the KVM setup as well. We definitely didn't have it with the LXC
>> containers though, but those all had public IP addresses of some form
>> or another (one was Public IPv6 only, with NAT IPv4)
>>
>> The current setup (using one machine as an example, they're all
>> identical except for the IP ranges used):
>>
>> - Normal Linux bridges, setup using Debian's /etc/network/interfaces
>> and bridge utilities.
>> - Host takes 10.150.85.1/25 (br0) and 10.150.81.129/25 (br1)
>>
>> - Docker containers are allocated the rest of the 10.150.85.1/25 IP
>> block, and are connected to the corresponding bridge (br0)
>> - Windows virtual machines are allocated static IP addresses in the
>> 10.150.85.129/25 block, on the corresponding bridge (br1)
>>
>> - VPN connection is established using OpenVPN, with the OpenVPN server
>> routing 10.150.85.0/24 to the VPN client. Only traffic within the
>> 10.150.85.0/16 subnet will be sent over the VPN. This is done to
>> permit secure communication with the Docker management daemons, and to
>> permit easy+secure access to the Windows VMs.
>>
>> - Public network access is handled on the host (not the VPN server) using NAT.
>
> I'm afraid I'm not enough of a network sysadmin to be able to find out what
> might be wrong in this setup, if anything.
And I think I might have found our culprit.
http://backreference.org/2010/07/28/linux-bridge-mac-addresses-and-dynamic-ports/
That would neatly fit the issues we've hit.
I'll implement a fix to that.... fingers crossed this is the cause.
>
>> I've bumped the limits on each anongit node so hopefully that will solve it.
>> The limit was a bit on the conservative side anyway.
>>
>> If Jenkins is making that number of Git connections at one moment....
>> i'd be quite surprised.
>
> I think I saw more anongit errors yesterday, but I didn't write them down. Let's see.
Okay.
>
>> Indeed. The main reason for performing many builds at once is to
>> ensure the small projects don't get blocked up when big items (like
>> Qt, PIM and Calligra) do a build.
>> They've all been known to tie up a builder for more than an hour per
>> build and has led to a large pile of other builds blocking up behind
>> them (which i've received complaints about as well)
>
> I know, but this was a smaller problem than false positives IMHO :-)
>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Working on KDE Frameworks 5
>
Cheers,
Ben
More information about the Kde-frameworks-devel
mailing list