[Bug 231402] textproc/kf5-syntax-highlighting: does not build on systems with VLAN interfaces

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sun Sep 16 16:43:34 BST 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231402

            Bug ID: 231402
           Summary: textproc/kf5-syntax-highlighting: does not build on
                    systems with VLAN interfaces
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: kde at FreeBSD.org
          Reporter: lantw44 at gmail.com
          Assignee: kde at FreeBSD.org
             Flags: maintainer-feedback?(kde at FreeBSD.org)

kf5-syntax-highlighting build fails with undefined symbol error on a FreeBSD
11.2 system with at least one VLAN network interface. I know it is odd for
network configuration on the system to affect the build, but it is really what
I found after 3 days of debugging. Here are the error messages:

[94/132] cd
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data &&
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/bin/katehighlightingindexer
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data/index.katesyntax
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/syntax-highlighting-5.49.0/data/schema/language.xsd
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data/syntax-data.qrc
FAILED: data/index.katesyntax 
cd /tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data &&
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/bin/katehighlightingindexer
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data/index.katesyntax
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/syntax-highlighting-5.49.0/data/schema/language.xsd
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/data/syntax-data.qrc
/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so: Undefined symbol
"_ZN17QNetworkInterfaceC1ERKS_ at Qt_5"
ninja: build stopped: subcommand failed.

I guess this is a memory corruption issue in Qt5 network module, which may
provide the kernel a bad pointer and cause the kernel to overwrite data of the
runtime linker. The symbol '_ZN17QNetworkInterfaceC1ERKS_' does exist in
/usr/local/lib/qt5/libQt5Network.so.5 and
/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so correctly lists
libQt5Network.so.5 as its dependency with NEEDED, but the runtime linker
rejects the symbol in libQt5Network.so.5 when comparing version tags.

Steps to reproduce the problem:

1. Install FreeBSD 11.2 amd64 and download the ports tree. Whether it is a
physical machine or a virtual machine doesn't matter.
2. Create a VLAN network interface. It can be done with command 'ifconfig vlan3
create vlan 3 vlandev re0' where 're0' is your network interface.
3. Make sure the runtime linker /libexec/ld-elf.so.1 is compiled with -O2
option. This is the default, so you don't have to do anything in this step
unless you don't use binaries distributed by FreeBSD project.
4. Install textproc/qt5-xmlpatterns port with portmaster.
5. Build textproc/kf5-syntax-highlighting.

It was tested on FreeBSD 11.2-RELEASE-p3 amd64 with ports revision 479821. I
could reproduce it on 3 systems (physical machine, virtual machine, jail on
virtual machine) and each of them runs on different hardware.

I mentioned qt5-xmlpatterns above because it is an optional dependency of
kf5-syntax-highlighting. kf5-syntax-highlighting can be built without problems
when qt5-xmlpatterns is not installed, but it also means that it doesn't link
to qt5-network. kf5-syntax-highlighting automatically picks up qt5-xmlpatterns
during the configure phase and it is qt5-xmlpatterns that causes
kf5-syntax-highlighting to load qt5-network during the build.

The following are results of my debugging. I haven't found the root cause of
the problem, but I think these notes may be useful to do further debugging.

I started by checking symbol tables of both libqgenericbearer.so and
libQt5Network.so.5.

$ pkg which /usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so
/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so was installed by package
qt5-network-5.11.1
$ readelf -aW /usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so
Symbol table (.dynsym) contains 140 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
    69: 0000000000000000    21 FUNC    GLOBAL DEFAULT  UND
_ZN17QNetworkInterfaceC1ERKS_ at Qt_5 (2)

$ pkg which /usr/local/lib/qt5/libQt5Network.so.5
/usr/local/lib/qt5/libQt5Network.so.5 was installed by package
qt5-network-5.11.1
$ readelf -aW /usr/local/lib/qt5/libQt5Network.so.5
Symbol table (.dynsym) contains 2161 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
  1245: 00000000000c7790    21 FUNC    GLOBAL DEFAULT   12
_ZN17QNetworkInterfaceC1ERKS_@@Qt_5 (3)

The plugin links to libQt5Network.so.5 properly:

$ ldd
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/bin/katehighlightingindexer 
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/bin/katehighlightingindexer:
        libQt5XmlPatterns.so.5 => /usr/local/lib/qt5/libQt5XmlPatterns.so.5
(0x800a00000)
        libQt5Network.so.5 => /usr/local/lib/qt5/libQt5Network.so.5
(0x801033000)
        libQt5Core.so.5 => /usr/local/lib/qt5/libQt5Core.so.5 (0x801400000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0x801aec000)
        ...

$ ldd /usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so
/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so:
        libQt5Network.so.5 => /usr/local/lib/qt5/libQt5Network.so.5
(0x80120c000)
        libQt5Core.so.5 => /usr/local/lib/qt5/libQt5Core.so.5 (0x801600000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0x801cec000)
        ...

But the program which throws the undefined symbol error,
katehighlightingindexer, doesn't link to libqgenericbearer.so. It suggests that
libqgenericbearer.so is loaded by calling dlopen.

I set a breakpoint on dlopen in GDB, and yes, it calls it with:
dlopen("/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so", RTLD_NODELETE
| RTLD_LAZY);

The return value of dlopen is correct. It is properly loaded, and the hash of
the version entry is 363045.

(gdb) b dlopen
Function "dlopen" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlopen) pending.

(gdb) r 1 2 3
Starting program:
/tmp/wrkdirs/usr/ports/textproc/kf5-syntax-highlighting/work/.build/bin/katehighlightingindexer
1 2 3
[New LWP 101325 of process 74133]

Thread 1 hit Breakpoint 1, dlopen (name=0x805415498
"/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so", mode=4097) at
/usr/src/libexec/rtld-elf/rtld.c:3193
warning: Source file is more recent than executable.
3193            return (rtld_dlopen(name, -1, mode));

(gdb) finish
Run till exit from #0  dlopen (name=0x805415498
"/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so", mode=4097) at
/usr/src/libexec/rtld-elf/rtld.c:3193
0x000000080165a731 in ?? () from /usr/local/lib/qt5/libQt5Core.so.5
Value returned is $2 = (void *) 0x80067e000

(gdb) p ((Obj_Entry *)(0x80067e000))->vertab[2]
$3 = {hash = 363045, flags = 0, name = 0x807202678 "Qt_5", file = 0x8072025de
"libQt5Network.so.5"}
(gdb) p ((Obj_Entry *)(0x80067e000))->path
$8 = 0x800634f40 "/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so"

The number '2' seems to come from the '(2)' suffix of the output of readelf. I
assumes it means the version tag used by the symbol has index 2.

(gdb) b _rtld_bind if $_streq(obj->path,
"/usr/local/lib/qt5/plugins/bearer/libqgenericbearer.so") &&
obj->vertab[2].hash != 363045
Breakpoint 3 at 0x80060f907: file /usr/src/libexec/rtld-elf/rtld.c, line 810.

(gdb) c
Continuing.
[Switching to LWP 101325 of process 74133]

Thread 2 hit Breakpoint 3, _rtld_bind (obj=0x80067e000, reloff=1272) at
/usr/src/libexec/rtld-elf/rtld.c:810
810         rlock_acquire(rtld_bind_lock, &lockstate);

(gdb) p obj->vertab[2]
$17 = {hash = 32, flags = 0, name = 0x807202678 "Qt_5", file = 0x8072025de
"libQt5Network.so.5"}

The value of the hash field of the version entry has changed from 363045 to 32.
The value '32' isn't random. I always get the same value here. If you follow
the execution of the correct _rtld_bind call, you will find it fails to match
the version tag at file /usr/src/libexec/rtld-elf/rtld.c, function
matched_symbol, line 4329:

4329                 if (obj->vertab[verndx].hash != req->ventry->hash ||
4330                     strcmp(obj->vertab[verndx].name, req->ventry->name)) { 
4331                         /*
4332                          * Version does not match. Look if this is a
4333                          * global symbol and if it is not hidden. If
4334                          * global symbol (verndx < 2) is available,
4335                          * use it. Do not return symbol if we are
4336                          * called by dlvsym, because dlvsym looks for
4337                          * a specific version and default one is not
4338                          * what dlvsym wants.
4339                          */
4340                         if ((req->flags & SYMLOOK_DLSYM) ||
4341                             (verndx >= VER_NDX_GIVEN) ||
4342                             (obj->versyms[symnum] & VER_NDX_HIDDEN))
4343                                 return (false);
4344                 }

verndx is 2, and req->ventry->hash is 363045. If obj->vertab[2].hash hasn't
been modified, the runtime linker will pick this symbol and the execution can
continue.

I tried to set a hardware watchpoint on obj->vertab[2].hash in GDB, but the
watchpoint never hit. I also tried to set a software watchpoint on the same
address, and the result wasn't always the same. Most of the time it ran forever
and I interrupted it after a few minutes, but sometimes it stopped at
instructions which should not modify the memory, such as 'mov r15,QWORD PTR
fs:0x10' and 'mov r15,rdi'. Therefore, I thought the hash value was modified by
the kernel, but 'catch syscall' command in GDB didn't seem to work for me. GDB
kept printing 'Thread 2 received signal SIGSYS, Bad system call.' and made the
program behave abnormally. I decided to use DTrace to track the hash value
changes for me:

# dtrace -n 'syscall:::entry, syscall:::return /pid == 99608/ { printf("%s %u
==> %x %x %x %x", probefunc, *(unsigned int *)copyin(0x801242230, 4), arg0,
arg1, arg2, arg3); }'

dtrace: description 'syscall:::entry, syscall:::return ' matched 2168 probes
CPU     ID                    FUNCTION:NAME
  1  80243                      ioctl:entry ioctl 363045 ==> 8 c0306938
7fffdfffd770 0
  1  80244                     ioctl:return ioctl 32 ==> 0 0 0 0

0x801242230 was the address of the hash variable obtained from GDB. It seems it
was a 'ioctl(8, SIOCGIFMEDIA, 0x7fffdfffd730)' call that changed the value. 8
was a socket file descriptor created by calling 'socket(PF_INET, SOCK_DGRAM |
SOCK_CLOEXEC, 0)'. 0x7fffdfffd730 looked like a pointer on the stack, as
'procstat -v' said this region grew down. I stopped debugging here and
temporarily removed the VLAN interface with 'ifconfig vlan3 destroy' to let
portmaster upgrade kf5-syntax-highlighting and hundreds of other ports for me.

The conclusion is that I probably have to read the code of qt5-network in order
to figure out what really happens. I found totally 3 ways to workaround the
problem on systems affected by this problem:

1. Remove all VLAN interfaces, which may not be possible if your networking
environment requires it.
2. Use Clang 6 shipped with FreeBSD base to recompile /libexec/ld-elf.so.1 with
-O1, -O0, or -DDEBUG.
3. Use GCC 8 from ports to recompile /libexec/ld-elf.so.1 with -O0. Using -O1
or -DDEBUG doesn't help when using GCC.

In fact, I didn't replace /libexec/ld-elf.so.1 on the system because it is
risky. I did the test by either running the compiled ld-elf.so.1 under
/usr/src/libexec/rtld-elf directly as an executable or modifying the
interpreter path stored in katehighlightingindexer executable with 'patchelf
--set-interpreter' command.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the kde-freebsd mailing list