clickhouse how to solve GLIBC incompatibility -- the end

Keywords: OLAP

In the last article article In, we introduced how to package shared libraries with binary files to solve the problem of GLIBC incompatibility in clickhouse runtime. However, through our practice, we find that this scheme actually has side effects, which are fatal in the production environment and will cause the clickhouse process crash. Therefore, this paper overturns the scheme in the previous article and uses clickhouse's own mechanism to completely solve the GLIBC incompatibility problem.

Side effects of packaging shared libraries and binary files

test_getaddrinfo.cpp

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>

int main() 
{
 struct addrinfo hints;                                                   
 struct addrinfo *res;                                                    
 memset(&hints, 0, sizeof hints);                                         
 hints.ai_flags = 1024;
 hints.ai_family = 0;
 hints.ai_socktype = 1;
 hints.ai_protocol = 6; 
 hints.ai_addrlen = 0;
 hints.ai_addr = 0x0; 
 hints.ai_canonname = 0x0; 
 hints.ai_next = 0x0;
 int err = ::getaddrinfo("localhost", "8000", &hints, &res);
 printf("%d", err);
 return 0;
}

according to Last article We compile test in the Ubuntu 20 environment_ The getaddrinfo.cpp file, together with its dependent shared library, is packaged and deployed to the Ubuntu 16 environment. The specific steps are as follows

  • In u20 environment, compile, copy shared library, modify ELF information, etc
$ g++ test_getaddrinfo.cpp  -o test_getaddrinfo
$ ldd ./test_getaddrinfo                                        
 linux-vdso.so.1 (0x00007ffc539ea000)
 libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007fc38ae95000)
 /lib64/ld-linux-x86-64.so.2 (0x00007fc38b08e000)
$ copylib.sh ./test_getaddrinfo
$ patchelf --set-rpath  /home/liyang/lib  test_getaddrinfo
$ patchelf --set-interpreter /home/liyang/lib/ld-linux-x86-64.so.2 test_getaddrinfo 

  • Will get_getaddrinfo and lib files are copied to the u16 running environment.

  • Running test in a u16 environment_ getaddrinfo,   You can see the program exception because:: getaddrinfo does not return zero

$ ./test_getaddrinfo 
-11

So what's the problem? As a control group, we compile and run test directly in the u16 environment_ Getaddrinfo.cpp file, return zero.

The running processes of strace and binary respectively correspond to bianry compiled by u16 and u20. You can see that many more. so files will be loaded on the right, and these. so files cannot be packaged through the copylib tool. I guess this reason led to the test_ The getaddrinfo runtime does not completely remove its dependence on the u20 kernel, resulting in an exception returned by:: getaddrinfo.

image-20211205114506800

New scheme

Cause analysis

With the above bad case, the scheme of packaging lib with binary obviously doesn't work. We go back to the initial problem: the clickhouse compiled by u20 will report an error when started in u16 environment: version GLIBC_2.27 not found

Let's see what GLIBC versions u16 supports?   2.27 is really not u16 supported

$ strings /lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_ 
GLIBC_2.2.5
GLIBC_2.2.6
GLIBC_2.3
GLIBC_2.3.2
GLIBC_2.3.3
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.5
GLIBC_2.6
GLIBC_2.7
GLIBC_2.8
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
GLIBC_2.13
GLIBC_2.14
GLIBC_2.15
GLIBC_2.16
GLIBC_2.17
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23
GLIBC_PRIVATE

Which dynamic library introduced glibc 2.27 in clickhouse?

$ objdump -x ./clickhouse
Version References:
  required from libdl.so.2:
    0x09691a75 0x00 02 GLIBC_2.2.5
  required from libpthread.so.0:
    0x09691a75 0x00 03 GLIBC_2.2.5
    0x09691974 0x00 04 GLIBC_2.3.4
  required from libm.so.6:
    0x09691a75 0x00 05 GLIBC_2.2.5
    0x06969187 0x00 06 GLIBC_2.27  

libm library is related to mathematical calculation, so which symbol depends on glibc 2.27?

$ readelf -s clickhouse | grep  "GLIBC_2.27"
   148: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND powf@GLIBC_2.27 (3)

It turns out that the powf function introduces a dependency on glibc. Think of a module in clickhouse that deals specifically with glibc incompatibility

base/glibc-compatibility/glibc-compatibility.c

/** Allows to build programs with libc 2.27 and run on systems with at least libc 2.4,
  *  such as Ubuntu Hardy or CentOS 5.
  *
  * Also look at http://www.lightofdawn.org/wiki/wiki.cgi/NewAppsOnOldGlibc
  */

How does base / glibc compatibility solve the incompatibility problem? The principle is very simple. Just override the glibc function that Clickhouse depends on. However, there is no override powf function in the base / glibc compatibility of Clickhouse version 20.3, so the compilation of ck depends on the local libm shared library of u20, which introduces the dependency of GLIBC 2.27.

Solution

After the reason is clear, the solution is clear at a glance: the implementation of override powf in base / glibc compatibility, fortunately, has overridden powf in the latest version of clickhouse. You can directly copy and recompile the relevant code.

image-20211205123202493

We verify whether the clickhouse binary after override powf still depends on GLIBC 2.27?

$ objdump -x ./clickhouse | grep "GLIBC_2.27"

$ readelf -s  clickhouse  | grep powf
372223: 0000000028bb549c   727 FUNC    GLOBAL DEFAULT   15 powf
460362: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS powf.c
460380: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS powf_data.c
460381: 00000000124f7900   296 OBJECT  LOCAL  HIDDEN    11 __powf_log2_data
1274523: 0000000028bb549c   727 FUNC    GLOBAL DEFAULT   15 powf


symbol: powf in clickhouse is no longer dependent on GLIBC_2.27.

summary

In this paper, we first prove that article The scheme in does not work for clickhouse, and then analyzes why clickhouse introduces the dependence of GLIBC 2.27. Finally, we completely solve the compatibility problem when glibc < = 2.27 by completing the override of powf in base / glibc compatibility.

Incidentally, a recent PR of the community has implemented hermitic builds,   It makes the compilation of clickhouse only rely on the libc Library in clickhouse, completely removes the dependence on libc in the compilation environment, and fundamentally solves the compatibility problem when glibc > 2.27. I can't help but sigh that clickhouse is really a treasure community. It has achieved the ultimate in various engineering details. They have solved or are on the way to solving the unexpected problems we think of.

reference resources

http://www.lightofdawn.org/wiki/wiki.cgi/NewAppsOnOldGlibc

For more interesting content, please pay attention to WeChat official account: ClickHouse OS

Posted by zeno on Sun, 05 Dec 2021 19:01:36 -0800