Author: Tian Jie
It's not uncommon for a long query execution time to cause an application aware "jam" in the daily support and use of the database, but the SQL execution "jam" caused by the time zone setting is still an interesting phenomenon, which has not been specifically concerned before.
This customer's meticulous and insistence let us find the source of the problem.
1. Explanation of terms
serial number | noun | Explain |
---|---|---|
1 | CPU usage | Percentage of non idle CPU time. |
2 | User CPU usage | The percentage of CPU time consumed by user space application code. |
3 | Sys CPU usage | System space kernel code consumes CPU time. |
4 | Futex | Fast user state lock / semaphore provided by Linux kernel; it runs completely in user space in the non competitive scenario, but it will cause system call in the competitive scenario. |
2. Problem phenomenon
There are a lot of active connection stacks in the customer MySQL 8.0 instance from 22:03 to 22:04 on March 19, 2020. There are a lot of low-cost queries in the slow log. The CPU utilization rate is not high, but the system SYS CPU utilization rate fluctuates abnormally.
3. Troubleshooting
3.1 OS level
Let's consider what factors may cause carton:
• physical machine OS level fluctuation (excluded by IO ﹣ wait index).
• MySQL's own mechanism.
3.2 MySQL layer
Excluding the OS level exception class factors, we began to focus on the analysis of mysqld process call stack.
In order to better analyze MySQL's behavior, Alibaba database provides bianque system to track, count and display the process internal method calls within a certain time.
When we analyze the above figure, we can see that 40.5% of the CPU time is spent on the call of the time zone system:: GMT sec to time() method, which is the following code.
void Time_zone_system::gmt_sec_to_TIME(MYSQL_TIME *tmp, my_time_t t) const { struct tm tmp_tm; time_t tmp_t = (time_t)t; localtime_r(&tmp_t, &tmp_tm); localtime_to_TIME(tmp, &tmp_tm); tmp->time_type = MYSQL_TIMESTAMP_DATETIME; adjust_leap_second(tmp); }
If you read this code carefully, you will find that Localtime? To? Time() and adjust? Leap? Second() are both simple format conversion and calculation, and do not involve system calls.
While Localtime? R() involves the? Localtime? R() method in glibc, the code is as follows
/* Return the `struct tm' representation of *T in local time, using *TP to store the result. */ struct tm * __localtime_r (t, tp) const time_t *t; struct tm *tp; { return __tz_convert (t, 1, tp); } weak_alias (__localtime_r, localtime_r)
Let's continue to drill down to see the implementation of ﹣ TZ ﹣ convert(), the code is as follows
/* Return the `struct tm' representation of *TIMER in the local timezone. Use local time if USE_LOCALTIME is nonzero, UTC otherwise. */ struct tm * __tz_convert (const time_t *timer, int use_localtime, struct tm *tp) { long int leap_correction; int leap_extra_secs; if (timer == NULL) { __set_errno (EINVAL); return NULL; } __libc_lock_lock (tzset_lock); /* Update internal database according to current TZ setting. POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname. This is a good idea since this allows at least a bit more parallelism. */ tzset_internal (tp == &_tmbuf && use_localtime, 1); if (__use_tzfile) __tzfile_compute (*timer, use_localtime, &leap_correction, &leap_extra_secs, tp); else { if (! __offtime (timer, 0, tp)) tp = NULL; else __tz_compute (*timer, tp, use_localtime); leap_correction = 0L; leap_extra_secs = 0; } if (tp) { if (! use_localtime) { tp->tm_isdst = 0; tp->tm_zone = "GMT"; tp->tm_gmtoff = 0L; } if (__offtime (timer, tp->tm_gmtoff - leap_correction, tp)) tp->tm_sec += leap_extra_secs; else tp = NULL; } __libc_lock_unlock (tzset_lock); return tp; }
Notice that there are lock and unlock operations in the code. Now let's take a look at the definition of 65104; libc ﹐ lock ﹐ lock(), and the code is as follows
#if IS_IN (libc) || IS_IN (libpthread) # ifndef __libc_lock_lock # define __libc_lock_lock(NAME) \ ({ lll_lock (NAME, LLL_PRIVATE); 0; }) # endif #else # undef __libc_lock_lock # define __libc_lock_lock(NAME) \ __libc_maybe_call (__pthread_mutex_lock, (&(NAME)), 0) #endif
Continue to trace the implementation of lll_lock(), the code is as follows
static inline void __attribute__ ((always_inline)) __lll_lock (int *futex, int private) { int val = atomic_compare_and_exchange_val_24_acq (futex, 1, 0); if (__glibc_unlikely (val != 0)) { if (__builtin_constant_p (private) && private == LLL_PRIVATE) __lll_lock_wait_private (futex); else __lll_lock_wait (futex, private); } } #define lll_lock(futex, private) __lll_lock (&(futex), private)
You can see that the code attempts to lock futex using atomic ABCD compare ABCD and ABCD exchange ABCD Val ABCD 24 ABCD acq().
futex, as a memory area shared by multiple threads, will cause system call and enter system state in the scenario of multiple client thread s (multiple sessions / queries) competition, which will increase the CPU utilization of SYS system state.
Moreover, the lock mechanism of the critical area protection limits the concurrency of the time zone conversion method ﹣ TZ ﹣ convert(), and then multiple sessions / queries wait for the lock to enter the critical area. When the conflict is fierce, it will trigger the Caton
So what triggered it
For the time zone system:: GMT sec to time() call, go back to the field timestamp:: get date internal() method. The code is as follows
bool Field_timestampf::get_date_internal(MYSQL_TIME *ltime) { THD *thd = table ? table->in_use : current_thd; struct timeval tm; my_timestamp_from_binary(&tm, ptr, dec); if (tm.tv_sec == 0) return true; thd->time_zone()->gmt_sec_to_TIME(ltime, tm); return false; }
In this method, the virtual function gmt_sec_to_TIME() of the base class Time_zone is invoked to transform the time to time format with time zone. Combined with the name of Field_timestampf::get_date_internal(), we can infer that the query should involve the access of timestamp data type.
Based on the above conjecture, we verify the query and data type of Caton
# Slow query SELECT id, ...... create_time, update_time, ...... FROM mytab WHERE duid IN (?,?,?,?,? ) and (state in (2, 3) or ptype !=0) # Table involved in query CREATE TABLE `mytab` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `duid` char(32) NOT NULL, ...... `state` tinyint(2) unsigned NOT NULL DEFAULT '0', `ptype` tinyint(4) NOT NULL DEFAULT '0', `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, ......, PRIMARY KEY (`id`), ) ENGINE=InnoDB
From the above information, we can see that both the create time and update time fields are of timestamp data type, which verifies the previous guess.
4. Problem solving
On the basis of the above analysis, we can see that the OS level futex lock competition introduced by the call to time() results in low-cost query execution stuck.
To avoid calling this method, you can adjust the time zone parameter value from "system" to the local time zone in the instance console, such as "+ 8:00" in the East 8 time zone of China.
After modification, time ﹣ zone ﹣ offset:: GMT ﹣ sec ﹣ to ﹣ time() will be called to directly calculate at the MySQL level, so as to avoid the OS level unlock caused by accessing glibc functions.
Modification effect comparison (compare the completion time of executing the same number of timestamp data type queries)
time_zone='system ', it takes about 15 minutes to complete
Time "zone = '+ 8:00', about 5 minutes
5. Best practices
If high-frequency timestamp type data access is involved in high concurrency applications:
• if you really want to use the timestamp type, it is recommended that the console set the time zone parameter to UTC/GMT offset format, such as East 8 '+ 8:00', which can effectively reduce the high concurrent query execution cost and response time RT.
• since the Datatime type after MySQL 5.7 supports the default value of the Timestamp type and supports the property of "on update current" Timestamp, it is recommended to replace the Timestamp type with the Datetime type.