Disk IO Performance Optimization-Practice

Keywords: PHP

RAID Card Cache Policy Adjustment

You can adjust the RAID card caching policy from No Write Cache if bad BBU to Write Cache OK if bad BBU, that is, do not turn off the cache when the battery is charging or discharging, to ensure I/O performance.However, this method has the risk of data loss and needs to be evaluated and adjusted reasonably.

Detailed reasons:

The server's Riad cards are equipped with rechargeable batteries, which can also discharge slightly when not in use. When its power discharges to a low level, the Raid card controller "discharges" the batteries once, releases the remaining power, and then "charges" again.This is actually a battery protection mechanism and a mechanism to guarantee the availability of Raid cards.

By default, when the battery power of RAID card is below a certain threshold, the RAID card firmware considers that the battery is not available at this time. To ensure the security of the data, the "cache" of RAID will be disabled. This default mechanism is reasonable, but when the cache of RAID is disabled, the I/O capability of RAID will be greatly reduced.Down.In general, this charging and discharging (discharging - > charging) time can last for several hours. For I/O intensive applications, the resulting performance degradation may be fatal, resulting in increased I/O latency, queue stacking, slowing and even the entire system may crash.

There are two ways to solve this problem:

Note: The operations below apply to servers based on LSI MegaRAID cards.

  • Method 1: Check the status of batteries, support the charging and discharging of batteries, or arrange manual charging and discharging in a planned way.

    The average server RAID card battery charge and discharge cycle is 90 days (the specific cycle can be checked and confirmed by the command below), and then select to force charge and discharge manually near the next charge, in order to avoid potential performance hazards caused by RAID card battery charging and discharging at unknown time.

View battery charge and discharge cycles:

MegaCli -AdpBbuCmd -getBbuProperties -aALL|egrep 'Period|Next'

Output sample:

  Auto Learn Period: 27 Days
  Next Learn time: Tue Sep 18 05:52:27 2018

Manual forced charging and discharging:

MegaCli -AdpBbuCmd -BbuLearn –a0
  • Fa 2: Change the RAID card policy so that when charging or discharging, the Raid card cache is not disabled.

    In this way, when the batteries are automatically charged or discharged, the write cache will not be turned off and I/O performance will not degrade; however, if the server is powered off at this time, data in the Raid card cache will not be written to disk in time, resulting in data loss; the probability that the server will be powered off just when the RAID card batteries are charged or discharged should be metVery low, but this is a risk point and needs to be assessed reasonably; caution is required for applications with DB classes that require high data security; and this adjustment is highly recommended for scenarios where I/O throughput is high but data consistency is not very demanding.

View the current caching policy for Raid cards:

MegaCli -LDGetProp -Cache -LAll -aAll

sample output

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 8(target id: 8): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 9(target id: 9): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 10(target id: 10): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU

Exit Code: 0x00

Note: Because there are 11 VD s on this server, 11 rows will be displayed. You can see that the caching policy is No Write Cache if Bad BBU, which turns off the cache when the battery is charged or discharged.

Adjust the cache policy so that write caches are not turned off when charging or discharging:

MegaCli -LDSetProp CachedBadBBU -lall -a0

Output sample:

Set Write Cache OK if bad BBU on Adapter 0, VD 0 (target id: 0) success
Set Write Cache OK if bad BBU on Adapter 0, VD 1 (target id: 1) success
Set Write Cache OK if bad BBU on Adapter 0, VD 2 (target id: 2) success
Set Write Cache OK if bad BBU on Adapter 0, VD 3 (target id: 3) success
Set Write Cache OK if bad BBU on Adapter 0, VD 4 (target id: 4) success
Set Write Cache OK if bad BBU on Adapter 0, VD 5 (target id: 5) success
Set Write Cache OK if bad BBU on Adapter 0, VD 6 (target id: 6) success
Set Write Cache OK if bad BBU on Adapter 0, VD 7 (target id: 7) success
Set Write Cache OK if bad BBU on Adapter 0, VD 8 (target id: 8) success
Set Write Cache OK if bad BBU on Adapter 0, VD 9 (target id: 9) success
Set Write Cache OK if bad BBU on Adapter 0, VD 10 (target id: 10) success

Confirm the results and check the current caching policy for the Raid card:

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 8(target id: 8): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 9(target id: 9): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 10(target id: 10): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU

Exit Code: 0x00

Note:

The cache policy has changed to Write Cache OK if bad BBU, which means that the cache is not turned off when the battery is charged or discharged.

The above adjustment is to adjust all the VDs, depending on the scene, we can specify the VD to operate on.

If you need to modify the Cache policy to the original value, you can do so with the following commands:

MegaCli -LDSetProp NoCachedBadBBU -lall -a0

Operation Instances

Our ELK machines are two RAID1 disks, which act as system disks; ten data disks make single RAID0.We are now going to turn off the CachedBadBBU on the system disk (all VD caching strategies were previously adjusted to CachedBadBBU) to ensure data security.

# Adjust the cache policy of the VD0 where the system disk is located to NoCachedBadBBU
[root@BJSH-ELK-137-114.meitu-inc.com ~]# MegaCli -LDSetProp NoCachedBadBBU -l0 -a0

Set No Write Cache if bad BBU on Adapter 0, VD 0 (target id: 0) success

Exit Code: 0x00
[root@BJSH-ELK-137-114.meitu-inc.com ~]#

# View the VD0 where the system disk is located
[root@BJSH-ELK-137-114.meitu-inc.com ~]# MegaCli -LDGetProp -Cache -L0 -aAll

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU

Exit Code: 0x00

# View all VD s
[root@BJSH-ELK-137-114.meitu-inc.com ~]# MegaCli -LDGetProp -Cache -LAll -aAll

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 8(target id: 8): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 9(target id: 9): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU
Adapter 0-VD 10(target id: 10): Cache Policy:WriteBack, ReadAhead, Cached, Write Cache OK if bad BBU

Exit Code: 0x00
[root@BJSH-ELK-137-114.meitu-inc.com ~]#

I/O Scheduling Algorithm

Currently, the default is cfq, the algorithm is more moderate, solid state hard disk can be adjusted to noop; for mechanical disk, different applications can compare the performance of other scheduling algorithms such as deadline under test.For applications such as databases, to avoid starvation, it is recommended to adjust to deadline.

File system journal

File system log, opened by default, can be temporarily adjusted.

Disk mount parameters

To improve disk I/O performance, consider adjusting the disk mount parameters to async,noatime,data=writeback,barrier=0,nobh.

Parameter meaning:

async: using asynchronous I/O

noatime: accessing files without modifying meta-information improves file system read-write performance

data=writeback: Enable writeback mode, do not record data journal s, improve file system write performance

barrier=0: close barrier

nobh: Turn off buffer_head to prevent the kernel from interrupting IO operations on large blocks of data

Operation Instances

Adjust the case mount parameters for the ELK server data directory

[root@ELK-133-10 ~]# mount|grep data
/dev/sdc1 on /data1 type xfs (rw,noatime,nodiratime)
/dev/sdd1 on /data2 type xfs (rw,noatime,nodiratime)
/dev/sde1 on /data3 type xfs (rw,noatime,nodiratime)
/dev/sdf1 on /data4 type xfs (rw,noatime,nodiratime)
/dev/sdg1 on /data5 type xfs (rw,noatime,nodiratime)
/dev/sdb1 on /data6 type xfs (rw,noatime,nodiratime,barrier=1)
[root@ELK-133-10 ~]#

# Generate remount command
[root@ELK-133-10 ~]# mount|grep data|awk '{print "mount "$1" "$3" -o remount,rw,noatime,data=writeback,barrier=0,nobh"}'
mount /dev/sdc1 /data1 -o remount,rw,noatime,data=writeback,barrier=0,nobh
mount /dev/sdd1 /data2 -o remount,rw,noatime,data=writeback,barrier=0,nobh
mount /dev/sde1 /data3 -o remount,rw,noatime,data=writeback,barrier=0,nobh
mount /dev/sdf1 /data4 -o remount,rw,noatime,data=writeback,barrier=0,nobh
mount /dev/sdg1 /data5 -o remount,rw,noatime,data=writeback,barrier=0,nobh
mount /dev/sdb1 /data6 -o remount,rw,noatime,data=writeback,barrier=0,nobh
[root@ELK-133-10 ~]#

# Execute the remount command
[root@ELK-133-10 ~]# mount|grep data|awk '{print "mount "$1" "$3" -o remount,rw,noatime,data=writeback,barrier=0,nobh"}'|bash
[root@ELK-133-10 ~]#

# Confirm remount results
[root@ELK-133-10 ~]#  mount|grep data
/dev/sdc1 on /data1 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
/dev/sdd1 on /data2 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
/dev/sde1 on /data3 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
/dev/sdf1 on /data4 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
/dev/sdg1 on /data5 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
/dev/sdb1 on /data6 type xfs (rw,noatime,data=writeback,barrier=0,nobh)
[root@ELK-133-10 ~]#

Performance data comparison

(to be added)

Posted by dolcezza on Tue, 06 Aug 2019 10:44:00 -0700