Ubuntu 18.04升级内核以及内存性能研究

2020/11/17 ubuntu

Ubuntu 18.04升级内核以及内存性能研究

自从用上了 AMD 7002 CPU之后,服务器上出现了 EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?)

通过 suse 的论坛支持,确定这是因为 “AMD ROMA” 这一代太高级了,Linux Kernel 没跟上,所以会出现这个问题。Ubuntu 18.04 的内核版本是 4.15.0-55-generic;Ubuntu 20.04 的内核版本是 5.4.0-33-generic;官网提供的稳定版内核版本是 5.7.2-050702-generic。

那么就要评估要不要升级内核了?是直接升级操作系统还是仅升级内核?

要不要升级内核?

这需要判断这个错误对 性能稳定性 的影响程度。

先整理出两台机器:

  • mtest01:7542, 64G 2933 DDR4 * 8,内核 4.15.0-55-generic
  • mtest02: 7542, 64G 2933 DDR4 * 8,内核 5.7.2-050702-generic

重启后,通过 dmesg 检测,发现 mtest01 出现 EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?) 问题,而 mtest02 没出现这个问题。

memtester 测试

# mtest01
memtester 5G 2
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 5120MB (5368709120 bytes)
got  5120MB (5368709120 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Loop 2/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Done.
# mtest02
memtester 5G 2
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 5120MB (5368709120 bytes)
got  5120MB (5368709120 bytes), trying mlock ...locked.
Loop 1/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Loop 2/2:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok

Done.

sysbench 测试

# mtest01
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16384KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 6400 (  869.54 per second)

102400.00 MiB transferred (13912.57 MiB/sec)


General statistics:
    total time:                          7.3591s
    total number of events:              6400

Latency (ms):
         min:                                  0.99
         avg:                                  1.15
         max:                                 13.56
         95th percentile:                      1.32
         sum:                               7353.34

Threads fairness:
    events (avg/stddev):           6400.0000/0.00
    execution time (avg/stddev):   7.3533/0.00
# mtest02
sysbench --test=memory --memory-block-size=16M --memory-total-size=100G run
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16384KiB
  total size: 102400MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 6400 (  662.80 per second)

102400.00 MiB transferred (10604.75 MiB/sec)


General statistics:
    total time:                          9.6551s
    total number of events:              6400

Latency (ms):
         min:                                  0.95
         avg:                                  1.51
         max:                                 13.93
         95th percentile:                      2.00
         sum:                               9647.11

Threads fairness:
    events (avg/stddev):           6400.0000/0.00
    execution time (avg/stddev):   9.6471/0.00

mbw 测试

数据暂未获取

总结

从三项数据上看,低版本内核的内存性能可能更好,EDAC amd64: Error: F0 not found, device 0x1460 (broken BIOS?) 也未导致明显错误。

细节:更新内核

# 仅针对 Ubuntu
# 打开 https://link.zhihu.com/?target=https%3A//kernel.ubuntu.com/~kernel-ppa/mainline/,查找确定需要的内核,并下载
$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.3/amd64/linux-headers-5.7.3-050703-generic_5.7.3-050703.202006171531_amd64.deb
$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.3/amd64/linux-headers-5.7.3-050703_5.7.3-050703.202006171531_all.deb
$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.3/amd64/linux-image-unsigned-5.7.3-050703-generic_5.7.3-050703.202006171531_amd64.deb
$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7.3/amd64/linux-modules-5.7.3-050703-generic_5.7.3-050703.202006171531_amd64.deb
$ sudo dpkg -i *.deb
$ reboot

$ uname -r

更新内核后,NVIDIA 显卡驱动无法安装。

细节:edac 到底是什么

Error Detection And Correction。Linux 2.6.16 引入系统。该内核模块的目的是发现并报告硬件错误。

EDAC 有一个工具 edac-util,但根据我的测试,在兼容机上执行该指令只会出现 edac-util: Error: No memory controller data found.。经过查询,现已使用 mcesys 来判断硬件信息。

参考

Search

    Table of Contents