签到成功

知道了

CNDBA社区CNDBA社区

DM 达梦 core 分析工具 gdb + dmrdc 说明

2022-12-21 16:01 2603 0 原创 DM 达梦
作者: dave

在之前的博客,我们了解了达梦数据库core 文件和生成的工具,如下:

DM 达梦数据库 Core Dump 文件 说明
https://www.cndba.cn/dave/article/3732
Linux 程序调式命令 GDB 概述
https://www.cndba.cn/dave/article/3731

http://www.cndba.cn/cndba/dave/article/116406

1 Core文件说明

DM 实例故障,即数据库进程 dmserver 出现异常,表现为异常中止,进程存在但无响应或者无法登录的状态,出现此类问题都属于比较严重的故障,一般情况下我们需要尽可能的收集到所需要的信息进行故障分析。

可能会用到的工具和一些术语:

http://www.cndba.cn/cndba/dave/article/116406

  1. core 文件:程序异常时操作系统保留的完整进程的内存镜像文件。
  2. gdb:用于调试执行程序或者 core 文件的工具。
  3. 堆栈:程序执行中的运行情况,详细包含了运行时函数调用数据以及数据相关信息。
  4. dmrdc:DM 数据库提供的自带对 core 文件进行简单分析的小工具,以 core 文件作为输入参数,dmrdc 可以从 core 文件中读出所有异常时活动会话上的 SQL 语句信息。

启用core文件:

[dave@www.cndba.cn bin]$ ulimit -c
0
[dave@www.cndba.cn bin]$ ulimit -c unlimited
[dave@www.cndba.cn bin]$ ulimit -a
core file size          (blocks, -c) unlimited
……
[dave@www.cndba.cn bin]$ ulimit -c
unlimited
[dave@www.cndba.cn bin]$

启用core 之后,在dmserver 异常的情况下,会在$DM_HOEM/bin 下生成对应的core文件。

注意,core 文件很大,生产默认不启用改功能,如果启用,注意磁盘空间使用情况。更多Core 文件说明,参考之前的博客。http://www.cndba.cn/cndba/dave/article/116406http://www.cndba.cn/cndba/dave/article/116406

Linux ulimit -c unlimited 不生成 core文件 解决方法
https://www.cndba.cn/dave/article/116405

2 GDB分析已有Core文件+dmrdc解析

当数据库异常中断宕机产生core文件,通过GDB分析core文件来判断造成数据库宕机的原因。

GDB相关命令

http://www.cndba.cn/cndba/dave/article/116406

  1. Bt:查看当前线程的栈信息
  2. thread apply all bt:输出所有线程的详细栈信息,通常会由此查看是否有自己实现的类或者so库。一般会把所有线程的详细栈信息输出到一个文件里面如thread_info.txt
  3. thread apply all bt:查看所有线程堆栈。
  4. thread apply thread1 thread2… bt:查看指定线程堆栈。
  5. thread N:切换线程。

2.1 查找生成的core文件

[dave@www.cndba.cn bin]$ pwd
/dm/dmdbms/bin
[dave@www.cndba.cn bin]$ ll core*
-rw------- 1 dmdba dinstall 2851975168 12月 21 14:12 core-dmserver-5214-8

2.2 gdb读取core文件

[dave@www.cndba.cn bin]$ gdb ./dmserver core-dmserver-5214-8
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-neokylin-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dm/dmdbms/bin/dmserver...Missing separate debuginfo for /dm/dmdbms/bin/dmserver
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/74/1278afdf90615cb73c5240b653cf8952e42525.debug
(no debugging symbols found)...done.
[New LWP 5214]
[New LWP 5219]
[New LWP 5220]
[New LWP 5221]
[New LWP 5222]
……
[New LWP 5297]
[New LWP 5277]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/dm/dmdbms/bin/dmserver path=/dm/dmdbms/data/DCP/dm.ini -noconsole'.
Program terminated with signal 8, Arithmetic exception.
#0  0x000000000169a6c7 in assert_fun ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.ns7.02.x86_64 libgcc-4.8.5-36.el7.ns7.01.x86_64 libstdc++-4.8.5-36.el7.ns7.01.x86_64
(gdb)

2.3 定义存储堆栈的文件名

(gdb) set logging file core1.txt
(gdb) set logging on
Copying output to core1.txt.
(gdb)

2.4 记录当前所有崩溃线程堆栈

(gdb) thread apply all bt

Thread 78 (Thread 0x7f1243d10700 (LWP 5277)):
#0  0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000043ed9e in os_event2_wait_timeout ()
#2  0x000000000083f718 in purg2_thread ()
#3  0x00007f12f63cfdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f12f58ebead in clone () from /lib64/libc.so.6

Thread 77 (Thread 0x7f123a53b700 (LWP 5297)):
#0  0x00007f12f63d6e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x0000000000450e99 in os_thread_sleep_low ()
#2  0x00000000016b3f6a in nsvr_schedule_thread ()
#3  0x00007f12f63cfdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f12f58ebead in clone () from /lib64/libc.so.6
……
Thread 2 (Thread 0x7f12680ad700 (LWP 5219)):
#0  0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000450d52 in os_semaphore_p ()
#2  0x00000000016b8f79 in nsvr_quit_thread ()
#3  0x00007f12f63cfdd5 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#4  0x00007f12f58ebead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f12f69f1740 (LWP 5214)):
#0  0x000000000169a6c7 in assert_fun ()
#1  0x00000000016b3590 in sigterm_handler ()
#2  <signal handler called>
#3  0x00007f12f63d3963 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x000000000043eefc in os_event2_wait ()
#5  0x00000000016bd488 in main ()

2.5 关闭向文件中写入

(gdb) set logging off
Done logging to core1.txt.
(gdb)

[dave@www.cndba.cn bin]$ pwd
/dm/dmdbms/bin
[dave@www.cndba.cn bin]$ cat core1.txt |more

Thread 78 (Thread 0x7f1243d10700 (LWP 5277)):
#0  0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000043ed9e in os_event2_wait_timeout ()
#2  0x000000000083f718 in purg2_thread ()
#3  0x00007f12f63cfdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f12f58ebead in clone () from /lib64/libc.so.6

Thread 77 (Thread 0x7f123a53b700 (LWP 5297)):
#0  0x00007f12f63d6e3d in nanosleep () from /lib64/libpthread.so.0
#1  0x0000000000450e99 in os_thread_sleep_low ()
#2  0x00000000016b3f6a in nsvr_schedule_thread ()
#3  0x00007f12f63cfdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f12f58ebead in clone () from /lib64/libc.so.6

2.6 记录当前崩溃线程堆栈

(gdb) bt
#0  0x000000000169a6c7 in assert_fun ()
#1  0x00000000016b3590 in sigterm_handler ()
#2  <signal handler called>
#3  0x00007f12f63d3963 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x000000000043eefc in os_event2_wait ()
#5  0x00000000016bd488 in main ()
(gdb)

2.7 记录当前崩溃线程号

输入 info threads 记录当前崩溃线程号:

(gdb)  info threads

备注 : 前面有*为当前线程,LWP后面为线程号
(gdb) info threads
  Id   Target Id         Frame
  78   Thread 0x7f1243d10700 (LWP 5277) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  77   Thread 0x7f123a53b700 (LWP 5297) 0x00007f12f63d6e3d in nanosleep () from /lib64/libpthread.so.0
  76   Thread 0x7f124bdd2700 (LWP 5247) 0x00007f12f63d69dd in connect () from /lib64/libpthread.so.0
  75   Thread 0x7f1262729700 (LWP 5236) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  74   Thread 0x7f1242a4f700 (LWP 5278) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  73   Thread 0x7f124b222700 (LWP 5259) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  72   Thread 0x7f124c4d9700 (LWP 5240) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  71   Thread 0x7f12581e0700 (LWP 5237) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  70   Thread 0x7f124c5da700 (LWP 5239) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  69   Thread 0x7f124c3d8700 (LWP 5241) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  68   Thread 0x7f124c2d7700 (LWP 5242) 0x00007f12f63d3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  67   Thread 0x7f123a43a700 (LWP 5298) 0x00007f12f58e2f73 in select () from /lib64/libc.so.6
  66   Thread 0x7f124bed3700 (LWP 5246) 0x00007f12f58e2f73 in select () from /lib64/libc.so.6
……
  6    Thread 0x7f1263436700 (LWP 5223) 0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7f1263537700 (LWP 5222) 0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7f1263638700 (LWP 5221) 0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7f1263739700 (LWP 5220) 0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7f12680ad700 (LWP 5219) 0x00007f12f63d3965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7f12f69f1740 (LWP 5214) 0x000000000169a6c7 in assert_fun ()

2.8 使用dmrdc工具生成SQL语句

[dave@www.cndba.cn bin]$ ll core*
-rw-r--r-- 1 dmdba dinstall      30088 12月 21 15:36 core1.txt
-rw------- 1 dmdba dinstall 2851975168 12月 21 14:12 core-dmserver-5214-8
[dave@www.cndba.cn bin]$ ./dmrdc sfile=core-dmserver-5214-8
dmrdc V8

Analysing: 0/2851975168
Analysing: 31457268/2851975168
Analysing: 62914536/2851975168
Analysing: 94371804/2851975168
Analysing: 125829072/2851975168
……
Analysing: 2799696852/2851975168
Analysing: 2831154120/2851975168
all the process spent total    4.291 s

[dave@www.cndba.cn bin]$

2.9 生成的文件为core-dmserver-5214-8_tmp

[dave@www.cndba.cn bin]$ ll -lrth core*
-rw------- 1 dmdba dinstall 2.7G 12月 21 14:12 core-dmserver-5214-8
-rw-r--r-- 1 dmdba dinstall  30K 12月 21 15:36 core1.txt
-rw-r--r-- 1 dmdba dinstall    0 12月 21 15:40 core-dmserver-5214-8_tmp
[dave@www.cndba.cn bin]$

2.10 查看core语句

结合dmrdc的结果对应的SQL语句与 info threads可以查看崩溃线程号相对应, 然后开始分析该语句

[dave@www.cndba.cn bin]$ cat core-dmserver-5214-8_tmphttp://www.cndba.cn/cndba/dave/article/116406http://www.cndba.cn/cndba/dave/article/116406

我们这里是测试环境,没有执行,所有为空。

3 给正在运行的DM生成Core

当系统出现异常的时候,dmserver服务又没有宕机,没有自动生成core 时,就需要手动生产core文件。

如果是集群环境,必须先将dmmonitor进程关掉,再将 dmwatcher进程关掉,http://www.cndba.cn/cndba/dave/article/116406http://www.cndba.cn/cndba/dave/article/116406

3.1 gdb调试进程

[dave@www.cndba.cn bin]$ gdb dmserver
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-neokylin-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dm/dmdbms/bin/dmserver...Missing separate debuginfo for /dm/dmdbms/bin/dmserver
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/74/1278afdf90615cb73c5240b653cf8952e42525.debug
(no debugging symbols found)...done.
(gdb)

3.2 通过进程号,attach进入到进程里

[dave@www.cndba.cn ~]# ps -ef|grep dms
dmdba     6213     1  0 15:34 pts/1    00:00:00 /dm/dmdbms/bin/dmserver path=/dm/dmdbms/data/DCP/dm.ini -noconsole
dmdba     6311  4273  0 15:34 pts/1    00:00:00 gdb ./dmserver core-dmserver-5214-8
dmdba     6525  6339  0 15:44 pts/2    00:00:00 gdb dmserver
[dave@www.cndba.cn ~]#


(gdb) attach 6213
Attaching to program: /dm/dmdbms/bin/dmserver, process 6213
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 6297]
[New LWP 6296]
……

Reading symbols from /usr/lib64/gconv/UTF-16.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/gconv/UTF-16.so
0x00007f20ad1ea965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7.ns7.02.x86_64 libgcc-4.8.5-36.el7.ns7.01.x86_64 libstdc++-4.8.5-36.el7.ns7.01.x86_64
(gdb)

3.3 手动生成core文件

(gdb) generate-core-file
warning: target file /proc/6213/cmdline contained unexpected null characters
Saved corefile core.6213
(gdb)

[dave@www.cndba.cn bin]$ pwd
/dm/dmdbms/bin
[dave@www.cndba.cn bin]$ ll core*
-rw-r--r-- 1 dmdba dinstall      30088 12月 21 15:36 core1.txt
-rw-r--r-- 1 dmdba dinstall 3052825992 12月 21 15:46 core.6213
-rw------- 1 dmdba dinstall 2851975168 12月 21 14:12 core-dmserver-5214-8
-rw-r--r-- 1 dmdba dinstall          0 12月 21 15:40 core-dmserver-5214-8_tmp
[dave@www.cndba.cn bin]$

3.4 Detach并退出 —离开进程

(gdb) detach
Detaching from program: /dm/dmdbms/bin/dmserver, process 6213
 (gdb) quit
[dave@www.cndba.cn bin]$

分析完成后,先开启dmwatcher,再将dmmonitor开启。http://www.cndba.cn/cndba/dave/article/116406

4 给正在运行的DM生成线程堆栈

[dave@www.cndba.cn ~]# ps -ef|grep dms
dmdba     6213     1  0 15:34 pts/1    00:00:00 /dm/dmdbms/bin/dmserver path=/dm/dmdbms/data/DCP/dm.ini -noconsole
root      6562  4171  0 15:47 pts/0    00:00:00 grep --color=auto dms
[dave@www.cndba.cn ~]#

4.1 gdb调试进程

[dave@www.cndba.cn bin]$ gdb dmserver
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-neokylin-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dm/dmdbms/bin/dmserver...Missing separate debuginfo for /dm/dmdbms/bin/dmserver
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/74/1278afdf90615cb73c5240b653cf8952e42525.debug
(no debugging symbols found)...done.

4.2 通过进程号,attach进入到进程里

(gdb) attach 6213
Attaching to program: /dm/dmdbms/bin/dmserver, process 6213
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 6297]
[New LWP 6296]

4.3 定义存储堆栈的文件名

(gdb) set logging file cndba.txt
(gdb) set logging on
Copying output to cndba.txt.

4.4 记录当前所有崩溃线程堆栈 -一直回车到什么都不输出

(gdb) thread apply all bt

Thread 78 (Thread 0x7f208a3fc700 (LWP 6216)):
#0  0x00007f20ad1ead12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000043ee0f in os_event2_wait_timeout ()
#2  0x00000000004d552d in tlog_flush_thread ()
#3  0x00007f20ad1e6dd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f20ac702ead in clone () from /lib64/libc.so.6
……

4.5 关闭向文件中写入并退出

(gdb) set logging off
Done logging to cndba.txt.
(gdb) detach
Detaching from program: /dm/dmdbms/bin/dmserver, process 6213
(gdb) quit

5 打印DM某个线程的堆栈

[dave@www.cndba.cn ~]# ps -ef|grep dms
dmdba     6213     1  0 15:34 pts/1    00:00:00 /dm/dmdbms/bin/dmserver path=/dm/dmdbms/data/DCP/dm.ini -noconsole
root      6618  4171  0 15:51 pts/0    00:00:00 grep --color=auto dms
[dave@www.cndba.cn ~]#

通过TOP -H -p 进程ID,找到具体的线程占用情况,Shift+H可以开启关闭线程显示

[dave@www.cndba.cn ~]# top -Hp 6213
top - 15:51:56 up  2:13,  4 users,  load average: 0.00, 0.02, 0.00
Threads:  78 total,   0 running,  78 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1987044 total,   100068 free,   697492 used,  1189484 buff/cache
KiB Swap:  3014652 total,  3014384 free,      268 used.  1177380 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 6213 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.13 dmserver
 6216 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_sqllog_thd
 6218 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_quit_thd
 6219 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_io_thd
 6220 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_io_thd
 6221 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_io_thd
 6222 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_io_thd
 6223 dmdba     20   0 3180560 420176  49064 S  0.0 21.1   0:00.00 dm_io_thd
……

通过命令pstack 进程ID显示线程堆栈,LWP 6223对应线程ID的堆栈,就是占用CPU最高的堆栈,可以具体分析什么原因造成的。

[dave@www.cndba.cn ~]# pstack 6223
Thread 1 (process 6223):
#0  0x00007f20ad1ea965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000450d52 in os_semaphore_p ()
#2  0x000000000044c41a in os_io_thread_sema ()
#3  0x000000000044c8db in os_io_thread ()
#4  0x00007f20ad1e6dd5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f20ac702ead in clone () from /lib64/libc.so.6
[dave@www.cndba.cn ~]#

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2283
    原创
  • 3
    翻译
  • 579
    转载
  • 196
    评论
  • 访问:8173435次
  • 积分:4428
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ