文档结构  
翻译进度:已翻译     翻译赏金:0 元 (?)    ¥ 我要打赏

我有点沮丧,因为在线的“gdb示例“虽然显示了所有命令行,但没显示输出。GDB是GNU的调试器,基于Linux标准的调试器。我想起了2015年Greg Law cppcon所谈到的: 给我15分钟,我会改变你对GDB观点 ,其中,感谢,包括输出!这是超值的15分钟。

它让我收到启发,让我也来分享一个完整的GDB调试的例子,输出的每一步,包括结尾。这不是一个特别有趣的或另类的问题,这只是正常的GDB调试会话。但它涉及不少基本知识,并可以作为一个教程,还有啊,GDB的命令比我这里使用过的要多的多。

我将使用root权限运行下面的命令,因为我调试的工具,现在需要root权限访问。不使用root权限或者sudo 都不行,你不希望看到下面的事发生吧:我列举了每一个步骤,这样你就可以清晰易懂的看到结果。

第 1 段(可获 2.33 积分)

1. 问题

BCC 收集的BPF工具有一个 拉取请求,在顶部采用缓存显示进度的方式表示出来。太棒了!然而,当我测试它时碰到了一段错误:

# ./cachetop.py
Segmentation fault

请注意,它说“Segmentation fault”,而不是“Segmentation fault (core dumped)”。我想要一个核心dump文件来调试它。(一个核心dump是进程内存的副本-名称来自磁芯内存时代-并且可以使用调试器进行调试。)

核心转储分析是一种方法,但不是唯一的调试。我可以运行的程序内存用GDB检查问题。我可以使用一个外部追踪程序抓取数据和堆栈错误以及事件痕迹。下面我们开始分析这个核心dump。

第 2 段(可获 1.65 积分)

2. 修复Core Dumps

我将检查core dump 设置:

# ulimit -c
0
# cat /proc/sys/kernel/core_pattern
core

ulimit -c 显示允许创建的core dump文件的大小, 并且它的值为0: 禁止生成core dumps (对当前进程和它的子进程来说).

/proc/.../core_pattern设置的是 "core",它将会在当前目录生成一个叫"core"的 core dump 文件. 目前来说这是可行的, 但是我将展示如果为它设置一个全局路径:

# ulimit -c unlimited
# mkdir /var/cores
# echo "/var/cores/core.%e.%p" > /proc/sys/kernel/core_pattern

你可以进一步设置 core_pattern; 如, %h 代表主机名,并且 %t代表生成dump文件时间. 在Linux内核源代码的文档中有这些选项,文档位置是/sysctl/kernel.txt.

为了让core_pattern 永久生效,即使是重启, 你可以在文件/etc/sysctl.conf中设置 "kernel.core_pattern" 它 .

再次尝试:

# ./cachetop.py
Segmentation fault (core dumped)
# ls -lh /var/cores
total 19M
-rw------- 1 root root 20M Aug  7 22:15 core.python.30520
# file /var/cores/core.python.30520 
/var/cores/core.python.30520: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python ./cachetop.py'

这次更好: 我们有了自己的core dump.

第 3 段(可获 1.79 积分)

3. 开始GDB

现在我将在目标程序位置运行 gdb  (使用shell替代, "`", 除非你可以确定它能正常工作,否则你应该指定完整的路径), 并且后台跟着 core dump 文件:

# gdb `which python` /var/cores/core.python.30520
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 30520]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.
Core was generated by `python ./cachetop.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f0a37aac40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5

最后两行特别有趣: 它告诉我们这是libncursesw 库中doupdate()函数的内存地址错误. 这是一个众所周知的问题,它值得我们通过网络搜索一下. 我快速的看了下但没有找到一个相同的原因.

我已经可以猜到libncursesw是什么, 但如果你对它很陌生, 那么它在 "/lib"目录且以".so.*"结尾, 这显示它是一个共享库, 这里可能有帮助手册, 网站,包的描述, 等.

# dpkg -l | grep libncursesw
ii  libncursesw5:amd64                  6.0+20160213-1ubuntu1                    amd64
     shared libraries for terminal handling (wide character support)

我在Ubuntu上调试它, gdb可以在任何Linux的发行版上正常使用.

第 4 段(可获 1.78 积分)

4. 回溯

堆栈回溯显示我们的程序如何崩溃的,是常用的识别问题方法。这通常是我用gdb回话的第一个命令: bt (简短的回溯命令):

(gdb) bt
#0  0x00007f0a37aac40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
#1  0x00007f0a37aa07e6 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.5
#2  0x00007f0a37a99616 in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.5
#3  0x00007f0a37a9a325 in wgetch () from /lib/x86_64-linux-gnu/libncursesw.so.5
#4  0x00007f0a37cc6ec3 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#5  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#6  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#7  0x00000000004def08 in ?? ()
#8  0x00000000004b1153 in PyObject_Call ()
#9  0x00000000004c73ec in PyEval_EvalFrameEx ()
#10 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#11 0x00000000004caf42 in PyEval_EvalFrameEx ()
#12 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#13 0x00000000004c2ba9 in PyEval_EvalCode ()
#14 0x00000000004f20ef in ?? ()
#15 0x00000000004eca72 in PyRun_FileExFlags ()
#16 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#17 0x000000000049e18a in Py_Main ()
#18 0x00007f0a3be10830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7ffd33d94838, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffd33d94828) at ../csu/libc-start.c:291
#19 0x000000000049da19 in _start ()

从底部向上读,从父类到子类。“??“是找不到符号的意思。堆栈产生的过程中也可能出现错误。在这种情况下,您可能会看到一个有效的帧,然后少量的内存地址。如果符号残缺堆栈跟踪就变得很难,那么通常的解决办法:安装调试信息包(给GDB更多的符号,并让它做 DWARF-基于堆栈),或重新编译软件框架以及调试信息源(-fno-omit-frame-pointer -g)。上面的许多“??”条目可以通过安装python-dbg包解决。

这种特殊的堆栈看起来没什么用:队列中5到17行(看左边的索引)是Python内部,虽然我们看不见的Python方法(目前还看不到)。然后4行是_curses lib,然后我们在libncursesw。看起来是这样 wgetch() -> wrefresh() -> doupdate()。基于名字,我猜是窗口的刷新。为什么会是核心转储呢?

第 5 段(可获 2.69 积分)

5. 分解

我们开始分解doupdate()函数的内存地址错误:

(gdb) disas doupdate
Dump of assembler code for function doupdate:
   0x00007f0a37aac2e0 <+0>:   push   %r15
   0x00007f0a37aac2e2 <+2>:   push   %r14
   0x00007f0a37aac2e4 <+4>:   push   %r13
   0x00007f0a37aac2e6 <+6>:   push   %r12
   0x00007f0a37aac2e8 <+8>:   push   %rbp
   0x00007f0a37aac2e9 <+9>:   push   %rbx
   0x00007f0a37aac2ea <+10>:  sub    $0xc8,%rsp
[...]
---Type <return> to continue, or q <return> to quit---
[...]
   0x00007f0a37aac3f7 <+279>: cmpb   $0x0,0x21(%rcx)
   0x00007f0a37aac3fb <+283>: je     0x7f0a37aacc3b <doupdate+2395>
   0x00007f0a37aac401 <+289>: mov    0x20cb68(%rip),%rax        # 0x7f0a37cb8f70
   0x00007f0a37aac408 <+296>: mov    (%rax),%rsi
   0x00007f0a37aac40b <+299>: xor    %eax,%eax
=> 0x00007f0a37aac40d <+301>: mov    0x10(%rsi),%rdi
   0x00007f0a37aac411 <+305>: cmpb   $0x0,0x1c(%rdi)
   0x00007f0a37aac415 <+309>: jne    0x7f0a37aac6f7 <doupdate+1047>
   0x00007f0a37aac41b <+315>: movswl 0x4(%rcx),%ecx
   0x00007f0a37aac41f <+319>: movswl 0x74(%rdx),%edi
   0x00007f0a37aac423 <+323>: mov    %rax,0x40(%rsp)
[...]

截取了部分输出. (我也可以只输入"disas"并且它会默认为加上doupdate.)

这个箭头 "=>"指明了内存地址的错误, 它做的操作是 mov 0x10(%rsi),%rdi: 指向内存%rsi 寄存器加上一个偏移量 0x10, 移动到 %rdi 寄存器. 我将检查下一个寄存器的状态.

第 6 段(可获 0.93 积分)

6. 检查寄存器

输出寄存器状态使用 i r (这旨info registers的简写形式):

(gdb) i r
rax            0x0  0
rbx            0x1993060    26816608
rcx            0x19902a0    26804896
rdx            0x19ce7d0    27060176
rsi            0x0  0
rdi            0x19ce7d0    27060176
rbp            0x7f0a3848eb10   0x7f0a3848eb10 <SP>
rsp            0x7ffd33d93c00   0x7ffd33d93c00
r8             0x7f0a37cb93e0   139681862489056
r9             0x0  0
r10            0x8  8
r11            0x202    514
r12            0x0  0
r13            0x0  0
r14            0x7f0a3848eb10   139681870703376
r15            0x19ce7d0    27060176
rip            0x7f0a37aac40d   0x7f0a37aac40d <doupdate+301>
eflags         0x10246  [ PF ZF IF RF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0

好吧, %rsi是0. 这是我们的问题! 0不可能是一个有效的地址,这种类型的内存地址错误是一种常见的软件错误: 引用的是一个未初始化的指针或NULL.

第 7 段(可获 0.54 积分)

7. 内存映射

你可以检查0是否有效 使用 i proc m ( info proc mappings的简写形式):

(gdb) i proc m
Mapped address spaces:

      Start Addr           End Addr       Size     Offset objfile
        0x400000           0x6e7000   0x2e7000        0x0 /usr/bin/python2.7
        0x8e6000           0x8e8000     0x2000   0x2e6000 /usr/bin/python2.7
        0x8e8000           0x95f000    0x77000   0x2e8000 /usr/bin/python2.7
  0x7f0a37a8b000     0x7f0a37ab8000    0x2d000        0x0 /lib/x86_64-linux-gnu/libncursesw.so.5.9
  0x7f0a37ab8000     0x7f0a37cb8000   0x200000    0x2d000 /lib/x86_64-linux-gnu/libncursesw.so.5.9
  0x7f0a37cb8000     0x7f0a37cb9000     0x1000    0x2d000 /lib/x86_64-linux-gnu/libncursesw.so.5.9
  0x7f0a37cb9000     0x7f0a37cba000     0x1000    0x2e000 /lib/x86_64-linux-gnu/libncursesw.so.5.9
  0x7f0a37cba000     0x7f0a37ccd000    0x13000        0x0 /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
  0x7f0a37ccd000     0x7f0a37ecc000   0x1ff000    0x13000 /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
  0x7f0a37ecc000     0x7f0a37ecd000     0x1000    0x12000 /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
  0x7f0a37ecd000     0x7f0a37ecf000     0x2000    0x13000 /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
  0x7f0a38050000     0x7f0a38066000    0x16000        0x0 /lib/x86_64-linux-gnu/libgcc_s.so.1
  0x7f0a38066000     0x7f0a38265000   0x1ff000    0x16000 /lib/x86_64-linux-gnu/libgcc_s.so.1
  0x7f0a38265000     0x7f0a38266000     0x1000    0x15000 /lib/x86_64-linux-gnu/libgcc_s.so.1
  0x7f0a38266000     0x7f0a3828b000    0x25000        0x0 /lib/x86_64-linux-gnu/libtinfo.so.5.9
  0x7f0a3828b000     0x7f0a3848a000   0x1ff000    0x25000 /lib/x86_64-linux-gnu/libtinfo.so.5.9
[...]

第一个有效的虚拟地址是0x400000. 后面比它小的都是无效的, 如果引用, 将触发一个内存地址异常.

在这一点上有几种不同的方法可以进一步挖掘.我将从一些指令单步进入开始.

第 8 段(可获 0.73 积分)

8. 断点

回到分解点:

   0x00007f0a37aac401 <+289>:   mov    0x20cb68(%rip),%rax        # 0x7f0a37cb8f70
   0x00007f0a37aac408 <+296>:   mov    (%rax),%rsi
   0x00007f0a37aac40b <+299>:   xor    %eax,%eax
=> 0x00007f0a37aac40d <+301>:   mov    0x10(%rsi),%rdi

阅读这四个指令:它看起来像是从堆栈中拉取某些东西到%rax中去,然后引用%rax 变成%rsi,设置%eax为零(异或是为了取代mov 0 而做的优化),然后我们引用% RSI的偏移地址,我们已经知道% RSI是零。这个序列是用于记录数据结构。也许%rax里会很有趣,但它已经由之前的指令设置为零,所以我们不能在核心dump上查看。

我在doupdate + 289中设置一个断点,然后单步执行每条指令看看寄存器的设置和更改。首先,我需要启动gdb,我们执行程序的命令:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86\_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.
第 9 段(可获 1.61 积分)

现在使用 b 设置断点(简短的断点命令):

(gdb) b *doupdate + 289
No symbol table is loaded.  Use the "file" command.

哎呀.我想让这个错误解释我们为什么会在main 的开始处 断点不起作用,这时断点处的符号看起来已经加载了,要设置真正的断点。我会直接去 doupdate 函数的入口,运行它,然后在函数的切换点设置偏移断点:

(gdb) b doupdate
Function "doupdate" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (doupdate) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.

Breakpoint 1, 0x00007ffff34ad2e0 in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) b *doupdate + 289
Breakpoint 2 at 0x7ffff34ad401
(gdb) c
Continuing.

Breakpoint 2, 0x00007ffff34ad401 in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5

到达我们设置的断点。

如果你还没有这样做过,这 r (run)命令附加参数将被传递到我们前面指定的命令行上(Python)。所以最后运行"python cachetop.py"。

第 10 段(可获 1.35 积分)

9. 步进

我会一步一个指令(Si,stepi的简写)然后检查寄存器:

(gdb) si
0x00007ffff34ad408 in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) i r
rax            0x7ffff3e8f948   140737285519688
rbx            0xaea060 11444320
rcx            0xae72a0 11432608
rdx            0xa403d0 10748880
rsi            0x7ffff7ea8e10   140737352732176
rdi            0xa403d0 10748880
rbp            0x7ffff3e8fb10   0x7ffff3e8fb10 <SP>
rsp            0x7fffffffd390   0x7fffffffd390
r8             0x7ffff36ba3e0   140737277305824
r9             0x0  0
r10            0x8  8
r11            0x202    514
r12            0x0  0
r13            0x0  0
r14            0x7ffff3e8fb10   140737285520144
r15            0xa403d0 10748880
rip            0x7ffff34ad408   0x7ffff34ad408 <doupdate+296>
eflags         0x202    [ IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0
(gdb) p/a 0x7ffff3e8f948
$1 = 0x7ffff3e8f948 <cur_term>

另一条线索。所以我们取消引用的空指针看起来像是在一个名叫“cur_term”的符号(p/a 是Print/a的简写,其中“/a”意思是格式化地址)。鉴于这是简短说明,是我们的终端环境设置的一些奇怪的东西。

# echo $TERM
xterm-256color

我试着设置为VT100然后运行程序,但依旧是一样的错误。

请注意,我只是 检查doupdate()第一次调用,但它可以被多次调用,而这一问题以后也会出现。我可以通过每一步输入 C (continue的简写)。如果它只调用了几次这样做还不错,但如果它被调用几千次,我就得考虑用其他方法。(我会在15节再讲这个。)

第 11 段(可获 1.86 积分)

10. 反向步进

Greg Law 曾经说过,gdb有一个非常棒的命令,反向步进,这是例子。

我会再启动一个Python会话,表示重新开始:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86\_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python...(no debugging symbols found)...done.

现在和以前一样,我要在doupdate设置一个断点 ,但一旦它被设置,我会记录,然后继续程序,让它崩溃。记录内存地址增加了不小的开销,所以我不想把它记录在main上。

(gdb) b doupdate
Function "doupdate" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (doupdate) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.

Breakpoint 1, 0x00007ffff34ad2e0 in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) record
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff34ad40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
第 12 段(可获 0.95 积分)

在这一时间点上,我可以通过命令反向执行。它的运行原理是从我们的记录中回放寄存器状态。我会向上返回两个指令,然后打印寄存器:

(gdb) reverse-stepi
0x00007ffff34ad40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) reverse-stepi
0x00007ffff34ad40b in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) i r
rax            0x7ffff3e8f948   140737285519688
rbx            0xaea060 11444320
rcx            0xae72a0 11432608
rdx            0xa403d0 10748880
rsi            0x0  0
rdi            0xa403d0 10748880
rbp            0x7ffff3e8fb10   0x7ffff3e8fb10 <SP>
rsp            0x7fffffffd390   0x7fffffffd390
r8             0x7ffff36ba3e0   140737277305824
r9             0x0  0
r10            0x8  8
r11            0x302    770
r12            0x0  0
r13            0x0  0
r14            0x7ffff3e8fb10   140737285520144
r15            0xa403d0 10748880
rip            0x7ffff34ad40b   0x7ffff34ad40b <doupdate+299>
eflags         0x202    [ IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0
(gdb) p/a 0x7ffff3e8f948
$1 = 0x7ffff3e8f948 <cur_term>

所以,重新找到“cur_term”这一提示。这时候我真想阅读源代码,但我得从调试上获取信息。

第 13 段(可获 0.73 积分)

11. 调试信息

这是 libncursesw,并且我没有安装它的调试信息(Ubuntu):

# apt-cache search libncursesw
libncursesw5 - shared libraries for terminal handling (wide character support)
libncursesw5-dbg - debugging/profiling libraries for ncursesw
libncursesw5-dev - developer's libraries for ncursesw
# dpkg -l | grep libncursesw
ii  libncursesw5:amd64                  6.0+20160213-1ubuntu1                    amd64        shared libraries for terminal handling (wide character support)

我会安装它:

# apt-get install -y libncursesw5-dbg
Reading package lists... Done
Building dependency tree       
Reading state information... Done
[...]
After this operation, 2,488 kB of additional disk space will be used.
Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu xenial/main amd64 libncursesw5-dbg amd64 6.0+20160213-1ubuntu1 [729 kB]
Fetched 729 kB in 0s (865 kB/s)          
Selecting previously unselected package libncursesw5-dbg.
(Reading database ... 200094 files and directories currently installed.)
Preparing to unpack .../libncursesw5-dbg_6.0+20160213-1ubuntu1_amd64.deb ...
Unpacking libncursesw5-dbg (6.0+20160213-1ubuntu1) ...
Setting up libncursesw5-dbg (6.0+20160213-1ubuntu1) ...
# dpkg -l | grep libncursesw
ii  libncursesw5:amd64                  6.0+20160213-1ubuntu1                    amd64        shared libraries for terminal handling (wide character support)
ii  libncursesw5-dbg                    6.0+20160213-1ubuntu1                    amd64        debugging/profiling libraries for ncursesw
第 14 段(可获 0.24 积分)

太好了, 这些版本都匹配. 所以现在我们的内存地址错误看起来是什么样呢?

# gdb `which python` /var/cores/core.python.30520
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
[...]
warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.
Core was generated by `python ./cachetop.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ClrBlank (win=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
1129        if (back_color_erase)
(gdb) bt
#0  ClrBlank (win=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
#1  ClrUpdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1147
#2  doupdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1010
#3  0x00007f0a37aa07e6 in wrefresh (win=win@entry=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_refresh.c:65
#4  0x00007f0a37a99499 in recur_wrefresh (win=win@entry=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:384
#5  0x00007f0a37a99616 in _nc_wgetch (win=win@entry=0x1993060, result=result@entry=0x7ffd33d93e24, use_meta=1)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:491
#6  0x00007f0a37a9a325 in wgetch (win=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:672
#7  0x00007f0a37cc6ec3 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#8  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#9  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#10 0x00000000004def08 in ?? ()
#11 0x00000000004b1153 in PyObject_Call ()
#12 0x00000000004c73ec in PyEval_EvalFrameEx ()
#13 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#14 0x00000000004caf42 in PyEval_EvalFrameEx ()
#15 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#16 0x00000000004c2ba9 in PyEval_EvalCode ()
#17 0x00000000004f20ef in ?? ()
#18 0x00000000004eca72 in PyRun_FileExFlags ()
#19 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#20 0x000000000049e18a in Py_Main ()
#21 0x00007f0a3be10830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7ffd33d94838, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffd33d94828) at ../csu/libc-start.c:291
#22 0x000000000049da19 in _start ()

跟踪的堆栈信息看起来有点不同: 发生的错误并不是真的在doupdate()函数, 而是ClrBlank()函数, ClrBlank()是由ClrUpdate()调用, 并且ClrUpdate()是由doupdate()调用.

现在我真的很想看看源码.

第 15 段(可获 0.54 积分)

12. 源代码

由于已安装调试信息包, gdb 可以根据组件列出它的源代码:

(gdb) disas/s
Dump of assembler code for function doupdate:
/build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:
759 {
   0x00007f0a37aac2e0 <+0>:   push   %r15
   0x00007f0a37aac2e2 <+2>:   push   %r14
   0x00007f0a37aac2e4 <+4>:   push   %r13
   0x00007f0a37aac2e6 <+6>:   push   %r12
[...]
   0x00007f0a37aac3dd <+253>: jne    0x7f0a37aac6ca <doupdate+1002>

1009        if (CurScreen(SP_PARM)->_clear || NewScreen(SP_PARM)->_clear) {   /* force refresh ? */
   0x00007f0a37aac3e3 <+259>: mov    0x80(%rdx),%rax
   0x00007f0a37aac3ea <+266>: mov    0x88(%rdx),%rcx
   0x00007f0a37aac3f1 <+273>: cmpb   $0x0,0x21(%rax)
   0x00007f0a37aac3f5 <+277>: jne    0x7f0a37aac401 <doupdate+289>
   0x00007f0a37aac3f7 <+279>: cmpb   $0x0,0x21(%rcx)
   0x00007f0a37aac3fb <+283>: je     0x7f0a37aacc3b <doupdate+2395>

1129        if (back_color_erase)
   0x00007f0a37aac401 <+289>: mov    0x20cb68(%rip),%rax        # 0x7f0a37cb8f70
   0x00007f0a37aac408 <+296>: mov    (%rax),%rsi

1128        NCURSES_CH_T blank = blankchar;
   0x00007f0a37aac40b <+299>: xor    %eax,%eax

1129        if (back_color_erase)
=> 0x00007f0a37aac40d <+301>: mov    0x10(%rsi),%rdi
   0x00007f0a37aac411 <+305>: cmpb   $0x0,0x1c(%rdi)
   0x00007f0a37aac415 <+309>: jne    0x7f0a37aac6f7 <doupdate+1047>

太棒了! 看箭头 "=>" 和它的上一行代码. 我们的内存地址错误是由于"if (back_color_erase)"? 这似乎不太可能.

在这一点上,我仔细的检查了安装的调试信息版本, 并重新运行了应用程序,内存地址错误进入到gdb的同一个地方.

难道 back_color_erase有什么特殊情况? 我又进入到ClrBlank(), 因此我将列出源代码:

(gdb) list ClrBlank
1124    
1125    static NCURSES_INLINE NCURSES_CH_T
1126    ClrBlank(NCURSES_SP_DCLx WINDOW *win)
1127    {
1128        NCURSES_CH_T blank = blankchar;
1129        if (back_color_erase)
1130        AddAttr(blank, (AttrOf(BCE_BKGD(SP_PARM, win)) & BCE_ATTRS));
1131        return blank;
1132    }
1133    

啊,它没有在函数中定义, 这是一个共性问题嘛?

第 16 段(可获 1.34 积分)

13. TUI

GDB的用户界面(TUI)很棒,可我没用过,但在Greg的谈话中受到过启发。

你可以用这个命令启动它 --tui:

# gdb --tui `which python` /var/cores/core.python.30520
   ┌───────────────────────────────────────────────────────────────────────────┐
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │             [ No Source Available ]                                       │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   │                                                                           │
   └───────────────────────────────────────────────────────────────────────────┘
None No process In:                                                L??   PC: ?? 
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
---Type  to continue, or q  to quit---

这是它在抱怨没有Python源代码。我能解决这个问题,但是我们在libncursesw崩溃处回车,让它完成加载,此时它加载的libncursesw调试信息的源代码如下:

   ┌──/build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c──────┐
   │1124                                                                       │
   │1125    static NCURSES_INLINE NCURSES_CH_T                                 │
   │1126    ClrBlank(NCURSES_SP_DCLx WINDOW *win)                              │
   │1127    {                                                                  │
   │1128        NCURSES_CH_T blank = blankchar;                                │
  >│1129        if (back_color_erase)                                          │
   │1130            AddAttr(blank, (AttrOf(BCE_BKGD(SP_PARM, win)) & BCE_ATTRS)│
   │1131        return blank;                                                  │
   │1132    }                                                                  │
   │1133                                                                       │
   │1134    /*                                                                 │
   │1135    **      ClrUpdate()                                                │
   │1136    **                                                                 │
   └───────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7f0a3c5e87 In: doupdate            L1129 PC: 0x7f0a37aac40d 
warning: JITed object file architecture unknown is not compatible with target ar
chitecture i386:x86-64.
---Type <return> to continue, or q <return> to quit---
Core was generated by `python ./cachetop.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ClrBlank (win=0x1993060)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
(gdb) 

赞!

箭头“>”显示了我们在代码中哪一行。在分离式布局,可拆分窗口中, 它将变得更好:

   ┌──/build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c──────┐
  >│1129        if (back_color_erase)                                          │
   │1130            AddAttr(blank, (AttrOf(BCE_BKGD(SP_PARM, win)) & BCE_ATTRS)│
   │1131        return blank;                                                  │
   │1132    }                                                                  │
   │1133                                                                       │
   │1134    /*                                                                 │
   │1135    **      ClrUpdate()                                                │
   └───────────────────────────────────────────────────────────────────────────┘
  >│0x7f0a37aac40d <doupdate+301>   mov    0x10(%rsi),%rdi                     │
   │0x7f0a37aac411 <doupdate+305>   cmpb   $0x0,0x1c(%rdi)                     │
   │0x7f0a37aac415 <doupdate+309>   jne    0x7f0a37aac6f7 <doupdate+1047>      │
   │0x7f0a37aac41b <doupdate+315>   movswl 0x4(%rcx),%ecx                      │
   │0x7f0a37aac41f <doupdate+319>   movswl 0x74(%rdx),%edi                     │
   │0x7f0a37aac423 <doupdate+323>   mov    %rax,0x40(%rsp)                     │
   │0x7f0a37aac428 <doupdate+328>   movl   $0x20,0x48(%rsp)                    │
   │0x7f0a37aac430 <doupdate+336>   movl   $0x0,0x4c(%rsp)                     │
   └───────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7f0a3c5e87 In: doupdate            L1129 PC: 0x7f0a37aac40d 

chitecture i386:x86-64.
Core was generated by `python ./cachetop.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
---Type <return> to continue, or q <return> to quit---
#0  ClrBlank (win=0x1993060)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
(gdb) layout split

Greg用反向步进做了演示,所以您可以想象同时执行代码和程序集执行(我需要一个视频来演示才行)。

第 17 段(可获 1.64 积分)

14. External: cscope

关于 back_color_erase我还想了解更多, 并且我可以尝试使用gdb搜索命令, 但我发现使用一个外部工具cscope将会搜索的更快. cscope是贝尔实验室1980年开发的一个基于文本源代码的浏览器. 你可以用自己喜欢的IDE工具来代替它。

设置 cscope:

# apt-get install -y cscope
# wget http://archive.ubuntu.com/ubuntu/pool/main/n/ncurses/ncurses_6.0+20160213.orig.tar.gz
# tar xvf ncurses_6.0+20160213.orig.tar.gz
# cd ncurses-6.0-20160213
# cscope -bqR
# cscope -dq

cscope -bqR建立查找数据库. cscope -dq然后启动cscope.

搜索 back_color_erase 的定义:

Cscope version 15.8b                                   Press the ? key for help

Find this C symbol:
Find this global definition: back_color_erase
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
Find assignments to this symbol:

回车:

[...]
#define non_dest_scroll_region         CUR Booleans[26]
#define can_change                     CUR Booleans[27]
#define back_color_erase               CUR Booleans[28]
#define hue_lightness_saturation       CUR Booleans[29]
#define col_addr_glitch                CUR Booleans[30]
#define cr_cancels_micro_mode          CUR Booleans[31]
[...]
第 18 段(可获 1.06 积分)

哦, 看见了 #define。(他们至少可以利用#define作为常见风格。)

好的,那CUR是什么呢?查找cscope的定义是弱类型

#define CUR cur_term->type.                                                     

至少 #define 是大写的!

我们知道 cur_term 早就采用步进指令和检查记录。这是怎么一回事?

#if 0 && !0
extern NCURSES_EXPORT_VAR(TERMINAL *) cur_term;
#elif 0
NCURSES_WRAPPED_VAR(TERMINAL *, cur_term);
#define cur_term   NCURSES_PUBLIC_VAR(cur_term())
#else
extern NCURSES_EXPORT_VAR(TERMINAL *) cur_term;
#endif

cscope在/usr/include/term.h这,因为有这么多宏定义,所以我认为必须要有高亮代码,为什么会有一个“if 0 && !0 ... elif 0”呢?我不知道(我需要读更多的资料)。有时程序员使用“#if 0“在调试代码为了禁止编译,但是这里是自动生成的。

查找 ncurses_export_var 发现:

#  define NCURSES_EXPORT_VAR(type) NCURSES_IMPEXP type

... and NCURSES_IMPEXP:

/* Take care of non-cygwin platforms */
#if !defined(NCURSES_IMPEXP)          
#  define NCURSES_IMPEXP /* nothing */
#endif                                
#if !defined(NCURSES_API)             
#  define NCURSES_API /* nothing */   
#endif                                
#if !defined(NCURSES_EXPORT)          
#  define NCURSES_EXPORT(type) NCURSES_IMPEXP type NCURSES_API
#endif                                
#if !defined(NCURSES_EXPORT_VAR)      
#  define NCURSES_EXPORT_VAR(type) NCURSES_IMPEXP type
#endif  
第 19 段(可获 1.63 积分)

... 这里的TERMINAL是这样的:

typedef struct term {       /* describe an actual terminal */
    TERMTYPE    type;       /* terminal type description */
    short   Filedes;    /* file description being written to */
    TTY     Ottyb,      /* original state of the terminal */
        Nttyb;      /* current state of the terminal */
    int     _baudrate;  /* used to compute padding */
    char *      _termname;      /* used for termname() */
} TERMINAL;

嗯!现在 TERMINAL是大写的。伴随这个宏的出现,代码是不那么好读懂了...

好的,谁设置了 cur_term呢?记住,我们的问题是,它设置为零,也许是因为它未初始化或者被别人故意设置的。查看一下设置它的代码路径可能会提供更多的线索,以帮助回答为什么它没有被设置,或为什么它被设置为零。使用cscope第一个选项:

Find this C symbol: cur_term
Find this global definition:
Find functions called by this function:
Find functions calling this function:
[...]

快速查找之后。发现了这里:

NCURSES_EXPORT(TERMINAL *)
NCURSES_SP_NAME(set_curterm) (NCURSES_SP_DCLx TERMINAL * termp)
{
    TERMINAL *oldterm;

    T((T_CALLED("set_curterm(%p)"), (void *) termp));

    _nc_lock_global(curses);
    oldterm = cur_term;
    if (SP_PARM)
    SP_PARM->_term = termp;
#if USE_REENTRANT
    CurTerm = termp;
#else
    cur_term = termp;
#endif

我加了高亮。即使函数名被编写在一个宏定义中。但至少我们已经找到了howcur_term 的设置:通过 set_curterm()设置的。也许它没被执行?

第 20 段(可获 1.48 积分)

15. 组件: perf-tools/ftrace/uprobes

我会在这时候使用GDB,但我忍不住选择 perf-tools 中的uprobe进行尝试,它是Linux和uprobes的合成产物。利用跟踪器的优点之一是,他们不暂停目标进程,如果是GDB来做(虽然这cachetop.py例子种这并不重要)。另一个优点是,我可以很容易的追踪一些或几千个事件。

我能够在libncursesw中跟踪调用 set_curterm() ,甚至打印第一个参数:

# /apps/perf-tools/bin/uprobe 'p:/lib/x86_64-linux-gnu/libncursesw.so.5:set_curterm %di'
ERROR: missing symbol "set_curterm" in /lib/x86_64-linux-gnu/libncursesw.so.5

嗯,它不工作。set_curterm()在哪里呢?找到它有很多方法,如GDB或objdump:

(gdb) info symbol set_curterm
set_curterm in section .text of /lib/x86_64-linux-gnu/libtinfo.so.5

# objdump -tT /lib/x86_64-linux-gnu/libncursesw.so.5 | grep cur_term
0000000000000000      DO *UND*  0000000000000000  NCURSES_TINFO_5.0.19991023 cur_term
# objdump -tT /lib/x86_64-linux-gnu/libtinfo.so.5 | grep cur_term
0000000000228948 g    DO .bss   0000000000000008  NCURSES_TINFO_5.0.19991023 cur_term

GDB做的更好。如果我仔细看看源码,我会注意到它是为libtinfo而构建的。

第 21 段(可获 1.78 积分)

尝试打印 在libtinfo中打印set_curterm():

# /apps/perf-tools/bin/uprobe 'p:/lib/x86_64-linux-gnu/libtinfo.so.5:set_curterm %di'
Tracing uprobe set_curterm (p:set_curterm /lib/x86_64-linux-gnu/libtinfo.so.5:0xfa80 %di). Ctrl-C to end.
          python-31617 [007] d... 24236402.719959: set_curterm: (0x7f116fcc2a80) arg1=0x1345d70
          python-31617 [007] d... 24236402.720033: set_curterm: (0x7f116fcc2a80) arg1=0x13a22e0
          python-31617 [007] d... 24236402.723804: set_curterm: (0x7f116fcc2a80) arg1=0x14cdfa0
          python-31617 [007] d... 24236402.723838: set_curterm: (0x7f116fcc2a80) arg1=0x0
^C

执行成功。所以 set_curterm() 被执行过了,并且执行四次。最后一次是通过零,这似乎是要找的问题。

如果你想知道我是怎么知道%di 是第一个参数,然后它来自AMD64 / x86_64 ABI(并假设这个编译的库是ABI兼容)。提醒你看这里:

# man syscall
[...]
       arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
       ──────────────────────────────────────────────────────────────────
       arm/OABI      a1    a2    a3    a4    v1    v2    v3
       arm/EABI      r0    r1    r2    r3    r4    r5    r6
       arm64         x0    x1    x2    x3    x4    x5    -
       blackfin      R0    R1    R2    R3    R4    R5    -
       i386          ebx   ecx   edx   esi   edi   ebp   -
       ia64          out0  out1  out2  out3  out4  out5  -
       mips/o32      a0    a1    a2    a3    -     -     -     See below
       mips/n32,64   a0    a1    a2    a3    a4    a5    -
       parisc        r26   r25   r24   r23   r22   r21   -
       s390          r2    r3    r4    r5    r6    r7    -
       s390x         r2    r3    r4    r5    r6    r7    -
       sparc/32      o0    o1    o2    o3    o4    o5    -
       sparc/64      o0    o1    o2    o3    o4    o5    -
       x86_64        rdi   rsi   rdx   r10   r8    r9    -
[...]

我还想看到一个arg1 = 0x0的堆栈跟踪信息,但这种ftrace工具还不支持堆栈跟踪。

第 22 段(可获 1.23 积分)

16. 组件: bcc/BPF

因为我们正在调试一个BCC的工具,cachetop.py,值得注意的是,BCC的trace.py和uprobe工具有些相像的地方:

# ./trace.py 'p:tinfo:set_curterm "%d", arg1'
TIME     PID    COMM         FUNC             -
01:00:20 31698  python       set_curterm      38018416
01:00:20 31698  python       set_curterm      38396640
01:00:20 31698  python       set_curterm      39624608
01:00:20 31698  python       set_curterm      0

是的,我们 正在使用BCC调试BCC !

如果你刚入门 bcc, 得检查一下。在在Linux 4X系列中它为新的BPF 跟踪功能提供了Python和Lua接口。总之,它允许性能检查,这在以前是不可能的或复杂的多。在  Ubuntu Xenial我已经发布了指令运行。

BCC的 trace.py工具应该有一个打印用户堆栈开关,因为Linux 4.6内核已经有BPF堆栈功能,虽然到目前为止我们还没有添加这个开关。

第 23 段(可获 1.7 积分)

17. 多个断点

我真的应该用gdb在 set_curterm() 开始断点,但我希望通过ftrace 和BPF做一个有趣的迂回。

回到当前运行模式:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
[...]
(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa43150) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xab5870) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xbecb90) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {

好的,在这个断点上,我们可以看到, set_curterm() 正被一个termp = 0x0作为参数调用,感谢调试输出的信息。如果我没有调试信息,我只能在每个断点打印寄存器。

我会打印堆栈跟踪,那样我们就可以看到 谁 设置 curterm 为0。

(gdb) bt
#0  set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
#1  0x00007ffff5a44e75 in llvm::sys::Process::FileDescriptorHasColors(int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#2  0x00007ffff45cabb8 in clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, clang::driver::JobAction const&, clang::driver::InputInfo const&, llvm::SmallVector<clang::driver::InputInfo, 4u> const&, llvm::opt::ArgList const&, char const*) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#3  0x00007ffff456ffa5 in clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, char const*, bool, bool, char const*, clang::driver::InputInfo&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#4  0x00007ffff4570501 in clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#5  0x00007ffff457224a in clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#6  0x00007ffff4396cda in ebpf::ClangLoader::parse(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >*, std::unique_ptr<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> >, std::default_delete<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> > > >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#7  0x00007ffff4344314 in ebpf::BPFModule::load_cfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#8  0x00007ffff4349e5e in ebpf::BPFModule::load_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#9  0x00007ffff43430c8 in bpf_module_create_c_from_string () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#10 0x00007ffff690ae40 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#11 0x00007ffff690a8ab in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#12 0x00007ffff6b1a68c in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#13 0x00007ffff6b1ed82 in ?? () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#14 0x00000000004b1153 in PyObject_Call ()
#15 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#16 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#17 0x00000000004def08 in ?? ()
#18 0x00000000004b1153 in PyObject_Call ()
#19 0x00000000004f4c3e in ?? ()
#20 0x00000000004b1153 in PyObject_Call ()
#21 0x00000000004f49b7 in ?? ()
#22 0x00000000004b6e2c in ?? ()
#23 0x00000000004b1153 in PyObject_Call ()
#24 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#25 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#26 0x00000000004def08 in ?? ()
#27 0x00000000004b1153 in PyObject_Call ()
#28 0x00000000004c73ec in PyEval_EvalFrameEx ()
#29 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#30 0x00000000004caf42 in PyEval_EvalFrameEx ()
#31 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#32 0x00000000004c2ba9 in PyEval_EvalCode ()
#33 0x00000000004f20ef in ?? ()
#34 0x00000000004eca72 in PyRun_FileExFlags ()
#35 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#36 0x000000000049e18a in Py_Main ()
#37 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfa8) at ../csu/libc-start.c:291
#38 0x000000000049da19 in _start ()

好的,还有一些线索……我想。我们在 llvm::sys::Process::FileDescriptorHasColors()。LLVM编译器?

第 24 段(可获 1.33 积分)

18. 组件: cscope, take 2

更多的源代码浏览使用cscope,这次在LLVM。FileDescriptorHasColors 函数是这样的:

static bool terminalHasColors(int fd) {
[...]
  // Now extract the structure allocated by setupterm and free its memory
  // through a really silly dance.
  struct term *termp = set_curterm((struct term *)nullptr);
  (void)del_curterm(termp); // Drop any errors here.

下面是早期版本中的代码:

static bool terminalHasColors() {
  if (const char *term = std::getenv("TERM")) {
    // Most modern terminals support ANSI escape sequences for colors.
    // We could check terminfo, or have a list of known terms that support
    // colors, but that would be overkill.
    // The user can always ask for no colors by setting TERM to dumb, or
    // using a commandline flag.
    return strcmp(term, "dumb") != 0;
  }
  return false;
}

他被set_curterm()调用一个空指针 变成 "silly dance" 。

第 25 段(可获 0.56 积分)

19. 写入内存

作为一个例题,并探讨可能的解决方法,我将修改正在运行的进程的内存以避免set_curterm() 为零 。

我要运行GDB,在set_curterm()上设置一个断点,然后把它带到零的调用上:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1                                  
[...]
(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa43150) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xab5870) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xbecb90) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.                                                                    
                                                                               
Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      { 

 

第 26 段(可获 0.58 积分)

我要跑GDB, set_curterm()上设置一个断点,然后把它带到零的调用:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1                                  
[...]
(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa43150) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xab5870) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xbecb90) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      {
(gdb) c
Continuing.                                                                    
                                                                               
Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80      { 

在这一点上,我会用 set 命令重写内存和替换为零用以前的方法:ofset_curterm() 这个地址0xbecb90,希望它仍然有效。

警告:写内存不安全! GDB不会问“are you sure?”如果你犯了错误或者打错字,就重头来吧。最好的情况下,您的应用程序立即崩溃,那样你就意识到你的错误。最坏的情况下,你的应用程序继续默默地损坏着数据,这是多年后才发现。

当前种情况下,我在没有生产数据的实验室机器上进行实验,所以我才会继续。我要打印的%rdi的十六进制值(P / X),然后 set 它以前的地址,再次打印,然后打印所有寄存器:

第 27 段(可获 1.86 积分)
(gdb) p/x $rdi
$1 = 0x0
(gdb) set $rdi=0xbecb90
(gdb) p/x $rdi
$2 = 0xbecb90
(gdb) i r
rax            0x100    256
rbx            0x1  1
rcx            0xe71    3697
rdx            0x0  0
rsi            0x7ffff5dd45d3   140737318307283
rdi            0xbecb90 12503952
rbp            0x100    0x100
rsp            0x7fffffffa5b8   0x7fffffffa5b8
r8             0xbf0050 12517456
r9             0x1999999999999999   1844674407370955161
r10            0xbf0040 12517440
r11            0x7ffff7bb4b78   140737349634936
r12            0xbecb70 12503920
r13            0xbeaea0 12496544
r14            0x7fffffffa9a0   140737488333216
r15            0x7fffffffa8a0   140737488332960
rip            0x7ffff3c76a80   0x7ffff3c76a80 <set_curterm>
eflags         0x246    [ PF ZF IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0

(因为在这一断点上我有安装调试信息,在这种情况下不需要我指定寄存器,我可以用“termp”这个变量名来set_curterm()寄存器 ,代替 $rdi。)

%rdi现在出栈了,所以这些寄存器可以继续。

(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {

好的,我们继续执行 set_curterm()!然而,我们触发了另一个,一个与0有关的参数。再试试我们的伪装写入:

(gdb) set $rdi=0xbecb90
(gdb) c
Continuing.
warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff34ad411 in ClrBlank (win=0xaea060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
1129        if (back_color_erase)

啊哈。这就是我的写入内存。因此本实验在另一个错误段结束了。

第 28 段(可获 1.11 积分)

20. 条件断点

在上一节中,我不得不使用三次继续达到断点右侧的调用上.。如果是成百上千的调用,然后我会用条件断点。下面是一个例子。

我会像往常一样运行,在 set_curterm() 中设置中断:

# gdb `which python`
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1                                  
[...]
(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa43150) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {

现在我要把断点1做为条件断点,因此只有当% RDI记录为零时:

(gdb) cond 1 $rdi==0x0
(gdb) i b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff3c76a80 in set_curterm at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
    stop only if $rdi==0x0
    breakpoint already hit 1 time
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
(gdb)

干得漂亮! Cond 是 条件断点的简写。那么,当我第一次创建“挂起”断点为什么我没有运行它?我发现条件不在等待断点的工作,至少在这个GDB版本。(或者是我做错了。)这里我也用了 i b (信息断点)列出表信息。

第 29 段(可获 1.64 积分)

21. 修改返回口径

我要尝试另一种类似黑客的写入,但这次是改变指令路径,而不是数据。

警告:查看以前的警告, 这里也适用。

我会把我们的断点之前 set_curterm() 设置为0x0,然后发出一 RET (return的简写),这将从函数返回并不能马上执行它。我的希望是,通过不执行它,它不会设置全局的 curterm为 0x0。

第 30 段(可获 0.95 积分)
[...]
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80

(gdb) ret
Make set_curterm return now? (y or n) y
#0  0x00007ffff5a44e75 in llvm::sys::Process::FileDescriptorHasColors(int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
                                                    _nc_free_termtype (ptr=ptr@entry=0x100) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/free_ttype.c:52
52      FreeIfNeeded(ptr->str_table);

又来一个崩溃。我通过这种方式再次得到想要的结果。

浏览一点代码后,再试一次。如果父函数也参与,我想试着 RET 两次,这是一个容易的实验:

[...]
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) ret
Make set_curterm return now? (y or n) y
#0  0x00007ffff5a44e75 in llvm::sys::Process::FileDescriptorHasColors(int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
(gdb) ret
Make selected stack frame return now? (y or n) y
#0  0x00007ffff45cabb8 in clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, clang::driver::JobAction const&, clang::driver::InputInfo const&, llvm::SmallVector const&, llvm::opt::ArgList const&, char const*) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
(gdb) c

屏幕空白和暂停…然后重绘:

07:44:22 Buffers MB: 61 / Cached MB: 1246
PID      UID      CMD              HITS     MISSES   DIRTIES  READ_HIT%  WRITE_HIT%
    2742 root     systemd-logind          3       66        2       1.4%      95.7%
   15836 root     kworker/u30:1           7        0        1      85.7%       0.0%
    2736 messageb dbus-daemon             8       66        2       8.1%      89.2%
       1 root     systemd                15        0        0     100.0%       0.0%
    2812 syslog   rs:main Q:Reg          16       66        8       9.8%      80.5%
     435 root     systemd-journal        32       66        8      24.5%      67.3%
    2740 root     accounts-daemon       113       66        2      62.0%      36.9%
   15847 root     bash                  160        0        1      99.4%       0.0%
   15864 root     lesspipe              306        0        2      99.3%       0.0%
   15854 root     bash                  309        0        2      99.4%       0.0%
   15856 root     bash                  309        0        2      99.4%       0.0%
   15866 root     bash                  309        0        2      99.4%       0.0%
   15867 root     bash                  309        0        2      99.4%       0.0%
   15860 root     bash                  313        0        2      99.4%       0.0%
   15868 root     bash                  341        0        2      99.4%       0.0%
   15858 root     uname                 452        0        2      99.6%       0.0%
   15858 root     bash                  453        0        2      99.6%       0.0%
   15866 root     dircolors             464        0        2      99.6%       0.0%
   15861 root     basename              465        0        2      99.6%       0.0%
   15864 root     dirname               468        0        2      99.6%       0.0%
   15856 root     ls                    476        0        2      99.6%       0.0%
[...]

Wow! 它正常运行了!

第 31 段(可获 0.73 积分)

22. 一个更好的解决方法

我一直在调试输出 GitHub,特别是任职为BPF工程师之后,Alexei Starovoitov,在LLVM的内部也非常熟悉,其根本原因是LLVM的bug。当我把写和返回弄混的时候,他建议把LLVM的 -fno-color-diagnostics加入BCC,它正常了!作为一种解决方法它被添加到BCC。(我们应该获取LLVM的bug修正版。)

23. Python 环境

我们已经解决了这个问题,但你可能会好奇地看到堆栈跟踪被完全修正了。

添加 python-dbg:

# apt-get install -y python-dbg
Reading package lists... Done
[...]
The following additional packages will be installed:
  libpython-dbg libpython2.7-dbg python2.7-dbg
Suggested packages:
  python2.7-gdbm-dbg python2.7-tk-dbg python-gdbm-dbg python-tk-dbg
The following NEW packages will be installed:
  libpython-dbg libpython2.7-dbg python-dbg python2.7-dbg
0 upgraded, 4 newly installed, 0 to remove and 20 not upgraded.
Need to get 11.9 MB of archives.
After this operation, 36.4 MB of additional disk space will be used.
[...]

现在我要重新运行GDB并且查看堆栈跟踪:

# gdb `which python` /var/cores/core.python.30520
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
[...]
Reading symbols from /usr/bin/python...Reading symbols from /usr/lib/debug/.build-id/4e/a0539215b2a9e32602f81c90240874132c1a54.debug...done.
[...]
(gdb) bt
#0  ClrBlank (win=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
#1  ClrUpdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1147
#2  doupdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1010
#3  0x00007f0a37aa07e6 in wrefresh (win=win@entry=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_refresh.c:65
#4  0x00007f0a37a99499 in recur_wrefresh (win=win@entry=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:384
#5  0x00007f0a37a99616 in _nc_wgetch (win=win@entry=0x1993060, result=result@entry=0x7ffd33d93e24, use_meta=1)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:491
#6  0x00007f0a37a9a325 in wgetch (win=0x1993060) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:672
#7  0x00007f0a37cc6ec3 in PyCursesWindow_GetCh.lto_priv.109 (self=0x7f0a3c57b198, args=()) at /build/python2.7-HpIZBG/python2.7-2.7.11/Modules/_cursesmodule.c:853
#8  0x00000000004c4d5a in call_function (oparg=<optimized out>, pp_stack=0x7ffd33d93f30) at ../Python/ceval.c:4350
#9  PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#10 0x00000000004c2e05 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#11 0x00000000004def08 in function_call.lto_priv () at ../Objects/funcobject.c:523
#12 0x00000000004b1153 in PyObject_Call () at ../Objects/abstract.c:2546
#13 0x00000000004c73ec in ext_do_call (nk=0, na=<optimized out>, flags=<optimized out>, pp_stack=0x7ffd33d941e8, func=<function at remote 0x7f0a37edcc80>)
    at ../Python/ceval.c:4662
#14 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#15 0x00000000004c2e05 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#16 0x00000000004caf42 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffd33d943f0, func=<function at remote 0x7f0a38039140>)
    at ../Python/ceval.c:4445
#17 call_function (oparg=<optimized out>, pp_stack=0x7ffd33d943f0) at ../Python/ceval.c:4370
#18 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#19 0x00000000004c2e05 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#20 0x00000000004c2ba9 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:669
#21 0x00000000004f20ef in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#22 0x00000000004eca72 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#23 0x00000000004eb1f1 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#24 0x000000000049e18a in Py_Main () at ../Modules/main.c:640
#25 0x00007f0a3be10830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7ffd33d94838, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffd33d94828) at ../csu/libc-start.c:291
#26 0x000000000049da19 in _start ()
第 32 段(可获 1.43 积分)

再也没有“??“了,但是也没有更大的帮助。

python调试程序包添加了GDB的其他功能。现在我们可以看看Python回溯:

(gdb) py-bt
Traceback (most recent call first):
  File "./cachetop.py", line 188, in handle_loop
    s = stdscr.getch()
  File "/usr/lib/python2.7/curses/wrapper.py", line 43, in wrapper
    return func(stdscr, *args, **kwds)
  File "./cachetop.py", line 260, in 
    curses.wrapper(handle_loop, args)

... 这是python的资源列表:

(gdb) py-list
 183        b.attach_kprobe(event="mark_buffer_dirty", fn_name="do_count")
 184    
 185        exiting = 0
 186    
 187        while 1:
>188            s = stdscr.getch()
 189            if s == ord('q'):
 190                exiting = 1
 191            elif s == ord('r'):
 192                sort_reverse = not sort_reverse
 193            elif s == ord('<'):

这确定是我们的Python代码,我们执行断点到了段错误。这真的很好!

与最初的堆栈跟踪的问题是,我们看到Python内部执行的是方法,而不是方法本身。如果你调试的另一种语言,注意它是如何由编译/运行结束执行代码的。如果你正在调试一个“语言名”和“gdb”你会发现它的网页搜索有GDB调试信息扩展就像Python一样。如果没有,坏消息是你需要自己写。好消息是,这是可能的!搜索文档“为Python添加新的GDB命令”,因为他们是用Python写的。

第 33 段(可获 1.95 积分)

24. 其他

虽然它可能看起来像为GDB写的综合功能,我真的没有写全,还有很多GDB命令。该help 命令将列出的主要部分:

(gdb) help
List of classes of commands:

aliases -- Aliases of other commands
breakpoints -- Making program stop at certain points
data -- Examining data
files -- Specifying and examining files
internals -- Maintenance commands
obscure -- Obscure features
running -- Running the program
stack -- Examining the stack
status -- Status inquiries
support -- Support facilities
tracepoints -- Tracing of program execution without stopping the program
user-defined -- User-defined commands

Type "help" followed by a class name for a list of commands in that class.
Type "help all" for the list of all commands.
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.

然后你可以为每个命令类运行 help。例如,这里是断点的完整列表.:

(gdb) help breakpoints
Making program stop at certain points.

List of commands:

awatch -- Set a watchpoint for an expression
break -- Set breakpoint at specified location
break-range -- Set a breakpoint for an address range
catch -- Set catchpoints to catch events
catch assert -- Catch failed Ada assertions
catch catch -- Catch an exception
catch exception -- Catch Ada exceptions
catch exec -- Catch calls to exec
catch fork -- Catch calls to fork
catch load -- Catch loads of shared libraries
catch rethrow -- Catch an exception
catch signal -- Catch signals by their names and/or numbers
catch syscall -- Catch system calls by their names and/or numbers
catch throw -- Catch an exception
catch unload -- Catch unloads of shared libraries
catch vfork -- Catch calls to vfork
clear -- Clear breakpoint at specified location
commands -- Set commands to be executed when a breakpoint is hit
condition -- Specify breakpoint number N to break only if COND is true
delete -- Delete some breakpoints or auto-display expressions
delete bookmark -- Delete a bookmark from the bookmark list
delete breakpoints -- Delete some breakpoints or auto-display expressions
delete checkpoint -- Delete a checkpoint (experimental)
delete display -- Cancel some expressions to be displayed when program stops
delete mem -- Delete memory region
delete tracepoints -- Delete specified tracepoints
delete tvariable -- Delete one or more trace state variables
disable -- Disable some breakpoints
disable breakpoints -- Disable some breakpoints
disable display -- Disable some expressions to be displayed when program stops
disable frame-filter -- GDB command to disable the specified frame-filter
disable mem -- Disable memory region
disable pretty-printer -- GDB command to disable the specified pretty-printer
disable probes -- Disable probes
disable tracepoints -- Disable specified tracepoints
disable type-printer -- GDB command to disable the specified type-printer
disable unwinder -- GDB command to disable the specified unwinder
disable xmethod -- GDB command to disable a specified (group of) xmethod(s)
dprintf -- Set a dynamic printf at specified location
enable -- Enable some breakpoints
enable breakpoints -- Enable some breakpoints
enable breakpoints count -- Enable breakpoints for COUNT hits
enable breakpoints delete -- Enable breakpoints and delete when hit
enable breakpoints once -- Enable breakpoints for one hit
enable count -- Enable breakpoints for COUNT hits
enable delete -- Enable breakpoints and delete when hit
enable display -- Enable some expressions to be displayed when program stops
enable frame-filter -- GDB command to disable the specified frame-filter
enable mem -- Enable memory region
enable once -- Enable breakpoints for one hit
enable pretty-printer -- GDB command to enable the specified pretty-printer
enable probes -- Enable probes
enable tracepoints -- Enable specified tracepoints
enable type-printer -- GDB command to enable the specified type printer
enable unwinder -- GDB command to enable unwinders
enable xmethod -- GDB command to enable a specified (group of) xmethod(s)
ftrace -- Set a fast tracepoint at specified location
hbreak -- Set a hardware assisted breakpoint
ignore -- Set ignore-count of breakpoint number N to COUNT
rbreak -- Set a breakpoint for all functions matching REGEXP
rwatch -- Set a read watchpoint for an expression
save -- Save breakpoint definitions as a script
save breakpoints -- Save current breakpoint definitions as a script
save gdb-index -- Save a gdb-index file
save tracepoints -- Save current tracepoint definitions as a script
skip -- Ignore a function while stepping
skip delete -- Delete skip entries
skip disable -- Disable skip entries
skip enable -- Enable skip entries
skip file -- Ignore a file while stepping
skip function -- Ignore a function while stepping
strace -- Set a static tracepoint at location or marker
tbreak -- Set a temporary breakpoint
tcatch -- Set temporary catchpoints to catch events
tcatch assert -- Catch failed Ada assertions
tcatch catch -- Catch an exception
tcatch exception -- Catch Ada exceptions
tcatch exec -- Catch calls to exec
tcatch fork -- Catch calls to fork
tcatch load -- Catch loads of shared libraries
tcatch rethrow -- Catch an exception
tcatch signal -- Catch signals by their names and/or numbers
tcatch syscall -- Catch system calls by their names and/or numbers
tcatch throw -- Catch an exception
tcatch unload -- Catch unloads of shared libraries
tcatch vfork -- Catch calls to vfork
thbreak -- Set a temporary hardware assisted breakpoint
trace -- Set a tracepoint at specified location
watch -- Set a watchpoint for an expression

Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.

这有助于说明GDB有多少命令,这让我觉得这里例子中我运用了这么少。

第 34 段(可获 0.88 积分)

25. 最后的话

嗯,这是一个令人讨厌的问题:一个LLVM bug断点破坏了ncurses导致Python程序段错误。但我用来调试它的命令和程序大多是例行的:查看堆栈跟踪,检查寄存器,设置断点,步进和浏览源代码.。

当我第一次使用GDB(年前),我真的不喜欢它。感到它的笨拙和有限。因为我的GDB经验,GDB已经改善了很多,现在我看到它作为一个强大的现代调试器。调试器之间的sets变化,但是GDB可能是基于文本的最强大的调试器,与LLDB相当。

我希望任何人寻找GDB的例子时发现我分享的输出是有用的,以及我讨论的沿途各种注意事项。偶然的话也许我会贴一些GDB会话,特别是对于其他运行库,比如java。

按 q 退出GDB。

第 35 段(可获 1.99 积分)

文章评论