- This article was originally written from a 32-bit PowerPC architecture perspective and register information will vary across architectures. 64-bit Power offset information will be added at a later date.
- 1 Compile Your Binary with Debugging Symbols
- 2 Attaching GDB to a running process
- 3 Manual re-construction of a backtrace
- 4 Credits
Compile Your Binary with Debugging Symbols
Runtime debugging with GDB is difficult if you don't have the debugging symbols embedded into your program, though not impossible (there are times where adding -g will actually magically make the program work where it was failing before.)
Compile with the -g flag to get debugging symbols embedded in the application binary:
«user@host»:~/dir§ gcc -g test.c
Now GDB, objdump, nm, and all of the other binary investigation tools can gather extended (readable) symbol information from the binary.
A tutorial on how to locate problem points when you can't use the debugging symbols will be covered later (basically compile a version of the library with symbols and note the offsets and compare the offsets in the debug version with the non-symbol version). You have to have a spot-on copy of the source and compile with the same compiler and the same options.
Attaching GDB to a running process
Often when attempting to debug a threading problem you'll get a case where gdb won't catch a hang if you attempt to invoke the program from within GDB. In such a case you'll have to attach gdb to a running program that has hung. To do so find the applications pid using ps -afx then use the following gdb invocation:
«user@host»:~/dir§ gdb GNU gdb Red Hat Linux (188.8.131.52-0.31rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu". (gdb) attach 25912
Manual re-construction of a backtrace
As a series of cascading function calls gets progressively deeper the stack increases in size. Before branching to another function the currently running function saves a return address into the "link register", lr. When the function is called it gets its own stack space and it stores the "link register" into its stack-frame in a place called the "LR Save word". This is the address to which it is supposed to branch when it is ready to return control to its calling function. Each function call does this and this builds a call-stack as each progressivly deeper function is called. A backtrace is the series of return addresses for a stack of function calls.
Sometimes a backtrace gets corrupted in GDB because GDB can't figure out how to construct it properly. This will require a manual backtrace reconstruction. We do this by manually rebuilding the stack, frame by frame. Don't worry, it isn't too bad, just time consuming.
You have to have the ABI handy for your particular architecture to know how the stack-frame is constructed.
- The x86ManualBacktrace tutorial shows how to manually construct a backtrace on x86 based upon the i386 32-bit ABI
- This tutorial is based upon the ppc 32-bit ELF ABI.
Corrupted gdb Backtrace
Take the following corrupted backtrace as an example:
(gdb) bt #0 0x10002b2c in __pthread_sigsuspend (set=0x4) at pt-sigsuspend.c:54 #1 0x10001c2c in __pthread_wait_for_restart_signal (self=0x4) at pthread.c:1216 #2 0x100008c8 in pthread_join (thread_id=16386, thread_return=0xffffe070) at restart.h:34 #3 0x10000428 in finish () at test.c:35 #4 0x100001ec in __do_global_dtors_aux () #5 0x10056c78 in _fini () #6 0x10007238 in __libc_csu_fini () at elf-init.c:81 #7 0x10007238 in __libc_csu_fini () at elf-init.c:81 #8 0x10007238 in __libc_csu_fini () at elf-init.c:81 #9 0x10007238 in __libc_csu_fini () at elf-init.c:81 Previous frame inner to this frame (corrupt stack?)
Notice how frames #6,#7,#8 and #9 have the same Program Counter (0x10007238)? This indicates that got confused at that point. This may or may not indicate real stack corruption. For this we'll have to investigate the stack. Sometimes the corruption can be quite extensive.
- The following how-to will show how to manually reconstruct the back-trace.
Register relevance to backtraces
A good place to start investigation is with the current state of the registers:
(gdb) info reg r0 0xb2 178 r1 0xffffdeb0 4294958768 r2 0x100c5400 269243392 r3 0x4 4 r4 0x8 8 r5 0xffffdef0 4294958832 r6 0x8 8 r7 0x10080000 268959744 r8 0xffffffc0 4294967232 r9 0x0 0 r10 0x0 0 r11 0x1f 31 r12 0x44000428 1140851752 r13 0x10082814 268970004 r14 0x0 0 r15 0x0 0 r16 0x0 0 r17 0x0 0 r18 0x0 0 r19 0x0 0 r20 0x10080000 268959744 r21 0x1000032c 268436268 r22 0x10007254 268464724 r23 0x100071d4 268464596 r24 0x0 0 r25 0x1 1 r26 0xffffe334 4294959924 r27 0xffffe070 4294959216 r28 0x0 0 r29 0x4002 16386 r30 0x10080000 268959744 r31 0xffffdef0 4294958832 pc 0x10002b2c 268446508 cr 0x34000428 872416296 lr 0x10001c2c 268442668 ctr 0x100071d4 268464596 xer 0x20000000 536870912
On Power hardware there are three really important registers that'll help us rebuild the stack and capture our back-trace. These are "general purpose register 1", i.e. r1, the "program counter register", i.e. pc and the "link register", i.e lr. The "count register", i.e. ctr can be useful as well, due to the fact that it is often used for branching to function pointers.
- The "program counter" register pc is denoted by the ppc32 elf abi as the register that holds the pointer to the current instruction (or instruction that a hang or crash waits on).
- The ppc32 elf abi denotes the "Link Register, lr as a register that is volatile across each function call. A function will update the lr with the address that a yet to-be-called function should return to when it is done. There is no stricture on when a function, preparing to branch, should fill in the lr. The lr isn't the most reliable indicator because the program could crash after it set the lr but before it branched to the next function or it could have crashed before it set the lr, prior to a branch. The abi indicates that a called function must save the lr into the stack frame's LR save location immediately after it is invoked so that it knows which function it is supposed to branch back to when it returns.
- Per the ppc32 elf abi, "general purpose register 1", 'r1', always holds the current stack-frame pointer and it is always valid. The contents of the stack-frame pointer (first word) is always the BC (Back Chain) pointer to the previously allocated Stack-Frame. In ppc32 the second word at the stack-frame pointer address (stack-frame pointer + 0x4) is always the LR Save area. It is where the address found in the Link Register when the function is entered is required to be stored.
We know that the last instruction is stored in the pc, 0x10002b2c, so we can keep that in mind when we rebuild our backtrace.
Examining the Stack-Frame in memory
Next, we'll begin to reconstruct the backtrace by locating all of the stack-frame pointers. Lets take a look at the memory comprising the stack by investigating the first stack-frame pointer, pointed to by r1. I cheat and exclude parts of the stack that are irrelevant.
(gdb) x /200w 0xffffdeb0 0xffffdeb0: 0xffffdf50 0x10001c1c 0x00000000 0x00000000 0xffffdec0: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffded0: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdee0: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdef0: 0xffffdf00 0x00000000 0x100e0000 0x00000000 0xffffdf00: 0xffffdf20 0x00000000 0x00000000 0x00000000 0xffffdf10: 0x00000000 0x00000000 0x000003e0 0x1007d29c 0xffffdf20: 0xffffdf40 0x00000000 0x00000000 0x00000000 0xffffdf30: 0xffffdf50 0x00000000 0x00000000 0xffffdf50 0xffffdf40: 0xffffdf50 0x00004002 0x100c1ba0 0x1007dc64 0xffffdf50: 0xffffe030 0x100008c8 0x100be000 0xffffdf70 0xffffdf60: 0xffffe000 0x10001c1c 0x00000000 0x00000000 0xffffdf70: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdf80: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdf90: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdfa0: 0x00000000 0x00000000 0x00000000 0x00000000 0xffffdfb0: 0x00000003 0x00000004 0x100bec40 0x100bec28 0xffffdfc0: 0xffffdfd0 0x00000001 0x1007db44 0x100be000 0xffffdfd0: 0xffffe000 0x10004dd0 0x00003362 0x44000422 0xffffdfe0: 0x00000000 0x00003362 0x00000000 0x80000000 0xffffdff0: 0x100be000 0x00000000 0x100be000 0x00000094 0xffffe000: 0x1007dc64 0x1000045c 0x00000000 0x100071d4 0xffffe010: 0x100be000 0x00000002 0x00000000 0x00000000 0xffffe020: 0x00000000 0x00000000 0x10080000 0xffffe030 0xffffe030: 0xffffe060 0x10000428 0x00000000 0x1007d29c 0xffffe040: 0xffffe070 0x10012e14 0x00000000 0x00000000 0xffffe050: 0x00000023 0x00000000 0x00000000 0x10080000 0xffffe060: 0xffffe080 0x100001ec 0xffffe304 0x00000000 0xffffe070: 0x00000000 0xffffe418 0x00000000 0xffffffff 0xffffe080: 0xffffe0a0 0x10056c78 0x00000000 0x1007d094 0xffffe090: 0xffffe0a0 0x100066d8 0xffffe304 0x00000000 0xffffe0a0: 0xffffe0c0 0x10007238 0x00000000 0xffffe0b0 0xffffe0b0: 0x00000000 0x00000000 0x00000000 0x10080000 0xffffe0c0: 0xffffe0e0 0x100080c8 0x00000000 0xffffe1b2 0xffffe0d0: 0xffffe0e0 0xffffe418 0x00000000 0x10080000 0xffffe0e0: 0xffffe2f0 0x10006d80 0x00000000 0x00000000 0xffffe0f0: 0x00000000 0x00000000 0x00000000 0x00000000 ... ... 0xffffe250: 0x00000000 0x00000000 0x00000000 0x00000000
- In the output above 0xffffdeb0, taken from r1, is the current (as of the hang or crash) stack-frame pointer, which coincides with the instruction in the pc register. The value at the stack-frame pointer is the address of the back-chain pointer to the previous stack frame. So address 0xffffdf50 is the address of the previous stack frame.
- As mentionted earlier, the second word of a stack frame is the LR Save Word. So address 0xffffdeb0 is the LR Save Word for the current function, which is the instruction address to which we blr (branch to link register) when this function returns to its calling function. We use this address to determine which function the current function was called from.
Gathering the saved Program Counters
We can compose the following stack-frame table by following the backchain pointer and recording the LR Save Word for each stack-frame, e.g.
- NOTE: When the backchain pointer for a stack-frame is 0x00000000 we know we've reached the start of the program.
stack frame ptr backchain ptr LR save word 0xffffdeb0: 0xffffdf50 0x10001c1c 0xffffdf50: 0xffffe030 0x100008c8 0xffffe030: 0xffffe060 0x10000428 0xffffe060: 0xffffe080 0x100001ec 0xffffe080: 0xffffe0a0 0x10056c78 0xffffe0a0: 0xffffe0c0 0x10007238 0xffffe0c0: 0xffffe0e0 0x100080c8 0xffffe0e0: 0xffffe2f0 0x10006d80 0xffffe250: 0x00000000 0x00000000
Use objdump to attach symbols to addresses
The next thing to do is to use another terminal to objdump the disassembly and symbol information from the binary so that we can see which functions the instruction pointers stored in the LR Save Words reside in.
«user@host»:~/dir§ objdump -tD a.out > a.dis
For the backtrace we're really only interested in the program counters (the values in LR save word) so we construct a backtrace table using the value in the pc as the first address:
#0 0x10002b2c #1 0x10001c1c #2 0x100008c8 #3 0x10000428 #4 0x100001ec #5 0x10056c78 #6 0x10007238 #7 0x100080c8 #8 0x10006d80
Investigate objdump disassembly for program counters
Now, start looking up the addresses in the disassembly file. Remember, if you don't see symbol names you either didn't build with the -g option or you didn't ask objdump for the symbol information.
So looking for address 0x10002b2c gives the following, such that we know that 0x10002b2c resides in function __pthread_sigsuspend:
10002b20 <__pthread_sigsuspend>: 10002b20: 38 00 00 b2 li r0,178 10002b24: 38 80 00 08 li r4,8 10002b28: 44 00 00 02 sc 10002b2c: 7c 00 00 26 mfcr r0 10002b30: 4e 80 00 20 blr
We'll do one more example since the first instruction address is usually a bit different than the rest because it is usually the instruction that caused the crash, and not a function return address like the remainder of the address pointers will be. Look at the next instruction in the list, 0x10001c1c.
10001be4 <__pthread_wait_for_restart_signal>: 10001be4: 94 21 ff 60 stwu r1,-160(r1) 10001be8: 7c 08 02 a6 mflr r0 10001bec: 93 e1 00 9c stw r31,156(r1) 10001bf0: 3b e1 00 10 addi r31,r1,16 10001bf4: 93 c1 00 98 stw r30,152(r1) 10001bf8: 38 80 00 00 li r4,0 10001bfc: 7f e5 fb 78 mr r5,r31 10001c00: 38 60 00 02 li r3,2 10001c04: 3f c0 10 08 lis r30,4104 10001c08: 90 01 00 a4 stw r0,164(r1) 10001c0c: 48 00 60 e5 bl 10007cf0 <__sigprocmask> 10001c10: 80 9e a8 24 lwz r4,-22492(r30) 10001c14: 7f e3 fb 78 mr r3,r31 10001c18: 48 00 63 0d bl 10007f24 <sigdelset> 10001c1c: 38 00 00 00 li r0,0 10001c20: 90 02 8c 24 stw r0,-29660(r2) 10001c24: 7f e3 fb 78 mr r3,r31 10001c28: 48 00 0e f9 bl 10002b20 <__pthread_sigsuspend> 10001c2c: 81 3e a8 24 lwz r9,-22492(r30) 10001c30: 80 02 8c 24 lwz r0,-29660(r2) 10001c34: 7f 80 48 00 cmpw cr7,r0,r9 10001c38: 40 9e ff ec bne+ cr7,10001c24 <__pthread_wait_for_restart_signal+0x40> 10001c3c: 7c 00 04 ac sync 10001c40: 80 01 00 a4 lwz r0,164(r1) 10001c44: 83 c1 00 98 lwz r30,152(r1) 10001c48: 83 e1 00 9c lwz r31,156(r1) 10001c4c: 7c 08 03 a6 mtlr r0 10001c50: 38 21 00 a0 addi r1,r1,160 10001c54: 4e 80 00 20 blr
This one is interesting because we know we are in __pthread_wait_for_restart_signal but the LR is before the __pthread_sigsuspend function call which is the call we just made. This is because __pthread_wait_for_restart_signal probably includes a loop in the code and the compiler decided to have the called function immediately execute again. Continue to trace each instruction address in our backtrace and rebuild it until you get the following:
Apply symbols to rebuilt backtrace
#0 0x10002b2c in __pthread_sigsuspend #1 0x10001c1c in __pthread_wait_for_restart_signal #2 0x100008c8 in pthread_join #3 0x10000428 in finish #4 0x100001ec in __do_global_dtors_aux #5 0x10056c78 in _fini #6 0x10007238 in __libc_csu_fini #7 0x100080c8 in exit #8 0x10006d80 in __libc_start_main
Congratulations, you've successfully reconstructed a backtrace.
- The original content of this tutorial was provided by Ryan S. Arnold, aka RandomTask, from his engineering journal.