Debugging PowerPC ELF Binaries

From Devpit
Jump to: navigation, search

Authors

This page was originally created by Ryan S. Arnold aka RandomTask.

ELF Sections

The .text

The .text section on powerpc32

The .glink section on powerpc32

The .glink is an implementation detail of the secure-PLT (Procedure Linkage Table) ABI.

  • The .glink section is the executable part of the PLT and must reside in the code segment (LOAD ... r-x).
  • The .plt section is the companion non-executable part of the PLT and must reside in the data segment (LOAD ... rw-).

The .glink section has two purposes:

  1. It serves as an executable trampoline for branching to a symbol whose absolute address is dynamically resolved and stored into a non-executable .plt entry.
  2. It detects whether a dynamically resolved symbol has yet to be resolved and invokes the dl_runtime_resolver if necessary to do so.

Therefore the .glink holds two kinds of code:

  1. PLT call code stubs: dynamically resolved symbol requests for functions in shared objects will branch to these call stubs. These stubs are generated by the linker who knows the offset (from the .got) to the .plt entry which will eventually hold the absolute address of the dynamically resolved symbol. The call code stubs will branch to the address held in the .plt entry for the associated function.
  2. The PLT symbol resolver stub: .plt entries for functions which have not been dynamically resolved by the loader are set up by default to fall into this stub (via a series of nop instructions). This stub then calls the loader's dl_runtime_resolve function which will populated the unresolved .plt entry with the absolute address of the dynamically resolved symbol. Future calls to the PLT call code stubs will now branch to the resolved absolute symbol address held by the associated .plt entry for the function.

Description of process given an application that references a dynamically resolved external function, namely function2:

  • Prior to executable generation via linking, function branches in a .S (GNU Assembler) file will look like the following:
bl function2@plt
  • The linker knows when it generates the executable that function2 is undefined and that this should actually be a branch to the companion .glink stub for function2, not a direct branch to the .plt entry for that function.
  • The linker generates the PLT call codes stub and places it on the tail end of the .text section. Since the .glink section is unmarked references to function2 will appear in objdump output as branches to a .glink stub labeled call___do_global_ctors_aux+offset. The linker has picked the closest previous symbol as a label.
10001960 <call___do_global_ctors_aux+0x20>
  • The PLT call code stubs for function2 at address 0x1001960 looks like the following:

10001960:       3d 60 10 01     lis     r11,4097
10001964:       81 6b 1b 78     lwz     r11,7032(r11)
10001968:       7d 69 03 a6     mtctr   r11
1000196c:       4e 80 04 20     bctr
  • The associated .plt entry at 0x10011b78 has been initialized by the linker as the following:

10011b78 <function2@plt>:
10011b78:       10 00 19 84

The PLT call code stub for function2 loads the contents of memory pointed to by 0x10011b78 (the .plt entry for function2) into gpr11, effectively 0x10001984. The referenced .plt entry for function2 will ultimately hold the absolute address of function2 after the dl_runtime_resolver has loaded the necessary library.

By default, the linker sets the address in the .plt entry for function2 as the prologue address for the PLT runtime resolver. This address 0x10001984 is where the PLT call code stub for function2 branches to, e.g.

10001980:       60 00 00 00     nop
10001984:       60 00 00 00     nop
10001988:       60 00 00 00     nop
1000198c:       60 00 00 00     nop
10001990:       3d 80 10 01     lis     r12,4097
10001994:       3d 6b f0 00     addis   r11,r11,-4096
10001998:       80 0c 1b 6c     lwz     r0,7020(r12)
1000199c:       39 6b e6 80     addi    r11,r11,-6528
100019a0:       7c 09 03 a6     mtctr   r0
100019a4:       7c 0b 5a 14     add     r0,r11,r11
100019a8:       81 8c 1b 70     lwz     r12,7024(r12)
100019ac:       7d 60 5a 14     add     r11,r0,r11
100019b0:       4e 80 04 20     bctr

Here's an example of the an entire .glink section made up of PLT call code stubs and the PLT symbol resolver.

  • The blue text highlights where the .glink section starts.
  • The teal text highlights PLT call code stubs.
  • The orange text highlights the PLT symbol resolver stub, including the preceding nop fall-through code.

10001930 <call___do_global_ctors_aux>:
10001930:       94 21 ff f0     stwu    r1,-16(r1)
10001934:       7c 08 02 a6     mflr    r0
10001938:       90 01 00 14     stw     r0,20(r1)
1000193c:       80 01 00 14     lwz     r0,20(r1)
10001940:       38 21 00 10     addi    r1,r1,16
10001944:       7c 08 03 a6     mtlr    r0
10001948:       4e 80 00 20     blr
1000194c:       00 00 00 00     .long 0x0
10001950:       3d 60 10 01     lis     r11,4097
10001954:       81 6b 1b 74     lwz     r11,7028(r11)
10001958:       7d 69 03 a6     mtctr   r11
1000195c:       4e 80 04 20     bctr
10001960:       3d 60 10 01     lis     r11,4097
10001964:       81 6b 1b 78     lwz     r11,7032(r11)
10001968:       7d 69 03 a6     mtctr   r11
1000196c:       4e 80 04 20     bctr
10001970:       3d 60 10 01     lis     r11,4097
10001974:       81 6b 1b 7c     lwz     r11,7036(r11)
10001978:       7d 69 03 a6     mtctr   r11
1000197c:       4e 80 04 20     bctr
10001980:       60 00 00 00     nop
10001984:       60 00 00 00     nop
10001988:       60 00 00 00     nop
1000198c:       60 00 00 00     nop
10001990:       3d 80 10 01     lis     r12,4097
10001994:       3d 6b f0 00     addis   r11,r11,-4096
10001998:       80 0c 1b 6c     lwz     r0,7020(r12)
1000199c:       39 6b e6 80     addi    r11,r11,-6528
100019a0:       7c 09 03 a6     mtctr   r0
100019a4:       7c 0b 5a 14     add     r0,r11,r11
100019a8:       81 8c 1b 70     lwz     r12,7024(r12)
100019ac:       7d 60 5a 14     add     r11,r0,r11
100019b0:       4e 80 04 20     bctr
100019b4:       60 00 00 00     nop
100019b8:       60 00 00 00     nop
100019bc:       60 00 00 00     nop
100019c0:       60 00 00 00     nop
100019c4:       60 00 00 00     nop
100019c8:       60 00 00 00     nop
100019cc:       60 00 00 00     nop

The .rodata

The .rodata section on powerpc32

.section          .rodata.cst16,"aM",@progbits,32
.LC1:   /* 9223372036854775808.0DL */
        .long   0x2207c000
        .long   0x00000003
        .long   0xa4cfa07a
        .long   0x2c7f600a
.LC2:   /* 18446744073709551616.0DL */
        .long   0x2207c000
        .long   0x0000000c
        .long   0xa99e40ed
        .long   0xc5ba58e0
The section flags and entsize ("aM" and @progbits,32 respectively) identify this section as allocatable (a), read-only (no w), non-string (no S), and mergeable (M). The element size is 32 (because there are two 16 byte constants). The section name (.rodata.suffix) can be anything and it is used by the linker (ld -r) to merge like named sections. Different flags/entsize sections should have different section names.

The .rodata section on powerpc64

The .rodata section exists on powerpc64 but a performance boost can be attained by storing static constant data in the .toc section instead.
Space in the .toc section is limited so be discriminatory about the data being placed there. Something like an array of constant data should be held in the .rodata section instead.
.section        ".toc","aw"
.LC1:   /* 9223372036854775808.0DD */
        .tc  FT_2207c000_3_a4cfa07a_2c7f600a[TC],0x2207c00000000003,0xa4cfa07a2c7f600a
.LC2:   /* 18446744073709551616.0DD */
        .tc FT_2207c000_c_a99e40ed_c5ba58e0[TC],0x2207c0000000000c,0xa99e40edc5ba58e0

The .got, the .toc, and the .plt

disclaimer: Reference to the .got is powerpc32 centric. On powerpc64 the symbol would be referenced directly from the .toc. Position Independent Code (PIC) and secure-plt usage are assumed.

The .got section on powerpc32

The powerpc32 .got is a Global Offset Table of absolute addresses to symbols. This offset table is required because the PIC (position independent code) standard says that code cannot contain absolute addresses. The .got is used for two things, it holds the absolute address to static or global variables and it holds the absolute address to functions accessed via function pointers as in the following example:

int (*ptrfunc)(int) = 0;
ptrfunc = &function1;
ret = ptrfunc(10);

This means that the absolute address of function1 is resolved by the dynamic linker-loader ld.so at program load-time. You pay the up-front cost of the dynamic symbol resolution at program load-time.

Getting the absolute address of the _GLOBAL_OFFSET_TABLE_ symbol on powerpc32

Let's examine the Gnu Assembler sequence that you'll see in every 32-bit PowerPC function that branches to a dynamically resolved symbol. A dynamically resolved symbol's address is accessed via an offset into the .plt from the _GLOBAL_OFFSET_TABLE_symbol address. In order to access the PLT or and GOT referenced data, the address of the _GLOBAL_OFFSET_TABLE_ symbol needs to be computed at least once since (per the ABI) it is not stored in a gpr across function calls.

Note:
  • The program code is in the .text section which is in the read-execute code segment.
  • The .got is in the read-write data segment.
  • The _GLOBAL_OFFSET_TABLE_ is defined by the ABI as being offset 0x8000 from the start of the.got section such that positive and negative addressing off of the _GLOBAL_OFFSET_TABLE_ symbol can cover the entire 64K address space in which the GOT can reside.

Example code before static-linking resolves relocations:

       bcl 20,31,.LCF1
.LCF1:
       mflr 30
       addis 30,30,_GLOBAL_OFFSET_TABLE_-.LCF1@ha
       addi 30,30,_GLOBAL_OFFSET_TABLE_-.LCF1@l
  • Due to the position-independent code addressing model, the static-linker can not assign an absolute address to the .LCF1 symbol at link time so it must be calculated at runtime using the bcl and mflr instructions (as described below).
  • At link time the static-linker does know the the fixed distance between the .LCF1 symbol and the _GLOBAL_OFFSET_TABLE and will subtract the symbol address of .LCF1 from the address of the _GLOBAL_OFFSET_TABLE_ address, i.e. _GLOBAL_OFFSET_TABLE_-.LCF1.
  • This fixed offset will be used at runtime, in combination with the runtime resolved address of the .LCF1 symbol to attain the absolute address of the _GLOBAL_OFFSET_TABLE_ symbol.

Here's the generated object-code after the static-linker has run:

1000159c:       42 9f 00 05     bcl-    20,4*cr7+so,100015a0 <main+0x1c>
100015a0:       7f c8 02 a6     mflr    r30
100015a4:       3f de 00 01     addis   r30,r30,1
100015a8:       3b de 05 c8     addi    r30,r30,1480
  • bcl is used to obtain the address of the next instruction. This is stored into the link register per the 'l' on the 'bc' mnemonic.
  • mflr is used to load the value in the link register (The address 0x100015a0) into gpr30.
  • The fixed offset between .LCF1 and _GLOBAL_OFFSET_TABLE_ was calculated by the linker to be 0x00011480.
  • The addis and addi combo will effectively add the computed offset 0x00011480 to the .LCF1 symbol address 0x100015a0.
  • addis is used to add the high half-word of the computed offset to the value in gpr30.
  • addi is used to add the low half-word of the computed offset to the address in gpr30.
  • The absolute address of the _GLOBAL_OFFSET_TABLE_ in gpr30 is 0x10011B68.

The .toc section on powerpc64

The .toc section only exists on powerpc64. It stands for Table of Contents. It holds both data and addresses. The powerpc64 ELF ABI defines general purpose register 2 to always hold a pointer to the .toc section.

The .plt section

On both powepc32 and powerpc64 the .plt section of an executable is the Procedure Linkage Table. It is used to store the absolute address of late-bound functions invoked by symbol name. For instance the following invocation of function1() would require late binding and would be invoked through a symbol offset in the .plt section:
ret = function1(10);

Addressibility

Addressibility on PowerPC32

The PowerPC 32-bit architecture cannot load an entire 32 bit address in one instruction. As a result you'll generally see two methods for getting an address into a register, lis/lwz or addis/addi. Both of these methods will load the high 16 bits of the address first and then the low 16 bits.

The lis/lwz instruction pair loads the contents pointed to by the address 0x10011b44 into gpr11.

 10001920:       3d 60 10 01     lis     r11,4097
 10001924:       81 6b 1b 44     lwz     r11,6980(r11)
  • The lis instruction stands for load immediate shifted and it stores 0x10010000 in gpr11, i.e. 0x1001 shifted to the high bits of grp11.
  • The lwz instruction stands for load word and zero and it says take the contents at the address in gpr11 (0x10010000) and add to it the offset 0x1b44. Then load the contents at the resultant address (0x10011b44) into gpr11.

The addis/addi instruction pair adds 0x000105c8 to the address already in gpr30 using the following method:

 100015a4:       3f de 00 01     addis   r30,r30,1
 100015a8:       3b de 05 c8     addi    r30,r30,1480
  • The addis instruction stands for add immediate shift and it adds 0x0001 to the high-order 16-bits of the address already held in gpr30.
  • The addi instruction stands for add immediate and it adds 0x05c8 to the low-order 16-bits of the address already held in gpr30.

Branching

  • On PowerPC, unconditional branches are done in one of the following four ways:
  1. A direct branch to an address, e.g. b 10011b44 <symbol> (used for gotos)
  2. A branch to an address, setting up the link register, e.g. bl 10011b44 <symbol>.
  3. A branch to the address in the link register, e.g. blr.
  4. A branch to an address held in the count register, e.g. bctr (used for indirection or loops)
  • You will see the compiler make use of all of the enumerated unconditional branches for its own internal use. Additionally each of these branches can be generated as the result of a particular symbol invocation by a user level program:
  1. A direct branch results from a simple goto, e.g. goto mylabel;.
  2. A branch to an address is a result of invoking a statically or dynamically resolved function, e.g. function2();.
  3. A branch to the address in the link register is the result of a function return, e.g. return somevariable;.
  4. A branch to an address in the count register is generally the result of invoking a function via a function pointer. The loader resolves these symbols at load time so you pay the resolution price up-front (i.e. _dl_runtime_resolve() is not invoked).

Branching on PowerPC32

On PowerPC32 the effect of calling a function via a function pointer is that the symbol address is resolved and loaded into the .got at application load time. It does not have a .plt reference unless it is invoked directly e.g. function2() and dynamically resolved.

User Code Branching Example

Given the following example files we'll demonstrate three different function invocation methods and the resultant bindings:

  1. Dynamically resolved, load-time bound shared-object function pointer invocation.
  2. Dynamically resolved, late bound (_dl_runtime_resolve) shared-object function invocation.
  3. Statically resolved, link-time bound local function invocation.
  • func.h:
extern int function1(int);
extern int function2(int);
  • func.c:
#include "func.h"

int function1(int val) {
        return ++val;
}

int function2(int val) {
        return --val;
}
  • test.c
#include "func.h"
int function3(int val) {
        return ++val;
}

int main()
{
        int (*ptrfunc)(int) = 0;
        int ptrret;
        int ret;  

        ptrfunc = &function1;

        ptrret = ptrfunc(10); /* Function pointer invocation of function1().  */
        ret = function2(ptrret); /* Dynamic symbol resolution.  */
        return function3(ret); /* Static symbol resolution local to test.o.  */
}

Powerpc32 example of functions invoked through the .got and .plt sections

Create the .o file which holds function1().

/opt/biarch/20060123/bin/gcc -g -m32 -msecure-plt -fpic -c func.c -o func.o

Create the shared object and symlinks.

/opt/biarch/20060123/bin/gcc -shared -Wl,-export-dynamic,-soname,libfunc.so.1 -o libfunc.so.1.0.1 func.o
ln -s libfunc.so.1.0.1 libfunc.so.1
ln -s libfunc.so.1.0.1 libfunc.so

Intermediary powerpc32 assembler code

We can ask GCC to create an intermediary assembler file for investigation which will reveal the pre-linkage assembler for our test application.

/opt/biarch/20060123/bin/gcc -g -m32 -msecure-plt -fpic -L. -lfunc test.c -S

Investigation of the '.S' file at this stage will reveal the three function call methods highlighted earlier as well as some auxiliary information:

  • The address of the .got is loaded into gpr30 in the necessary round-about manner(note orange highlighted text).
  • The blr at the end of the main routine returns the calling function to the address in the link-register (note blue highlighted text).
  1. The contents of the Global Offset Table entry for function1@got is the absolute address of the symbol function1. This absolute address was resolved by the dynamic link/loader (ld.so) at program load-time. This address is loaded into gpr 0 and eventually moved into register ctr. The function is finally branched to with the bctrl call (note green highlighted text). This is the method used to invoke a function via a function pointer.
  2. A bl to function2@plt is requested (note red highlighted text). Since the absolute address is determined at load time this ..S file simply uses a symbol reference to the .plt section to indicate the late binding. Later investigation of the disassembled executable will reveal that since function2() exists in a shared-object file the address of function1@plt is bound at runtime to an executable stub which loads the absolute function1 address from the .plt entry that was populated by the dynamic loader via the _dl_dynamic_resolve() function.
  3. The bl to function3@plt is requested (note purple highlighted text). Even though the .S file contains function3@plt, at link time the linker notices that function3 exists in the same C file as main and inserts the absolute address for function3 into the executable as a bl directly to the absolute function address.

main:
.LFB3:
        .loc 1 8 0
        stwu 1,-32(1)
.LCFI3:
        mflr 0
.LCFI4:
        stw 30,24(1)
.LCFI5:
        stw 31,28(1)
.LCFI6:
        stw 0,36(1)
.LCFI7:
        mr 31,1
.LCFI8:
        bcl 20,31,.LCF1
.LCF1:
        mflr 30
        addis 30,30,_GLOBAL_OFFSET_TABLE_-.LCF1@ha
        addi 30,30,_GLOBAL_OFFSET_TABLE_-.LCF1@l
        .loc 1 9 0
        li 0,0
        stw 0,16(31)
        .loc 1 12 0
        lwz 0,function1@got(30)
        stw 0,16(31)
        .loc 1 13 0
        lwz 0,16(31)
        mtctr 0
        li 3,10
        bctrl
        mr 0,3
        stw 0,12(31)
        .loc 1 14 0
        lwz 3,12(31)
        bl function2@plt
        mr 0,3
        stw 0,8(31)
        .loc 1 15 0
        lwz 3,8(31)
        bl function3@plt
        mr 0,3
        .loc 1 16 0
        mr 3,0
        lwz 11,0(1)
        lwz 0,4(11)
        mtlr 0
        lwz 30,-8(11)
        lwz 31,-4(11)
        mr 1,11
        blr

Build and link the executable

Build and link the executable to the shared object file.

/opt/biarch/20060123/bin/gcc -g -m32 -msecure-plt -fpic -L. -lfunc test.c -o test

Objdump the full ELF information into a disassembly file to examine the .plt, .got, and plt stubs.

/opt/biarch/20060123/bin/objdump -stDx test > test.dis

Specify the LD_LIBRARY_PATH environment variable so that the linker can find libfunc.so.1 when you execute the application:

export LD_LIBRARY_PATH=$PWD

Examine the dissasembled binary

We can determine what the linker has done during early binding by examining the disassembled binary.

If GCC can see the code for a function it will generally include it in the executable. If you were to directly #include "func.c" rather than "func.h" which contains the extern function prototype GCC would simply insert the function code into the executable. During the link stage the linker would determine that it could resolve this function1@plt reference directly to an absolute address, meaning it will not be loaded from a shared library and it would result in a bl directly to the function address.

10001580 <main>:
10001580:       94 21 ff e0     stwu    r1,-32(r1)
10001584:       7c 08 02 a6     mflr    r0
10001588:       93 c1 00 18     stw     r30,24(r1)
1000158c:       93 e1 00 1c     stw     r31,28(r1)
10001590:       90 01 00 24     stw     r0,36(r1)
10001594:       7c 3f 0b 78     mr      r31,r1
10001598:       42 9f 00 05     bcl-    20,4*cr7+so,1000159c <main+0x1c>
1000159c:       7f c8 02 a6     mflr    r30
100015a0:       3f de 00 01     addis   r30,r30,1
100015a4:       3b de 05 cc     addi    r30,r30,1484
100015a8:       38 00 00 00     li      r0,0
100015ac:       90 1f 00 10     stw     r0,16(r31)
100015b0:       80 1e ff fc     lwz     r0,-4(r30)
100015b4:       90 1f 00 10     stw     r0,16(r31)
100015b8:       80 1f 00 10     lwz     r0,16(r31)
100015bc:       7c 09 03 a6     mtctr   r0
100015c0:       38 60 00 0a     li      r3,10
100015c4:       4e 80 04 21     bctrl
100015c8:       7c 60 1b 78     mr      r0,r3
100015cc:       90 1f 00 0c     stw     r0,12(r31)
100015d0:       80 7f 00 0c     lwz     r3,12(r31)
100015d4:       48 00 03 8d     bl      10001960 <call___do_global_ctors_aux+0x30>
100015d8:       7c 60 1b 78     mr      r0,r3
100015dc:       90 1f 00 08     stw     r0,8(r31)
100015e0:       80 7f 00 08     lwz     r3,8(r31)
100015e4:       4b ff ff 69     bl      1000154c <function3>
100015e8:       7c 60 1b 78     mr      r0,r3
100015ec:       7c 03 03 78     mr      r3,r0
100015f0:       81 61 00 00     lwz     r11,0(r1)
100015f4:       80 0b 00 04     lwz     r0,4(r11)
100015f8:       7c 08 03 a6     mtlr    r0
100015fc:       83 cb ff f8     lwz     r30,-8(r11)
10001600:       83 eb ff fc     lwz     r31,-4(r11)
10001604:       7d 61 5b 78     mr      r1,r11
10001608:       4e 80 00 20     blr
  • The orange highlighted text shows the post-linkage assembly code used to fetch the address of the .got into gpr30 (as discussed above).
  • The green highlighted text shows the post-linkage assembly code that invokes function call 1. The linker simply computed the offset of function1 in the .got and loaded the result into gpr0. The .got entry for function1 was populated at load-time with the absolute address of function1.
  • The red highlighted text shows how the linker used late binding to bl to a code segment representing function2@plt. The symbol label <call___do_global_ctors_aux+0x20> in the above asm is wrong and it is due to objdump assigning the nearest preceding symbol as a label for the address. This address is actually the .glink PLT call code stub for function2:
10001960:       3d 60 10 01     lis     r11,4097
10001964:       81 6b 1b 78     lwz     r11,7032(r11)
10001968:       7d 69 03 a6     mtctr   r11
1000196c:       4e 80 04 20     bctr

This .glink PLT call code stub for function2 loads the contents of the function2 .plt entry (at address 0x10011b78) into the ctr and branches to it:

This is the .plt section:
Contents of section .plt:
 10011b74 10001980 10001984 10001988           ............
The contents of the actual .plt entry for function2:
10011b78 <function2@plt>:
10011b78:       10 00 19 84     vslw    v0,v0,v3

The .plt entry for function2 will eventually hold the absolute address of function2 after it is loaded by the dynamic linker.

After the linker has run the .plt entry for function2 holds 0x10001984 which is the address of the .glink PLT resolver:

10001980:       60 00 00 00     nop
10001984:       60 00 00 00     nop
10001988:       60 00 00 00     nop
1000198c:       60 00 00 00     nop
10001990:       3d 80 10 01     lis     r12,4097
10001994:       3d 6b f0 00     addis   r11,r11,-4096
10001998:       80 0c 1b 6c     lwz     r0,7020(r12)
1000199c:       39 6b e6 80     addi    r11,r11,-6528
100019a0:       7c 09 03 a6     mtctr   r0
100019a4:       7c 0b 5a 14     add     r0,r11,r11
100019a8:       81 8c 1b 70     lwz     r12,7024(r12)
100019ac:       7d 60 5a 14     add     r11,r0,r11
100019b0:       4e 80 04 20     bctr

Before the dynamic link/loader has loaded the shared object which implements function2 the .plt entry for function2 actually contains a fall-through nop address for the PLT resolver which will invoke _dl_runtime_resolve().

After _dl_runtime_resolve() has run the .plt entry for function2 will hold the absolute address for function2. All future .glink PLT call code stub invocations for function2 will load the absolute address for function2 into the count register and the bctr will invoke function2 proper. Therefore the expense of the dynamic resolution is made the first time the function is used. All other invocations simply incur the cost of the .glink access of the .plt entry for function2.

  • The purple highlighted text shows how the linker recognized that the function body of function3 exists in the executable and provided a direct bl to the function body rather than branching to a .glink stub and fetching the function address from the .plt. The asm for function3 is listed here for reference.
1000154c <function3>:
1000154c:       94 21 ff e0     stwu    r1,-32(r1)
10001550:       93 e1 00 18     stw     r31,24(r1)
10001554:       7c 3f 0b 78     mr      r31,r1
10001558:       90 7f 00 08     stw     r3,8(r31)
1000155c:       81 3f 00 08     lwz     r9,8(r31)
10001560:       38 09 00 01     addi    r0,r9,1
10001564:       90 1f 00 08     stw     r0,8(r31)
10001568:       80 1f 00 08     lwz     r0,8(r31)
1000156c:       7c 03 03 78     mr      r3,r0
10001570:       81 61 00 00     lwz     r11,0(r1)
10001574:       83 eb ff f8     lwz     r31,-8(r11)
10001578:       7d 61 5b 78     mr      r1,r11
1000157c:       4e 80 00 20     blr
Note: Per the warning at the beginning of this section, if you directly include a .c file rather than a .h which contains a function prototype the GCC compiler can get to the function code and will insert it into the executable, negating the benefit of linking against a shared object.
  • The blue highlighted text shows how the application returns control to the calling function by branching to the address held in the link-register.

How do I fix symbols showing up as check-localplt make check failures?

Reference this email exchange on the libc-alpha mailing list.

Say you're seeing the following:

--- /home/ryanarn/glibc/glibc-2.7/scripts/data/localplt-powerpc-linux-gnu.data
+++ /home/ryanarn/glibc/build/glibc32/check-localplt.out.new
@@ -4,4 +4,9 @@
 libc.so: malloc
 libc.so: memalign
 libc.so: realloc
+libm.so: cosl
+libm.so: finitel
+libm.so: logl
 libm.so: matherr
+libm.so: sinl
+libm.so: sqrtl

This means that some function is using the symbols cosl, finitel, logl, sinl, and sqrtl directly from within libm.so when they SHOULD be using the internal versions __cosl, __finitel, __logl, __sinl, and __sqrtl instead.

Using the external interface from within the library is bad for two reasons:

1. Someone could change the implementation out from underneath you and then libm may end up using an erroneous user created function where it is not appropriate.
2. The libm.so shared object ends up having to do plt branching when it could simply branch directly to the internal symbol.

To find out which function is using the symbols follow the procedure outlined below.

NOTE: The trick is that PLT slots and call-stubs are numbered and have to correspond.

Dump the relocations used by the library:

> readelf -r math/libm.so | grep R_PPC_JMP_SLOT
000b7000  00007815 R_PPC_JMP_SLOT    00000000   __assert_fail + 0
000b7004  00008f15 R_PPC_JMP_SLOT    00047ea0   cosl + 0
000b7008  00009315 R_PPC_JMP_SLOT    00000000   __errno_location + 0
000b700c  0000df15 R_PPC_JMP_SLOT    0004d570   sqrtl + 0
000b7010  0000f015 R_PPC_JMP_SLOT    00000000   fputs + 0
000b7014  0000f715 R_PPC_JMP_SLOT    00000000   strlen + 0
000b7018  00012415 R_PPC_JMP_SLOT    00000000   sprintf + 0
000b701c  00013115 R_PPC_JMP_SLOT    00010180   matherr + 0
000b7020  00015b15 R_PPC_JMP_SLOT    00000000   __cxa_finalize + 0
000b7024  00017e15 R_PPC_JMP_SLOT    00000000   strtold + 0
000b7028  00018515 R_PPC_JMP_SLOT    00000000   memset + 0
000b702c  00018615 R_PPC_JMP_SLOT    00000000   strtof + 0
000b7030  00018a15 R_PPC_JMP_SLOT    00000000   strtod + 0
000b7034  00018d15 R_PPC_JMP_SLOT    0004b0a0   sinl + 0
000b7038  00019815 R_PPC_JMP_SLOT    0004ccd0   logl + 0
000b703c  0001a615 R_PPC_JMP_SLOT    00000000   fwrite + 0
000b7040  0001ab15 R_PPC_JMP_SLOT    00054910   finitel + 0
000b7044  0001c915 R_PPC_JMP_SLOT    00000000   __gmon_start__ + 0

Next you need to find out where the call-stubs are at in the libm.so .text section. There is no symbol defined to tell you where this starts. Generally the linker seems to put the first call-stub at the end of of the <call___do_global_ctors_aux> text for 32-bit and before the <call_gmon_start> text for 64-bit:

0005eb00 <call___do_global_ctors_aux>:
   5eb00:       94 21 ff f0     stwu    r1,-16(r1)
   5eb04:       7c 08 02 a6     mflr    r0
   5eb08:       90 01 00 14     stw     r0,20(r1)
   5eb0c:       80 01 00 14     lwz     r0,20(r1)
   5eb10:       38 21 00 10     addi    r1,r1,16
   5eb14:       7c 08 03 a6     mtlr    r0
   5eb18:       4e 80 00 20     blr
   5eb1c:       00 00 00 00     .long 0x0
   5eb20:       81 7e 00 0c     lwz     r11,12(r30)
   5eb24:       7d 69 03 a6     mtctr   r11
   5eb28:       4e 80 04 20     bctr
   5eb2c:       60 00 00 00     nop
   5eb30:       81 7e 00 10     lwz     r11,16(r30)
   5eb34:       7d 69 03 a6     mtctr   r11
   5eb38:       4e 80 04 20     bctr
   5eb3c:       60 00 00 00     nop
   5eb40:       81 7e 00 14     lwz     r11,20(r30)
   5eb44:       7d 69 03 a6     mtctr   r11
   5eb48:       4e 80 04 20     bctr
   5eb4c:       60 00 00 00     nop

The second call-stub in blue corresponds with the second relocation slot for cosl:

   5eb30:       81 7e 00 10     lwz     r11,16(r30)
   5eb34:       7d 69 03 a6     mtctr   r11
   5eb38:       4e 80 04 20     bctr
   5eb3c:       60 00 00 00     nop

So simply search the libm disassembly for the branch to the address of the call-stub which happens to be in the __ieee754_j0l function:

0003e370 <__ieee754_j0l>:
   3e370:       94 21 ff 40     stwu    r1,-192(r1)
   3e374:       7c 08 02 a6     mflr    r0
   3e378:       42 9f 00 05     bcl-    20,4*cr7+so,3e37c <__ieee754_j0l+0xc>
   ...
   3e8f8:       48 02 02 39     bl      5eb30 <call___do_global_ctors_aux+0x30>

This function is implemented in the following file:

glibc-2.7/sysdeps/ieee754/ldbl-128/e_j0l.c

In this file change all references to cosl to __cosl. Continue to search libm.so for all instance of 0x5eb30 and then move on to the next relocation that needs to be removed.

How to create and link to a shared object

.so naming convention and the required symlinks

Code and Data segment layout in executable file and memory

Talk about brk() and sbrk() and how the loader lays out the segments.

Differences between -fpic and -fPIC

-fpic and -fPIC explained

References

ppc64 ELF ABI version 1.9 Secure-PLT discussion