Compiler Options for Creating Odd Binaries

From Devpit
Jump to: navigation, search

Compiling different binary formats is an annoyingly undocumented task. Here's a list of useful options for compiling special binaries. Some of these may be found in man pages, and some seem to be impossible to find.

Options for GCC

  • -nostdinc: "No standard includes". This means not to search the standard include path, which is usually "/usr/include". You'll need this and the -I option (to add directories to the include path) if you're compiling using your own set of standard include files.

Options for LD

  • -nostdlib: "No standard libraries". This means not to attempt to link with the standard libraries. Similarly, you'll need this and the -L option (to add directories to the library path) if you're compiling using your own set of standard libraries.
  • -Ttext 0x100000: Use load address 0x100000. This specifies that the text segment is to be loaded at 0x100000. The other segments will automatically be placed immediately following the text segment unless their locations are also explicitly specified. If linking to an ELF, the headers will give the entry point and the load address of each segment. The loader should use those, as there could be space left in between the segments. If linking to a raw, the load address is also 0x100000 and all the segments are smashed together.
  • --oformat binary: Output a raw binary. Input modules are still in the host system's native format (usually ELF). Intermediate modules obviously cannot be raw binaries because they need symbol and relocation information. (--iformat specifies the input format, but that is seldom useful.)
  • -static: If you use a set of standard libraries, they must be statically linked unless your loader supports dynamic linking.

Code directives

The assembler can compile 16-bit assembly code to be run in a 16-bit segment, 32-bit assembly code to be run in a 32-bit segment, or 32-bit assembly code to be run in a 16-bit segment (by using overrides on each instruction). As far as we know, GCC always generates 32-bit assembly code. Several directives can be placed in the assembly output by using the asm() directive in the C code.

  • .code16: This specifies that the code being assembled is 16-bit code to be run in a 16-bit segment.
  • .code32: This specifies that the code being assembled is 32-bit code to be run in a 32-bit segment. This is the default.
  • .code16gcc: This specifies that the code being assembled was generated by GCC and therefore is 32-bit assembly code that will run in a 16-bit segment. GAS will add the necessary overrides to each instruction to indicate 32-bit addresses, registers, data, etc. This is the most useful, since this directive allows us to write C code to be run in a 16-bit segment (either real mode or protected mode). In terms of applying this to C code, this means adding asm(".code16gcc\n"); at the top of each C module. Notice that you can mix functions to be run in 16-bit segments and functions to be run in 32-bit segments in the same module, but if you do, be careful about switching segments and make sure the load addresses are valid for every part of the code.

Compiling DOS COM Files and Boot Loaders

The DOS COM file format is just a raw binary loaded into a 16-bit real mode segment at address 0x100. To compile a COM with gcc, you need to add asm(".code16gcc\n"); at the top of each C module (including all the standard libraries) and link with the options "-static -Ttext 0x100 --oformat binary". Although you will probably have a lot of in-line assembly for system calls, it will probably still be easier to write in C.

To make system calls easier, we wrote a macro that expands to an asm() that takes arguments for the values of eax, ebx, ecx, edx, esi, and edi. We wrote more macros for getting the return values out of the registers and for testing the values of the bits in the flags register after returning from a system call. This makes it trivial to write some wrapper functions for most file and terminal I/O in DOS.

Compiling a boot loader would work the same way, except the load address is 0x7C00 instead of 0x100 and the size of your raw binary is limited to 446 bytes for a master boot record and 512 bytes for a boot sector. Good luck fitting a useful C program into 446 bytes, though.