Ones and Zeros, Part 2:
Making Executable Files
(15 February 2016)

Introduction

I first started programming in QBasic. My father had brought home a used IBM PS/2 Model 50 from work, and I spent many hours writing programs that printed text in different colours whilst beeping the internal speaker.

Those programs could only run in the QBasic environment, which made them harder to share with friends. I wanted to turn my programs into executable files, like real programs, but couldn't figure out how. Since then, I have learned many things, including using a compiler to build executables, but I was always curious: how do executable files really work?

Programming with Ones And Zeros showed how computer programs are translated into ones and zeros, and how to run them in memory. This post explores how to put those ones and zeros into an executable, a runnable file. Like the previous post, it assumes basic knowledge of C, x86 assembly language and hexadecimal numbers. There are examples for Linux, Mac, and Windows.

Update 2024-05-09: Added a paragraph about why DOS executables can start with either MZ or ZM.

Example Program
Linking and Loading
ROT13 in Machine Code
Making a Linux Executable
Making a Mac OS X Executable
- Writing the Mach-O File
Making a Windows Executable
- Writing the PE File
Exercises
Further Reading

Example Program

The program below (rot13.c) implements the ROT13 cipher, a simple substitution cipher in which the ith letter of the alphabet is replaced by the (i+13)th, wrapping around as necessary. For example, a becomes n, b becomes o, and so on, until z which becomes m. Non-alphabetical characters do not change.

Our program performs the substitution using a lookup table (generated by make_rot13_table.c) for efficiency and pedagogical reasons.

#include <stdio.h>

#define BUFFER_SIZE 4096

static const unsigned char rot13_table[] = {   0,   1,   2,   3,   4,   5,   6,
  7,   8,   9,  10,  11,  12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,
 23,  24,  25,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
 39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,
 55,  56,  57,  58,  59,  60,  61,  62,  63,  64, 'N', 'O', 'P', 'Q', 'R', 'S',
'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I',
'J', 'K', 'L', 'M',  91,  92,  93,  94,  95,  96, 'n', 'o', 'p', 'q', 'r', 's',
't', 'u', 'v', 'w', 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
'j', 'k', 'l', 'm', 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150,
151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166,
167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198,
199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214,
215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230,
231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
247, 248, 249, 250, 251, 252, 253, 254, 255 };

static unsigned char buffer[BUFFER_SIZE];

int main()
{
        size_t n, i;

        while ((n = fread(buffer, 1, BUFFER_SIZE, stdin))) {
                for (i = 0; i < n; i++) {
                        buffer[i] = rot13_table[buffer[i]];
                }

                fwrite(buffer, 1, n, stdout);
        }

        return 0;
}

The program can be tested like this on Linux and Mac:

$ gcc rot13.c -o rot13
$ echo 'Hello, world!' | ./rot13
Uryyb, jbeyq!

Since ROT13 is its own inverse, the same program is used for decryption:

$ echo 'Uryyb, jbeyq!' | ./rot13
Hello, world!

The rest of this article will explain how to create a rot13 executable for different operating systems by hand.

Linking and Loading

Programs are typically written in multiple source files, which are compiled separately into object files, and combined by a linker into an executable. Object files contain the code and data of the source file, translated by the compiler into machine code. Functions and data items in an object are called symbols. Functions are not really different from other data at this level, just ones and zeros, but different symbols can still have different properties. In particular, some symbols might be read-only whereas others are writable or executable. Symbols with different properties are stored in different sections of the object file.

The example code above defines three symbols which will be put into three different sections: main is executable data (code), rot13_table is read-only data, and buffer is writable zero-initialized data (recall that file-scope definitions are zero-initialized in C). The sections used for these types of data are usually called .text, .rodata, and .bss, respectively.

Our program also references symbols defined in other files. The fread and fwrite functions, and stdin and stdout variables, are declared in the included stdlib.h header, but their actual definitions are somewhere else. The object file keeps track of both defined and undefined symbols in its symbol table.

On Linux, we can look at the symbol table for our object file with nm(1): (nm works on Mac OS X too; on Windows, use dumpbin /symbols.)

$ gcc -c rot13.c
$ nm rot13.o
00000000 b buffer
         U fread
         U fwrite
00000000 T main
00000000 r rot13_table
         U stdin
         U stdout

The fread, fwrite, stdin, and stdout symbols are marked U for undefined. The others are marked b, T, and r, for zero-initialized data (bss), executable code (text), and read-only data, respectively. A lowercase letter signifies that the symbol is local to this object file, that is, it cannot be referenced from other object files. The numbers on the left show each symbol's offset in its section. Since our example only has one symbol in each section, the offsets are all zero.

Once each source file has been compiled into an object file, it is the job of a linker to combine them into an executable. The name refers to the process of linking symbol definitions with symbol references. Finding the symbol definition for a reference is called symbol resolution. In our example, the linker would need to resolve the references to fread, etc.

When the linker has resolved all symbols, it writes them into an executable. Different symbols are again organized based on their properties, this time into what is called segments (though the terminology varies between operating systems): one for read-only data, one for executable data, and so on. In our example, there is a one-to-one correspondence between the sections in the object file and the segments in the executable, but that is not always the case. For example, there could be different sections with executable code that would end up in a single executable segment.

Loading is the process performed by the operating system when reading an executable into memory in order to run it. The OS memory maps each segment of the executable with appropriate read, write, and execution permissions. (It will also load and resolve symbols against any shared libraries, but that is beyond the scope of this article.) Once the executable is loaded, the OS transfers control to the program's entry point, the address where the program begins its execution.

ROT13 in Machine Code

To put the rot13 program into an executable, it must first be translated into machine code. The main loop in the program looks simple enough, but how do we deal with the standard library functions fread and fwrite? To avoid having to link against the standard library, we will implement their functionality ourselves. Under the hood, those functions call on the operating system to perform the reading and writing. Such a system call on x86 Linux is traditionally performed by raising a software interrupt with an int $0x80 instruction, with the system call number in eax, and the arguments in ebx, ecx, and edx.

(This program will be adjusted for Mac and Windows later.)

For fread, we perform the read system call, which has system call number 3 (the number can be found in syscall_32.tbl in the Linux source):

static const uint8_t rot13_program[] = {
                               /* read:                          */
0xb8,0x03,0x00,0x00,0x00,      /*   movl   $3,%eax               */
0x31,0xdb,                     /*   xorl   %ebx,%ebx             */
0xb9,0x00,0xa0,0x04,0x08,      /*   movl   $buffer,%ecx          */
0xba,0x00,0x10,0x00,0x00,      /*   movl   $4096,%edx            */
0xcd,0x80,                     /*   int    $0x80                 */

The first instruction sets the system call number in eax, the next sets the file descriptor number to zero (for stdin) in ebx; ecx holds the address to read to, and edx the length. (See the definition of BSS_ADDR further down for the address of buffer used in the third instruction.)

After the system call, eax holds the return value of the system call: in this case the number of bytes that were read. If this value is zero or negative, the code jumps to the end of the program:

0x85,0xc0,                     /*   testl  %eax,%eax             */
0x7e,0x21,                     /*   jle    end                   */

Next comes the fun part, the loop that performs the character substitutions:

0x89,0xc2,                     /*   movl   %eax,%edx             */
                               /* rot13:                         */
0x0f,0xb6,0x19,                /*   movzbl (%ecx),%ebx           */
0x8a,0x9b,0x00,0x90,0x04,0x08, /*   movb   rot13_table(%ebx),%bl */
0x88,0x19,                     /*   movb   %bl,(%ecx)            */
0x41,                          /*   incl   %ecx                  */
0x48,                          /*   decl   %eax                  */
0x75,0xf1,                     /*   jnz    rot13                 */

Before entering the loop, we save the number of read bytes into edx, as it will be needed later. When the loop begins, ecx points to the beginning of buffer. The loop then proceeds to read one byte from the buffer into ebx, read the ebxth byte from rot13_table, and write that back to the buffer. Next, the pointer into the buffer is incremented, and we loop if there are any characters left (that is, if eax is non-zero).

After the loop is finished, we perform a write system call. The call number is 4, the file descriptor is 1 for stdout, and the number of bytes is in edx from before. To make ecx point to buffer, we use a trick: after the loop, the value of ecx is buffer+n, where n is the number of bytes that were read. Since that number is readily available in edx, we simply subtract that from ecx to get back to buffer:

0xb8,0x04,0x00,0x00,0x00,      /*   movl   $4,%eax               */
0xbb,0x01,0x00,0x00,0x00,      /*   movl   $1,%ebx               */
0x29,0xd1,                     /*   subl   %edx,%ecx             */
0xcd,0x80,                     /*   int    $0x80                 */

Once those bytes have been written, we start over and try to read some more:

0xeb,0xc8,                     /*   jmp    read                  */

To finally exit the program, we must perform one final system call. exit is system call number 1, and the argument, ebx, is the status code we want our program to return (zero):

                               /* end:                           */
0xb8,0x01,0x00,0x00,0x00,      /*   movl   $1,%eax               */
0x31,0xdb,                     /*   xorl   %ebx,%ebx             */
0xcd,0x80                      /*   int    $0x80                 */
};

The numbers on the left show the machine code in hexadecimal.

(One could use the information in Part 1 to translate the assembly code into machine code by hand, but I cheated and put rot13_linux.s through an assembler instead.)

Making a Linux Executable

The file format used for executables on Linux is called ELF, for Executable and Linkable Format. It is specified in Chapters 4 and 5 of the System V ABI.

When creating the executable, it will be convenient to have some functions for writing little-endian numbers and seeking in the file:

/* Write an 8-bit value to file. */
static void write8(FILE *file, uint8_t value);

/* Write a 16-bit value in little-endian to file. */
static void write16(FILE *file, uint16_t value);

/* Write a 32-bit value in little-endian to file. */
static void write32(FILE *file, uint32_t value);

/* Seek to offset in file. */
static void seek(FILE *file, off_t offset);

Our program begins by opening the file for writing:

int main(int argc, char **argv)
{
        FILE *f;
        size_t i;

        if (argc != 2) {
                fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
                exit(1);
        }

        f = fopen(argv[1], "wb");
        if (f == NULL) {
                perror("fopen");
                exit(1);
        }

LaTeX

The figure on the right illustrates the layout of the file we will produce.

An ELF file starts with an ELF header. The first four bytes are a magic number: 0x7f, 'E', 'L', 'F'. They identify the file as being ELF. Subsequent bytes specify whether the file is 32-bit or 64-bit, little- or big-endian, etc.:

        /* ELF header. */
        write8(f, 0x7f);       /* EI_MAG0                     */
        write8(f, 'E');        /* EI_MAG1                     */
        write8(f, 'L');        /* EI_MAG2                     */
        write8(f, 'F');        /* EI_MAG3                     */
        write8(f, 1);          /* EI_CLASS:     32-bit        */
        write8(f, 1);          /* EI_DATA:      Little-endian */
        write8(f, 1);          /* EI_VERSION:   Current       */
        write8(f, 0);          /* EI_OSABI:     No extensions */
        write8(f, 0);          /* EI_ABIVERSION               */
        write8(f, 0);          /* EI_PAD ... (7 bytes)        */
        write8(f, 0);
        write8(f, 0);
        write8(f, 0);
        write8(f, 0);
        write8(f, 0);
        write8(f, 0);

(A more pragmatic approach would be to #include <elf.h>, fill in the structs defined there, and dump them to a file. That would be more convenient if our program is only used on x86 Linux, but it would prevent it from being used on systems that don't provide elf.h, and we would have to be vary about endianness and struct layout issues.)

The second half of the header contains the e_entry field, which specifies the entry point. In our case, we want execution to start at the beginning of our code, which will be loaded at TEXT_ADDR (defined further below). This part of the header also specifies the location of various pieces of contents in the file. For example, e_phoff specifies the file offset for the program header table (explained below). That table is usually located right after the ELF header.

        /* ELF header (continued). */
        write16(f, 2);         /* e_type:     ET_EXEC         */
        write16(f, 3);         /* e_machine:  EM_386          */
        write32(f, 1);         /* e_version:  EV_CURRENT      */
        write32(f, TEXT_ADDR); /* e_entry                     */
        write32(f, EHSIZE);    /* e_phoff                     */
        write32(f, 0);         /* e_shoff                     */
        write32(f, 0);         /* e_flags                     */
        write16(f, EHSIZE);    /* e_ehsize                    */
        write16(f, PHSIZE);    /* e_phentsize                 */
        write16(f, 3);         /* e_phnum                     */
        write16(f, 40);        /* e_shentsize                 */
        write16(f, 0);         /* e_shnum                     */
        write16(f, 0);         /* e_shstrndx                  */

A program header is used to describe a segment to be loaded into memory: its size and offset in the file, and the destination address and size in memory. If the size in memory is greater than the size in the file, excess bytes are filled with zeros, something which is useful for zero-initialized data. The program headers are written back-to-back in order of increasing memory address, forming the program header table.

For our example, we need to load three pieces of data into memory: the code, the rot13_table, and buffer. Each should be loaded as a separate segment with the appropriate permissions (executable, read-only, and read-write zero-initialized), and we will refer to them as the TEXT, RDATA, and BSS segments, respectively.

On x86 Linux, it is customary to map the code of a program at address 0x08048000. We will load the other segments right after, aligned to the size of a memory page (4096 bytes on x86 Linux; see Section 5-1 "Program Loading" in the psABI):

#define TEXT_ADDR  0x08048000
#define RDATA_ADDR 0x08049000
#define BSS_ADDR   0x0804a000

The file offset from where to load data and the target virtual address must be congruent modulo the segment alignment. We use the following constants and utility function:

#define TEXT_SIZE  sizeof(rot13_program)
#define RDATA_SIZE sizeof(rot13_table)
#define BSS_SIZE   4096

#define EHSIZE 52       /* ELF header size. */
#define PHSIZE 32       /* Program header size. */
#define PAGE_SIZE 4096

/* Round value up to alignment. */
static uint32_t align_to(uint32_t value, uint32_t alignment)
{
        uint32_t remainder = value % alignment;

        if (remainder == 0) {
                return value;
        }

        return value + (alignment - remainder);
}

and compute the file offsets like this:

        assert(RDATA_ADDR - TEXT_ADDR >= TEXT_SIZE);
        assert(BSS_ADDR - RDATA_ADDR >= RDATA_SIZE);

        uint32_t text_offset = align_to(EHSIZE + 3 * PHSIZE, PAGE_SIZE);
        uint32_t rdata_offset = align_to(text_offset + TEXT_SIZE, PAGE_SIZE);

Now that we know the file offsets and virtual addresses of our segments, we can write the program header table:

        /* Program header 1 (TEXT). */
        write32(f, 1);           /* p_type: PT_LOAD      */
        write32(f, text_offset); /* p_offset             */
        write32(f, TEXT_ADDR);   /* p_vaddr              */
        write32(f, 0);           /* p_paddr              */
        write32(f, TEXT_SIZE);   /* p_filesz             */
        write32(f, TEXT_SIZE);   /* p_memsz              */
        write32(f, 0x4 | 0x1);   /* p_flags: PF_R | PF_X */
        write32(f, PAGE_SIZE);   /* p_align              */

        /* Program header 2 (RDATA). */
        write32(f, 1);            /* p_type: PT_LOAD */
        write32(f, rdata_offset); /* p_offset        */
        write32(f, RDATA_ADDR);   /* p_vaddr         */
        write32(f, 0);            /* p_paddr         */
        write32(f, RDATA_SIZE);   /* p_filesz        */
        write32(f, RDATA_SIZE);   /* p_memsz         */
        write32(f, 0x4);          /* p_flags: PF_R   */
        write32(f, PAGE_SIZE);    /* p_align         */

        /* Program header 3 (BSS). */
        write32(f, 1);         /* p_type: PT_LOAD      */
        write32(f, 0);         /* p_offset             */
        write32(f, BSS_ADDR);  /* p_vaddr              */
        write32(f, 0);         /* p_paddr              */
        write32(f, 0);         /* p_filesz             */
        write32(f, BSS_SIZE);  /* p_memsz              */
        write32(f, 0x4 | 0x2); /* p_flags: PF_R | PF_W */
        write32(f, PAGE_SIZE); /* p_align              */

Finally, we write the data at the offsets that we calculated, and then we're done:

        /* Write the TEXT segment. */
        seek(f, text_offset);
        for (i = 0; i < sizeof(rot13_program); i++) {
                write8(f, rot13_program[i]);
        }

        /* Write the RDATA segment. */
        seek(f, rdata_offset);
        for (i = 0; i < sizeof(rot13_table); i++) {
                write8(f, rot13_table[i]);
        }

        if (fclose(f) == EOF) {
                perror("fclose");
                exit(1);
        }

        return 0;
}

Note that the BSS segment doesn't have any data in the file. Since its p_filesz is 0 in the program header table, the loader will not read any bytes from the file, but instead fill the p_memsz bytes of the segment with zeroes.

The code of the full program can be found in make_rot13_linux.c, and tested like this:

$ gcc make_rot13_linux.c -o make_rot13_linux
$ ./make_rot13_linux rot13

We can use file(1) to see if what we produced really looks like an ELF file:

$ file rot13
rot13: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped

To inspect the ELF header and program header table, we use readelf(1):

$ readelf -hl rot13
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048000
  Start of program headers:          52 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         0
  Section header string table index: 0

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x001000 0x08048000 0x00000000 0x00041 0x00041 R E 0x1000
  LOAD           0x002000 0x08049000 0x00000000 0x00100 0x00100 R   0x1000
  LOAD           0x000000 0x0804a000 0x00000000 0x00000 0x01000 RW  0x1000

That all looks as expected. Finally, let's try running the program:

$ chmod +x rot13
$ echo 'Hello, world!' | ./rot13
Uryyb, jbeyq!

We have successfully written a program that creates a functional executable file by hand!

Making a Mac OS X Executable

To make the ROT13 machine code work on Mac OS X (Darwin), we have to make some small changes. Macs all use x86 CPUs these days, so the code won't change much, but we will need to change the parts that interface with the operating system: the read, write, and exit system calls.

The kernel in Darwin is called XNU. It is a combination of a micro-kernel called Mach and parts of the BSD operating system. In particular, it incorporates the BSD system call interface. Therefore, system calls on Mac OS X use the same ABI as on BSD, which is explained for x86 in the FreeBSD Developers' Handbook.

Just as on Linux, the system call numbers for read, write, and exit are 3, 4, and 1, respectively (see syscalls.master). The system call number goes in the eax register, the call is made with int 0x80, and the return value is in eax. However, the system call arguments are passed as when calling a regular C function: on the stack, pushed right-to-left, followed by a return address (which the syscall doesn't use). After the syscall we must restore the stack.

Note that making raw system calls on Mac OS X is generally not a good idea, as the interface is not guaranteed to be stable. The author of the Mac OS X Internals book says it's a no-no, as does Apple. However, it's instructive to see how it works, and probably stable enough that our example should work on all Mac OS X versions released so far.

The machine code for reading in the ROT13 program is changed to:

0xb9,0x00,0x20,0x00,0x00,      /*   movl   $buffer,%ecx          */
0xb8,0x03,0x00,0x00,0x00,      /*   movl   $3,%eax               */
0x68,0x00,0x10,0x00,0x00,      /*   pushl  $4096                 */
0x51,                          /*   pushl  %ecx                  */
0x6a,0x00,                     /*   pushl  $0                    */
0x6a,0x00,                     /*   pushl  $0                    */
0xcd,0x80,                     /*   int    $0x80                 */
0x83,0xc4,0x10,                /*   addl   $16,%esp              */

The first instruction moves the address of buffer into ecx since later code relies on that. The system call number is moved into eax, and then read's arguments (file descriptor, buffer address, byte count) are pushed in reverse order, followed by a dummy "return address". After the syscall, add restores the stack.

The code for calling write is changed similarly:

0xb8,0x04,0x00,0x00,0x00,      /*   movl   $4,%eax               */
0x52,                          /*   pushl  %edx                  */
0x51,                          /*   pushl  %ecx                  */
0x6a,0x01,                     /*   pushl  $1                    */
0x6a,0x00,                     /*   pushl  $0                    */
0xcd,0x80,                     /*   int    $0x80                 */
0x83,0xc4,0x10,                /*   addl   $16,%esp              */

And finally, the code to exit the program becomes:

0xb8,0x01,0x00,0x00,0x00,      /*   movl   $1,%eax               */
0x6a,0x00,                     /*   pushl  $0                    */
0x6a,0x00,                     /*   pushl  $0                    */
0xcd,0x80,                     /*   int    $0x80                 */

Since the system call causes the program to exit, there is no need to restore the stack afterwards.

Writing the Mach-O File

The file format used for executables and object files on Darwin is called Mach-O. It is documented in the OS X ABI Mach-O File Format Reference. In addition to that, constants and some structures are defined in the kernel's header files. For example, MH_ constants are defined in loader.h, CPU_ constants in machine.h, and VM_PROT_ constants in vm_prot.h.

We define some constants of our own to help in producing the executable:

#define MACH_HEADER_SZ     (7*4)      /* Size of Mach-O header.      */
#define MACH_SEGMENT_SZ    (10*4+16)  /* Size of segment struct.     */
#define MACH_SECTION_SZ    (2*16+9*4) /* Size of section struct.     */
#define MACH_UNIXTHREAD_SZ (4*4+16*4) /* Size of UNIXTHREAD command. */

#define TEXT_SZ   sizeof(rot13_program) /* Size of the code. */
#define CONST_SZ  sizeof(rot13_table)   /* Size of rot13_table. */
#define COMMON_SZ 4096                  /* Size of buffer. */

#define PAGE_SIZE 4096

LaTeX

Like ELF, a Mach-O file starts with a header that identifies the file type with a magic number, and then declares what machine it's for, etc. It also contains the number and size of the load commands which will follow the header: (See figure on the right for an overview of our file.)

        uint32_t num_commands = 4; /* Three LC_SEGMENT and
                                      one LC_UNIXTHREAD. */

        uint32_t commands_size = 3 * MACH_SEGMENT_SZ +
                                 3 * MACH_SECTION_SZ +
                                 MACH_UNIXTHREAD_SZ;

        /* Mach-O header. */
        write32(f, 0xfeedface);    /* MH_MAGIC             */
        write32(f, 7);             /* CPU_TYPE_I386        */
        write32(f, 3);             /* CPU_SUBTYPE_I386_ALL */
        write32(f, 2);             /* MH_EXECUTE           */
        write32(f, num_commands);  /* ncmds                */
        write32(f, commands_size); /* sizeofcmds           */
        write32(f, 1);             /* MH_NOUNDEFS          */

Load commands specify how to load the program into memory. In our case, there will be four load commands: three segments and one for initializing the thread state.

It is customary (mandatory for 64-bit executables since OS X 10.10, it seems) to load a page without read or write permissions at address zero, called the __PAGEZERO segment. This makes it impossible for the process to map that page, preventing a common source of security problems. (Linux has mmap_min_addr for the same reason.)

        /* The __PAGEZERO segment. */
        write32(f, 1);               /* LC_SEGMENT   */
        write32(f, MACH_SEGMENT_SZ); /* cmdsize      */
        writestr16(f, "__PAGEZERO"); /* segname      */
        write32(f, 0x0);             /* vmaddr       */
        write32(f, 0x1000);          /* vmsize       */
        write32(f, 0);               /* fileoff      */
        write32(f, 0);               /* filesize     */
        write32(f, 0);               /* VM_PROT_NONE */
        write32(f, 0);               /* VM_PROT_NONE */
        write32(f, 0);               /* nsects       */
        write32(f, 0);               /* flags        */

Next comes the __TEXT segment. It will be loaded at address 0x1000, that is, right after the __PAGEZERO segment, and it contains our code and read-only data (the rot13_table) in two sections called __text and __const, respectively. Since segments are supposed to be mapped at page-aligned offsets and addresses, we map it from offset 0 in the file. Its size is also rounded up to an even page size:

        uint32_t text_cmd_size = MACH_SEGMENT_SZ + 2 * MACH_SECTION_SZ;
        uint32_t text_seg_addr = 0x1000;
        uint32_t text_seg_size = align_to(MACH_HEADER_SZ + commands_size +
                                          TEXT_SZ + CONST_SZ, PAGE_SIZE);

        /* The __TEXT segment. */
        write32(f, 1);             /* LC_SEGMENT                     */
        write32(f, text_cmd_size); /* cmdsize                        */
        writestr16(f, "__TEXT");   /* segname                        */
        write32(f, text_seg_addr); /* vmaddr                         */
        write32(f, text_seg_size); /* vmsize                         */
        write32(f, 0);             /* fileoff                        */
        write32(f, text_seg_size); /* filesize                       */
        write32(f, 0x7);           /* VM_PROT_ALL                    */
        write32(f, 0x1 | 0x4);     /* VM_PROT_READ | VM_PROT_EXECUTE */
        write32(f, 2);             /* nsects                         */
        write32(f, 0);             /* flags                          */

Since the segment is mapped at offset 0, the first bytes contain the Mach-O header and load commands from the start of the file, and the actual code doesn't start until further in:

        uint32_t text_sec_addr = text_seg_addr + MACH_HEADER_SZ + commands_size;
        uint32_t text_sec_size = TEXT_SZ;
        uint32_t text_sec_offset = MACH_HEADER_SZ + commands_size;

        /* The __TEXT,__text section. */
        writestr16(f, "__text");     /* sectname                          */
        writestr16(f, "__TEXT");     /* segname                           */
        write32(f, text_sec_addr);   /* addr                              */
        write32(f, text_sec_size);   /* size                              */
        write32(f, text_sec_offset); /* offset                            */
        write32(f, 0);               /* alignment                         */
        write32(f, 0);               /* reloff                            */
        write32(f, 0);               /* nreloc                            */
        write32(f, 0x80000400);      /* S_ATTR_{SOME | PURE}_INSTRUCTIONS */
        write32(f, 0);               /* reserved1                         */
        write32(f, 0);               /* reserved2                         */

The code section is followed by the __const section for our read-only data:

        uint32_t const_sec_addr = text_sec_addr + text_sec_size;
        uint32_t const_sec_size = CONST_SZ;
        uint32_t const_sec_offset = text_sec_offset + text_sec_size;

        /* The __TEXT,__const section. */
        writestr16(f, "__const");     /* sectname  */
        writestr16(f, "__TEXT");      /* segname   */
        write32(f, const_sec_addr);   /* addr      */
        write32(f, const_sec_size);   /* size      */
        write32(f, const_sec_offset); /* offset    */
        write32(f, 0);                /* alignment */
        write32(f, 0);                /* reloff    */
        write32(f, 0);                /* nreloc    */
        write32(f, 0);                /* flags     */
        write32(f, 0);                /* reserved1 */
        write32(f, 0);                /* reserved2 */

Next comes the __DATA segment, which will be mapped with read-write permissions and hold our buffer. It is zero-initialized, so no data is loaded from the file.

        uint32_t data_cmd_size = MACH_SEGMENT_SZ + MACH_SECTION_SZ;
        uint32_t data_seg_addr = text_seg_addr + text_seg_size;
        assert(data_seg_addr == align_to(data_seg_addr, PAGE_SIZE));
        uint32_t data_seg_size = COMMON_SZ;

        /* The __DATA segment. */
        write32(f, 1);             /* LC_SEGMENT                   */
        write32(f, data_cmd_size); /* cmdsize                      */
        writestr16(f, "__DATA");   /* segname                      */
        write32(f, data_seg_addr); /* vmaddr                       */
        write32(f, data_seg_size); /* vmsize                       */
        write32(f, 0);             /* fileoff                      */
        write32(f, 0);             /* filesize                     */
        write32(f, 0x7);           /* VM_PROT_ALL                  */
        write32(f, 0x1 | 0x2);     /* VM_PROT_READ | VM_PROT_WRITE */
        write32(f, 1);             /* nsects                       */
        write32(f, 0);             /* flags                        */

        /* The __DATA,__common section. */
        writestr16(f, "__common"); /* sectname   */
        writestr16(f, "__DATA");   /* segname    */
        write32(f, data_seg_addr); /* addr       */
        write32(f, data_seg_size); /* size       */
        write32(f, 0);             /* offset     */
        write32(f, 0);             /* alignment  */
        write32(f, 0);             /* reloff     */
        write32(f, 0);             /* nreloc     */
        write32(f, 0x1);           /* S_ZEROFILL */
        write32(f, 0);             /* reserved1  */
        write32(f, 0);             /* reserved2  */

The final command (see thread_status.h and _structs.h) specifies the initial thread state for running our program. Careful readers might have noted that unlike the ELF header, the Mach-O header does not specify an entry point. Instead, we specify the starting value of eip (x86's program counter) in this command:

        /* The LC_UNIXTHREAD command. */
        write32(f, 0x5);                /* LC_UNIXTHREAD                 */ 
        write32(f, MACH_UNIXTHREAD_SZ); /* cmdsize                       */
        write32(f, 1);                  /* flavor: X86_THREAD_STATE32    */
        write32(f, 16);                 /* count: x86_THREAD_STATE_COUNT */
        write32(f, 0x0);                /* eax                           */    
        write32(f, 0x0);                /* ebx                           */
        write32(f, 0x0);                /* ecx                           */
        write32(f, 0x0);                /* edx                           */
        write32(f, 0x0);                /* edi                           */
        write32(f, 0x0);                /* esi                           */
        write32(f, 0x0);                /* ebp                           */
        write32(f, 0x0);                /* esp                           */
        write32(f, 0x0);                /* ss                            */
        write32(f, 0x0);                /* eflags                        */
        write32(f, text_sec_addr);      /* eip                           */
        write32(f, 0x0);                /* cs                            */
        write32(f, 0x0);                /* ds                            */
        write32(f, 0x0);                /* es                            */
        write32(f, 0x0);                /* fs                            */
        write32(f, 0x0);                /* gs                            */

The esp value (stack pointer) is ignored. Instead, the loader will point it to a stack that it sets up for us. (If it were really set to 0, our program would crash at the first push instruction.)

After that last command, we write out the code and data of our program:

        /* Write the __TEXT,__text section. */
        for (i = 0; i < sizeof(rot13_program); i++) {
                write8(f, rot13_program[i]);
        }

        /* Write the __TEXT,__const section. */
        for (i = 0; i < sizeof(rot13_table); i++) {
                write8(f, rot13_table[i]);
        }

Finally, since we rounded up the size of the __TEXT segment to 4096 bytes, we pad the file to match that, and then we're done.

        /* Pad the file to match the __TEXT segment size. */
        assert(const_sec_offset + const_sec_size < text_seg_size);
        seek(f, text_seg_size - 1);
        write8(f, 0);

        if (fclose(f) == EOF) {
                perror("fclose");
                exit(1);
        }

        return 0;
}

The code of this program is available in make_rot13_darwin.c. We can run it and make sure it produces something that looks like a Mach-O file like this:

$ clang make_rot13_darwin.c -o make_rot13_darwin
$ ./make_rot13_darwin rot13
$ file rot13
rot13: Mach-O executable i386

So far, so good. We use otool(1) to inspect the header and the load commands in the file:

$ otool -hl rot13
rot13:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedface       7          3  0x00           2     4        452 0x00000001
Load command 0
      cmd LC_SEGMENT
  cmdsize 56
  segname __PAGEZERO
   vmaddr 0x00000000
   vmsize 0x00001000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
Load command 1
      cmd LC_SEGMENT
  cmdsize 192
  segname __TEXT
   vmaddr 0x00001000
   vmsize 0x00001000
  fileoff 0
 filesize 4096
  maxprot 0x00000007
 initprot 0x00000005
   nsects 2
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x000011e0
      size 0x0000004d
    offset 480
     align 2^0 (1)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
[..]

That all looks as expected. Finally, let us try running the generated program:

$ echo 'Hello, world!' | ./rot13
Uryyb, jbeyq!

Looks like it works!

Making a Windows Executable

As on Darwin, the ROT13 machine code needs to be updated to interact with the operating system. In the Win32 API, the functions corresponding to read, write, and exit are ReadFile, WriteFile, and ExitProcess. Additionally, GetStdHandle is used to get file handles corresponding to stdin and stdout.

It would be possible to perform system calls by interacting directly with the Windows kernel, as we did on Linux and Darwin. However, system call numbers, and even the mechanism for calling the kernel, vary between Windows versions, so our program might not always work. Instead, the portable way to interact with the system is through functions exported by DLL files (dynamically linked libraries). In our case, the functions we need are provided by kernel32.dll.

We will see later how our executable specifies that it needs to access functions from a DLL, but for now it is enough to know that the loader will write the addresses of such dllimported functions into a table called the Import Address Table (IAT).

The first order of business is getting the handles for stdin and stdout by calling GetStdHandle:

static const uint8_t rot13_program[] = {
0x6a,0xf6,                     /*   pushl  $-10                  */
0xff,0x15,0x00,0x30,0x40,0x00, /*   call   *iat_slot0            */
0x89,0xc6,                     /*   movl   %eax,%esi             */
0x6a,0xf5,                     /*   pushl  $-11                  */
0xff,0x15,0x00,0x30,0x40,0x00, /*   call   *iat_slot0            */
0x89,0xc7,                     /*   movl   %eax,%edi             */

The address of GetStdHandle is found in the first slot of the IAT. Note that the call instruction is not calling iat_slot0, it is loading the address at iat_slot0, and calling that. In our case, the IAT will start at address 0x00403000.

The calling convention for Win32 functions is called stdcall. As when calling a regular C function, arguments are pushed on the stack right-to-left and values are returned in eax, but the stack is restored by the callee instead of the caller.

Hence, to get the handle for stdin, the program pushes STD_INPUT_HANDLE (-10) on the stack and calls the function. It then saves the return value in esi for later use. Similarly, we pass STD_OUTPUT_HANDLE (-11) to get stdout, and store that resulting handle in edi.

The call to ReadFile looks like this:

                               /* read:                          */
0x6a,0x00,                     /*   pushl  $0                    */
0x89,0xe0,                     /*   movl   %esp,%eax             */
0x6a,0x00,                     /*   pushl  $0                    */
0x50,                          /*   pushl  %eax                  */
0x68,0x00,0x10,0x00,0x00,      /*   pushl  $4096                 */
0x68,0x00,0x40,0x40,0x00,      /*   pushl  $buffer               */
0x56,                          /*   pushl  %esi                  */
0xff,0x15,0x04,0x30,0x40,0x00, /*   call   *iat_slot1            */
0x58,                          /*   popl   %eax                  */
0x85,0xc0,                     /*   testl  %eax,%eax             */
0x7e,0x26,                     /*   jle    end                   */
0xb9,0x00,0x40,0x40,0x00,      /*   movl   $buffer,%ecx          */
0x89,0xc2,                     /*   movl   %eax,%edx             */

Before setting up the call, the code allocates four bytes of stack (by pushing a garbage value), and saves that stack address in eax. It then pushes the arguments for ReadFile from right to left: lpOverlapped (0), lpNumberOfBytesRead (eax), nNumberOfBytesToRead (4096), lpBuffer ($buffer), hFile (esi). Since the number of bytes read were written to the memory we allocated on the top of the stack, we can just pop it into eax afterwards.

As in the other versions, the code then checks if any bytes were read, and potentially jumps to the end, saves the buffer address in ecx and the number of bytes read in edx, for use later in the program.

The WriteFile call looks like this:

0x6a,0x00,                     /*   pushl  $0                    */
0x54,                          /*   pushl  %esp                  */
0x52,                          /*   pushl  %edx                  */
0x51,                          /*   pushl  %ecx                  */
0x57,                          /*   pushl  %edi                  */
0xff,0x15,0x08,0x30,0x40,0x00, /*   call   *iat_slot2            */

The arguments pushed are: lpOverlapped (0), lpNumberOfBytesWritten (esp - we don't care, so writing to the address of the lpOverlapped argument is fine), nNumberOfBytesToWrite (edx), lpBuffer (ecx), hFile (edi).

Finally, the call to ExitProcess:

0x6a,0x00,                     /*   pushl  $0                    */
0xff,0x15,0x0c,0x30,0x40,0x00, /*   call   *iat_slot3            */

Writing the PE File

The file formats used for executable and object files on Windows are called PE (Portable Executable) and COFF (Common Object File Format), respectively. (Since parts of the COFF format show up in PE files, they are sometimes referred to as PE/COFF.) They are documented in the Microsoft PE and COFF Specification. Additionally, WinNT.h (part of the Windows SDK) defines all the relevant structures and values. The lld linker is also a useful source of information; in particular COFF/Writer.cpp.

LaTeX

Our program will write a file laid out as shown by the figure on the right.

Contrary to what one might expect, a PE file does not start with a magic number identifying it as such. Instead, it starts out as a DOS MZ executable (see for example Bob Eager's notes on this format).

This backwards-compatibility trick means the program can be made to print a message such as "This program cannot be run in DOS" when a user does exactly that. To distinguish the PE from an old DOS executable, the value at offset 0x3C holds the offset of a "new exe header", which is where the PE part begins (on an 8-byte aligned offset).

We shall take this opportunity to put a small DOS program into our Windows executable:

static const uint8_t dos_program[] = {
0x8c, 0xc8,                    /* mov    %cs,%ax        */
0x8e, 0xd8,                    /* mov    %ax,%ds        */
0xba, 0x10, 0x00,              /* mov    $16,%dx        */
0xb4, 0x09,                    /* mov    $0x9,%ah       */
0xcd, 0x21,                    /* int    $0x21          */
0xb8, 0x01, 0x4c,              /* mov    $0x4c01,%ax    */
0xcd, 0x21,                    /* int    $0x21          */
'H', 'e', 'l', 'l', 'o', ',',  /* ($-terminated string) */
' ', 'D', 'O', 'S', ' ', 'w',
'o', 'r', 'l', 'd', '!', '$' };

The first two instructions copy the contents of the code segment register (cs) into the data segment register (ds). Two instructions are needed since there is no x86 instruction for moving directly between two segment registers. The next instruction moves the address of our string into dx. The address is relative to the data segment, which we have just made equivalent to the code segment, because the string comes right after the code.

The next two instructions invoke a DOS system call, Int 21/AH=09h, which prints the string pointed to by ds:dx to stdout. After that, the program invokes Int 21/AH=4Ch which exits the program with the return code in al.

The MZ executable header looks like this:

        uint32_t dos_stub_sz = DOS_HDR_SZ + sizeof(dos_program);
        uint32_t pe_offset = align_to(dos_stub_sz, 8);

        write8(f, 'M');                /* Magic number (2 bytes) */
        write8(f, 'Z');
        write16(f, dos_stub_sz % 512); /* Last page size */
        write16(f, align_to(dos_stub_sz, 512) / 512); /* Pages in file */
        write16(f, 0);                 /* Relocations */
        write16(f, DOS_HDR_SZ / 16);   /* Size of header in paragraphs */
        write16(f, 0);                 /* Minimum extra paragraphs needed */
        write16(f, 1);                 /* Maximum extra paragraphs needed */
        write16(f, 0);                 /* Initial (relative) SS value */
        write16(f, 0);                 /* Initial SP value */
        write16(f, 0);                 /* Checksum */
        write16(f, 0);                 /* Initial IP value */
        write16(f, 0);                 /* Initial (relative) CS value */
        write16(f, DOS_HDR_SZ);        /* File address of relocation table */
        write16(f, 0);                 /* Overlay number */
        write16(f, 0);                 /* Reserved0 */
        write16(f, 0);                 /* Reserved1 */
        write16(f, 0);                 /* Reserved2 */
        write16(f, 0);                 /* Reserved3 */
        write16(f, 0);                 /* OEM id */
        write16(f, 0);                 /* OEM info */
        for (i = 0; i < 10; i++)
                write16(f, 0);         /* Reserved (10 words) */
        write32(f, pe_offset);         /* File offset of PE header. */

MZ are the initials of Mark Zbikowski, who designed the file format. For historical reasons, DOS also accepts executables starting with "ZM". According to Mark, IBM had a DOS linker for their mainframes which would output the signature bytes in reverse order, perhaps because the mainframes are big-endian computers. Since early versions of DOS only checked the file extension, not the magic bytes, those files were accepted. Thus, later versions of DOS, which do check the signature, had to accept "ZM" as an "old" signature. An example of such "ZM executables" can be found in IBM Pascal Compiler 1.00's pas1.exe. Windows does not accept files starting with "ZM" as PE files, but it will still run such DOS files!

For the purposes of an MZ header, a page is 512 bytes and a paragraph is 16 bytes. The file contents after the header (the size of which is calculated based on the "size of header", "last page size", and "pages in file" fields above) is known as the load module. The loader will load that into memory, and set up cs:ip to point to it, so that's where we write our code:

        /* DOS Program. */
        for (i = 0; i < sizeof(dos_program); i++) {
                write8(f, dos_program[i]);
        }

The PE part of the executable starts with the PE signature, followed by a COFF header:

        /* PE signature. */
        seek(f, pe_offset);
        write8(f, 'P');
        write8(f, 'E');
        write8(f, 0);
        write8(f, 0);

        uint32_t num_sections = 4;

        /* COFF header. */
        write16(f, 0x14c); /* Machine: IMAGE_FILE_MACHINE_I386 */
        write16(f, num_sections); /* NumberOfSections */
        write32(f, 0); /* TimeDateStamp */
        write32(f, 0); /* PointerToSymbolTable */
        write32(f, 0); /* NumberOfSymbols */
        write16(f, OPT_HDR_SZ); /* SizeOfOptionalHeader */
        write16(f, 0x103); /* Characteristics: no relocations, exec, 32-bit */

Next comes the "optional" (it's not optional in PE files) COFF header, consisting of three parts. To fill out this header, we need the size and location of all the different pieces of the executable, which we compute below. The virtual addresses of loaded data are usually specified relative to the program's image base. Such an address is called an RVA for relative virtual address.

        uint32_t headers_sz = pe_offset + PE_HDR_SZ + num_sections * SEC_HDR_SZ;

        uint32_t text_rva = align_to(headers_sz, SEC_ALIGN);
        uint32_t text_offset = align_to(headers_sz, FILE_ALIGN);
        uint32_t text_sz = sizeof(rot13_program);

        uint32_t rdata_rva = align_to(text_rva + text_sz, SEC_ALIGN);
        uint32_t rdata_offset = align_to(text_offset + text_sz, FILE_ALIGN);
        uint32_t rdata_sz = sizeof(rot13_table);

        uint32_t idata_rva = align_to(rdata_rva + rdata_sz, SEC_ALIGN);
        uint32_t idata_offset = align_to(rdata_offset + rdata_sz, FILE_ALIGN);

        uint32_t num_imports = 4;
        uint32_t iat_rva = idata_rva;
        uint32_t iat_sz = (num_imports + 1) * IAT_ENTRY_SZ;
        uint32_t import_dir_table_rva = iat_rva + iat_sz;
        uint32_t import_dir_table_sz = 2 * IMPORT_DIR_ENTRY_SZ;
        uint32_t import_lookup_table_rva = import_dir_table_rva +
                                           import_dir_table_sz;
        uint32_t name_table_rva = import_lookup_table_rva +
                        (num_imports + 1) * IMPORT_LOOKUP_TBL_ENTRY_SZ;
        uint32_t dll_name_rva = name_table_rva +
                                num_imports * NAME_TABLE_ENTRY_SZ;
        uint32_t name_table_sz = num_imports * NAME_TABLE_ENTRY_SZ + 16;
        uint32_t idata_sz = name_table_rva + name_table_sz - idata_rva;

        uint32_t bss_rva = align_to(idata_rva + idata_sz, SEC_ALIGN);
        uint32_t bss_sz = 4096;

Once all that has been figured out, the optional header can be written:

        /* Optional header, part 1: standard fields */
        write16(f, 0x10b); /* Magic: PE32 */
        write8(f, 0); /* MajorLinkerVersion */
        write8(f, 0); /* MinorLinkerVersion */
        write32(f, text_sz); /* SizeOfCode */
        write32(f, rdata_sz + idata_sz); /* SizeOfInitializedData */
        write32(f, bss_sz); /* SizeOfUninitializedData */
        write32(f, text_rva); /* AddressOfEntryPoint */
        write32(f, text_rva); /* BaseOfCode */
        write32(f, rdata_rva); /* BaseOfData */

        /* Optional header, part 2: Windows-specific fields */
        write32(f, IMAGE_BASE); /* ImageBase */
        write32(f, SEC_ALIGN); /* SectionAlignment */
        write32(f, FILE_ALIGN); /* FileAlignment */
        write16(f, 3); /* MajorOperatingSystemVersion */
        write16(f, 10); /* MinorOperatingSystemVersion*/
        write16(f, 0); /* MajorImageVersion */
        write16(f, 0); /* MinorImageVersion */
        write16(f, 3); /* MajorSubsystemVersion */
        write16(f, 10); /* MinorSubsystemVersion */
        write32(f, 0); /* Win32VersionValue*/
        write32(f, align_to(bss_rva + bss_sz, SEC_ALIGN)); /* SizeOfImage */
        write32(f, align_to(headers_sz, FILE_ALIGN)); /* SizeOfHeaders */
        write32(f, 0); /* Checksum */
        write16(f, 3); /* Subsystem: Console */
        write16(f, 0); /* DllCharacteristics */
        write32(f, 0x100000); /* SizeOfStackReserve */
        write32(f, 0x1000); /* SizeOfStackCommit */
        write32(f, 0x100000); /* SizeOfHeapReserve */
        write32(f, 0x1000); /* SizeOfHeapCommit */
        write32(f, 0); /* LoadFlags */
        write32(f, 16); /* NumberOfRvaAndSizes */

        /* Optional header, part 3: data directories. */
        write32(f, 0); write32(f, 0); /* Export Table. */
        write32(f, import_dir_table_rva); /* Import Table. */
        write32(f, import_dir_table_sz);
        write32(f, 0); write32(f, 0); /* Resource Table. */
        write32(f, 0); write32(f, 0); /* Exception Table. */
        write32(f, 0); write32(f, 0); /* Certificate Table. */
        write32(f, 0); write32(f, 0); /* Base Relocation Table. */
        write32(f, 0); write32(f, 0); /* Debug. */
        write32(f, 0); write32(f, 0); /* Architecture. */
        write32(f, 0); write32(f, 0); /* Global Ptr. */
        write32(f, 0); write32(f, 0); /* TLS Table. */
        write32(f, 0); write32(f, 0); /* Load Config Table. */
        write32(f, 0); write32(f, 0); /* Bound Import. */
        write32(f, iat_rva); /* Import Address Table. */
        write32(f, iat_sz);
        write32(f, 0); write32(f, 0); /* Delay Import Descriptor. */
        write32(f, 0); write32(f, 0); /* CLR Runtime Header. */
        write32(f, 0); write32(f, 0); /* (Reserved). */

The image and subsystem version fields are interesting. This MSDN post says the subsystem needs to be no higher than 5.01 for Windows XP support, so it seems the numbers effectively set the minimum required operating system version. Our program only uses functions that have been in the Win32 API since it was introduced, in Windows NT 3.1 whose build number is 3.10 — so that's what we use.

Next come the section headers. Section 45 in the specification define the names and characteristics which are normally used:

        /* Section header #1: .text */
        writestr8(f, ".text"); /* Name */
        write32(f, text_sz); /* VirtualSize */
        write32(f, text_rva); /* VirtualAddress */
        write32(f, align_to(text_sz, FILE_ALIGN)); /* SizeOfRawData */
        write32(f, text_offset); /* PointerToRawData */
        write32(f, 0); /* PointerToRelocations */
        write32(f, 0); /* PointerToLinenumbers */
        write16(f, 0); /* NumberOfRelocations */
        write16(f, 0); /* NumberOfLinenumbers */
        write32(f, 0x60000020); /* Characteristics: code, read, execute */

        /* Section header #2: .rdata */
        writestr8(f, ".rdata"); /* Name */
        write32(f, rdata_sz); /* VirtualSize */
        write32(f, rdata_rva); /* VirtualAddress */
        write32(f, align_to(rdata_sz, FILE_ALIGN)); /* SizeOfRawData */
        write32(f, rdata_offset); /* PointerToRawData */
        write32(f, 0); /* PointerToRelocations */
        write32(f, 0); /* PointerToLinenumbers */
        write16(f, 0); /* NumberOfRelocations */
        write16(f, 0); /* NumberOfLinenumbers */
        write32(f, 0x40000040); /* Characteristics: data, read */

        /* Section header #3: .idata */
        writestr8(f, ".idata"); /* Name */
        write32(f, idata_sz); /* VirtualSize */
        write32(f, idata_rva); /* VirtualAddress */
        write32(f, align_to(idata_sz, FILE_ALIGN)); /* SizeOfRawData */
        write32(f, idata_offset); /* PointerToRawData */
        write32(f, 0); /* PointerToRelocations */
        write32(f, 0); /* PointerToLinenumbers */
        write16(f, 0); /* NumberOfRelocations */
        write16(f, 0); /* NumberOfLinenumbers */
        write32(f, 0xc0000040); /* Characteristics: data, read, write */

        /* Section header #4: .bss */
        writestr8(f, ".bss"); /* Name */
        write32(f, bss_sz); /* VirtualSize */
        write32(f, bss_rva); /* VirtualAddress */
        write32(f, 0); /* SizeOfRawData */
        write32(f, 0); /* PointerToRawData */
        write32(f, 0); /* PointerToRelocations */
        write32(f, 0); /* PointerToLinenumbers */
        write16(f, 0); /* NumberOfRelocations */
        write16(f, 0); /* NumberOfLinenumbers */
        write32(f, 0xc0000080); /* Characteristics: uninit. data, read, write */

The code and read-only data are written at their calculated offsets:

        /* Write .text segment. */
        seek(f, text_offset);
        for (i = 0; i < sizeof(rot13_program); i++) {
                write8(f, rot13_program[i]);
        }

        /* Write .rdata segment. */
        seek(f, rdata_offset);
        for (i = 0; i < sizeof(rot13_table); i++) {
                write8(f, rot13_table[i]);
        }

LaTeX

Finally, we write the contents of the .idata section, which specifies what DLLs our program needs to access, and what functions to import from those DLLs. See Sections 61-65 in the specification. The figure on the right shows an overview of how we will write this section.

First, we write the Import Address Table (IAT). In the file, it has the same contents as the Import Lookup Table (see below), but when the operating system loads our executable, the IAT will get filled in with the actual addresses of the functions imported from DLLs.

        /* Write .idata segment. */
        seek(f, idata_offset);

        /* Import Address Table (IAT):
           (Same as the Import Lookup Table) */
        write32(f, name_table_rva + 0 * NAME_TABLE_ENTRY_SZ);
        write32(f, name_table_rva + 1 * NAME_TABLE_ENTRY_SZ);
        write32(f, name_table_rva + 2 * NAME_TABLE_ENTRY_SZ);
        write32(f, name_table_rva + 3 * NAME_TABLE_ENTRY_SZ);
        write32(f, 0);

The Import Directory Table specifies what DLLs need to be loaded together with the program (just kernel32.dll in our case):

        /* Import Directory Table */
        /* kernel32.dll */
        write32(f, import_lookup_table_rva); /* ILT RVA */
        write32(f, 0); /* Time/Data Stamp */
        write32(f, 0); /* Forwarder Chain */
        write32(f, dll_name_rva); /* Name RVA */
        write32(f, iat_rva); /* Import Address Table RVA */
        /* Null terminator */
        write32(f, 0);
        write32(f, 0);
        write32(f, 0);
        write32(f, 0);
        write32(f, 0);

The Import Lookup Table specifies what functions to import from a specific DLL, either by name or by ordinal (for example, "import function number five from the DLL"). The table consists of one 32-bit word per function. If the most significant bit is set, the other bits are interpreted as an ordinal. Otherwise, they are the address of an entry in the Name/Hint Table (see below).

        /* Import Lookup Table */
        write32(f, name_table_rva + 0 * NAME_TABLE_ENTRY_SZ); /* GetStdHandle */
        write32(f, name_table_rva + 1 * NAME_TABLE_ENTRY_SZ); /* ReadFile */
        write32(f, name_table_rva + 2 * NAME_TABLE_ENTRY_SZ); /* WriteFile */
        write32(f, name_table_rva + 3 * NAME_TABLE_ENTRY_SZ); /* ExitProcess */
        write32(f, 0); /* Null terminator */

The Hint/Name Table contains strings with the names of functions to import. Additionally, it provides a "hint" for looking up a name in a DLL faster. Functions exported from a DLL are sorted alphabetically to allow functions to be looked up by name using binary search, and the hint can be used as a starting point for the search.

        /* Hint/Name Table */
        write16(f, 0); /* Hint */
        writestr16(f, "GetStdHandle"); /* Name */
        write16(f, 0); /* Hint */
        writestr16(f, "ReadFile"); /* Name */
        write16(f, 0); /* Hint */
        writestr16(f, "WriteFile"); /* Name */
        write16(f, 0); /* Hint */
        writestr16(f, "ExitProcess"); /* Name */

Finally, the name of the DLL (referenced by the Directory Table above) needs to be written somewhere. We put it right after the Name/Hint Table:

        /* Put the dll name here too; we've got to write it somewhere. */
        writestr16(f, "kernel32.dll");

And that marks the end of our program.

The code of the whole program is available in make_rot13_win.c. On Linux or Mac, we can run and inspect the output like this:

$ gcc make_rot13_win.c -o make_rot13_win
$ ./make_rot13_win rot13.exe
$ file rot13.exe
rot13.exe: PE32 executable (console) Intel 80386, for MS Windows

We can try the DOS stub by running it in DOSBox:

$ dosbox rot13.exe

[..]
C:\>ROT13.EXE
Hello, DOS world!

On Windows, we can build and inspect the output like this in a Visual Studio Developer Command Prompt (I used the free Visual Studio Express):

c:\tmp>cl /nologo make_rot13_win.c
make_rot_13_win.c

c:\tmp>make_rot13_win rot13.exe

c:\tmp>dumpbin /nologo /all rot13.exe


Dump of file rot13.exe

PE signature found

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES
             14C machine (x86)
               4 number of sections
               0 time date stamp
               0 file pointer to symbol table
               0 number of symbols
              E0 size of optional header
             103 characteristics
                   Relocations stripped
                   Executable
                   32 bit word machine

[..]

In particular, we can see if we got the .idata section right:

[..]

SECTION HEADER #3
  .idata name
      A8 virtual size
    3000 virtual address (00403000 to 004030A7)
     200 size of raw data
     600 file pointer to raw data (00000600 to 000007FF)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
C0000040 flags
         Initialized Data
         Read Write

[..]

  Section contains the following imports:

    kernel32.dll
                403000 Import Address Table
                40303C Import Name Table
                     0 time date stamp
                     0 Index of first forwarder reference

                    0 GetStdHandle
                    0 ReadFile
                    0 WriteFile
                    0 ExitProcess

Finally, let's try our generated program:

C:\tmp>echo Uryyb, Jvaqbjf jbeyq! | rot13
Hello, Windows world!

It is a special feeling watching the computer breathe life into the numbers we just wrote into that file.

Exercises

Write a program that takes as input a parameter x, and generates an executable that performs ROTx encryption.
Refactor one of the make_rot13 programs by extracting a function (for example, write_elf) that writes an executable with code and data specified by the function's parameters.
By mapping the __TEXT segment from file offset zero, the Darwin ROT13 executable avoids the need for padding between the file headers and text segment. Apply the same technique to reduce the size of the ELF executable.
FreeBSD uses the ELF file format but the same system call ABI as in our Darwin example. Write a program that puts the Darwin version of the ROT13 machine code into an ELF file for FreeBSD. (You will need to change the EI_OSABI field in the ELF header and the addresses of buffer and rot13_table in the machine code.)
In the Darwin program, use the LC_UNIXTHREAD command to set more useful initial register values, and use that to reduce the size of the machine code.
A program that produces a copy of its own source code as output is called a quine. In a similar spirit, create an executable that writes itself to stdout.

Ones and Zeros, Part 2: Making Executable Files(15 February 2016)

Introduction

Table of Contents

Example Program

Linking and Loading

ROT13 in Machine Code

Making a Linux Executable

Making a Mac OS X Executable

Writing the Mach-O File

Making a Windows Executable

Writing the PE File

Exercises

Further Reading

Ones and Zeros, Part 2:
Making Executable Files
(15 February 2016)