I’ve occasionally wondered how it is that a computer gets from its power on state to running an operating system. Today I decided to take a look at a small piece of that puzzle by examining a bootloader for an embedded x86 system I was hoping to reverse engineer at some point in the future. This particular machine runs the pSOS real-time operating system on a 486. I had some trouble figuring out where the firmware is actually loaded into memory by cursory examination and googling so I opted to break out IDA Pro and actually take a look at what goes on.

I don’t have access to the BIOS, unfortunately, so I was not able to examine that piece of the puzzle. I’ll save that for a later date when I’m not trying to finish my dissertation. Fortunately, the BIOS functionality seems to be pretty standard. What follows is a lightly edited version of the notes I took while reverse engineering the bootloader.

  1. The BIOS loads the master boot record (MBR) (sector 1 of the internal CF flash) at 0000:7c00 and then jumps to the bootloader at 0000:7c00.
  2. The boot loader loads 16 sector starting with sector 26 to 0a00:0000 using the interrupt handler for int 13h set up by the BIOS. This is the root directory structure for the FAT file system.
  3. It looks up the first cluster number for the PSOSBOOT.SYS file (cluster 2).
  4. The first sector of this cluster is loaded to 0a00:0000 which overwrites the root directory.
  5. Next, es is set to the first word of 0a00:0000 which is 0x07e0.
  6. This value is shifted left by 4 and 4 is added to it (to get 0x7e04) which is then written to 0000:7d14. The 4 bits that were shifted off the end of the word are written into the least significant 4 bits of 0000:7d16. The 4 bits don’t change anything since they’re just 0 and 0000:7d16 was already 0. This is self-modifying code! The result is that the last instruction executed by the bootloader is a jmp large far ptr 0010:7e04 (according to IDA).
  7. Next, it needs to load the OS (or a tertiary bootloader) into RAM. It does this at 7e00:0000. It already looked up the first cluster number for PSOSBOOT.SYS, so it can immediately load it. It loops through the clusters corresponding to this file, loading it sequentially in memory.
  8. A global descriptor table is constructed at address 0000:5000. The table is 4 entries long. The zeroth is always ignored by the processor. The first and second are set to have a base address of 0x00000000 and a limit of 0xffffffff. The first is a data segment that is read/write whereas the second is a code segment that is execute/read. The third entry is all zeros
  9. (Maskable) interrupts are disabled and the processor enables protected mode by writing a bit to the cr0 register. At this point, the /Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide/ says that “[r]andom failures can occur” if a far jump or far call is not issued immediately after setting this bit in cr0. This code does not do that. Instead, it performs a short jump to the following instruction and will not actually set the cs register until step~12.
  10. The ds, es, fs, gs, and ss segment registers are loaded with the selector for the first (data) segment.
  11. A sequence of 4 instructions whose purpose eludes me.

    mov	eax, esp
    xor	bx, bx
    mov	sp, bx
    mov	esp, eax
    

    As far as I can tell, the only effect this will have is to clear ebx. I tested it with with a simple program.

    #include <stdio.h>
        
    int main()
    {
            int eax;
            short bx;
            short sp;
            int esp;
        
            asm("movl %%esp, %%eax\n\t"
                "xorw %%bx, %%bx\n\t"
                "movw %%bx, %%sp\n\t"
                "movl %%eax, %%esp\n\t"
                "movl %%eax, %0\n\t"
                "movw %%bx, %1\n\t"
                "movw %%sp, %2\n\t"
                "movl %%esp, %3"
                : "=r"(eax), "=r"(bx), "=r"(sp), "=r"(esp)
                :
                : "eax", "bx", "esp");
            printf("eax = %08x\n"
                   "bx  =     %04hx\n"
                   "sp  =     %04hx\n"
                   "esp = %08x\n", eax, bx, sp, esp);
            return 0;
    }
    

    The result is exactly what I expected, eax and esp have the same value, bx is zero, and sp is the bottom half of esp. Of course, I was not running this in a strange state between entering protected mode and before loading the cs register and without virtual memory and a whole host of other environmental issues such as running in ring

    1. If anyone has any idea why these instructions are here, I’d love to know.
  12. Finally, it executes a far jmp to 10:7e04—this is the instruction that was modified in step 6 above—which sets the cs register to the second segment and jumps to address 0x7e04.\n\n

Now I need to look at what is loaded at 0x7e00…but that can wait for another day.