Running Haskell on bare iron

Project Summary

In this document, we'll explain how to turn GHC-compiled executables into binaries that are fit to be loaded by the Grub boot loader. The process described here could, of course, be applied to binaries built by other compilers for other languages.

Tools and environment

We will be working in Linux running on an x86 host machine.

Background

Let's begin by talking about what a GHC-compiled binary looks like.

Consider the following Haskell program:

main :: IO ()
main = do putStrLn "Hello!"
          putStrLn "Bye!"
We can compile it at the command line like so:
$ ghc hello.hs
which then leads to several new files:
$ ls
a.out  hello.hi  hello.hs  hello.o
Of course, our program produces the following predictable output:
$ ./a.out
Hello!
Bye!

Now, we can easily see that this program has some external library dependencies:

$ nm -u a.out
         w _Jv_RegisterClasses
	 U __ctype_b_loc@@GLIBC_2.3
	 U __errno_location@@GLIBC_2.0
	 w __gmon_start__
	 U __gmp_set_memory_functions
	 U __gmpn_cmp
       ...[Several lines omitted]...
         U __libc_start_main@@GLIBC_2.0
	 U __strtod_internal@@GLIBC_2.0
	 U exit@@GLIBC_2.0
       ...[Several lines omitted]...
         U fork@@GLIBC_2.0
	 U fprintf@@GLIBC_2.0
	 U fputc@@GLIBC_2.0
	 U free@@GLIBC_2.0
	 U fwrite@@GLIBC_2.0
       ...[Several lines omitted]...
         U malloc@@GLIBC_2.0
	 U memcpy@@GLIBC_2.0
       ...[Several lines omitted]...
         U setitimer@@GLIBC_2.0
	 U sigaction@@GLIBC_2.0
       ...[Several lines omitted]...
         U strlen@@GLIBC_2.0
	 U strncpy@@GLIBC_2.0
       ...[Several lines omitted]...
The __gmp stuff comes from the GNU MP Bignum library, which the Haskell runtime uses for it's high-precision math routines. Several other things (such as __libc_start_main) are referenced as part of the Linux loader contract. Finally, we have lots of operating-system calls (such as free, malloc, and sigaction), as well as several libc calls (such as strlen and strncpy).

Now let's talk about what Grub expects from the binaries it loads.

Grub, as a fairly sophisticated (or bloated, depending on your perspective) boot loader is able to load kernels whose binaries are ELF-formatted. An ELF-formatted file comes with several headers that tell the loader (in this case, Grub) the locations of various named symbols, as well as information about unresolved symbols (such as the symbols we listed using nm -u).

When Grub loads an ELF-formatted kernel, it looks for the "Grub header," which it expects to be located near the the binary's entry point (somewhere within the first 8192 bytes of the entry point, to be precise). For example, if the kernel's ELF headers designated the start symbol to be the kernel's entry point, then a valid implementation of the Grub header can be found here.

What to do

So, if we want our Haskell binary to be Grub-loadable, here's what we have to do:

  1. Write our own implementations of the unresolved functions that we found using nm -u;
  2. Write a Grub header;
  3. Build our Haskell program into object files;
  4. Link the Haskell object files against our Grub header, our library implementations, and the GHC Runtime (HSbase_cbits, HSbase, and HSrts), being sure to tell the linker to set our Grub header as the entry point.
Note that GHC has several different runtimes you can choose from; however, the more advanced runtimes will have more library dependencies.

Also note that the code in our Grub header will need to call the Haskell code at some point (say, right away). The signature for this call is precisely what you'd expect, given that GHC thinks it is trying to build a Linux binary:

  int main(int argc, char *argv[]);

Short cuts

Here is one useful shortcut that I use frequently in the Kinetic operating system:

There are many library calls that are small, isolated functions. I'm talking about things like strncpy -- functions that don't actually depend on the underlying operating system or external libraries.

Libc already has some wonderful implementations of these functions, which can easily be extracted in a ready-to-link-against form. For example,

$ ar -x strncmp.o /usr/lib/libc.a

If you make a habit of doing this, please be sure to consult with the licenses of the software you're pilfering from.