Scripts for bootstrapping various programming languages
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

5.5 KiB

Deprecation

Just use live-bootstrap, they got a lot more figured out.

This is just historical interest for me now.

GCC Bootstrap

This is a collection of notes and utilities related to bootstrapping the GNU Compiler Collection, as well as necessary GNU utilities, from as little as possible, with the goal of bootstrapping any GNU/Linux distribution from these, such as Linux From Scratch or Gentoo.

Background

C compilers, being as old, complex and utterly foundational as they are, don't have nearly as clear and historically documented of a bootstrap path as newer languages. How exactly they came to be has been mostly lost to history, and probably because of how many small iterative steps were taken before we reached a consensus on what the language would be.

Nowadays, you'd think you can bootstrap C from simply an assembler, and while you can, most assemblers are written in C for the sake of architecture-independency, and you also need an environment to run this assembler in, to create and manage files, which is often also written in C. To fully bootstrap, you would have to write a kernel, assembler and compiler from scratch, all of which (especially the kernel) would be tied to a specific set of hardware, which few other people would have. That's rather inconvenient.

However, there's a more convenient way to go about this, that satisfies me, mostly. That is, by simply reducing the size of the binaries required to build a "full" C compiler as much as possible. This makes it possible to run these on existing kernels and systems, while making the binaries as inspectable, simple and hand-writeable as possible, so that one day there could be a computer that bootstraps itself from hand-written machine code.

Enter GNU Mes, a very small Scheme implementation (mes) written in C, a C library (mes-libc), and a C compiler (mescc) written in said Scheme. The mes binary comes in at around 108kB, and is fully self-hosting, requiring only a linux(-compatible) kernel to run, and an x86 processor.

This compiler has already been used to successfully build a full GNU/Linux distribution, as described in a Guix recipe, but this source file is rather useless for anyone not using Guix, and the bootstrap process, being as complex as it is, doesn't provide any explanations for what is being done, does a lot of things specific to how Guix works, and generally isn't very readable to anyone not familiar with functional package managers and/or scheme.

I seek to "fix" this, by documenting the exact manual steps involved in this bootstrap, providing a regular shell script that can be used by anyone, and hopefully help make "Linux From Scratch" just that tiny bit more "From Scratch" than it already is.

Goals

For sanity, this bootstrap's only goal is to provide just enough to build a modern GCC without any intermediate compiler/tool version hops.

A note about differences from Guix

While I've tried to keep most of the bootstrap process intact compared to Guix, a lot of steps have been changed to either simplify the instructions, remove things that are unnecessary since this isn't Guix, and simplify the functionality required from the bootstrap utilities. It's hard to describe the exact changes, because there's a lot. The main thing that remains unchanged is (most of) the software versions, and the general bootstrap path.

Every step in the build_bootstrap.sh script has a comment with the name of the equivalent Guix package definition. The approximate revision these scripts are based on is this.

Future goals

While I consider this bootstrap complete as is, this isn't the end of the road for smaller bootstrap binaries. As part of the Bootstrappable project, people are working bootstrapping Mes with M2-Planet, which in turn can be built with a sub-kB hex assembler, but it's not yet clear when or how this will be finished, as it's all still work-in-progress.

Additionally, while in the process of reimplementing the Guix bootstrap process I took care to simplify some things, and the requirements upon the functionality of some tools, there's still room for improvement, and Guix itself might also rewrite or simplify their process in the future, which might be passed down to here. I don't know.

In any case, anything that can be changed to reduce the amount of steps in the bootstrap, reduce the amount of patches/hacks necessary or reduce the binary footprint is welcome.

For more tangible goals, here's a small todo:

  • Figure out if the coreutils precursors, filutils, textutils and sh-utils, can be built any earlier to further reduce reliance on busybox.
  • Revisit glibc-2.16.0, as the build breaks(?) when rebuilt with gcc-4.9.4, and the patches are mildly ugly.

Note on busybox

While busybox is used to provide the initial toolset, because of its small size as a statically-linked binary, it isn't rebuilt during the bootstrap. While it would greatly simplify the instructions by building a ton of auxiliary tools at once, I haven't been able to find a single version that builds with gcc-2.95.2/glibc-2.2.5, so the GNU tools are used during the build, instead.