From 2485b5c6f3209e58b6a2d1aa6477dee4ff19a649 Mon Sep 17 00:00:00 2001 From: mid-kid Date: Tue, 18 Oct 2022 21:14:28 +0200 Subject: [PATCH] Archive gcc bootstrap This commit was never properly finalized, might not even work --- gcc/README.md | 61 +++++++++++++++++++++++++++++++ gcc/build.sh | 12 +++--- gcc/build_binaries.sh | 4 +- gcc/build_bootstrap.sh | 27 +++++++++++++- gcc/build_cross.sh | 9 +++-- gcc/notes/gentoo/gentoo_notes.txt | 2 - 6 files changed, 102 insertions(+), 13 deletions(-) create mode 100644 gcc/README.md diff --git a/gcc/README.md b/gcc/README.md new file mode 100644 index 0000000..8fbc0be --- /dev/null +++ b/gcc/README.md @@ -0,0 +1,61 @@ +Deprecation +=========== + +Just use [live-bootstrap](https://github.com/fosslinux/live-bootstrap), they got a lot more figured out. + +This is just historical interest for me now. + + +GCC Bootstrap +============= + +This is a collection of notes and utilities related to bootstrapping the GNU Compiler Collection, as well as necessary GNU utilities, from as little as possible, with the goal of bootstrapping any GNU/Linux distribution from these, such as Linux From Scratch or Gentoo. + +Background +---------- + +C compilers, being as old, complex and utterly foundational as they are, don't have nearly as clear and historically documented of a bootstrap path as newer languages. How exactly they came to be has been mostly lost to history, and probably because of how many small iterative steps were taken before we reached a consensus on what the language would be. + +Nowadays, you'd think you can bootstrap C from simply an assembler, and while you _can_, most assemblers are written in C for the sake of architecture-independency, and you also need an environment to run this assembler in, to create and manage files, which is often also written in C. To fully bootstrap, you would have to write a kernel, assembler and compiler from scratch, all of which (especially the kernel) would be tied to a specific set of hardware, which few other people would have. That's rather inconvenient. + +However, there's a more convenient way to go about this, that satisfies me, mostly. That is, by simply reducing the size of the binaries required to build a "full" C compiler as much as possible. This makes it possible to run these on existing kernels and systems, while making the binaries as inspectable, simple and hand-writeable as possible, so that one day there _could_ be a computer that bootstraps itself from hand-written machine code. + +Enter [GNU Mes](https://www.gnu.org/software/mes/), a very small Scheme implementation (mes) written in C, a C library (mes-libc), and a C compiler (mescc) written in said Scheme. The mes binary comes in at around 108kB, and is fully self-hosting, requiring only a linux(-compatible) kernel to run, and an x86 processor. + +This compiler has already been used to successfully build a full GNU/Linux distribution, as described in [a Guix recipe](https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/commencement.scm), but this source file is rather useless for anyone not using Guix, and the bootstrap process, being as complex as it is, doesn't provide any explanations for what is being done, does a lot of things specific to how Guix works, and generally isn't very readable to anyone not familiar with functional package managers and/or scheme. + +I seek to "fix" this, by documenting the exact manual steps involved in this bootstrap, providing a regular shell script that can be used by _anyone_, and hopefully help make "Linux From Scratch" just that tiny bit more "From Scratch" than it already is. + + +Goals +----- + +For sanity, this bootstrap's only goal is to provide just enough to build a modern GCC without any intermediate compiler/tool version hops. + + +A note about differences from Guix +---------------------------------- + +While I've tried to keep most of the bootstrap process intact compared to Guix, a lot of steps have been changed to either simplify the instructions, remove things that are unnecessary since this isn't Guix, and simplify the functionality required from the bootstrap utilities. It's hard to describe the exact changes, because there's a lot. The main thing that remains unchanged is (most of) the software versions, and the general bootstrap path. + +Every step in the `build_bootstrap.sh` script has a comment with the name of the equivalent Guix package definition. The approximate revision these scripts are based on is [this](https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/commencement.scm?id=b85863f7ce99d05205e57358b36ff50656cca08b). + + +Future goals +------------ + +While I consider this bootstrap complete as is, this isn't the end of the road for smaller bootstrap binaries. As part of the [Bootstrappable](http://bootstrappable.org/) project, people are working [bootstrapping Mes with M2-Planet](https://github.com/oriansj/mes-m2), which in turn can be built with [a sub-kB hex assembler](https://github.com/oriansj/mescc-tools-seed), but it's not yet clear when or how this will be finished, as it's all still work-in-progress. + +Additionally, while in the process of reimplementing the Guix bootstrap process I took care to simplify some things, and the requirements upon the functionality of some tools, there's still room for improvement, and Guix itself might also rewrite or simplify their process in the future, which might be passed down to here. I don't know. + +In any case, anything that can be changed to reduce the amount of steps in the bootstrap, reduce the amount of patches/hacks necessary or reduce the binary footprint is welcome. + +For more tangible goals, here's a small todo: +- Figure out if the coreutils precursors, filutils, textutils and sh-utils, can be built any earlier to further reduce reliance on busybox. +- Revisit glibc-2.16.0, as the build breaks(?) when rebuilt with gcc-4.9.4, and the patches are mildly ugly. + + +Note on busybox +--------------- + +While busybox is used to provide the initial toolset, because of its small size as a statically-linked binary, it isn't rebuilt during the bootstrap. While it would greatly simplify the instructions by building a ton of auxiliary tools at once, I haven't been able to find a single version that builds with gcc-2.95.2/glibc-2.2.5, so the GNU tools are used during the build, instead. diff --git a/gcc/build.sh b/gcc/build.sh index e6028c8..76711c6 100755 --- a/gcc/build.sh +++ b/gcc/build.sh @@ -5,12 +5,14 @@ set -e #qemu=qemu-i386 qemu= +NPROC="${NPROC:-$(nproc)}" + rm -rf system mkdir -p builds # Stage 1: Build matching mes and busybox binaries -./build_binaries.sh -mv system/binaries.tar.gz builds/ # Will be regenerated later, to match. +NPROC="$NPROC" ./build_binaries.sh +mv system/binaries.tar.gz builds/ # Stage 2: Bootstrap system from these mkdir -p system/sources/ @@ -19,13 +21,13 @@ cp build_bootstrap.sh system/sources/ mkdir system/dev system/tmp mknod system/dev/null c 1 3 mknod system/dev/tty c 5 0 -$qemu system/bin/busybox chroot system /bin/busybox env -i NPROC="$(nproc)" /bin/busybox sh /sources/build_bootstrap.sh +$qemu system/bin/busybox chroot system /bin/busybox env -i NPROC="$NPROC" /bin/busybox sh /sources/build_bootstrap.sh mv system/bootstrap.tar.gz builds/ # Stage 2.5 (optional): Rebuild bootstrap binaries to have a "clean" archive if [ "$1" = double_bootstrap ]; then cp build_binaries.sh binaries.sha1 busybox-config system/ -$qemu system/bootstrap/bin/chroot system /bootstrap/bin/env -i NPROC="$(nproc)" PATH=/bootstrap/bin /bootstrap/bin/sh /build_binaries.sh +$qemu system/bootstrap/bin/chroot system /bootstrap/bin/env -i NPROC="$NPROC" PATH=/bootstrap/bin /bootstrap/bin/sh /build_binaries.sh mv system/system/binaries.tar.gz builds/ rm system/build_binaries.sh system/binaries.sha1 system/busybox-config rm -rf system/system/ @@ -33,5 +35,5 @@ fi # Stage 3: Cross-compile system for x86_64 cp build_cross.sh system/sources -$qemu system/bootstrap/bin/chroot system /bootstrap/bin/env -i NPROC="$(nproc)" /bootstrap/bin/sh /sources/build_cross.sh +$qemu system/bootstrap/bin/chroot system /bootstrap/bin/env -i NPROC="$NPROC" /bootstrap/bin/sh /sources/build_cross.sh mv system/system/cross.tar.gz builds/ diff --git a/gcc/build_binaries.sh b/gcc/build_binaries.sh index 5c09950..d33e842 100755 --- a/gcc/build_binaries.sh +++ b/gcc/build_binaries.sh @@ -37,7 +37,7 @@ tar xf "$dir_sources/mes-$version_mes.tar.gz" # First, we build a native mes, "mes-gcc". # This allows us to cross-build everything on systems that mes doesn't support. - gcc -O2 -std=gnu99 -w -lrt -o mes-gcc -DMES_VERSION='""' -DSYSTEM_LIBC=1 -Iinclude \ + gcc -w -O2 -std=gnu99 -fcommon -lrt -o mes-gcc -DMES_VERSION='""' -DSYSTEM_LIBC=1 -Iinclude \ lib/mes/eputs.c \ lib/mes/fdgetc.c \ lib/mes/fdputc.c \ @@ -117,7 +117,7 @@ tar xf "$dir_sources/busybox-$version_busybox.tar.bz2" # GCC-4.6.4's results aren't reproducible across machines for some reason, # so this bootstrap uses GCC-4.9.4. This increases the build time a little. ( cd binutils - CFLAGS='-O2 -w' ./configure \ + CFLAGS='-O2 -w -fcommon' ./configure \ --target=i686-bootstrap-linux-gnu \ --prefix="$prefix" \ --with-sysroot="$prefix" \ diff --git a/gcc/build_bootstrap.sh b/gcc/build_bootstrap.sh index 202fbb5..0237cd7 100755 --- a/gcc/build_bootstrap.sh +++ b/gcc/build_bootstrap.sh @@ -411,6 +411,9 @@ bzcat binutils-2.14.tar.bz2 | tar x bfd/configure > /tmp/sed; mv /tmp/sed bfd/configure chmod +x configure bfd/configure + # Force deterministic AR output + patch -p1 -i ../binutils-2.14-force-deterministic.patch + CC='tcc -D__GLIBC_MINOR__=6' AR='tcc -ar' ./configure \ --host=i686-pc-linux-gnu \ --prefix=/gcc2 \ @@ -508,6 +511,9 @@ bzcat binutils-2.14.tar.bz2 | tar x bfd/configure > /tmp/sed; mv /tmp/sed bfd/configure chmod +x configure bfd/configure + # Force deterministic AR output + patch -p1 -i ../binutils-2.14-force-deterministic.patch + ./configure \ --host=i686-pc-linux-gnu \ --prefix=/gcc2 \ @@ -710,6 +716,9 @@ rm -rf /bootstrap rm -rf binutils-2.20.1 tar jxf binutils-2.20.1a.tar.bz2 ( cd binutils-2.20.1 + # Force deterministic AR output + patch -p1 -i ../binutils-2.20.1-force-deterministic.patch + ./configure \ --build=i686-pc-linux-gnu \ --prefix=/bootstrap \ @@ -732,10 +741,16 @@ tar jxf glibc-2.16.0.tar.bz2 ( cd glibc-2.16.0 patch -p1 -i ../glibc-boot-2.16.0.patch + # Fix hardcode of /bin/pwd + sed -i -e 's@/bin/pwd@pwd@g' configure + + # Make build deterministic + sed -i -e 's/__DATE__//g' -e 's/__TIME__//g' nscd/nscd_stat.c + # This can't be rebuilt with the final gcc and glibc, for some reason # Possibly the configure flags aren't suitable? mkdir build && cd build - CC='/gcc46/bin/gcc -DBOOTSTRAP_GLIBC=1 -L/gcc2/lib' ../configure \ + CC='/gcc46/bin/gcc -L/gcc2/lib -DBOOTSTRAP_GLIBC=1' ../configure \ --build=i686-pc-linux-gnu \ --prefix=/bootstrap \ --with-headers=/bootstrap/include \ @@ -769,6 +784,13 @@ tar jxf gcc-4.9.4.tar.bz2 tar zxf ../mpc-1.0.3.tar.gz mv mpc-1.0.3 mpc + # Build deterministic archives + #sed -i -e 's/$AR cru/$AR crD/' mpc/configure + #sed -i -e '/^ARFLAGS =/s/cru/crD/' \ + # zlib/Makefile.in \ + # libcpp/Makefile.in \ + # libdecnumber/Makefile.in + # The previous setup to set the library/include path with --with-sysroot # doesn't work when you throw dynamic linking into the mix and you're not # purely cross-compiling (we want to run resulting binaries as-is). @@ -829,6 +851,9 @@ export PATH=/bootstrap/bin:/bin rm -rf findutils-4.6.0 tar zxf findutils-4.6.0.tar.gz ( cd findutils-4.6.0 + # Build deterministic archives + #sed -i -e 's/$AR cru/$AR crD/' configure + ./configure \ --build=i686-pc-linux-gnu \ --disable-nls diff --git a/gcc/build_cross.sh b/gcc/build_cross.sh index 2168701..51d7f25 100755 --- a/gcc/build_cross.sh +++ b/gcc/build_cross.sh @@ -10,7 +10,7 @@ set -e # To homogenize build instructions across both multilib and non-multilib # installs, and because some applications require heavy patches to install # in alternate libdirs (cough cough python), the lib directory will contain -# native libraries, while lib32 whill contain 32-bit libraries. +# native libraries, while lib32 will contain 32-bit libraries. # GCC will be patched slightly, and configured to achieve this, as by default # it uses lib64 and lib. @@ -47,6 +47,9 @@ rm -rf linux-4.14 rm -rf binutils-2.20.1 tar jxf binutils-2.20.1a.tar.bz2 ( cd binutils-2.20.1 + # Force deterministic AR output + patch -p1 -i ../binutils-2.20.1-force-deterministic.patch + ./configure \ --build=i686-pc-linux-gnu \ --target=x86_64-pc-linux-gnu \ @@ -144,8 +147,8 @@ tar jxf glibc-2.16.0.tar.bz2 # Do the whole gcc/glibc song and dance... # DESTDIR set to a different dir since glibc makefile breaks otherwise... -make -C glibc-2.16.0/build DESTDIR=/system csu/subdir_install -make -C glibc-2.16.0/build32 DESTDIR=/system csu/subdir_install +make -C glibc-2.16.0/build DESTDIR=/system csu/subdir_install # TODO: Try install-lib? +make -C glibc-2.16.0/build32 DESTDIR=/system csu/subdir_install # TODO: Try install-lib? make -C glibc-2.16.0/build DESTDIR=/system install-bootstrap-headers=yes install-headers touch /system/bootstrap/include/gnu/stubs.h x86_64-pc-linux-gnu-gcc -nostdlib -nostartfiles -shared -x c /dev/null -o /system/bootstrap/lib/libc.so diff --git a/gcc/notes/gentoo/gentoo_notes.txt b/gcc/notes/gentoo/gentoo_notes.txt index 63cd51a..4f5ba62 100644 --- a/gcc/notes/gentoo/gentoo_notes.txt +++ b/gcc/notes/gentoo/gentoo_notes.txt @@ -165,8 +165,6 @@ emerge -be @system To install everything into a clean root: USE=build emerge --root /final sys-apps/baselayout emerge --root /final -K -j$(nproc) @system -#emerge --root /final --sysroot /final -K --with-bdeps=y --root-deps -j$(nproc) @system -TODO: How to forcefully install _everything_ Now you're essentially done. You can move /final (as well as /var/db/repos, /var/cache/distfiles and /var/cache/binpkgs) to a proper disk and start using