Tuesday, 28 February 2017

How to build Grub 0.97 in Debain 8 / gcc (Debian 4.9.2-10) 4.9.2

The Grub 0.97 is a bit legacy to recent Linux distributions. For example in my Debian 8 box, the ./configure produced
configure: error: GRUB requires a working absolute objcopy; upgrade your binutils
Figure 1

Apparently the binutils is the recent so it must be the Grub 0.97 out of sync. This problem can be quickly resolved by googling around that an option should be attached to objcopy. So in the "configure" change this line
if { ac_try='${OBJCOPY-objcopy} -O binary conftest.exec conftest'
to
 if { ac_try='${OBJCOPY-objcopy} -R .note.gnu.build-id -O binary conftest.exec conftest'
After successfully running the ./configure command, however, you still need to edit every Makefiles in the Grub 0.97 directory and all its subdirectories, modify this line
OBJCOPY = @OBJCOPY@
to
OBJCOPY = @OBJCOPY@ --strip-unneeded -R .note -R .comment -R .note.gnu.build-id -R .reginfo -R .rel.dyn -R .note.gnu.gold-version
Otherwise the final images will be incredibly huge, like that in Figure 2, a 134MB stage1 instead of the normal 512 bytes:

Figure 2

It looks so far so good. The real problem started from here. Once written the Grub into a floppy or ISO file, the virtual machine displayed "GRUB Loading stage2 ..." and  die:

Figure 3

After several minor adjusting the parameters of the objcopy, I decided to have a close look at the startup codes in Grub. There was the dying message about "Loading stage2", so stage1 must be good. This string can be found in start.S and it was displayed in the beginning of the _start() so I looked at the bottom of the function:
bootit:
        /* print a newline */
        MSG(notification_done)
        popw    %dx     /* this makes sure %dl is our "boot" drive */
#ifdef STAGE1_5
        ljmp    $0, $0x2200
#else /* ! STAGE1_5 */
        ljmp    $0, $0x8200
#endif /* ! STAGE1_5 */
...
notification_step:      .string "."
notification_done:      .string "\r\n"

Well, it looks the _start() worked pretty well. It loaded the second stage and long jump to the entry. So what's the damage? I moved back to the make and link process and found this piece of logs:
 gcc -Os -fno-stack-protector -fno-builtin -nostdinc  -DSUPPORT_SERIAL=1 -DSUPPORT_HERCULES=1 -DHAVE_CONFIG_H -I. -I. -I.. -I../stage1 -Wall -Wmissing-prototypes -Wunused -Wshadow -Wpointer-arith -falign-jumps=1 -falign-loops=1 -falign-functions=1 -Wundef -g -c -o start_exec-start.o `test -f 'start.S' || echo './'`start.S
gcc  -g   -o start.exec -nostdlib -Wl,-N -Wl,-Ttext -Wl,8000 start_exec-start.o
...
gcc  -g   -o pre_stage2.exec -nostdlib -Wl,-N -Wl,-Ttext -Wl,8200 pre_stage2_exec-asm.o pre_stage2_exec-bios.o pre_stage2_exec-boot.o pre_stage2_exec-builtins.o pre_stage2_exec-char_io.o pre_stage2_exec-cmdline.o pre_stage2_exec-common.o pre_stage2_exec-console.o pre_stage2_exec-disk_io.o pre_stage2_exec-fsys_ext2fs.o pre_stage2_exec-fsys_fat.o pre_stage2_exec-fsys_ffs.o pre_stage2_exec-fsys_iso9660.o pre_stage2_exec-fsys_jfs.o pre_stage2_exec-fsys_minix.o pre_stage2_exec-fsys_reiserfs.o pre_stage2_exec-fsys_ufs2.o pre_stage2_exec-fsys_vstafs.o pre_stage2_exec-fsys_xfs.o pre_stage2_exec-gunzip.o pre_stage2_exec-hercules.o pre_stage2_exec-md5.o pre_stage2_exec-serial.o pre_stage2_exec-smp-imps.o pre_stage2_exec-stage2.o pre_stage2_exec-terminfo.o pre_stage2_exec-tparm.o
objcopy --strip-unneeded -R .note -R .comment -R .note.gnu.build-id -R .reginfo -R .rel.dyn -R .note.gnu.gold-version -O binary pre_stage2.exec pre_stage2
...
objcopy --strip-unneeded -R .note -R .comment -R .note.gnu.build-id -R .reginfo -R .rel.dyn -R .note.gnu.gold-version -O binary start.exec start
...
cat start pre_stage2 > stage2

In simple, the stage2 binary was combined by two binaries, the 'start' and 'pre_stage2'. The 'start' is a size of 0x200 bytes bootup binary and targets to 0x8000. The 'pre_stage2' is just right behind 'start'  so it starts from 0x8200, like the linker profile stated. Have a look at asm.S:
start:
_start:

ENTRY(main)
        /*
         *  Guarantee that "main" is loaded at 0x0:0x8200 in stage2 and
         *  at 0x0:0x2200 in stage1.5.
         */
        ljmp $0, $ABS(codestart)

        . = EXT_C(main) + 0x6
        .byte   COMPAT_VERSION_MAJOR, COMPAT_VERSION_MINOR

        /*
         *  This is a special data area 8 bytes from the beginning.
         */

        . = EXT_C(main) + 0x8

VARIABLE(install_partition)
        .long   0xFFFFFF
/* This variable is here only because of a historical reason.  */
VARIABLE(saved_entryno)
        .long   0
VARIABLE(stage2_id)
        .byte   STAGE2_ID
But in the problematic binary:

Figure 4


It doesn't like a jump code and there are no version and magic 0xffffff bytes either. On the contrary I found the proper binary at ox83f8:

Figure 5

The mystery was partly resolved by disassembling the 'pre_stage2.exec'. It looked like this:
00008200 <lba_to_chs.2277>:
8200:       55                      push   %ebp
8201:       89 e5                   mov    %esp,%ebp
8203:       57                      push   %edi
8204:       56                      push   %esi
8205:       53                      push   %ebx
8206:       53                      push   %ebx
...

0000826a <journal_init>:
826a:       55                      push   %ebp
826b:       ba 0c 00 00 00          mov    $0xc,%edx
8270:       89 e5                   mov    %esp,%ebp
8272:       57                      push   %edi
8273:       56                      push   %esi
8274:       53                      push   %ebx
8275:       8d 8d dc df ff ff       lea    -0x2024(%ebp),%ecx
827b:       81 ec 2c 20 00 00       sub    $0x202c,%esp
...

000083f8 <_start>:
83f8:       ea 70 82 00 00 00 03    ljmp   $0x300,$0x8270
83ff:       02 ff                   add    %bh,%bh
The asm.o was supposed to be linked into the start address. For somewhat reason, these two functions 'lba_to_chs' and 'journal_init' was inserted ahead of '_start'. I reckon different version or build of linker has different optimizing strategy. My linker was just over optimized.

However I googled as deep as I could, I have also searched the full manpage of the ld, I still could not find the proper option to turn off the relating optimization. It looks it can not be handled by simple command line options but only by complex link scripts, which is my least intention to do so. After two days frustration, I decide to make a workaround in the source codes by changing the scope of those functions. It only involves fsys_reiserfs.c and builtins.c. In fsys_reiserfs.c, change
static int
journal_init (void)
to
int
journal_init (void)
and in builtins.c, relocate the whole implementation of
void lba_to_chs (int lba, int *cl, int *ch, int *dh)
out of partnew_func(). After rebuild, the disassembly of pre_stage2.exec and reiserfs_stage1_5 are all correct now:
00008200 <_start>:
    8200:       ea 70 82 00 00 00 03    ljmp   $0x300,$0x8270
    8207:       02 ff                   add    %bh,%bh

00008208 <install_partition>:
    8208:       ff                      (bad)
    8209:       ff                      (bad)
    820a:       ff 00                   incl   (%eax)
Enjoy the Grub legacy
Figure 6
 


A patch for Grub 0.97 in Debian Jessie can be found here.


No comments:

Post a Comment