Hellorld: ppc64 assembly on Alpine Linux

Table of contents:

Preparation

Unless you happen to already have a real IBM pSeries system or some other ppc64 (PowerPC64) machine handy, you’ll need to emulate one. I made a notes file, Alpine Linux (ppc64le) in QEMU, which explains how I did this, and how to overcome some obstacles along the way.

To assemble and link programs, you need only the binutils package. Install as usual with apk:

# apk add binutils

If you want to compile the C example, or use Makefiles, it might be convenient to install the build-base package, too:

# apk add build-base

Static assembly and linking

Before this project, I don’t think I’d ever seen PPC64 assembly, let alone written any. So I set out to find a “Hello, World” example, and came across the page Assembly Primer Part 4 – Hello World – PPC, which in turn led me to the IBM page: PowerPC Assembly [Internet Archive].

As an aside, it’s interesting to see the evolution of the IBM website over twenty years through this page. Here are some points at which its style was noticeably different:

Here is the “Hello, World!” file from that site, but modified to say the much more au courant ‘Hellorld!’:

hellorld-ppc64.s
.data                       # section declaration - variables only

msg:
	.string "Hellorld!\n"
	len = . - msg       # length of our dear string

.text                       # section declaration - begin code

        .global _start
        .section        ".opd","aw"
        .align 3
_start:
        .quad   ._start,.TOC.@tocbase,0
        .previous

        .global  ._start
._start:

# write our string to stdout

	li      0,4         # syscall number (sys_write)
	li      3,1         # first argument: file descriptor (stdout)
	                    # second argument: pointer to message to write

	# load the address of 'msg':

	                    # load high word into the low word of r4:
	lis 4,msg@highest   # load msg bits 48-63 into r4 bits 16-31
	ori 4,4,msg@higher  # load msg bits 32-47 into r4 bits  0-15

	rldicr  4,4,32,31   # rotate r4's low word into r4's high word

	                    # load low word into the low word of r4:
	oris    4,4,msg@h   # load msg bits 16-31 into r4 bits 16-31
	ori     4,4,msg@l   # load msg bits  0-15 into r4 bits  0-15

	# done loading the address of 'msg'

	li      5,len       # third argument: message length
	sc                  # call kernel

# and exit

	li      0,1         # syscall number (sys_exit)
	li      3,0         # first argument: exit code
	sc                  # call kernel

(Five instructions to load a single address, yikes…)

Drop this code into a file, then assemble, link, and execute:

$ as hellorld-ppc64.s -o hellorld-ppc64.o
$ ld hellorld-ppc64.o -o hellorld-ppc64
$ ./hellorld-ppc64
Hellorld!
$ 

It works! Let’s see what kind of file it is:

$ file hellorld-ppc64
hellorld-ppc64: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, Power ELF V1 ABI, version 1 (SYSV), statically linked, not stripped
$ 

You could also disassemble it with objdump:

$ objdump -d hellorld-ppc64.o

hellorld-ppc64.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <._start>:
   0:   04 00 00 38     li      r0,4
   4:   01 00 60 38     li      r3,1
   8:   00 00 80 3c     lis     r4,0
   c:   00 00 84 60     ori     r4,r4,0
  10:   c6 07 84 78     sldi    r4,r4,32
  14:   00 00 84 64     oris    r4,r4,0
  18:   00 00 84 60     ori     r4,r4,0
  1c:   0b 00 a0 38     li      r5,11
  20:   02 00 00 44     sc
  24:   01 00 00 38     li      r0,1
  28:   00 00 60 38     li      r3,0
  2c:   02 00 00 44     sc

Dynamic executable

Something bothered me slightly about the previous example: it was a static binary. What about a dynamic one? (Ignore for the moment the fact that it’s pointless to make a dynamically-linked binary that doesn’t call or export any functions…)

Failure #1: Missing interpreter

I thought – foolishly, in retrospect – that I could simply relink the executable with -pie. If you try that:

$ ld -pie hellorld-ppc64.o -o hellorld-ppc64
$ ./hellorld-ppc64
-ash: ./hellorld-ppc64: not found
$ 

not found”? What does that mean? This was very confusing, because there really was an output file there.

This took me a lot longer to figure out than I’d care to admit ; although in my defense, neither strace nor gdb helped. But eventually I stumbled upon the reason by looking at other binaries on the system:

$ file hellorld-ppc64
hellorld-ppc64: ELF 64-bit LSB pie executable, 64-bit PowerPC or cisco 7500, Power ELF V1 ABI, version 1 (SYSV), dynamically linked, interpreter /usr/lib/ld.so.1, not stripped
$ file /bin/busybox
/bin/busybox: ELF 64-bit LSB pie executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-powerpc64le.so.1, BuildID[sha1]=cd4c08a0db926276d1770d6e7d88efcd2efaa700, stripped
$ ls -l /usr/lib/ld.so.1
ls: /usr/lib/ld.so.1: No such file or directory
$ ls -l /lib/ld-musl-powerpc64le.so.1
-rwxr-xr-x    1 root     root        789104 Nov  6 11:49 /lib/ld-musl-powerpc64le.so.1
$ 

It seems that the dynamic interpreter [linker] that ld uses by default doesn’t actually exist.

Failure #2: Segmentation fault

There is an option to override the dynamic linker:

$ ld -pie hellorld-ppc64.o -o hellorld-ppc64 --dynamic-linker=/lib/ld-musl-powerpc64le.so.1
$ ./hellorld-ppc64
Segmentation fault
$ 

Golly! That seems worse than before. Initially I thought I had made a mistake and backed out this change. But I realized that specifying --dynamic-linker is necessary, at least on Alpine Linux.

Unfortunately and once again, strace and gdb couldn’t tell me anything new about this problem. In the end, it was ldd and objdump that explained:

$ ldd hellorld-ppc64
        /lib/ld-musl-powerpc64le.so.1 (0x7fffa9580000)
Error relocating hellorld-ppc64: unsupported relocation type 41
Error relocating hellorld-ppc64: unsupported relocation type 39
Error relocating hellorld-ppc64: unsupported relocation type 5
Error relocating hellorld-ppc64: unsupported relocation type 4
$ objdump -r hellorld-ppc64.o

hellorld-ppc64.o:     file format elf64-powerpcle

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
0000000000000008 R_PPC64_ADDR16_HIGHEST  .data
000000000000000c R_PPC64_ADDR16_HIGHER  .data
0000000000000014 R_PPC64_ADDR16_HI  .data
0000000000000018 R_PPC64_ADDR16_LO  .data


RELOCATION RECORDS FOR [.opd]:
OFFSET           TYPE              VALUE
0000000000000000 R_PPC64_ADDR64    ._start
0000000000000008 R_PPC64_TOC       *ABS*


$ 

That’s when it hit me: I want to make a dynamic binary, but I’m using absolute, ie, non-relative symbols! These are not (under normal circumstances) relocatable. This is clearly never going to work. But I don’t yet know enough PPC64 asm to use relative addressing…

Detour: How does gcc generate relative addresses?

gcc can generate dynamic libraries (etc), so surely its assembly language output can be used to see how to do it. First, a suitable bit of C is needed:

hello.c
#include <unistd.h>

int
main()
{
        write(STDOUT_FILENO, "Hellorld!\n", 10);
        return 0;
}

You may well wonder why I am not using the ‘canonical’ printf(3) version. I’m using write(2) to get a bit closer to what the assembly version is doing. (If I wanted to be really accurate, I’d have to use syscall(2), but it gets slightly complicated.)

$ gcc -S -c hello.c
$ cat hello.s
        .file   "hello.c"
        .machine power8
        .abiversion 2
        .section        ".text"
        .section        .rodata
        .align 3
.LC0:
        .string "Hellorld!\n"
        .section        ".text"
        .align 2
        .globl main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
.LCF0:
0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     main,.-main
        mflr 0
        std 0,16(1)
        std 31,-8(1)
        stdu 1,-48(1)
        .cfi_def_cfa_offset 48
        .cfi_offset 65, 16
        .cfi_offset 31, -8
        mr 31,1
        .cfi_def_cfa_register 31
        li 5,10
        addis 4,2,.LC0@toc@ha
        addi 4,4,.LC0@toc@l
        li 3,1
        bl write
        nop
        li 9,0
        extsw 9,9
        mr 3,9
        addi 1,31,48
        .cfi_def_cfa 1, 0
        ld 0,16(1)
        mtlr 0
        ld 31,-8(1)
        blr
        .long 0
        .byte 0,0,0,1,128,1,0,1
        .cfi_endproc
.LFE0:
        .size   main,.-main
        .ident  "GCC: (Alpine 13.2.1_git20231014) 13.2.1 20231014"
        .section        .note.GNU-stack,"",@progbits
$ 

Note that on Alpine Linux, gcc is compiled with --enable-default-pie, so no ‘extra’ options are needed to get relative addresses. On other systems, you could use -fpie to achieve the same effect.

The .abiversion 2 directive jogged my memory about something I saw earlier: file(1) reported the system executables (eg busybox) as OpenPOWER ELF V2 ABI, but the hellorld-ppc64 executable as Power ELF V1 ABI. So it seems the example code from IBM (ca. 2002) was using a previous ABI version. Since I don’t know if it’s possible to mix-and-match object files from different ABI versions, I decided to standardize on version 2.

The other key to the puzzle was determining how registers 2 and 4 got set. The very first instructions in the main function are:

.LCF0:
0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l

So this seems to be setting register 2 to an offset of main (.LCF0) from .TOC. (Table of Contents). But what is register 12 doing here?

After some searching, I came across this note in the Power Architecture 64-Bit ELF V2 ABI Specification: OpenPOWER ABI for Linux Supplement document, p. 42:

When a function is entered through its global entry point, register r12 contains the entry-point address.

And §2.3.2 has some very helpful information on this process, including code samples.

So it seems that register 12 will be set before this code is executed, and thus the address in r2 will be adjusted according to where this code was loaded in memory.

Similarly register 4 (the address of the string) is set in a relative, rather than absolute, way:

.LC0:
        .string "Hellorld!\n"
        # [...]
        addis 4,2,.LC0@toc@ha
        addi 4,4,.LC0@toc@l

Success: Making a relative-address version

With those two points above, it’s possible to make a relative-address version. The important bits are highlighted in blue:

hellorld-ppc64-rel.s
	.machine power8
	.abiversion 2

.data                       # section declaration - variables only

msg:
	.string "Hellorld!\n"
	len = . - msg       # length of our dear string

.text                       # section declaration - begin code

        .global _start
_start:

# write our string to stdout

	li      0,4         # syscall number (sys_write)
	li      3,1         # first argument: file descriptor (stdout)
	                    # second argument: pointer to message to write

	# load the address of 'msg':

	# "When a function is entered through its global entry point,
	#  register r12 contains the entry-point address."
	# [Power Architecture 64-Bit ELF V2 ABI Specification,
	#  OpenPOWER ABI for Linux Supplement]

	addis   2,12,.TOC.-_start@ha    # r2 will contain the address
	addi    2,2,.TOC.-_start@l      # of _start (relative to .TOC.)

	addis   4,2,msg@toc@ha          # r4 will contain the address
	addi    4,4,msg@toc@l           # of msg (relative to .TOC.)

	# done loading the address of 'msg'

	li      5,len       # third argument: message length
	sc                  # call kernel

# and exit

	li      0,1         # syscall number (sys_exit)
	li      3,0         # first argument: exit code
	sc                  # call kernel

Assemble, link, execute, and inspect as before:

$ as hellorld-ppc64-rel.s -o hellorld-ppc64-rel.o
$ ld -pie hellorld-ppc64-rel.o -o hellorld-ppc64-rel --dynamic-linker=/lib/ld-musl-powerpc64le.so.1
$ ./hellorld-ppc64-rel
Hellorld!
$ file hellorld-ppc64-rel
hellorld-ppc64-rel: ELF 64-bit LSB pie executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-powerpc64le.so.1, not stripped
$ ldd hellorld-ppc64-rel
        /lib/ld-musl-powerpc64le.so.1 (0x7fff8b260000)
$ objdump -r hellorld-ppc64-rel.o

hellorld-ppc64-rel.o:     file format elf64-powerpcle

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
0000000000000008 R_PPC64_REL16_HA  .TOC.+0x0000000000000008
000000000000000c R_PPC64_REL16_LO  .TOC.+0x000000000000000c
0000000000000010 R_PPC64_TOC16_HA  .data
0000000000000014 R_PPC64_TOC16_LO  .data


$ 

Static PIE (position-independent executable)

Having now a relative version of the program, is it possible to create a static PIE? Yes, this is easily done:

$ ld -pie --no-dynamic-linker hellorld-ppc64-rel.o -o hellorld-ppc64-rel
$ ./hellorld-ppc64-rel
Hellorld!
$ file hellorld-ppc64-rel
hellorld-ppc64-rel: ELF 64-bit LSB pie executable, 64-bit PowerPC or cisco 7500, OpenPOWER ELF V2 ABI, version 1 (SYSV), static-pie linked, not stripped
$ 

Notice that the same object file, hellorld-ppc64-rel.o, can be used to produce either a dynamic executable, or a static PIE.

References


Feel free to contact me with any questions, comments, or feedback.