A Strange Hello World Program in C
Compile the C code below with either GCC or Clang on a x86-64 Linux platform
__attribute__((section(".text"))) const char main[] = {
0x55, 0x48, 0x89, 0xe5, 0x48, 0x83, 0xec, 0x10, 0x48, 0xb8,
0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x2c, 0x20, 0x77, 0x48, 0x89,
0x45, 0xf1, 0x48, 0xb8, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x21,
0x0a, 0x00, 0x48, 0x89, 0x45, 0xf8, 0x48, 0x8d, 0x45, 0xf1,
0x48, 0x89, 0xc6, 0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00, 0x48, 0xc7, 0xc2,
0x0f, 0x00, 0x00, 0x00, 0x0f, 0x05, 0xb8, 0x00, 0x00, 0x00,
0x00, 0xc9, 0xc3
};And you will get
$ gcc main.c
/tmp/ccJRPyPq.s: Assembler messages:
/tmp/ccJRPyPq.s:4: Warning: ignoring changed section attributes for .text
$ clang main.c
main.c:1:1: warning: variable named 'main' with external linkage has undefined behavior [-Wmain]
__attribute__((section(".text"))) const char main[] = {
^
1 warning generated.
$ ./a.out
hello, world!Both compilers emit warnings but if we ignore them and run the executable anyway it prints "hello, world!". This is surprising as we're seemingly assigning random bytes to a variable called main, and the fact that it compiles at all is astonishing.
The trick is to recall that computers make no differentiation between data and code. A CPU fetches then executes 0s and 1s (called machine code) from memory and doesn't care where those 0s and 1s come from. It will happily try to run an image or a word document and interpret that as machine code, although it will likely run itself into an invalid state and segfault.
Each CPU has a different type of machine code, according to the ISA (Instruction Set Architecture) it chooses to implement. For example, Intel CPUs implement the x86/x86-64 ISA and the ARM brand of CPUs implement the eponymous ARM ISA. Machine code can be considered as a sequence of individual instructions, each of which performs a really simple task such as moving an immediate (a constant value) into a register (which are variables the CPU have) or adding two numbers. For each instruction an opcode identifies the type of instruction and parameters specify what the instruction should be done on; this is all encoded in a few bytes.
Machine code is just 0s and 1s so it's time consuming and difficult for programmers to directly write in, so programmers have invented assembly language, which is essentially the same as machine code except each instruction is identified by a mnemonic instead of an opcode—for example mov for move, add for addition, and lea for load effective address. Below is x86-64 machine code followed by assembly
b8 0a 00 00 00 = mov eax, 10 bb 0d 00 00 00 = mov ebx, 13 01 d8 = add eax, ebx
So the sequence of bytes 0x01, 0xd8 tell the CPU to add whatever is in the ebx register to the eax register. If we compile the following C code that prints "hello, world!" in a very straightforward manner
#include <stdio.h>
int main(void) {
printf("hello, world!\n");
}Then disassemble it
...
0000000000001140 <main>:
1140: 55 push %rbp
1141: 48 89 e5 mov %rsp,%rbp
1144: 48 8d 3d b9 0e 00 00 lea 0xeb9(%rip),%rdi # 2004 <_IO_stdin_used+0x4>
114b: b0 00 mov $0x0,%al
114d: e8 de fe ff ff call 1030 <printf@plt>
1152: 31 c0 xor %eax,%eax
1154: 5d pop %rbp
1155: c3 ret
...We can see the assembly that the compiler generates from our C source code. In the middle column of the listing we see the actual machine code, the bits and bytes, stored in the executable. Instead of the compiler generating the assembly and then the assembler assembling the assembly, why not skip the middlemen and directly assign the machine code to our main function/array? If we try to compile that
const char main[] = {
0x55, 0x48, 0x89, 0xe5, 0x48, 0x8d, 0x3d, 0xb9, 0x0e, 0x00,
0x00, 0xb0, 0x00, 0xe8, 0xde, 0xfe, 0xff, 0xff, 0x31, 0xc0,
0x5d, 0xc3,
};Then running the resulting executable
$ ./a.out Segmentation fault (core dumped)
We get a segfault. This is because of a couple problems:
- If we
objdump -d a.out, we don't even see the main function. Tryobjdump -D a.outand we see that our main function is hidden away in the.rodatasection, which is marked non-executable. - Take note of the disassembly of our normal hello world function, see how
call 1030 <printf@plt>is calling an external symbolprintfthat is resolved at runtime by the dynamic linker. The linker can't do its job properly if GCC doesn't even know we're using the symbolprintf.
The solution to the first problem is to explicitly force the compiler to put the main variable into the .text section. This is achieved with __attribute__((section(".text"))).
The solution to the second problem is to use Linux syscalls. Instead of using a constant string we'll be putting the string onto the stack instead, so it's nearby and easily addressed. Compile the following code with -fno-stack-protector which just simplifies the assembly by getting rid of the extraneous stack protector code
#include <stdio.h>
int main(void) {
char s[] = "hello, world!\n";
printf(s);
}Disassembling the resulting executable yields
...
0000000000001139 <main>:
1139: 55 push %rbp
113a: 48 89 e5 mov %rsp,%rbp
113d: 48 83 ec 10 sub $0x10,%rsp
1141: 48 b8 68 65 6c 6c 6f movabs $0x77202c6f6c6c6568,%rax
1148: 2c 20 77
114b: 48 89 45 f1 mov %rax,-0xf(%rbp)
114f: 48 b8 77 6f 72 6c 64 movabs $0xa21646c726f77,%rax
1156: 21 0a 00
1159: 48 89 45 f8 mov %rax,-0x8(%rbp)
115d: 48 8d 45 f1 lea -0xf(%rbp),%rax
1161: 48 89 c7 mov %rax,%rdi
1164: b8 00 00 00 00 mov $0x0,%eax
1169: e8 c2 fe ff ff call 1030 <printf@plt>
116e: b8 00 00 00 00 mov $0x0,%eax
1173: c9 leave
1174: c3 ret
...We can copy the assembly into a temporary source file and switch out the call to printf with our syscall
.text
.globl main
main:
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
movabs $0x77202c6f6c6c6568,%rax
mov %rax,-0xf(%rbp)
movabs $0xa21646c726f77,%rax
mov %rax,-0x8(%rbp)
lea -0xf(%rbp),%rax
#mov %rax,%rdi
mov %rax,%rsi # move address of string into rsi instead of rdi
#mov $0x0,%eax
mov $0x1,%rax # syscall number (0x1 is write)
mov $0x1,%rdi # file descriptor (0x1 is stdout)
mov $0xf,%rdx # length of string (14 + 1 for null terminator)
syscall
#call 1030 <printf@plt>
mov $0x0,%eax
leave
retThen we assemble and link the assembly and disassemble again
$ gcc main.s
$ objdump -d a.out
...
0000000000001119 <main>:
1119: 55 push %rbp
111a: 48 89 e5 mov %rsp,%rbp
111d: 48 83 ec 10 sub $0x10,%rsp
1121: 48 b8 68 65 6c 6c 6f movabs $0x77202c6f6c6c6568,%rax
1128: 2c 20 77
112b: 48 89 45 f1 mov %rax,-0xf(%rbp)
112f: 48 b8 77 6f 72 6c 64 movabs $0xa21646c726f77,%rax
1136: 21 0a 00
1139: 48 89 45 f8 mov %rax,-0x8(%rbp)
113d: 48 8d 45 f1 lea -0xf(%rbp),%rax
1141: 48 89 c6 mov %rax,%rsi
1144: b8 00 00 00 00 mov $0x0,%eax
1149: 48 c7 c0 01 00 00 00 mov $0x1,%rax
1150: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
1157: 48 c7 c2 0f 00 00 00 mov $0xf,%rdx
115e: 0f 05 syscall
1160: b8 00 00 00 00 mov $0x0,%eax
1165: c9 leave
1166: c3 ret
...Now we can finally copy that machine code and combine our technique of explicitly telling the compiler to put the main variable into the .text section
__attribute__((section(".text"))) const char main[] = {
0x55, 0x48, 0x89, 0xe5, 0x48, 0x83, 0xec, 0x10, 0x48, 0xb8,
0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x2c, 0x20, 0x77, 0x48, 0x89,
0x45, 0xf1, 0x48, 0xb8, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x21,
0x0a, 0x00, 0x48, 0x89, 0x45, 0xf8, 0x48, 0x8d, 0x45, 0xf1,
0x48, 0x89, 0xc6, 0x48, 0xc7, 0xc0, 0x01, 0x00, 0x00, 0x00,
0x48, 0xc7, 0xc7, 0x01, 0x00, 0x00, 0x00, 0x48, 0xc7, 0xc2,
0x0f, 0x00, 0x00, 0x00, 0x0f, 0x05, 0xb8, 0x00, 0x00, 0x00,
0x00, 0xc9, 0xc3
};Which is what was shown at the beginning. Note the machine code is specific to x86-64 processors and I'm doing a Linux syscall, which would be done differently on other OSes. The code is a fun party trick but for obvious reasons is not portable and horrible in practice.