What is ELF (Binary Exploitation ) and How to Dissecting and Exploiting ELF Files - 3
This article will focus on explaining the ELF file format.
While this may seem like a really boring and very theory-heavy research topic I actually had a lot of fun during my time digging through the available literature and trying things out on my own.
This topic offers a huge amount of information, and I am by no means an expert since the following chapters are all knowledge obtained through self-studies.
I’ll try to keep it short and precise and convey my message in a fun way with practical examples.
Required skills
- basic understanding of C, assembly, Unix systems
General Information
So first of all why would you and I want to bother learning a specified file format that was adopted as a system default in UNIX systems almost 20 years ago?
You might think what the hell am I talking about and why would I include this quote here? You hopefully will realize what I realized when digging into this topic more and more and hence my reasoning here.
“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.” - Sun Tzu
So if you want to dive into reverse engineering/binary exploitation in UNIX flavored systems you will have to study the internals of such a system.
And one essential part of these are ELF files, since they are used for executables, shared libraries, object files, core-dump files, and even the kernel boot image!
So is this article only for people interested in reverse engineering?
No! If you’re a curious mind or want to learn more about UNIX flavored systems in general you’re at the right place.
With that being said let’s directly jump into the beefy part of this article.
The ELF file format dissected
Note: I will not and cannot present every detail of the ELF file format in this article.
The topic is a true rabbit hole and I really suggest doing your own research and reading.
I will add several sources at the end of this that helped me greatly! I recommend continuing from there if you got interested
Let’s start with a general layout of how a typical ELF file is structured:
Linking View Execution View
+-----------------+ +-----------------+
| ELF header | | ELF header |
+-----------------+ +-----------------+
| Program header | | Program header |
| table (opt.) | | table |
+-----------------+ +-----------------+
| Section 1 | | |
+-----------------+ | Segment 1 |
| ... | | |
+-----------------+ +-----------------+
| Section n | | |
+-----------------+ | Segment 2 |
| ... | | |
+-----------------+ +-----------------+
| ... | | ... |
+-----------------+ +-----------------+
| Section header | | Section header |
| table | | table (opt.) |
+-----------------+ +-----------------+
- As you can see an ELF file has at least 2 headers that are always present.
* ELF header (ELF32_Ehdr/ELF64_Ehdr) and the program header (Elf32_Phdr/struct Elf64_Phdr struct), or
* ELF header (ELF32_Ehdr/ELF64_Ehdr) and the section header (Elf32_Shdr/struct Elf64_Shdr struct)
-> Elf32 and Elf64 are each representing the architecture either being x86 or x64.
- The linking view is divided into sections and is used when the linking of a library and program takes place. The sections contain information about object files like data, instructions, debugging information, symbols, or relocation information.
- The execution view is divided into segments and is as the name suggests used during program execution. The segments go hand in hand with the program header table as shown later.
Let’s take a closer look at the core elements. I’ll be focusing on x64 from now on, but for x86 this can be done analogously. Mostly the used and allocated space for structures differs between these two architectures.
The header structure is not difficult to grasp. It always holds all the information as some kind of road map for the binary within the very first bytes of the file.
ELF Header
First of all, we do have the ELF header that is 64 bytes big on 64-bit machines and defined as follows:[...]
/* 64-bit ELF base types. */
typedef __u64 Elf64_Addr; /* 8 byte (unsigned) */
typedef __u16 Elf64_Half; /* 2 byte (unsigned) */
typedef __s16 Elf64_SHalf; /* 8 byte (signed) */
typedef __u64 Elf64_Off; /* 8 byte (unsigned) */
typedef __s32 Elf64_Sword; /* 4 byte (signed) */
typedef __u32 Elf64_Word; /* 4 byte (signed) */
typedef __u64 Elf64_Xword; /* 8 byte (unsigned) */
typedef __s64 Elf64_Sxword; /* 8 byte (signed) */
[...]
#define EI_NIDENT 16
typedef struct elf64_hdr {
unsigned char e_ident[EI_NIDENT]; /* ELF "magic number" */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
[...]
The header structure is not difficult to grasp. It always holds all the information as some kind of road map for the binary within the very first bytes of the file.
Just from this information, we can already conclude a lot of information about the binary.
Let’s quickly go through the non-address/size ones:
- e_ident: initial magic bytes that provide an answer for the OS on how to interpret and decode the contents of the file.
- e_type: identifies the file type. e.g: an executable or shared object file.
- e_machine: specifies the required architecture for a file. e.g: x86-64, ARM, MIPS.
- e_version: usually set to 1
The next values all specify certain offset, size, address values for the section header and program header values which we will discuss next. I’ll come back to these ELF header fields in due time.
Program Headers
- A program header describes segments within a binary and is necessary for program loading.
- These segments (1 or more) are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.
- Since it helps in creating a process image a program header table becomes mandatory for executable files but is optional for relocatable and shared object files (Linking View vs Execution View).
- Program headers do not exit in relocatable objects because these *.o files are meant to be linked into an executable but not meant to be loaded directly into memory.
The program header structure is specified as follows:
[...]
typedef struct elf64_phdr {
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment, file & memory */
} Elf64_Phdr;
[...]
- p_type: identifies the type of the segment. e.g.: loadable segment, dynamic linking tables.
- p_flags: specifies the attributes/permissions of the current segment. e.g.: 0x3 for R+W permissions
- p_offset: contains the offset of the segment from the beginning of the file
- p_vaddr: contains the virtual address of the segment in memory
- p_paddr: reserved for systems with physical addressing
- p_filesz: contains the size of the file image of the segment
- p_memsz: contains the size of the memory image of the segment
- p_align: some alignment bytes with the power of 2
Since a program can have multiple program segments there are n program headers within a binary.
The ELF header gives us all the information about where those are and how many of them exist:e_phoff: Points to the start of the program header table.
- e_phentsize: Contains the size of one program header table entry.
- e_phnum: Contains the number of entries in the program header table.
For example, in my local /bin/ls such a PT_INTERP segment can be found at the offset 0x238 and has the size of 0x1c
$ hd -n 28 -s 568 /bin/ls
00000238 2f 6c 69 62 36 34 2f 6c 64 2d 6c 69 6e 75 78 2d |/lib64/ld-linux-|
00000248 78 38 36 2d 36 34 2e 73 6f 2e 32 00 |x86-64.so.2.|
00000254
Section Headers
As shown earlier a program header contains segments that are necessary for program execution.
Each of these segments holds either code or data that is divided into sections. So in short a section header table references the location and size of these sections and is mainly used for linking/debugging purposes.
Each of these segments holds either code or data that is divided into sections. So in short a section header table references the location and size of these sections and is mainly used for linking/debugging purposes.
Section headers are not needed for correct program execution whereas program headers. That’s the case because section headers don’t map any memory layout for the binary. So you can happily strip away a section header from a binary and it still will execute just fine, but debugging/reversing will be more difficult.
When taking a closer look at sections it appears each can hold code or data, for example, program data, such as global variables, or dynamic linking information that is necessary for the linker. This makes clear why not having them in a binary makes the debugging process more difficult.
A section header is defined as follows:
When taking a closer look at sections it appears each can hold code or data, for example, program data, such as global variables, or dynamic linking information that is necessary for the linker. This makes clear why not having them in a binary makes the debugging process more difficult.
A section header is defined as follows:
[...]
typedef struct elf64_shdr {
Elf64_Word sh_name; /* Section name, index in string table */
Elf64_Word sh_type; /* Type of section */
Elf64_Xword sh_flags; /* Miscellaneous section attributes */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Size of section in bytes */
Elf64_Word sh_link; /* Index of another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
[...]
The structure is very similar to the program header one, so I will only point out these fields that are different.
- sh_name: An offset to a string in the .shstrtab section that represents the name of this section
- sh_link: Points to another section
- sh_info: Contains extra information about the section. Interpretation depends on section type
- sh_entsize: Contains the size of each entry, for sections that contain fixed-size entries. Otherwise, this field contains zero.
.text
- This section is a code section that contains the program's code instructions.
.rodata
- This one contains read-only data for example strings from code like this printf("Wassup CyberDevil?\n");.
- Moreover, since it contains read-only data it must reside in a read-only segment of the binary.
.plt
- The procedure linkage table (PLT) contains information for the dynamic linker to be able to call functions from used shared libraries.
.data
- This section resides in the data segment and contains data such as initialized global variables.
.bss
- This section is similar to .data just in the fact that it contains uninitialized global data.
.got.plt
- The global offset table (GOT) works together with the PLT to dynamically resolve and access imported shared library functions.
- This one is often attacked in GOT-overwrite exploits.
- This section contains information about dynamic symbols imported from shared libraries e.g. printf from libc.
- These are dynamically loaded at runtime.
- .dynstr contains the string table for dynamic symbols that are each null-terminated.
- Any relocation section has information about certain parts of a binary that need to be adjusted/modified by the linker or at runtime.
- This one contains a hash table with the purpose of being able to loop up symbols.
- The .symtab section contains all symbols from .dynsym as well as local symbols for the executable such as global vars, or local functions with type ElfN_Sym 15.
- .symtab is not loaded into memory because it is not necessary for runtime. It’s mainly for debugging and linking purposes.
- .strtab contains the symbol string table that is referenced by an entry within ElfN_Sym 15 structs.
- .shstrtab contains the section header string table that is used to resolve names for each section.
- More precise in here are the string values for the sh_name field from the section header struct.
- They can be accessed via an index/offset added on the sh_offset of this section.
- The .ctors (constructors) and .dtors (destructors) sections contain function pointers to the initialization and finalization code that is to be executed before and after the actual main() body of the program code.
Interim Conclusion 1
This was a lot of information already, but I hope you’re still following along. If I were to summarize each of these 3 constructs I’d say the following:- An ELF header resides at the beginning and holds a road map describing the organization of the files.
- A program header within the program header table is necessary for an executable file to map the binary correctly into memory.
- A section within the section header table holds object file information for the linking view like instructions, data, symbol table, relocation information, and more, but is optional for executable files.
Practical Example
So what can we do with all this information about how an ELF file is structured and which bytes are stored at which place?The first project that came to my mind takes apart any x86/x64 ELF binary and parses the magic out of it. Then I remembered we already got exactly this in form of readelf on basically any UNIX flavored system. It does exactly this. It looks at all the necessary byte positions of a binary and gives a more human-readable output corresponding to the present byte positions. That did not stop me from trying to do the exact same thing and I began firing up my editor and hacked away some super beautiful python code step after step.
Let’s go through the theory from the previous section with the aid of an example to clear things up.
To make things easy we will take a look at /bin/ls as our test binary.
In my case it’s looking like this:
$ file /bin/ls
/bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9567f9a28e66f4d7ec4baf31cfbf68d0410f0ae6, stripped
As we’re dealing with a 64-bit binary we need to keep in mind the appropriate data types and with that the overall size of each structure. So the ELF header, the initial data structure to be found in an ELF binary is 64 bytes long.
If we take a look at the corresponding hex dump of the first 64 bytes we can see the following:
$ hd -n 64 /bin/ls
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 03 00 3e 00 01 00 00 00 50 58 00 00 00 00 00 00 |..>.....PX......|
00000020 40 00 00 00 00 00 00 00 a0 03 02 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 09 00 40 00 1c 00 1b 00 |....@.8...@.....|
If you recall the Elf64_Ehdr struct from earlier you can easily translate the bytes to the appropriate fields.In this header, it roughly will look like this
e_ident[EI_IDENT]:
e_indent[MAG1,..,MAG3] byte 0-3 -> 7f 45 4c 46
e_indent[EI_CLASS] byte 4 -> 02
e_indent[EI_DATA] byte 5 -> 01
e_indent[EI_VERSION] byte 6 -> 01
e_indent[EI_OSABI] byte 7 -> 00
e_indent[EI_ABIVERSION] byte 8 -> 00
e_indent[EI_PAD] byte 9-15 -> 00 00 00 00 00 00 00 00
e_type: byte 16-17 -> 03 00
e_machine: byte 18-19 -> 3e 00
e_version: byte 20-23 -> 01 00 00 00
e_entry: byte 24-31 -> 50 58 00 00 00 00 00 00
e_phoff: byte byte 32-39 -> 40 00 00 00 00 00 00 00
e_shoff: byte 40-47 -> a0 03 02 00 00 00 00 00
e_flags: byte 48-51 -> 00 00 00 00
e_ehsize: byte 52-53 -> 40 00
e_phentsize: byte 54-55 -> 38 00
e_phnum: byte: byte 56-57 -> 09 00
e_shentsize: byte 58-59 -> 40 00
e_shnum: byte 60-61 -> 1c 00
e_shstrndx: byte 63-63 -> 1b 00
This obviously is not quite human-readable.
Luckily certain bytes have a specific meaning and hence can be replaced by a fitting ASCII string representation:
$ python3 parser.py -e /bin/ls
ELF HEADER
------------------ -------------------------------------
e_ident_EI_MAG 7f 45 4c 46 (valid ELF magic)
e_ident_EI_CLASS 64-bit
e_ident_EI_DATA little-endian
e_ident_EI_VERSION 1 (current version)
e_ident_EI_OSABI System V
e_ident_EI_PAD 0x0
e_type ET_DYN (Shared object file)
e_machine x86-64
e_version 0x1
e_entry 0x5850
e_phoff 0x40 (64 bytes into this file)
e_shoff 0x203a0 (132000 bytes into this file)
e_flags 0x0
e_ehsize 0x40 (64 bytes)
e_phentsize 0x38 (56 bytes)
e_phnum 0x9 (9)
e_shentsize 0x40 (64 bytes)
e_shnum 0x1c (28)
e_shstridx 0x1b (27)
-------------------- -------------------------------------
The same approach can be taken for the program headers and section headers within the binary.
The only difference is that we usually have multiple of each.
Let’s walk through an example for both file sections.
Starting with one program header in /bin/ls
In the ELF header, we found out that the program header has a size of 56 bytes and the first one starts 64 bytes into the file.
The only difference is that we usually have multiple of each.
Let’s walk through an example for both file sections.
Starting with one program header in /bin/ls
In the ELF header, we found out that the program header has a size of 56 bytes and the first one starts 64 bytes into the file.
$ hd -n 56 -s 64 /bin/ls
00000040 06 00 00 00 05 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000050 40 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |@.......@.......|
00000060 f8 01 00 00 00 00 00 00 f8 01 00 00 00 00 00 00 |................|
00000070 08 00 00 00 00 00 00 00 |........|
00000078
So let’s identify the values again!
p_type: bytes 0-3 -> 06 00 00 00
p_flags: bytes 4-7 -> 05 00 00 00
p_offset: bytes 8-15 -> 40 00 00 00 00 00 00 00
p_vaddr: bytes 16-23 -> 40 00 00 00 00 00 00 00
p_paddr: bytes 24-31 -> 40 00 00 00 00 00 00 00
p_filesz: bytes 32-39 -> f8 01 00 00 00 00 00 00
p_memsz: bytes 40-47 -> f8 01 00 00 00 00 00 00
p_align: bytes 48-55 -> 08 00 00 00 00 00 00 00
The hex dump values nicely translate to the specified program header fields. When translating these values in a human-readable form again we can format them like that:$ python3 parser.py -p /bin/ls
FOUND PROGRAM HEADER
-------- ------------------------------
p_type PT_PHDR
p_offset 0x40 (64 bytes into this file)
p_vaddr 0x40
p_paddr 0x40
p_filesz 0x1f8 (504 bytes)
p_memsz 0x1f8 (504 bytes)
p_flags read, execute
p_align 0x8
-------------------------- ----------------------
Last but not least let’s do the last walk though for a section header. It will be the same approach but for the sake of completeness let’s do it. One of them is located at the following memory range:
$ hd -s 132064 -n 64 /bin/ls
000203e0 0b 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 |................|
000203f0 38 02 00 00 00 00 00 00 38 02 00 00 00 00 00 00 |8.......8.......|
00020400 1c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00020410 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00020420
- e_shoff * (e_shentsize + e_shnum), where e_shnum is the variable part to access each section header!
sh_name: bytes 0-3 -> 0b 00 00 00
sh_type: bytes 4-7 -> 01 00 00 00
sh_flags: bytes 8-15 -> 02 00 00 00 00 00 00 00
sh_addr: bytes 16-23 -> 38 02 00 00 00 00 00 00
sh_offset: bytes 24-31 -> 38 02 00 00 00 00 00 00
sh_size: bytes 32-39 -> 1c 00 00 00 00 00 00 00
sh_link: bytes 40-43 -> 00 00 00 00
sh_info: bytes 44-47 -> 00 00 00 00
sh_addralign: bytes 48-55 -> 01 00 00 00 00 00 00 00
sh_entsize: bytes 56-53 -> 00 00 00 00 00 00 00 00
It’s time to convert our found bytes into a human-readable form for one last time:
$ python3 parser.py -s /bin/ls
FOUND SECTION HEADER
------------ --------------------------------------------
sh_name .interp
sh_type SHT_PROGBITS (Program data)
sh_flags SHF_ALLOC (Occupies memory during execution)
sh_addr 0x238
sh_offset 0x238 (568 bytes into this file)
sh_size 0x1c (28 bytes)
sh_link 0x0
sh_info 0x0
sh_addralign 0x1
sh_entsize 0x0
------------ --------------------------------------------
This is it. I hope this little system binary walkthrough highlighted the most important bits and pieces from the prior theory-heavy part.
Note: One thing I did not mention before is that we have to keep in mind the endianness to access the right memory locations.
Note: One thing I did not mention before is that we have to keep in mind the endianness to access the right memory locations.
Interim Conclusion 2
What I just walked through step by step with you is in the end exactly what readelf does. So why did I do this?In my opinion, solely relying on tools might work most of the times, but understanding the core concepts of something ultimately deepens your knowledge and helps you getting an advantage. Furthermore, if you actively work on little projects like these you might find further points of interest within the same project to take a deeper look at as I will next.
So after all that ELF file theory and digging around the ABI and internals in the previous post, I wondered what would be the smallest valid ELF file which at least the most basic functionality. This obviously would be something like a print statement so we can see some output on the screen. So let’s test how small can we go with a simple hello world program written in C.
My initial hello_world.c file was something like this for the sake of having some starting reference:
My initial hello_world.c file was something like this for the sake of having some starting reference:
#include <stdio.h>
int main(void) {
printf("5");
return;
}
Let’s compile this with the usual flags since it’s good enough for this experiment:
gcc -Wall HW_printf.c -o HW_printf
Our binary shrank by around 25%. We’re quite limited with what we can do when staying with C actually. We just have a one-liner left in our program. So let’s go even lower time to create some rudimentary ASM hello_world program that does what we want.
HW_1.asm : comment
BITS 64 : BITS directive to specify processor mode
GLOBAL main : GLOBAL directive exports 'main' so its accessible througout our code
SECTION .text : SECTION directive changes which section of the output file the code will be assembled into
main:
mov rax, 5
ret
We can then build and test it.
$ nasm -f elf64 HW_1.asm
$ gcc -Wall -s HW_1.o -o HW_1_asm
$ ./HW_1_asm; echo $?
5
So this assembly code does what we want.$ wc -c HW_1_asm
6048 HW_1_asm
That’s… rather not fun. Our super low-level wizardry didn’t do anything at all. This brings the following questions to the table:
- Why is there no overhead induced by the C programming language that was removed in our assembly code?
- What other option do we have to reduce the file size of an ordinary 64-bit elf file?
By using this the linker adds some things like an OS-specific interface that is calling main() in the end.
This induces some code overhead, which we do not want nor need. To get rid of that we need to avoid using a main()-like function construct. Let’s modify our assembly code as follows:
HW_3.asm
BITS 64
GLOBAL _start : _start default entry point for linker
SECTION .text
_start: : new code
xor rax, rax
inc al
mov bl, 5
int 0x80
This _start function is basically just a symbol for the linker to locate the entry point of our program. So let’s compile and see if our assembly is working and especially if our modification actually did result in a smaller elf file.
$ nasm -f elf64 HW_3.asm
$ gcc -Wall -s -nostartfiles HW_3.o -o HW_3_asm
$ ./HW_3_asm; echo $?
5
Let’s quickly inspect the program within GDB:
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────
*RAX 0x1c
RBX 0x5
*RCX 0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
*RDX 0x7ffff7de59a0 ◂— push rbp
*RDI 0x7ffff7ffe170 —▸ 0x555555554000 ◂— jg 0x555555554047
*RSI 0x7ffff7ffe700 ◂— 0
R8 0x0
R9 0x0
*R10 0x7ffff7ffe170 —▸ 0x555555554000 ◂— jg 0x555555554047
*R11 0x206
*R12 0x555555554250 ◂— xor rax, rax
*R13 0x7fffffffdec0 ◂— 0x1
R14 0x0
R15 0x0
RBP 0x0
RSP 0x7fffffffdec0 ◂— 0x1
*RIP 0x555555554250 ◂— xor rax, rax
─────────────────────────────────────────────────────────────────[ DISASM ]─────────────────────────────────────────────────────────────────
► 0x555555554250 xor rax, rax
0x555555554253 mov dl, 0
0x555555554255 int 0x80
0x555555554257 add byte ptr [rax], al
0x555555554259 add byte ptr [rax], al
0x55555555425b add byte ptr [rax], al
0x55555555425d add byte ptr [rax], al
0x55555555425f add byte ptr [rax], al
0x555555554261 add byte ptr [rax], al
0x555555554263 add byte ptr [rax], al
0x555555554265 add byte ptr [rax], al
─────────────────────────────────────────────────────────────────[ STACK ]──────────────────────────────────────────────────────────────────
00:0000│ r13 rsp 0x7fffffffdec0 ◂— 0x1
01:0008│ 0x7fffffffdec8 —▸ 0x7fffffffe249 ◂— 0x756e2f656d6f682f ('/home/la')
02:0010│ 0x7fffffffded0 ◂— 0x0
03:0018│ rcx 0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
04:0020│ 0x7fffffffdee0 —▸ 0x7fffffffe2ac ◂— 0x464e4f435f474458 ('XDG_CONF')
05:0028│ 0x7fffffffdee8 —▸ 0x7fffffffe2e3 ◂— 0x50454c45545f434c ('LC_TELEP')
06:0030│ 0x7fffffffdef0 —▸ 0x7fffffffe2fc ◂— 0x5f6e653d474e414c ('LANG=en_')
07:0038│ 0x7fffffffdef8 —▸ 0x7fffffffe30d ◂— 0x313d4c564c4853 /* 'SHLVL=1' */
───────────────────────────────────────────────────────────────[ BACKTRACE ]────────────────────────────────────────────────────────────────
► f 0 555555554250
f 1 1
f 2 7fffffffe249
f 3 0
Breakpoint *0x555555554250
pwndbg>
We can clearly see our code in the [ DISASM ] area of pwndbg but there is no real return address!
On Linux systems, this is a system call, more specifically an interrupt. Depending on the values in certain registers the system call will handle differents needs. In our case, we would like to exit the program with a status code 5.
- We reached a point where we almost halved the size of our initial C program with 8296 bytes in size compared to only 4832 bytes now.
- Now we only could try to optimize our assembly by using shorter/different instructions, which in my opinion is close to impossible by now.
All we really need is the ELF header and the ELF program header because the section header is optional for executables as we learned earlier and hence unwanted overhead.
So our new program only should consist of the following code in the end:
/* 64-bit ELF base types. */
typedef __u64 Elf64_Addr;
typedef __u16 Elf64_Half;
typedef __s16 Elf64_SHalf;
typedef __u64 Elf64_Off;
typedef __s32 Elf64_Sword;
typedef __u32 Elf64_Word;
typedef __u64 Elf64_Xword;
typedef __s64 Elf64_Sxword;
[...]
typedef struct elf64_hdr {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
[...]
typedef struct elf64_phdr {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;
These two C structs can be directly translated to some assembly:
HW_4.asm
BITS 64
ehdr: : ELF64_Ehdr
db 0x7F, "ELF", 2, 1, 1, 0 : e_indent
times 8 db 0 : EI_PAD
dw 3 : e_type
dw 0x3e : e_machine
dd 1 : e_version
dq _start : e_entry
dq phdr - $$ : e_phoff
dq 0 : e_shoff
dd 0 : e_flags
dw ehdrsize : e_ehsize
dw phdrsize : e_phentsize
dw 1 : e_phnum
dw 0 : e_shentsize
dw 0 : e_shnum
dw 0 : e_shstrndx
ehdrsize equ $ - ehdr
phdr: : ELF64_Phdr
dd 1 : p_type
dd 5 : p_flags
dq 0 : p_offset
dq $$ : p_vaddr
dq $$ : p_paddr
dq filesize : p_filesz
dq filesize : p_memsz
dq 0x1000 : p_align
phdrsize equ $ - phdr
_start:
xor al, al
inc al
mov bl, 5
int 0x80
filesize equ $ - $$
It seems to recognize it as a valid ELF file with only a corrupted section header size!
That sounds totally logical since we did not include any
That sounds totally logical since we did not include any
Let’s check the behavior in GDB:
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────
RAX 0x1
RBX 0x5
RCX 0x0
RDX 0x0
RDI 0x0
RSI 0x0
R8 0x0
R9 0x0
R10 0x0
R11 0x0
R12 0x0
R13 0x0
R14 0x0
R15 0x0
RBP 0x0
RSP 0x7fffffffdec0 ◂— 0x1
*RIP 0x7ffff7ffe07f ◂— int 0x80
─────────────────────────────────────────────────────────────────[ DISASM ]─────────────────────────────────────────────────────────────────
0x7ffff7ffe078 xor rax, rax
0x7ffff7ffe07b inc al
0x7ffff7ffe07d mov bl, 0
► 0x7ffff7ffe07f int 0x80 <SYS_write>
fd: 0x0
buf: 0x0
n: 0x0
0x7ffff7ffe081 add byte ptr [rax], al
0x7ffff7ffe083 add byte ptr [rax], al
0x7ffff7ffe085 add byte ptr [rax], al
0x7ffff7ffe087 add byte ptr [rax], al
0x7ffff7ffe089 add byte ptr [rax], al
0x7ffff7ffe08b add byte ptr [rax], al
0x7ffff7ffe08d add byte ptr [rax], al
─────────────────────────────────────────────────────────────────[ STACK ]──────────────────────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffdec0 ◂— 0x1
01:0008│ 0x7fffffffdec8 —▸ 0x7fffffffe249 ◂— 0x756e2f656d6f682f ('/home/la')
02:0010│ 0x7fffffffded0 ◂— 0x0
03:0018│ 0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
04:0020│ 0x7fffffffdee0 —▸ 0x7fffffffe2ac ◂— 0x464e4f435f474458 ('XDG_CONF')
05:0028│ 0x7fffffffdee8 —▸ 0x7fffffffe2e3 ◂— 0x50454c45545f434c ('LC_TELEP')
06:0030│ 0x7fffffffdef0 —▸ 0x7fffffffe2fc ◂— 0x5f6e653d474e414c ('LANG=en_')
07:0038│ 0x7fffffffdef8 —▸ 0x7fffffffe30d ◂— 0x313d4c564c4853 /* 'SHLVL=1' */
───────────────────────────────────────────────────────────────[ BACKTRACE ]────────────────────────────────────────────────────────────────
► f 0 7ffff7ffe07f
f 1 1
f 2 7fffffffe249
f 3 0
pwndbg>
[Inferior 1 (process 17169) exited with code 05]
Warning: not running or target is remote
pwndbg>
It does look pretty similar to before. All our code is there, the register is set accordingly and the exited with code 05 indicates that it indeed did what it should have done!
Interim Conclusion 3
Space is not a limitation on current systems, at least in the server/desktop segment. But even when it comes to embedded devices and IoT, where system resources are often heavily limited this is not a valid choice anymore. The process to write a fully functional application is tedious and takes way longer to test and debug too.Make the environment you work in your own and see how things work internally to find vectors for optimization or even a way for exploitation. I hope you enjoyed this somewhat theory-heavy dive into the ELF file format and can now understand the inner works a bit better.
Misc (aka ‘Super secure ELF encryption’)
As shown in the first section of this ELF file introduction the specified ELF standard has a 16-byte big magic bytes field that includes an 8-byte unused padding field e_ident[EI_PAD] right within the ELF header.$ file testbins/*
testbins/f1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f2: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f3: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f4: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f5: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f6: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
Their functionality is nothing fancy. You just have to believe me that for all 6 binaries the behavior when executing.
$ readelf -h f1
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 55 72 6c 20 6c 62 68 21
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 85
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x540
Start of program headers: 64 (bytes into file)
Start of section headers: 6448 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
That looks pretty normal to me… except wait the magic bytes above show some strange behavior.
$ python3 elf_crypter.py -d /home/lab/GIT/ELF_magic/elf_enc/testbins
[!] decrypting padding bytes of f1: Hey you!
[!] decrypting padding bytes of f2: Listen..
[!] decrypting padding bytes of f3: It is I!
[!] decrypting padding bytes of f4: encry...
[!] decrypting padding bytes of f5: ...ption
[!] decrypting padding bytes of f6: RIIIIICK
So this is most likely the smartest way to store your master key on your system, split between valid binaries in 8-byte chunks.
Even when people should find the chunks they need to put them together in the correct order.
Final Conclusion
This is it. If you reached this point where you’ve read through it all.Thanks for taking the time. Lastly, I want to address that all the used code examples and written tools for this article can be found on my Github.Do not expect any pretty code. All of it has the ‘PoC’-stamp and just works (at the moment). That marks the end of this article and I hope you enjoyed reading through it. As always I’m appreciating feedback of any kind.
Post a Comment