This project demonstrates a functional Proof-of-Concept for Reflective DLL Injection (RDI) on Windows ARM64. Key to this is leveraging the x18
register to access the Thread Environment Block (TEB) and subsequently the Process Environment Block (PEB), a method confirmed via WinDbg and Microsoft's ARM64 ABI documentation. The PoC, developed and tested on a Surface Pro 11 (ARM64), includes a basic injector and a reflective DLL, adapting Stephen Fewer's original RDI principles to the ARM64 architecture. This work aims to fill a gap in publicly available research for this specific technique on ARM64.
Reflective DLL Injection (RDI) is a well-documented technique for loading Dynamic Link Libraries (DLLs) into a process's memory from a memory buffer, bypassing conventional disk-based loading mechanisms. While extensively analyzed and utilized on x86 and x64 Windows architectures, its application and detailed public documentation for the Windows on ARM64 platform have been notably limited, with very little readily accessible research specifically addressing RDI on this architecture. This article details the process of adapting and implementing RDI for ARM64, focusing on the architectural nuances required for self-location and API resolution by the reflective loader. We present key findings from Windows Debugger (WinDbg) analysis regarding ARM64's Thread Environment Block (TEB) and Process Environment Block (PEB) access, and demonstrate a functional proof-of-concept. The implications for offensive security practitioners and defensive strategies on the ARM64 platform are also discussed.
The Windows on ARM64 platform represents a significant and expanding segment of the computing ecosystem. As this architecture gains traction, particularly with devices like my own Surface Pro 11, the need for security researchers and practitioners to understand its low-level behavior and adapt existing tradecraft becomes increasingly critical. Reflective DLL Injection, originally popularized by Stephen Fewer, offers a potent method for stealthy code execution by manually mapping a DLL within a target process's address space. This technique avoids standard Windows loader events associated with disk-based module loading, thereby evading many traditional detection methods.
Despite its utility, publicly available research and tooling for RDI specifically tailored to ARM64 Windows systems are non-existent (as far as I know). While individual components or concepts may have been touched upon, a cohesive, end-to-end public demonstration and explanation of RDI on ARM64 has been elusive. This article aims to contribute to bridging this gap by detailing the necessary architectural adaptations and providing a clear methodology for achieving RDI on this platform. My approach involves leveraging WinDbg for system introspection and constructing an ARM64-specific reflective loader by adapting established RDI principles.
The core of RDI is the ReflectiveLoader
function, a position-independent code segment exported by the DLL intended for injection. Once the DLL's raw image is written into the target process's memory and execution is transferred to the ReflectiveLoader
(e.g., via CreateRemoteThread
), it performs the following critical operations:
- Self-Location: Determines its own current base address in the target process's memory.
- API Resolution: Dynamically locates the addresses of essential Windows API functions (e.g.,
LoadLibraryA
,GetProcAddress
,VirtualAlloc
,NtFlushInstructionCache
) without relying on its own, yet unprocessed, import table. This typically involves navigating the Process Environment Block (PEB) to find loaded system modules likekernel32.dll
andntdll.dll
. - Memory Allocation: Allocates a new, suitably sized memory region within the target process with execute, read, and write (RWX) permissions.
- PE Image Mapping: Copies its own PE headers and sections from its initial location to the newly allocated memory region.
- Relocation Processing: Applies any necessary base relocations to correct absolute addresses within its code and data sections, as the new memory region is unlikely to match the DLL's preferred image base.
- Import Address Table (IAT) Resolution: Parses its own import directory, loads required dependent DLLs using the resolved
LoadLibraryA
, and populates its IAT with the addresses of imported functions obtained via the resolvedGetProcAddress
. - Execution Transfer: Calls the DLL's actual entry point (typically
DllMain
) with theDLL_PROCESS_ATTACH
reason.
The primary challenge in porting RDI to a new architecture like ARM64 lies in the API resolution phase, specifically how the TEB and PEB are accessed to enumerate loaded modules.
On x86 and x64 architectures, the TEB is typically accessed via the FS
and GS
segment registers, respectively. The PEB pointer is then found at a fixed offset within the TEB. ARM64 does not utilize segment registers in this manner, necessitating a different approach for locating these critical structures.
This investigation began by consulting official documentation. Microsoft's Overview of ARM64 ABI conventions provides crucial details regarding the architecture's register usage. Within this documentation, under the "Integer registers" section, a key piece of information is provided for the x18
register:
x18 N/A Reserved platform register: in kernel mode, points to KPCR for the current processor; In user mode, points to TEB
This statement directly identifies the x18
register as the user-mode pointer to the Thread Environment Block (TEB) on ARM64. This was the architectural equivalent we were seeking to replace the FS/GS segment register mechanism.
To verify this and determine the subsequent offset to the Process Environment Block (PEB), we utilized the Windows Debugger (WinDbg) attached to an ARM64 user-mode process (Notepad.exe).
The !teb
WinDbg command confirmed the presence of the TEB and its PEB Address
field in ARM64 processes:
0:026> !teb
TEB at 000000347f391000
...
PEB Address: 000000347f35a000
...
With the TEB's base address known (obtainable programmatically via x18
), further inspection of the TEB structure using dt ntdll!_TEB <TEB_Address>
revealed the offset of the ProcessEnvironmentBlock
(PEB pointer) member:
+0x060 ProcessEnvironmentBlock : 0x00000034`7f35a000 _PEB
This debugging output, combined with the ABI documentation, confirmed that on ARM64, the PEB pointer can be retrieved by reading the value of the x18
register (which gives the TEB base) and then dereferencing the memory location TEB_BASE + 0x60
. ARM64 C/C++ compilers provide ARM64 intrinsics such as __readx18qword()
to facilitate access to such registers, enabling the following C code pattern within the reflective loader:
ULONG_PTR uiPeb;
uiPeb = __readx18qword(0x60); // Reads PEB pointer from TEB_BASE (X18) + 0x60
With the PEB access mechanism for ARM64 identified as __readx18qword(0x60)
, the construction of the ReflectiveLoader
could proceed, adapting established RDI principles to the ARM64 architecture. The loader is designed as a position-independent function, typically exported by the DLL intended for injection.
Key implementation details include:
-
4.1. Self-Location and Initial API Pointer Acquisition: The loader first determines its own base address in memory. A common technique involves calling a non-inlined function that returns the address of its caller via an intrinsic like
_ReturnAddress()
. Once its own base is known, the loader can parse its own PE headers. The critical__readx18qword(0x60)
intrinsic is then used to retrieve the PEB address. From the PEB, the loader navigates toPEB->Ldr->InMemoryOrderModuleList
to begin enumerating loaded system DLLs, primarily seekingkernel32.dll
andntdll.dll
. -
4.2. Hashing Algorithms for API Resolution: To dynamically resolve API functions without relying on its own (yet unprocessed) import table, hashing is employed. Pre-calculated hash values for essential functions (
LoadLibraryA
,GetProcAddress
,VirtualAlloc
,NtFlushInstructionCache
) and their respective DLLs (kernel32.dll
,ntdll.dll
) are stored within the loader.-
Module Name Hashing: It was found that the hashing algorithm for module names (derived from
UNICODE_STRING.BaseDllName
in the PEB's LDR data) needed to precisely replicate the byte-wise processing and ASCII-centric uppercasing used in many original RDI implementations to match the pre-calculatedKERNEL32DLL_HASH
andNTDLLDLL_HASH
. TheUNICODE_STRING.Length
field (in bytes) is used as the counter.// Simplified excerpt of module name hashing within the ReflectiveLoader // pEntry is a PLDR_DATA_TABLE_ENTRY_LDR // dwModuleHash is the accumulator, ror_dword_loader performs bitwise rotation if (pEntry->BaseDllName.Length > 0 && pEntry->BaseDllName.Buffer != NULL) { USHORT usCounter = pEntry->BaseDllName.Length; // Length in bytes BYTE *pNameByte = (BYTE*)pEntry->BaseDllName.Buffer; do { dwModuleHash = ror_dword_loader(dwModuleHash); if (*pNameByte >= 'a' && *pNameByte <= 'z') { // Byte-level ASCII check dwModuleHash += (*pNameByte - 0x20); // Uppercase if lowercase ASCII } else { dwModuleHash += *pNameByte; // Add byte as is } pNameByte++; } while (--usCounter); // Compare dwModuleHash with KERNEL32DLL_HASH or NTDLLDLL_HASH }
-
Function Name Hashing: Similarly, function names exported by
kernel32.dll
andntdll.dll
are iterated, hashed, and compared against target hashes. The effective algorithm for function names often involves a rotation and sum of direct character ASCII values.```c // Simplified excerpt of function name hashing within the ReflectiveLoader // c points to a char* function name, h is the accumulator do { h = ror_dword_loader(h); // _rotr based rotation h += *c; // Sum of ASCII character values } while (*++c); // Compare h with LOADLIBRARYA_HASH, GETPROCADDRESS_HASH, etc. ```
Once the base addresses of
kernel32.dll
andntdll.dll
are found, their export address tables (EATs) are parsed. TheAddressOfNames
,AddressOfNameOrdinals
, andAddressOfFunctions
arrays are used in conjunction with the hashing mechanism to locate the Virtual Addresses (VAs) of the required API functions.
-
-
4.3. Memory Allocation and PE Image Mapping: Using the dynamically resolved pointer to
VirtualAlloc
, the loader allocates a new memory region within the target process. This region is typically marked withPAGE_EXECUTE_READWRITE
permissions and sized according toSizeOfImage
from its own PE header. The loader then meticulously copies its own PE headers (OptionalHeader.SizeOfHeaders
) and each section (IMAGE_SECTION_HEADER
) from its initial, temporary location in memory to their respective virtual addresses within this newly allocated image base. -
4.4. Relocation Processing: Since the DLL is unlikely to be loaded at its preferred
OptionalHeader.ImageBase
, base relocations must be processed. The loader calculates the delta between the new actual image base and the preferred image base. It then iterates through the relocation blocks found via theIMAGE_DIRECTORY_ENTRY_BASERELOC
data directory. For ARM64,IMAGE_REL_BASED_DIR64
is the predominant relocation type, requiring the 64-bit delta to be added to the value at the specified offset within the image.// Simplified excerpt of IMAGE_REL_BASED_DIR64 relocation handling // uiNewImageBase is the actual base, uiDelta is (uiNewImageBase - pOldNtHeaders->OptionalHeader.ImageBase) // pRelocBlock points to the current IMAGE_BASE_RELOCATION block // pRelocEntry points to the current IMAGE_RELOC_LDR entry if (pRelocEntry[k].type == IMAGE_REL_BASED_DIR64) { *(ULONG_PTR*)(uiNewImageBase + pRelocBlock->VirtualAddress + pRelocEntry[k].offset) += uiDelta; }
-
4.5. Import Address Table (IAT) Resolution: The loader parses its own
IMAGE_DIRECTORY_ENTRY_IMPORT
data directory. For eachIMAGE_IMPORT_DESCRIPTOR
, it uses the resolvedLoadLibraryA
to load the required dependent DLL into the target process's address space. Then, for each imported function (iterating through theOriginalFirstThunk
orFirstThunk
), it uses the resolvedGetProcAddress
(using either the function name or ordinal) to find the function's address in the now-loaded dependent module. This address is then written into the corresponding entry in theFirstThunk
array, effectively populating the IAT. -
4.6. Execution Transfer and Cleanup: Before transferring execution to the DLL's actual entry point (e.g.,
PayloadDllMain
),NtFlushInstructionCache
is called. This is crucial on architectures like ARM where instruction caching might lead to stale instructions being executed after relocations or IAT patching. The call is typicallyfnNtFlushInstructionCache((HANDLE)-1, NULL, 0)
to flush the cache for the current process. Finally, the DLL's entry point is called withDLL_PROCESS_ATTACH
and any parameters passed to theReflectiveLoader
.
The remaining steps of the loader, such as detailed PE structure definitions for internal parsing (e.g., _PEB_LDR_DATA_LDR
, _LDR_DATA_TABLE_ENTRY_LDR
), are essential for correct navigation and interpretation of process memory but follow established patterns adapted for 64-bit types.
A proof-of-concept was developed consisting of an ARM64 injector executable and an ARM64 reflective DLL. The DLL's PayloadDllMain
was programmed to display a MessageBoxA
upon successful loading. The injector, enhanced with a user-friendly banner and the ability to target processes by name or PID, performed the following sequence:
- Read the reflective DLL from disk into a memory buffer.
- Parsed the DLL's PE structure to locate the file offset of the exported
ReflectiveLoader
function. - Obtained a handle to the target ARM64 process.
- Allocated RWX memory in the target process using
VirtualAllocEx
. - Wrote the DLL's buffered image into the allocated remote memory using
WriteProcessMemory
. - Calculated the absolute address of
ReflectiveLoader
within the target process's address space. - Initiated execution of
ReflectiveLoader
viaCreateRemoteThread
.
Successful execution was confirmed by the appearance of the MessageBoxA from the injected DLL. The injector output below demonstrates a successful injection into chrome.exe
on a Windows 11 ARM64 system (Build 26200):
C:\Users\ah\Documents\GitHub\ReflectiveDLLInjection_ARM64>arm64_injector.exe chrome.exe arm64_rdi.dll
=====================================================================
| Reflective DLL Injection on Windows ARM64 |
| By @xaitax |
=====================================================================
Host Architecture: ARM64
Host OS: Windows 11 (Build 26200)
Targeting process name: chrome.exe
Found PID 20352 for process name 'chrome.exe'
DLL Path: arm64_rdi.dll
DLL file size: 6144 bytes
DLL read into local buffer at: 0x0000024699117260
ReflectiveLoader file offset: 0x538
Target process 20352 opened. Handle: 0x00000000000000AC
Memory allocated in target at: 0x000002BE69740000 (Size: 6144 bytes)
DLL written to target memory at: 0x000002BE69740000
Calculated remote ReflectiveLoader: 0x000002BE69740538
Remote thread created (Handle: 0x00000000000000B0). Waiting...
Remote thread completed.
Remote thread exit code: 0x69750000
Injection process finished.
The remote thread exit code (0x69750000
in this instance, corresponding to 0x000002BE69740000
) represents the base address where the reflective loader successfully mapped the DLL in the target process.
The proof-of-concept code demonstrating the ARM64 Reflective DLL Injection technique discussed in this article, including the injector and the reflective DLL (comprising the loader and payload), is available in this GitHub repository.
To compile this code, an ARM64 C/C++ development environment is required. For Windows, this typically involves using Microsoft Visual Studio with the ARM64 build tools installed. Compilation should be performed from an "ARM64 Native Tools Command Prompt" or a development environment correctly configured for ARM64 cross-compilation.
Example Compilation Commands (using MSVC cl.exe
):
-
Compile the Reflective DLL (e.g.,
arm64_rdi.dll
): The DLL must export theReflectiveLoader
function and link necessary libraries for its payload (e.g.,User32.lib
ifMessageBoxA
is used). The DLL's entry point for the reflective loader will be the payload'sDllMain
(e.g.,PayloadDllMain
).cl /LD /Fe:arm64_rdi.dll arm64_dll.c arm64_reflective_loader.c User32.lib /link /ENTRY:PayloadDllMain /DLL
/LD
: Create a DLL./Fe:filename
: Specify the output DLL name.User32.lib
: Example library forMessageBoxA
./link /ENTRY:functionName
: Sets the DLL's entry point./DLL
: Specifies that a DLL is to be built.
-
Compile the Injector Executable (e.g.,
arm64_injector.exe
):cl arm64_injector.c /Fe:arm64_injector.exe Kernel32.lib
Kernel32.lib
: Typically linked by default but can be specified for clarity for functions likeCreateFileA
,OpenProcess
, etc.
Users should adapt these commands based on their specific source file names and project structure. Ensure that the target architecture in the compiler and linker settings is explicitly set to ARM64.
This research demonstrates that Reflective DLL Injection is a viable and effective technique on the Windows on ARM64 platform. By identifying the ARM64-specific mechanism for TEB/PEB access (x18
register and +0x60
offset) and carefully adapting hashing and PE processing logic, a functional RDI implementation was achieved.
As ARM64 Windows systems become more integrated into the computing landscape, both offensive and defensive security practitioners must adapt their tools and methodologies accordingly. While this work provides a foundational proof-of-concept, further development could incorporate more advanced features such as SEH setup and support for a wider range of PE features to create more robust and versatile ARM64 RDI solutions.
- Microsoft Corporation. Overview of ARM64 ABI conventions
- Microsoft Corporation. ARM64 intrinsics
- Fewer, Stephen. Reflective DLL Injection