| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| en:docs:win16:thunking [2026/06/03 06:43] – prokushev | en:docs:win16:thunking [2026/06/03 08:41] (current) – [The Old New Thing (Raymond Chen)] prokushev |
|---|
| ===== Introduction: The mixed-bitness problem ===== | ===== Introduction: The mixed-bitness problem ===== |
| |
| 16-bit Windows (3.x) used the Intel 80286 segmented memory model: address = 16-bit selector + 16-bit offset (far pointer). ''int'' size = 16 bits, stack managed by SS:SP. 32-bit systems (Windows NT, 95) introduced the flat model: 32-bit linear address, ''int'' size = 32 bits, stack SS:ESP. | 16-bit Windows versions 1.x and 2.x ran on Intel 8086/8088 in **real mode**, where a far pointer was a 16-bit segment value (shifted left by 4) plus a 16-bit offset, resulting in a 20-bit physical address. A significant change came with the **Windows/386** family (versions 2.03 and later in 1987), which first introduced a **protected mode** environment for 386 processors[reference:0]. This "386 Enhanced Mode" allowed the system to run multiple MS-DOS applications concurrently in extended memory, moving beyond the 640KB barrier. It was during this period that a far pointer evolved into a **selector:offset** pair, where the selector was an index into a descriptor table[reference:1]. |
| |
| Direct calling between 16-bit and 32-bit code is impossible because of: | 32-bit systems (Windows NT, 95) introduced the **flat model**: a 32-bit linear address (0:32), with ''int'' size = 32 bits, stack SS:ESP, and support for up to 4GB of virtual address space. |
| - different pointer formats, | |
| - different sizes of basic types, | |
| - different stack conventions (SP vs ESP), | |
| - need to switch processor decoding mode (D-bit in segment descriptor). | |
| |
| A **thunk** is a piece of code (or a set of functions) that transparently transforms a call between bitnesses. | Direct calling between 16-bit code (real or protected mode) and 32-bit flat-model code is impossible because of: |
| | * different pointer formats (16:16 far pointer vs 0:32 linear), |
| | * different sizes of basic types (16-bit int vs 32-bit int), |
| | * different stack conventions (SS:SP vs SS:ESP), |
| | * the need to switch processor decoding mode (D-bit in segment descriptors for protected mode, or switching between real and protected modes). |
| | |
| | A **thunk** is a piece of code (or a set of functions) that transparently transforms a call between these incompatible execution environments. |
| | |
| | However, the concept of thunking is much older and broader than the 16‑to‑32‑bit transition. Already in the real‑mode era, Windows faced two fundamental problems that thunks were invented to solve: |
| | |
| | - **Segment swapping (overlays)** – In real mode, code segments could be discarded or moved to disk to save memory. When a far call was made to a function in a potentially absent segment, a mechanism was needed to check presence, reload the segment from disk if necessary, and then transfer control. This was performed by a **call thunk** (or **instance thunk**). The Windows API function ''MakeProcInstance'' created such a thunk dynamically, binding a far pointer to a specific instance's data segment (loading ''hInstance'' into ''AX'' before jumping to a system reload thunk). |
| | - **Return from a discarded segment** – After a function returned, the return address (CS:IP) on the stack might point to a code segment that had been swapped out. To handle this gracefully, Windows used a **return thunk**. The undocumented ''CallProcInstance'' function acted as such a thunk, checking the presence of the target segment, reloading it if needed, and then jumping to the original return address. This mechanism allowed cooperative multitasking without an MMU, relying on software‑managed segment validity. |
| | |
| | Thus, even in pure 16‑bit Windows, thunks were essential for memory management and for correctly routing callbacks (window procedures, timer procs, etc.) to the appropriate instance's data segment. Later, as 32‑bit flat model emerged, the same core idea — a small piece of glue code that translates between different calling conventions, pointer formats, and memory models — was extended to bridge 16‑bit and 32‑bit code, giving birth to Generic Thunk (NT), Universal Thunk (Win32s), Flat Thunk (Windows 95), and the low‑level Thunklet building blocks. |
| | |
| | This document describes all these thunk variants in technical detail, from the early real‑mode thunks to the last 16‑32 bridges used in Windows 9x and NT. |
| |
| ===== 1. Instance Thunk (Win16) ===== | ===== 1. Instance Thunk (Win16) ===== |
| |
| ==== 1.1 Purpose and architecture ==== | ==== 1.1 Architectural root of the problem: shared code, private data ==== |
| |
| In 16-bit Windows, multiple instances of the same application share code but each has its own **data segment (DS)**. When Windows calls a callback function (window procedure, ''EnumFonts'', ''SetTimer'', etc.), the system does not know which DS to set. ''MakeProcInstance'' creates a small thunk that, before calling the real function, loads the instance handle (''hInstance'') into register ''AX''. The receiving function must start with a prolog that loads DS from ''AX''. | In 16‑bit Windows, multiple instances of the same application share code but each instance has its own **data segment (DS)**. When Windows calls a callback function (window procedure, ''EnumFonts'', ''SetTimer'', etc.), the system does not know which DS to use for that particular instance. The 8086 processor had no MMU; all memory was physical, with no indirection layer. This meant that if the memory manager moved a data segment, it had to know about every reference to it in order to update pointers. This situation gave rise to a special mechanism – the **Instance Thunk**. |
| |
| **Thunk placement:** | ==== 1.2 Role of the loader and the NE format ==== |
| The thunk is created in the **data segment** of the calling module (usually DGROUP). Windows can move data segments in memory, so the kernel maintains an internal table of all created thunks to update them when segments are moved. | |
| |
| ==== 1.2 Thunk memory format (disassembled code) ==== | The foundation of the mechanism is laid when the executable file is loaded. Windows 3.x uses the **NE (New Executable)** format, in which each far (''FAR'') function has a 6‑byte record in the Entry Table. This table is placed by the loader in a fixed overhead segment that is shared by all instances of the program. |
| |
| A typical thunk occupies **8–16 bytes** (depending on alignment). Real code from Windows 3.1: | During loading, Windows **expands each 6‑byte entry table record into an 8‑byte fragment of machine code** called a **reload thunk**. These thunks become the official, single entry point for calling any corresponding far function. |
| |
| <code asm> | ==== 1.3 Reload thunk: detailed structure and swapping mechanism ==== |
| ; Thunk created by MakeProcInstance | |
| ; Address of thunk is returned as FARPROC | **Full reload thunk structure (8 bytes):** |
| mov ax, 1234h ; B8 34 12 – hInstance will be placed here | |
| jmp far ptr 5678:0000h ; EA 00 00 78 56 – target function address (seg:off) | <code> |
| | SAR BYTE PTR CS:[xxxx], 1 ; 3 bytes: shift access counter right |
| | INT 3Fh ; 2 bytes: software interrupt call |
| | db entry_segment ; 1 byte : index into the module's segment table |
| | dw entry_offset ; 2 bytes: function offset inside the segment |
| </code> | </code> |
| |
| Sometimes the thunk is padded with a ''0x66'' prefix (not used in pure Win16). After the thunk is called, control passes to a system reload thunk that loads the code segment if needed. | Each field has a precise purpose: |
| |
| ==== 1.3 API and ordinals ==== | * **''SAR BYTE PTR CS:[xxxx], 1''** – software “accessed bit” for the **LRU (Least Recently Used)** discarding algorithm. The access counter for the segment is initialised to 1. Every call shifts it right, turning it to 0. Every 250 ms Windows scans the counters and builds an LRU list. If a counter remains 1 (no calls), the segment is a candidate for discarding. This allowed software emulation of an MMU. |
| | * **''INT 3Fh''** – system interrupt that invokes the segment loader. The interrupt handler reads the operands that follow the ''INT 3Fh'' instruction. |
| | * **''db entry_segment''** – index of the entry table record for this function in the NE header. It corresponds directly to the ''entry_segment'' parameter in the NE format. |
| | * **''dw entry_offset''** – offset inside the 16‑bit code segment, relative to the module start. The loader uses it to compute the final 32‑bit linear address after the segment has been loaded. |
| |
| ^ Function ^ Export ^ Ordinal (KERNEL.EXE 3.10) ^ | **Two states of a reload thunk:** |
| | ''MakeProcInstance'' | by name | 51 (unofficial) | | |
| | ''FreeProcInstance'' | by name | **52** (confirmed) | | |
| | ''CallProcInstance'' | by name | missing (undocumented) | | |
| |
| **FreeProcInstance** (ordinal 52) removes the thunk from internal kernel tables and frees memory. | **State 1 – code segment not in memory:** |
| | The thunk executes ''SAR'', then ''INT 3Fh'' with its operands. The ''INT 3Fh'' handler reads ''entry_segment'' and ''entry_offset'', locates the segment in the module's segment table, loads the segment from disk, and updates the table. |
| |
| ==== 1.4 Full creation and call cycle ==== | **State 2 – code segment is in memory:** |
| | The kernel modifies the reload thunk: |
| | <code> |
| | SAR BYTE PTR CS:[xxxx], 1 ; 3 bytes – always executed |
| | JMP ssss:oooo ; 5 bytes – direct jump to the resident function |
| | </code> |
| | The ''JMP'' instruction occupies the 5 bytes that previously held ''INT 3Fh'' and its operands. The address ''ssss:oooo'' is the real 32‑bit linear address of the function. |
| |
| <code c> | **Life cycle:** |
| // 1. Create thunk | |
| FARPROC lpfnThunk = MakeProcInstance((FARPROC)MyCallback, hInst); | |
| |
| // 2. Pass thunk to Windows API | 1. **Initialisation** – the loader builds the reload thunk from the NE file's Entry Table data. |
| SetTimer(hWnd, ID_TIMER, 1000, (TIMERPROC)lpfnThunk); | 2. **First call** – ''SAR'' executes, then ''INT 3Fh''. The handler loads the code segment from disk. |
| | 3. **Patching** – after the segment is loaded, the loader overwrites ''INT 3Fh'' and the next 3 bytes with ''JMP ssss:oooo''. |
| | 4. **Subsequent calls** – ''SAR'' executes, then ''JMP'' directly to the function (no loading). |
| | 5. **Discarding** – when the system needs memory, the kernel may discard the code segment. Upon discarding, the reload thunk is restored to its original state (''INT 3Fh'' with operands) using saved data in the segment table. |
| |
| // 3. Inside MyCallback (must have special prolog) | ==== 1.4 Three prolog types for exported functions ==== |
| void FAR PASCAL MyCallback(HWND hWnd, UINT msg, UINT id, DWORD time) { | |
| // Compiler with -GA -GEa generates: | The loader not only creates reload thunks, but also **modifies the prolog of exported functions**, replacing the first 2‑3 bytes with ''NOP'' instructions. There are three prolog types: |
| // push bp | |
| // mov bp, sp | **Type 1 – Load DS from AX (classic, for EXE callbacks)** |
| // push ds | - Works with ''MakeProcInstance''. The loader replaces the first 3 bytes with ''NOP''s, making the prolog dependent on the value in ''AX''. |
| // mov ds, ax ; AX = hInstance from thunk | <code> |
| // ... function body ... | nop |
| } | nop |
| | nop |
| | push bp |
| | mov bp, sp |
| | push ds |
| | mov ds, ax ; load DS from AX |
| | </code> |
| | |
| | **Type 2 – Load DS from SS (alternative for EXE)** |
| | - Assumes ''SS'' already contains the correct ''DS''. ''MakeProcInstance'' is not needed. |
| | <code> |
| | mov ax, ss |
| | push bp |
| | mov bp, sp |
| | push ds |
| | mov ds, ax |
| | </code> |
| | |
| | **Type 3 – Load DS from a hard‑coded value (for DLLs)** |
| | - A DLL has only one data instance, its data selector is known at load time. The loader replaces the placeholder ''????'' with the actual value. ''MakeProcInstance'' is not needed. |
| | <code> |
| | mov ax, ???? ; actual DLL data selector |
| | push bp |
| | mov bp, sp |
| | push ds |
| | mov ds, ax |
| | </code> |
| | |
| | ==== 1.5 MakeProcInstance: dynamic binding to an instance ==== |
| | |
| | ''MakeProcInstance'' creates a thunk that binds a function call to a specific instance's data. It dynamically generates an 8‑byte code fragment in a fixed memory area: |
| |
| // 4. Free | <code> |
| FreeProcInstance(lpfnThunk); | mov ax, hInstance ; B8 xx xx – load the instance's data selector into AX |
| | jmp far ptr reload_thunk ; EA xx xx xx xx – jump to the system reload thunk |
| </code> | </code> |
| |
| ==== 1.5 Compiler options for callback functions ==== | The second operand of the ''JMP'' instruction is the address of the target function's **reload thunk**. The address of this generated **instance thunk** is returned by ''MakeProcInstance'' as a ''FARPROC''. |
| |
| Microsoft C 6.0/7.0 and Visual C++ 1.x require: | Every such dynamically created thunk must eventually be freed by ''FreeProcInstance''. If the instance's data segment is moved in memory, the kernel updates all corresponding thunks by replacing the immediate operand of the ''MOV AX, hInstance'' instruction (real mode had no indirection; all addresses were physical). |
| * **''-GA''** – load DS from AX on function entry (for callbacks). | |
| * **''-GEa''** – generate prolog for all exported functions (manages data segment). | |
| |
| Without these flags, the function will use the wrong DS, leading to a general protection fault. | **The call chain:** |
| |
| ==== 1.6 Role of ''CallProcInstance'' ==== | 1. **Instance thunk** loads ''hInstance'' into ''AX''. |
| | 2. It jumps to the system **reload thunk**. |
| | 3. **Reload thunk** updates the ''SAR'' counter; if necessary it loads the segment via ''INT 3Fh'' (using ''entry_segment'' and ''entry_offset''). |
| | 4. **Reload thunk** jumps to the modified prolog of the exported function. |
| | 5. The prolog (with the ''NOP''s) saves the old ''DS'' and loads the new ''DS'' from ''AX'', which was set by the instance thunk. |
| | 6. The function body executes. |
| |
| ''CallProcInstance'' is an **undocumented kernel function** that invoked a thunk passed via registers ES:BX. Prototype: | ==== 1.6 CallProcInstance and the return thunk ==== |
| |
| <code c> | ''CallProcInstance'' is an **undocumented kernel function** that, together with a special **return thunk** mechanism, solves the problem of returning into a discarded code segment. Prototype: |
| | |
| | <code> |
| LONG FAR PASCAL CallProcInstance(HWND hWnd, WORD wMsg, WORD wParam, LONG lParam); | LONG FAR PASCAL CallProcInstance(HWND hWnd, WORD wMsg, WORD wParam, LONG lParam); |
| </code> | </code> |
| |
| It was used internally by ''CallWindowProc'' and other dispatchers. It does not exist in modern 32-bit systems. | It was used internally by dispatchers such as ''CallWindowProc'' to invoke a thunk whose address was passed via the **ES:BX** register pair. However, its main role is in the return thunk. |
| | |
| | **Return thunk mechanism:** |
| | |
| | - For every discardable code segment, the system creates **one shared return thunk**, pre‑placed in the segment's overhead data. |
| | - When the kernel discards a code segment, a **stack patcher** walks through the stack of every thread and **replaces** the original return address (''CS:IP'') with the address of that segment's return thunk. |
| | - The original offset (''IP'') is saved in the stack location that previously held the caller's ''DS''. |
| | - The return thunk is **idempotent**: it can be safely called even if the target segment is already in memory. If the code is already present, it simply restores the original state and jumps to it; if not, it loads the segment. |
| | - This idempotency was provided by ''CallProcInstance''. |
| | |
| | ==== 1.7 Evolution and redundancy of MakeProcInstance ==== |
| | |
| | Later it was discovered that ''MakeProcInstance'' was often unnecessary. |
| | |
| | * **''__loadds'' for DLLs** – because a DLL has only one data instance, the compiler can hard‑code the fixed ''hInstance'' value directly into the function prolog (type 3). This made ''MakeProcInstance'' redundant. |
| | * **''__export'' for EXEs** – the instance handle can be obtained directly from the stack selector (''SS'') (type 2), also eliminating the need for ''MakeProcInstance'' in most applications. |
| | |
| | The discovery that the entire work of ''MakeProcInstance'' was superfluous was made by **Michael Geary**. His ''FixDS'' technique worked perfectly already in Windows 1.0, and the long‑standing practice of using ''EXPORTS'' and ''MakeProcInstance'' turned out to be an unnecessary adventure. |
| | |
| | In modern 32‑bit and 64‑bit Windows, ''MakeProcInstance'' is a stub macro that simply returns the passed pointer, and ''FreeProcInstance'' does nothing. |
| | |
| | ==== 1.8 Complete call flow summary ==== |
| | |
| | 1. **Loading** – the loader reconstructs the Entry Table into reload thunks (''SAR'', ''INT 3Fh'', ''entry_segment'', ''entry_offset'') and modifies the prologs of exported functions (replacing the beginning with ''NOP''s). |
| | 2. **Instance thunk creation** – the application calls ''MakeProcInstance'' with a function pointer and ''hInstance''. The kernel generates the thunk ''MOV AX, hInstance; JMP reload_thunk''. |
| | 3. **Call** – Windows calls the address returned by ''MakeProcInstance''. |
| | 4. **Instance thunk** – loads ''hInstance'' into ''AX'' and jumps to the reload thunk. |
| | 5. **Reload thunk** – executes ''SAR'' (updating the LRU counter). If the code segment is absent, ''INT 3Fh'' uses ''entry_segment'' and ''entry_offset'' to load the segment from disk. After loading, ''INT 3Fh'' is replaced by ''JMP''. |
| | 6. **Transfer to function** – the reload thunk jumps to the modified prolog of the exported function. |
| | 7. **Prolog** – saves the old ''DS'' and loads the new ''DS'' from ''AX'' (set by the instance thunk). |
| | 8. **Function execution**. |
| | 9. **Return** – if the code segment was discarded while the function was running, the return thunk mechanism (part of ''CallProcInstance'') reloads the segment and transfers control to the saved return address. |
| | |
| | ==== 1.9 API and ordinals ==== |
| | |
| | ^ Function ^ Export ^ Ordinal (KERNEL.EXE 3.10) ^ |
| | | ''MakeProcInstance'' | by name | 51 (unofficial) | |
| | | ''FreeProcInstance'' | by name | **52** (confirmed) | |
| | | ''CallProcInstance'' | by name | missing (undocumented) | |
| | |
| | **FreeProcInstance** (ordinal 52) removes the instance thunk from internal kernel tables and frees its memory. |
| |
| ===== 2. Generic Thunk (Windows NT) ===== | ===== 2. Generic Thunk (Windows NT) ===== |
| | **517** | ''CallProc32W'' | Calls a 32-bit function with parameter conversion (Pascal calling convention). | | | **517** | ''CallProc32W'' | Calls a 32-bit function with parameter conversion (Pascal calling convention). | |
| |
| Later ''CallProcEx32W'' appeared (no fixed ordinal), supporting ''__cdecl'' and the flag ''CPEX_DEST_CDECL''. | Later CallProcEx32W appeared (no fixed ordinal), supporting cdecl and the flag ''CPEX_DEST_CDECL''. |
| |
| ==== 2.3 API functions: prototypes and parameters ==== | ==== 2.3 API functions: prototypes and parameters ==== |
| ===== References ===== | ===== References ===== |
| |
| * Finnegan, J. "Test Drive Win32 from 16-bit Code Using the Windows NT WOW Layer and Generic Thunk". Microsoft Systems Journal, June 1994. (PDF: ''24.pdf'') | * Finnegan, J. "Test Drive Win32 from 16-bit Code Using the Windows NT WOW Layer and Generic Thunk". Microsoft Systems Journal, June 1994. (PDF: ''24.pdf'') |
| * Oney, W. "Mix 16-bit and 32-bit Code in Your Applications with the Win32s Universal Thunk". Microsoft Systems Journal, November 1993. (PDF: ''mix16.pdf'') | * Oney, W. "Mix 16-bit and 32-bit Code in Your Applications with the Win32s Universal Thunk". Microsoft Systems Journal, November 1993. (PDF: ''mix16.pdf'') |
| * Pietrek, M. "Windows 95 System Programming Secrets". IDG Books, 1996. (Chapter on ''QT_Thunk'') | * Pietrek, M. "Windows 95 System Programming Secrets". IDG Books, 1996. (Chapter on ''QT_Thunk'') |
| * Petzold, C. "Programming Windows 3.1". Microsoft Press, 1992. (Chapters on ''MakeProcInstance'') | * Petzold, C. "Programming Windows 3.1". Microsoft Press, 1992. (Chapters on ''MakeProcInstance'') |
| * Wine source code: ''dlls/wow32/thunk.c'', ''include/wine/thunk.h'' | * Wine source code: ''dlls/wow32/thunk.c'', ''include/wine/thunk.h'' |
| * Microsoft Win32 SDK for Windows NT 3.5, files: ''WOWNT16.H'', ''WOWNT32.H'' | * Microsoft Win32 SDK for Windows NT 3.5, files: ''WOWNT16.H'', ''WOWNT32.H'' |
| * Microsoft Knowledge Base article Q104009: "Generic Thunk Interface in Windows NT" | * Microsoft Knowledge Base article Q104009: "Generic Thunk Interface in Windows NT" |
| | |
| | |
| | ===== References ===== |
| | |
| | ==== Official Microsoft documentation (MSDN / KB) ==== |
| | |
| | * [[https://library.thedatadungeon.com/msdn-1992-09/progwin/html/prog5ip4.content.htm|When Windows Runs the Program (MSDN 1992)]] – describes reload thunk structure, ''SAR'', ''INT 3Fh'', and the two states. |
| | * [[https://library.thedatadungeon.com/msdn-1992-09/progwin/html/prog5iqw.content.htm|What MakeProcInstance Does (MSDN 1992)]] – official description of instance thunk creation. |
| | * [[https://jeffpar.github.io/kbarchive/kb/105/Q105137/|Q105137: Explanation of Exporting Functions in Windows]] – details the three prolog types. |
| | * [[https://betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/102871|Microsoft KB Archive/102871]] – compiler switches ''-GA -GEa'' required for callbacks. |
| | * [[https://betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/81496|Microsoft KB Archive/81496]] – clarifies ''HINSTANCE'' vs ''HMODULE''. |
| | |
| | ==== The Old New Thing (Raymond Chen) ==== |
| | |
| | * [[https://devblogs.microsoft.com/oldnewthing/20080207-00/?p=23533|What did MakeProcInstance do? (February 7, 2008)]] – explanation of the mechanism, its necessity due to lack of MMU, and why it became redundant. |
| | * [[https://devblogs.microsoft.com/oldnewthing/20180423-00/?p=98575|The early history of redundant function pointer casts: MakeProcInstance (April 23, 2018)]] – concludes that ''MakeProcInstance'' is now a stub. |
| | * [[https://devblogs.microsoft.com/oldnewthing/20080208-00/?p=23513|Why couldn't you have more than one instance of a 16-bit multi-DS program? (February 8, 2008)]] - explanation of the DS segment usage |
| | |
| | ==== Historical analysis and discoveries ==== |
| | |
| | * [[http://www.geary.com/fixds.html|FixDS – a bit of Windows history (Michael Geary)]] – first-hand account of how ''MakeProcInstance'' was eliminated for EXEs using ''__export'' and for DLLs using ''__loadds''. |
| | |
| | ==== Reverse engineering (Wine and WineVDM) ==== |
| | |
| | * [[https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/wow32/thunk.c|Wine thunk.c]] – source code showing ''TASK_AllocThunk'' and how instance thunks are allocated. |
| | * [[https://deepwiki.com/otya128/winevdm/Task_Management|WineVDM Task Management]] – explains per‑task thunk management. |
| | |
| | ==== General reference and third‑party ==== |
| | |
| | * [[https://www.gladir.com/CODER/CWINDOWS3/callprocinstance.htm|CallProcInstance (gladir.com)]] – documented syntax of the undocumented ''CallProcInstance''. |
| | * [[https://wiki.osdev.org/NE|NE (New Executable) Format (OSDev Wiki)]] – technical specification of the NE format and the Entry Table. |
| | * [[https://en.wikipedia.org/wiki/Thunk|Thunk – Wikipedia]] – general definition and historical mention of reload thunk. |
| | * [[https://hackernoon.com/win3mu-part-5-windows-3-executable-files-2b072fd7716b|Win3mu Part 5 – Windows 3 Executable Files (HackerNoon)]] – detailed analysis of NE files and relocation. |
| |
| |
| |