en:docs:win16:thunking

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:docs:win16:thunking [2026/06/03 06:58] – Update introduction prokusheven:docs:win16:thunking [2026/06/03 08:41] (current) – [The Old New Thing (Raymond Chen)] prokushev
Line 43: Line 43:
 ===== 1. Instance Thunk (Win16) ===== ===== 1. Instance Thunk (Win16) =====
  
-==== 1.1 Purpose and architecture ====+==== 1.1 Architectural root of the problem: shared code, private data ====
  
-In 16-bit Windows, multiple instances of the same application share code but each has its own **data segment (DS)**. When Windows calls a callback function (window procedure, ''EnumFonts'', ''SetTimer'', etc.), the system does not know which DS to set. ''MakeProcInstance'' creates a small thunk that, before calling the real function, loads the instance handle (''hInstance'') into register ''AX''. The receiving function must start with a prolog that loads DS from ''AX''.+In 16bit Windows, multiple instances of the same application share code but each instance has its own **data segment (DS)**. When Windows calls a callback function (window procedure, ''EnumFonts'', ''SetTimer'', etc.), the system does not know which DS to use for that particular instance. The 8086 processor had no MMU; all memory was physical, with no indirection layer. This meant that if the memory manager moved data segment, it had to know about every reference to it in order to update pointers. This situation gave rise to a special mechanism – the **Instance Thunk**.
  
-**Thunk placement:**   +==== 1.2 Role of the loader and the NE format ====
-The thunk is created in the **data segment** of the calling module (usually DGROUP). Windows can move data segments in memory, so the kernel maintains an internal table of all created thunks to update them when segments are moved.+
  
-==== 1.2 Thunk memory format (disassembled code====+The foundation of the mechanism is laid when the executable file is loadedWindows 3.x uses the **NE (New Executable)** format, in which each far (''FAR''function has a 6‑byte record in the Entry Table. This table is placed by the loader in a fixed overhead segment that is shared by all instances of the program.
  
-A typical thunk occupies **8–16 bytes** (depending on alignment)Real code from Windows 3.1:+During loading, Windows **expands each 6‑byte entry table record into an 8‑byte fragment of machine code** called a **reload thunk**. These thunks become the official, single entry point for calling any corresponding far function.
  
-<code asm+==== 1.3 Reload thunk: detailed structure and swapping mechanism ==== 
-Thunk created by MakeProcInstance + 
-Address of thunk is returned as FARPROC +**Full reload thunk structure (8 bytes):** 
-    mov     ax, 1234h       B8 34 12  – hInstance will be placed here + 
-    jmp     far ptr 5678:0000h EA 00 00 78 56  – target function address (seg:off)+<code> 
 +SAR BYTE PTR CS:[xxxx], 1  3 bytes: shift access counter right 
 +INT 3Fh                    2 bytes: software interrupt call 
 +db entry_segment           1 byte index into the module's segment table 
 +dw entry_offset            2 bytesfunction offset inside the segment
 </code> </code>
  
-Sometimes the thunk is padded with ''0x66'' prefix (not used in pure Win16). After the thunk is called, control passes to a system reload thunk that loads the code segment if needed.+Each field has precise purpose:
  
-==== 1.3 API and ordinals ====+* **''SAR BYTE PTR CS:[xxxx], 1''** – software “accessed bit” for the **LRU (Least Recently Used)** discarding algorithmThe access counter for the segment is initialised to 1. Every call shifts it right, turning it to 0. Every 250 ms Windows scans the counters and builds an LRU list. If a counter remains 1 (no calls), the segment is a candidate for discarding. This allowed software emulation of an MMU. 
 +* **''INT 3Fh''** – system interrupt that invokes the segment loader. The interrupt handler reads the operands that follow the ''INT 3Fh'' instruction. 
 +* **''db entry_segment''** – index of the entry table record for this function in the NE header. It corresponds directly to the ''entry_segment'' parameter in the NE format. 
 +* **''dw entry_offset''** – offset inside the 16‑bit code segment, relative to the module start. The loader uses it to compute the final 32‑bit linear address after the segment has been loaded.
  
-^ Function ^ Export ^ Ordinal (KERNEL.EXE 3.10) ^ +**Two states of a reload thunk:**
-| ''MakeProcInstance'' | by name | 51 (unofficial) | +
-| ''FreeProcInstance'' | by name | **52** (confirmed) | +
-| ''CallProcInstance'' | by name | missing (undocumented) |+
  
-**FreeProcInstance** (ordinal 52) removes the thunk from internal kernel tables and frees memory.+**State 1 – code segment not in memory:**   
 +The thunk executes ''SAR'', then ''INT 3Fh'' with its operands. The ''INT 3Fh'' handler reads ''entry_segment'' and ''entry_offset'', locates the segment in the module's segment table, loads the segment from disk, and updates the table.
  
-==== 1.4 Full creation and call cycle ====+**State 2 – code segment is in memory:**   
 +The kernel modifies the reload thunk: 
 +<code> 
 +SAR BYTE PTR CS:[xxxx],  ; 3 bytes – always executed 
 +JMP ssss:oooo              ; 5 bytes – direct jump to the resident function 
 +</code> 
 +The ''JMP'' instruction occupies the 5 bytes that previously held ''INT 3Fh'' and its operands. The address ''ssss:oooo'' is the real 32‑bit linear address of the function.
  
-<code c> +**Life cycle:**
-// 1. Create thunk +
-FARPROC lpfnThunk = MakeProcInstance((FARPROC)MyCallback, hInst);+
  
-// 2Pass thunk to Windows API +1**Initialisation** – the loader builds the reload thunk from the NE file's Entry Table data. 
-SetTimer(hWndID_TIMER1000, (TIMERPROC)lpfnThunk);+2. **First call** – ''SAR'' executesthen ''INT 3Fh''. The handler loads the code segment from disk. 
 +3. **Patching** – after the segment is loadedthe loader overwrites ''INT 3Fh'' and the next 3 bytes with ''JMP ssss:oooo''
 +4. **Subsequent calls** – ''SAR'' executesthen ''JMP'' directly to the function (no loading)
 +5. **Discarding** – when the system needs memory, the kernel may discard the code segment. Upon discarding, the reload thunk is restored to its original state (''INT 3Fh'' with operandsusing saved data in the segment table.
  
-// 3Inside MyCallback (must have special prolog+==== 1.4 Three prolog types for exported functions ====
-void FAR PASCAL MyCallback(HWND hWnd, UINT msg, UINT id, DWORD time) { +
-    // Compiler with -GA -GEa generates: +
-    //    push bp +
-    //    mov bp, sp +
-    //    push ds +
-    //    mov ds, ax      ; AX hInstance from thunk +
-    //    ... function body ... +
-}+
  
-// 4Free +The loader not only creates reload thunks, but also **modifies the prolog of exported functions**, replacing the first 2‑3 bytes with ''NOP'' instructions. There are three prolog types: 
-FreeProcInstance(lpfnThunk);+ 
 +**Type 1 – Load DS from AX (classic, for EXE callbacks)** 
 +- Works with ''MakeProcInstance''. The loader replaces the first 3 bytes with ''NOP''s, making the prolog dependent on the value in ''AX''
 +  <code> 
 +  nop 
 +  nop 
 +  nop 
 +  push bp 
 +  mov bp, sp 
 +  push ds 
 +  mov ds, ax           ; load DS from AX 
 +  </code> 
 + 
 +**Type 2 – Load DS from SS (alternative for EXE)** 
 +- Assumes ''SS'' already contains the correct ''DS''. ''MakeProcInstance'' is not needed
 +  <code> 
 +  mov ax, ss 
 +  push bp 
 +  mov bp, sp 
 +  push ds 
 +  mov ds, ax 
 +  </code> 
 + 
 +**Type 3 – Load DS from a hard‑coded value (for DLLs)** 
 +- A DLL has only one data instance, its data selector is known at load time. The loader replaces the placeholder ''????'' with the actual value. ''MakeProcInstance'' is not needed. 
 +  <code> 
 +  mov ax, ????          ; actual DLL data selector 
 +  push bp 
 +  mov bp, sp 
 +  push ds 
 +  mov ds, ax 
 +  </code> 
 + 
 +==== 1.5 MakeProcInstance: dynamic binding to an instance ==== 
 + 
 +''MakeProcInstance'' creates a thunk that binds a function call to a specific instance's data. It dynamically generates an 8‑byte code fragment in a fixed memory area: 
 + 
 +<code> 
 +mov     ax, hInstance       ; B8 xx xx  – load the instance's data selector into AX 
 +jmp     far ptr reload_thunk EA xx xx xx xx – jump to the system reload thunk
 </code> </code>
  
-==== 1.5 Compiler options for callback functions ====+The second operand of the ''JMP'' instruction is the address of the target function's **reload thunk**. The address of this generated **instance thunk** is returned by ''MakeProcInstance'' as a ''FARPROC''.
  
-Microsoft C 6.0/7.0 and Visual C++ 1.x require: +Every such dynamically created thunk must eventually be freed by ''FreeProcInstance''If the instance's data segment is moved in memory, the kernel updates all corresponding thunks by replacing the immediate operand of the ''MOV AX, hInstance'' instruction (real mode had no indirection; all addresses were physical).
-* **''-GA''** – load DS from AX on function entry (for callbacks). +
-* **''-GEa''** – generate prolog for all exported functions (manages data segment).+
  
-Without these flags, the function will use the wrong DS, leading to a general protection fault.+**The call chain:**
  
-==== 1.6 Role of ''CallProcInstance'' ====+1. **Instance thunk** loads ''hInstance'' into ''AX''
 +2. It jumps to the system **reload thunk**. 
 +3. **Reload thunk** updates the ''SAR'' counter; if necessary it loads the segment via ''INT 3Fh'' (using ''entry_segment'' and ''entry_offset''). 
 +4. **Reload thunk** jumps to the modified prolog of the exported function. 
 +5. The prolog (with the ''NOP''s) saves the old ''DS'' and loads the new ''DS'' from ''AX'', which was set by the instance thunk. 
 +6. The function body executes.
  
-''CallProcInstance'' is an **undocumented kernel function** that invoked a thunk passed via registers ES:BX. Prototype:+==== 1.6 CallProcInstance and the return thunk ====
  
-<code c>+''CallProcInstance'' is an **undocumented kernel function** that, together with a special **return thunk** mechanism, solves the problem of returning into a discarded code segment. Prototype: 
 + 
 +<code>
 LONG FAR PASCAL CallProcInstance(HWND hWnd, WORD wMsg, WORD wParam, LONG lParam); LONG FAR PASCAL CallProcInstance(HWND hWnd, WORD wMsg, WORD wParam, LONG lParam);
 </code> </code>
  
-It was used internally by ''CallWindowProc'' and other dispatchersIt does not exist in modern 32-bit systems.+It was used internally by dispatchers such as ''CallWindowProc'' to invoke a thunk whose address was passed via the **ES:BX** register pair. However, its main role is in the return thunk. 
 + 
 +**Return thunk mechanism:** 
 + 
 +- For every discardable code segment, the system creates **one shared return thunk**, pre‑placed in the segment's overhead data. 
 +- When the kernel discards a code segment, a **stack patcher** walks through the stack of every thread and **replaces** the original return address (''CS:IP'') with the address of that segment's return thunk. 
 +- The original offset (''IP'') is saved in the stack location that previously held the caller's ''DS''
 +- The return thunk is **idempotent**: it can be safely called even if the target segment is already in memory. If the code is already present, it simply restores the original state and jumps to it; if not, it loads the segment. 
 +- This idempotency was provided by ''CallProcInstance''
 + 
 +==== 1.7 Evolution and redundancy of MakeProcInstance ==== 
 + 
 +Later it was discovered that ''MakeProcInstance'' was often unnecessary. 
 + 
 +* **''__loadds'' for DLLs** – because a DLL has only one data instance, the compiler can hard‑code the fixed ''hInstance'' value directly into the function prolog (type 3). This made ''MakeProcInstance'' redundant. 
 +* **''__export'' for EXEs** – the instance handle can be obtained directly from the stack selector (''SS'') (type 2), also eliminating the need for ''MakeProcInstance'' in most applications. 
 + 
 +The discovery that the entire work of ''MakeProcInstance'' was superfluous was made by **Michael Geary**. His ''FixDS'' technique worked perfectly already in Windows 1.0, and the long‑standing practice of using ''EXPORTS'' and ''MakeProcInstance'' turned out to be an unnecessary adventure. 
 + 
 +In modern 32bit and 64‑bit Windows, ''MakeProcInstance'' is a stub macro that simply returns the passed pointer, and ''FreeProcInstance'' does nothing. 
 + 
 +==== 1.8 Complete call flow summary ==== 
 + 
 +1. **Loading** – the loader reconstructs the Entry Table into reload thunks (''SAR'', ''INT 3Fh'', ''entry_segment'', ''entry_offset'') and modifies the prologs of exported functions (replacing the beginning with ''NOP''s). 
 +2. **Instance thunk creation** – the application calls ''MakeProcInstance'' with a function pointer and ''hInstance''. The kernel generates the thunk ''MOV AX, hInstance; JMP reload_thunk''
 +3. **Call** – Windows calls the address returned by ''MakeProcInstance''
 +4. **Instance thunk** – loads ''hInstance'' into ''AX'' and jumps to the reload thunk. 
 +5. **Reload thunk** – executes ''SAR'' (updating the LRU counter). If the code segment is absent, ''INT 3Fh'' uses ''entry_segment'' and ''entry_offset'' to load the segment from disk. After loading, ''INT 3Fh'' is replaced by ''JMP''
 +6. **Transfer to function** – the reload thunk jumps to the modified prolog of the exported function. 
 +7. **Prolog** – saves the old ''DS'' and loads the new ''DS'' from ''AX'' (set by the instance thunk). 
 +8. **Function execution**. 
 +9. **Return** – if the code segment was discarded while the function was running, the return thunk mechanism (part of ''CallProcInstance'') reloads the segment and transfers control to the saved return address. 
 + 
 +==== 1.9 API and ordinals ==== 
 + 
 +^ Function ^ Export ^ Ordinal (KERNEL.EXE 3.10) ^ 
 +| ''MakeProcInstance'' | by name | 51 (unofficial) | 
 +| ''FreeProcInstance'' | by name | **52** (confirmed) | 
 +| ''CallProcInstance'' | by name | missing (undocumented) | 
 + 
 +**FreeProcInstance** (ordinal 52) removes the instance thunk from internal kernel tables and frees its memory. 
  
 ===== 2. Generic Thunk (Windows NT) ===== ===== 2. Generic Thunk (Windows NT) =====
Line 130: Line 218:
 | **517** | ''CallProc32W'' | Calls a 32-bit function with parameter conversion (Pascal calling convention). | | **517** | ''CallProc32W'' | Calls a 32-bit function with parameter conversion (Pascal calling convention). |
  
-Later ''CallProcEx32W'' appeared (no fixed ordinal), supporting ''__cdecl'' and the flag ''CPEX_DEST_CDECL''.+Later CallProcEx32W appeared (no fixed ordinal), supporting cdecl and the flag ''CPEX_DEST_CDECL''.
  
 ==== 2.3 API functions: prototypes and parameters ==== ==== 2.3 API functions: prototypes and parameters ====
Line 618: Line 706:
 ===== References ===== ===== References =====
  
-* Finnegan, J. "Test Drive Win32 from 16-bit Code Using the Windows NT WOW Layer and Generic Thunk". Microsoft Systems Journal, June 1994. (PDF: ''24.pdf''+  * Finnegan, J. "Test Drive Win32 from 16-bit Code Using the Windows NT WOW Layer and Generic Thunk". Microsoft Systems Journal, June 1994. (PDF: ''24.pdf''
-* Oney, W. "Mix 16-bit and 32-bit Code in Your Applications with the Win32s Universal Thunk". Microsoft Systems Journal, November 1993. (PDF: ''mix16.pdf''+  * Oney, W. "Mix 16-bit and 32-bit Code in Your Applications with the Win32s Universal Thunk". Microsoft Systems Journal, November 1993. (PDF: ''mix16.pdf''
-* Pietrek, M. "Windows 95 System Programming Secrets". IDG Books, 1996. (Chapter on ''QT_Thunk''+  * Pietrek, M. "Windows 95 System Programming Secrets". IDG Books, 1996. (Chapter on ''QT_Thunk''
-* Petzold, C. "Programming Windows 3.1". Microsoft Press, 1992. (Chapters on ''MakeProcInstance''+  * Petzold, C. "Programming Windows 3.1". Microsoft Press, 1992. (Chapters on ''MakeProcInstance''
-* Wine source code: ''dlls/wow32/thunk.c'', ''include/wine/thunk.h'' +  * Wine source code: ''dlls/wow32/thunk.c'', ''include/wine/thunk.h'' 
-* Microsoft Win32 SDK for Windows NT 3.5, files: ''WOWNT16.H'', ''WOWNT32.H'' +  * Microsoft Win32 SDK for Windows NT 3.5, files: ''WOWNT16.H'', ''WOWNT32.H'' 
-* Microsoft Knowledge Base article Q104009: "Generic Thunk Interface in Windows NT"+  * Microsoft Knowledge Base article Q104009: "Generic Thunk Interface in Windows NT" 
 + 
 + 
 +===== References ===== 
 + 
 +==== Official Microsoft documentation (MSDN / KB) ==== 
 + 
 +  * [[https://library.thedatadungeon.com/msdn-1992-09/progwin/html/prog5ip4.content.htm|When Windows Runs the Program (MSDN 1992)]] – describes reload thunk structure, ''SAR'', ''INT 3Fh'', and the two states. 
 +  * [[https://library.thedatadungeon.com/msdn-1992-09/progwin/html/prog5iqw.content.htm|What MakeProcInstance Does (MSDN 1992)]] – official description of instance thunk creation. 
 +  * [[https://jeffpar.github.io/kbarchive/kb/105/Q105137/|Q105137: Explanation of Exporting Functions in Windows]] – details the three prolog types. 
 +  * [[https://betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/102871|Microsoft KB Archive/102871]] – compiler switches ''-GA -GEa'' required for callbacks. 
 +  * [[https://betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/81496|Microsoft KB Archive/81496]] – clarifies ''HINSTANCE'' vs ''HMODULE''
 + 
 +==== The Old New Thing (Raymond Chen) ==== 
 + 
 +  * [[https://devblogs.microsoft.com/oldnewthing/20080207-00/?p=23533|What did MakeProcInstance do? (February 7, 2008)]] – explanation of the mechanism, its necessity due to lack of MMU, and why it became redundant. 
 +  * [[https://devblogs.microsoft.com/oldnewthing/20180423-00/?p=98575|The early history of redundant function pointer casts: MakeProcInstance (April 23, 2018)]] – concludes that ''MakeProcInstance'' is now a stub. 
 +  * [[https://devblogs.microsoft.com/oldnewthing/20080208-00/?p=23513|Why couldn't you have more than one instance of a 16-bit multi-DS program? (February 8, 2008)]] - explanation of the DS segment usage 
 + 
 +==== Historical analysis and discoveries ==== 
 + 
 +  * [[http://www.geary.com/fixds.html|FixDS – a bit of Windows history (Michael Geary)]] – first-hand account of how ''MakeProcInstance'' was eliminated for EXEs using ''__export'' and for DLLs using ''__loadds''
 + 
 +==== Reverse engineering (Wine and WineVDM) ==== 
 + 
 +  * [[https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/wow32/thunk.c|Wine thunk.c]] – source code showing ''TASK_AllocThunk'' and how instance thunks are allocated. 
 +  * [[https://deepwiki.com/otya128/winevdm/Task_Management|WineVDM Task Management]] – explains per‑task thunk management. 
 + 
 +==== General reference and third‑party ==== 
 + 
 +  * [[https://www.gladir.com/CODER/CWINDOWS3/callprocinstance.htm|CallProcInstance (gladir.com)]] – documented syntax of the undocumented ''CallProcInstance''
 +  * [[https://wiki.osdev.org/NE|NE (New Executable) Format (OSDev Wiki)]] – technical specification of the NE format and the Entry Table. 
 +  * [[https://en.wikipedia.org/wiki/Thunk|Thunk – Wikipedia]] – general definition and historical mention of reload thunk. 
 +  * [[https://hackernoon.com/win3mu-part-5-windows-3-executable-files-2b072fd7716b|Win3mu Part 5 – Windows 3 Executable Files (HackerNoon)]] – detailed analysis of NE files and relocation.