Tuesday, December 27, 2011

Troubleshooting 0x7F UNEXPECTED_KERNEL_MODE_TRAP

The Debugging Tools for Windows are required to analyze crash dump files. If you do not have the Debugging Tools for Windows installed or dump files are not being generated on system crash, see this post for installation/configuration instructions:

http://mikemstech.blogspot.com/2011/11/windows-crash-dump-analysis.html

0x0000007F UNEXPECTED_KERNEL_MODE_TRAP (also identified as 0x1000007F UNEXPECTED_KERNEL_MODE_TRAP_M) is a very common blue screen of death on the Windows platform (Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2, and Windows 8). This error is generally limited to Intel CPUs and is thrown when the CPU generates a trap that the kernel does not catch. This is typically due to a bound trap (one that the kernel can;t catch) or a double fault (an error occurs in error handling code). The error codes that sometimes end up in parameter 1 are listed here, these are listed/adapted from MSDN:

0x00000000 Divide by zero error
0x00000001 "A system-debugger call"
0x00000003 A debugger breakpoint. If this makes it into production code, this is a sloppy practice, see this post for a similar example.
0x00000004 "Overflow, occurs when the processor executes a call to an interrupt handler when the overflow (OF) flag is set. "This indicates that an integer operation overflowed and an error handling routine is called. This is likely seen when the processor is configured to automatically generate an exception when an overflow occurs.
0x00000005 Bounds Check Fault, indicates that the processor, while executing a BOUND instruction, finds that the operand exceeds the specified limits. A BOUND instruction ensures that a signed array index is within a certain range."
0x00000006 "Invalid Opcode, indicates that the processor tries to execute an invalid instruction. This error typically occurs when the instruction pointer has become corrupted and is pointing to the wrong location. The most common cause of this error is hardware memory corruption." Investigate as a potential hardware issue.
0x00000007 "A hardware coprocessor instruction with no coprocessor present."
0x00000008 Double fault. This is the most common exception subcode. This either occurs when a driver recurses too far and overflows a stack or memory corruption occurs. In the latter case, start with a memory test and enable driver verifier. In the former case, test different driver versions to see if one eliminates the fault, otherwise enable driver verifier to see if something is corrupting the memory.
0x0000000A "A corrupted Task State Segment"
0x0000000B "An access to a memory segment that was not present."
0x0000000C "An access to memory beyond the limits of a stack"
0x0000000D "An exception not covered by some other exception; a protection fault that pertains to access violations for applications"

Generic troubleshooting involves identifying issues with the memory and changing the driver version of the driver identified in the minidump. Aslo assure that other drivers and the system BIOS are up to date. Further information might be gained from the driver verifier or by analyzing a kernel memory dump, full memory dump, or using a live debugging session (and analyzing the stack based on the trap frames, task gate, or task state segment present; this is not useful with a minidump). Here is an example of a double fault blamed on an Intel graphics card driver (this is also common with AMD/ATI and NVidia cards, as well as Antivirus/Firewall vendors),


0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050033
Arg3: 00000000000406f8
Arg4: fffff88005be31b4

Debugging Details:
------------------


BUGCHECK_STR:  0x7f_8

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  7

LAST_CONTROL_TRANSFER:  from fffff80002ce52a9 to fffff80002ce5d00

STACK_TEXT:  
... : nt!KeBugCheckEx
... : nt!KiBugCheckDispatch+0x69
... : nt!KiDoubleFaultAbort+0xb2
... : igdpmd64+0x1911b4
... : 0x7109d704`2315c235
... : 0x899e2543`a1daf8d0
... : 0xfffffa80`07333010


STACK_COMMAND:  kb

FOLLOWUP_IP: 
igdpmd64+1911b4
fffff880`05be31b4 e8e7ffffff      call    igdpmd64+0x1911a0 (fffff880`05be31a0)

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  igdpmd64+1911b4

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: igdpmd64

IMAGE_NAME:  igdpmd64.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4df25f60

FAILURE_BUCKET_ID:  X64_0x7f_8_igdpmd64+1911b4

BUCKET_ID:  X64_0x7f_8_igdpmd64+1911b4

Followup: MachineOwner
---------

0: kd> lmvm igdpmd64
start             end                 module name
fffff880`05a52000 fffff880`065fc100   igdpmd64 T (no symbols)           
    Loaded symbol image file: igdpmd64.sys
    Image path: \SystemRoot\system32\DRIVERS\igdpmd64.sys
    Image name: igdpmd64.sys
    Timestamp:        Fri Jun 10 12:16:00 2011 (4DF25F60)
    CheckSum:         00BB2563
    ImageSize:        00BAA100
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
 
 
See Also,
Windows Crash Dump Analysis
Troubleshooting Memory Errors
How to Perform an Offline System Integrity Verification
Enable Driver Verifier to Help Identify Blue Screen Causes






 

No comments:

Post a Comment