Wednesday, March 21, 2012

Troubleshooting 0x7E SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

The Debugging Tools for Windows are required to analyze crash dump files. If you do not have the Debugging Tools for Windows installed or dump files are not being generated on system crash, see this post for installation/configuration instructions:
http://mikemstech.blogspot.com/2011/11/windows-crash-dump-analysis.html

0x0000007E SYSTEM_THREAD_EXCEPTION_NOT_HANDLED is a common bug check (blue screen of death) on the Windows platform (Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2, and Windows 8). This bug check also occurs with an exception code of 0x1000007E. This error indicates a general kernel mode exception that the error handler did not catch. Parameter 1 identifies the exception code that will provide more insight into the cause of the issue. A couple of the common ones are given below:

Exception Code Description
0x80000002: STATUS_DATATYPE_MISALIGNMENT This exception code indicates that an object was not properly aligned with its pointer. This often occurs if a programmer incorrectly calculates the pointer address of an object in an array or other data structure. The error code lookup tool shows the following for this error:

{EXCEPTION}
Alignment Fault
A datatype misalignment was detected in a load or store instruction.
0x80000003: STATUS_BREAKPOINT This error, if encountered outside of development, indicates a really sloppy software release management process by the driver's developers. Software developer use breakpoints to examine the state of an application at a specific point of execution. Often this is to identify the contents of the variables associated with a specific program at a point of execution. This exception itself indicates that a programmer was working on an issue in the code, but left a breakpoint (which generates an exception to stop execution and pass control to the debugger) in the code that was encountered by the system. The error code lookup tool shows the following for this error:

{EXCEPTION}
Breakpoint
A breakpoint has been reached.
0xC0000005: STATUS_ACCESS_VIOLATION This indicates that memory corruption occurred at some level. This is typically due to a driver corrupting the memory/system state and another driver or the system kernel identifying the issue at a later time. The FAULTING_MODULE in WinDbg is not reliable for this exception code. The error code lookup tool shows the following for this error:

The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".
0xC0000006: STATUS_IN_PAGE_ERROR This error indicates an I/O error that possibly points to a hardware issue. The error code lookup tool shows the following for this error:

The instruction at "0x%08lx" referenced memory at "0x%08lx". The required data was not placed into memory because of an I/O error status of "0x%08lx".

Troubleshooting SYSTEM_THREAD_EXCEPTION_NOT_HANDLED is fairly straightforward. For error codes other than 0xc0000005 (STATUS_ACCESS_VIOLATION), the faulting module indicated by kd/Windbg reports the driver (or possibly a related driver in the case of generic drivers like netio.sys and ndis.sys) that needs to be upgraded/downgraded/changed.

The following examples will give troubleshooting ideas for 0xC0000005 and 0xC0000006.

0xC0000006 STATUS_IN_PAGE_ERROR


STATUS_IN_PAGE_ERROR indicates that a memory page(s) were not written to disk or read from the disk due to an IO error. There are various causes for IO errors, but the memory and hard drive should be examined for issues. Below is an example analysis of a dump involving this substatus:


0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000006, The exception code that was not handled
Arg2: 8c3c1532, The address that the exception occurred at
Arg3: 9b2a5398, Exception Record Address
Arg4: 9b2a4f70, Context Record Address

Debugging Details:
------------------


OVERLAPPED_MODULE: Address regions for 'ZTEusbmdm6k' and 'USBSTOR.SYS' overlap

EXCEPTION_CODE: (NTSTATUS) 0xc0000006 - The instruction at 0x%p referenced memory 
                                        at 0x%p. The required data was not placed 
                                        into memory because of an I/O error status 
                                        of 0x%x.

FAULTING_IP: 
nvlddmkm+3a5532
8c3c1532 8b1f            mov     ebx,dword ptr [edi]

EXCEPTION_RECORD:  9b2a5398 -- (.exr 0xffffffff9b2a5398)
ExceptionAddress: 8c3c1532 (nvlddmkm+0x003a5532)
   ExceptionCode: c0000006 (In-page I/O error)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 85e40000
   Parameter[2]: c0000010
Inpage operation failed at 85e40000, due to I/O error c0000010

CONTEXT:  9b2a4f70 -- (.cxr 0xffffffff9b2a4f70)
eax=00000002 ebx=00000000 ecx=92544000 edx=002b0c70 esi=9b2a54bc edi=85e40000
eip=8c3c1532 esp=9b2a5460 ebp=9b2a546c iopl=0         nv up ei ng nz na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010286
nvlddmkm+0x3a5532:
8c3c1532 8b1f            mov     ebx,dword ptr [edi]  ds:0023:85e40000=????????
Resetting default scope

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0xc0000006 - The instruction at 0x%p referenced memory 
                                    at 0x%p. The required data was not placed 
                                    into memory because of an I/O error status 
                                    of 0x%x.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  85e40000

EXCEPTION_PARAMETER3:  c0000010

IO_ERROR: (NTSTATUS) 0xc0000010 - The specified request is not a valid operation 
                                  for the target device.

BUGCHECK_STR:  0x7E

EXCEPTION_STR:  0xc0000006_c0000010

FOLLOWUP_IP: 
+3a5532
85e40000 ??              ???

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: hardware_disk

IMAGE_NAME:  hardware_disk

DEBUG_FLR_IMAGE_TIMESTAMP:  0

STACK_COMMAND:  kb

FAILURE_BUCKET_ID:  0x7E_IMAGE_hardware_disk

BUCKET_ID:  0x7E_IMAGE_hardware_disk

Followup: MachineOwner
---------
 
 
In this particular error, the IO operation failed with 0xC0000010 (STATUS_INVALID_DEVICE_REQUEST: The specified request is not a valid operation for the target device).

0xC0000005 STATUS_ACCESS_VIOLATION


For most bug check codes, 0xC0000005 indicates that the memory and system state have been corrupted (resulting in a crash when the memory corruption is detected). For the majority of crashes with substatus of 0xC0000005, the issue occurs when the memory is corrupted, but the system crashes when the corruption is detected by another driver or the system memory manager. This typically results in another driver (or the kernel itself) getting blamed for the issue (as shown below). Empirically, there is a high probability of a video driver (ATI or Nvidia) being blamed (though this may or may not be true from the explanation above). Below is a typical analysis of a dump showing substatus 0xC0000005 where the error is blamed on the kernel (nt):


0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M (1000007e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 828d1fb0, The address that the exception occurred at
Arg3: 8a743b48, Exception Record Address
Arg4: 8a743720, Context Record Address

Debugging Details:
------------------


EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced 
                                 memory at 0x%08lx. The memory could not be %s.

FAULTING_IP: 
nt!IopGetFileObjectExtension+f
828d1fb0 8b448104        mov     eax,dword ptr [ecx+eax*4+4]

EXCEPTION_RECORD:  8a743b48 -- (.exr 0xffffffff8a743b48)
ExceptionAddress: 828d1fb0 (nt!IopGetFileObjectExtension+0x0000000f)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 5a035b09
Attempt to read from address 5a035b09

CONTEXT:  8a743720 -- (.cxr 0xffffffff8a743720)
eax=00000001 ebx=84ffff80 ecx=5a035b01 edx=00000000 esi=00000800 edi=85676020
eip=828d1fb0 esp=8a743c10 ebp=8a743c10 iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202
nt!IopGetFileObjectExtension+0xf:
828d1fb0 8b448104        mov     eax,dword ptr [ecx+eax*4+4] ds:0023:5a035b09=????????
Resetting default scope

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced 
                       memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  5a035b09

READ_ADDRESS: GetPointerFromAddress: unable to read from 829a8848
Unable to read MiSystemVaType memory at 82987e40
 5a035b09 

FOLLOWUP_IP: 
nt!IopGetFileObjectExtension+f
828d1fb0 8b448104        mov     eax,dword ptr [ecx+eax*4+4]

BUGCHECK_STR:  0x7E

LAST_CONTROL_TRANSFER:  from 828ccc12 to 828d1fb0

STACK_TEXT:  
8a743c10 828ccc12 00000001 00000000 00000800 nt!IopGetFileObjectExtension+0xf
8a743c24 82a7092d 84ffff80 848ac3c0 84ffff68 nt!IoGetRelatedDeviceObject+0x50
8a743c6c 82a61601 84ffff80 84ffff80 84ffff68 nt!IopDeleteFile+0x32
8a743c84 828b7d40 00000000 000c0000 00000000 nt!ObpRemoveObjectRoutine+0x59
8a743c98 828b7cb0 84ffff80 82a66fe1 85b64b18 nt!ObfDereferenceObjectWithTag+0x88
8a743ca0 82a66fe1 85b64b18 85b64b40 829aa980 nt!ObfDereferenceObject+0xd
8a743ccc 828a0f04 85b64b18 00000000 00000000 nt!MiSegmentDelete+0x191
8a743d28 828a1225 848b4020 00000000 00000000 nt!MiProcessDereferenceList+0xdb
8a743d50 82a47fda 00000000 abe63408 00000000 nt!MiDereferenceSegmentThread+0xc5
8a743d90 828f01d9 828a115e 00000000 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19


SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  nt!IopGetFileObjectExtension+f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  4e02a389

STACK_COMMAND:  .cxr 0xffffffff8a743720 ; kb

FAILURE_BUCKET_ID:  0x7E_nt!IopGetFileObjectExtension+f

BUCKET_ID:  0x7E_nt!IopGetFileObjectExtension+f

Followup: MachineOwner
--------- 
 

For issues involving STATUS_ACCESS_VIOLATION, troubleshooting usually starts with these steps:
  • Rule out a hardware issue with the memory or hard drive
  • Enable driver verifier and analyze the dumps after the system crashes again
  • Examine the loaded modules (run the "lm nt" debugger command) and BIOS (!sysinfo machineid) and look for older versions that need to be upgraded
  • Finally, if the system is under warranty/support, contact the manufacturer as it might be a known issue with a resolution provided by the manufacturer

See Also
Windows Crash Dump Analysis



1 comment:

  1. A link for the second troubleshooting bullet point, "enable driver verifier", from one of your other posts: http://mikemstech.blogspot.com/2011/12/enable-driver-verifier-to-help-identify.html

    ReplyDelete