Monday, November 14, 2011

How to Troubleshoot Blue Screen 0xA IRQL_NOT_LESS_OR_EQUAL

The Debugging Tools for Windows are required to analyze crash dump files. If you do not have the Debugging Tools for Windows installed or dump files are not being generated on system crash, see this post for installation/configuration instructions:
http://mikemstech.blogspot.com/2011/11/windows-crash-dump-analysis.html

Before we can start the discussion of why this exception occurs, it is necessary to understand a little bit about how Windows works with regard to interacting with devices. When a device requires the attention of the processor for the system, it generates an interrupt that causes the processor to give the device attention and handle the device's request. The Windows hardware abstraction layer (HAL) maps the hardware interrupt numbers to software interrupt request levels (IRQLs). IRQLs provide a mechanism that allows the system to prioritize interrupts, where the higher numbered interrupts are processed first (and preempt processing at all lower IRQLs). After the interrupt is handled, the processor returns to the previous (lower) IRQL.

The IRQLs are defined in the wdm.h file in the Windows Driver Development Kit (for me this is \WinDDK\7600.16385.1\inc\ddk\wdm.h).

#if defined(_X86_) 
//
// Interrupt Request Level definitions
//

#define PASSIVE_LEVEL 0    // Passive release level
#define LOW_LEVEL 0        // Lowest interrupt level
#define APC_LEVEL 1        // APC interrupt level
#define DISPATCH_LEVEL 2   // Dispatcher level
#define CMCI_LEVEL 5       // CMCI handler level

#define PROFILE_LEVEL 27   // timer used for profiling.
#define CLOCK1_LEVEL 28    // Interval clock 1 level - Not used on x86
#define CLOCK2_LEVEL 28    // Interval clock 2 level
#define IPI_LEVEL 29       // Interprocessor interrupt level
#define POWER_LEVEL 30     // Power failure level
#define HIGH_LEVEL 31      // Highest interrupt level

#define CLOCK_LEVEL                 (CLOCK2_LEVEL)

#endif 
#if defined(_AMD64_) 
//
// Interrupt Request Level definitions
//

#define PASSIVE_LEVEL 0    // Passive release level
#define LOW_LEVEL 0        // Lowest interrupt level
#define APC_LEVEL 1        // APC interrupt level
#define DISPATCH_LEVEL 2   // Dispatcher level
#define CMCI_LEVEL 5       // CMCI handler level

#define CLOCK_LEVEL 13     // Interval clock level
#define IPI_LEVEL 14       // Interprocessor interrupt level
#define DRS_LEVEL 14       // Deferred Recovery Service level
#define POWER_LEVEL 14     // Power failure level
#define PROFILE_LEVEL 15   // timer used for profiling.
#define HIGH_LEVEL 15      // Highest interrupt level

#endif  
 
There are 3 sets of IRQLs defined (x86, x64, and ia64). I focus on x86 and x64 because these platforms comprise the vast majority of systems. Maintaining an IRQL of 0 (PASSIVE_LEVEL) is one of the main goals of the device drivers and the system because all user mode code is executed at the passive level. The thread scheduler for the system operates at IRQL 2 (DISPATCH_LEVEL) and generates interrupts to change the currently executing thread. Device interrupts occur at level 3 and above (and thus prevent the scheduler from switching threads). A direct implication of this interrupt behavior is that device drivers operating at or above DISPATCH_LEVEL cannot access paged memory (due to the context switch required for the file system driver to pull the memory page from disk) and can only use memory from the non-paged pool.   

Bugcheck code 0x0000000A (10 in decimal) occurs when a driver attempts to perform a task that can only be performed at a lower IRQL, such as reading paged memory or performing a task using a call that the thread scheduler can preempt. Since the system as at or above Dispatch (DPC) level, the thread scheduler cannot force required context switch and crashes the system through a call to KeBugCheckEx (this causes an interrupt at HIGH_LEVEL, 31 on x86 and 15 on x64 and prevents any other device interrupts while crash information is saved to the hard drive and the system is brought down safely). The call to KeBugCheckEx results in a blue screen of death (BSOD).

IRQL_NOT_LESS_OR_EQUAL can often be resolved by updating (or downgrading in some cases) the driver that caused the crash (Note, this error is also very similar to 0xD1 DRIVER_IRQL_NOT_LESS_OR_EQUAL. In some cases the BIOS may also need to be updated. The following is an example process for debugging this issue.

Note that this is only an example, the driver causing your error will likely be different.

First, open the crash dump with WinDbg. Click here for instructions on opening a crash dump.

Next, execute the !analyze -v debugger command. Some output including the bug code, stack trace, and suspected driver are output. Details on each part of the analyze output are discussed below.

The !analyze -v output starts out with a description of the parameters passed to KeBugCheckEx. In this case, this crash was caused by the driver attempting to read invalid (or paged) memory at DISPATCH_LEVEL:
2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 00000004, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, bitfield :
 bit 0 : value 0 = read operation, 1 = write operation
 bit 3 : value 0 = not an execute operation, 
                      1 = execute operation (only on chips which support this level of status)
Arg4: 83227b06, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from 82f7b718
Unable to read MiSystemVaType memory at 82f5b160
 00000004 

CURRENT_IRQL:  2

FAULTING_IP: 
hal!HalPutScatterGatherList+a
83227b06 8b4104          mov     eax,dword ptr [ecx+4]

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0xA

PROCESS_NAME:  System
Next we see the address of the trap frame, registers, and stack trace at the time of the crash. For this error, this is less relevant because the driver is well identified. In some cases it may be necessary to dig in further using the driver verifier or by following the trap frames in a full or kernel memory dump to fully rebuild the call stack.
TRAP_FRAME:  b8732af0 -- (.trap 0xffffffffb8732af0)
ErrCode = 00000000
eax=88b81740 ebx=88caf280 ecx=00000000 edx=00000000 esi=89cbc420 edi=88a4b5f8
eip=83227b06 esp=b8732b64 ebp=b8732b6c iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
hal!HalPutScatterGatherList+0xa:
83227b06 8b4104          mov     eax,dword ptr [ecx+4] ds:0023:00000004=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from 83227b06 to 82e5982b

STACK_TEXT:  
b8732af0 83227b06 badb0d00 00000000 8a3fc504 nt!KiTrap0E+0x2cf
b8732b6c 8c80e653 88b81740 00000000 00000000 hal!HalPutScatterGatherList+0xa
b8732b88 92530159 88a4b5f8 00000000 89cbc420 ndis!NdisMFreeNetBufferSGList+0x27
WARNING: Stack unwind information not available. Following frames may be wrong.
b8732be8 9252ca0e 88ca6000 88caf280 0000000a e1k6232+0x16159
b8732c58 9252b093 88ca6000 00000000 b8732ca0 e1k6232+0x12a0e
b8732c74 8c860309 88ca6000 00000000 b8732ca0 e1k6232+0x11093
b8732cb0 8c8416b2 88a4b67c 00a4b668 00000000 ndis!ndisMiniportDpc+0xe2
b8732d10 8c828976 88a4b7d4 00000000 8b43f0e8 ndis!ndisQueuedMiniportDpcWorkItem+0xd0
b8732d50 830216d3 00000002 9f4c557c 00000000 ndis!ndisReceiveWorkerThread+0xeb
b8732d90 82ed30f9 8c82888b 00000002 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19


STACK_COMMAND:  kb
 
Finally, we get some information on the symbols and what the debugger suspects the faulting module is. In this case, it is related to the Intel Wireless card in this laptop (using the driver e1k6232.sys).
FOLLOWUP_IP: 
e1k6232+16159
92530159 ??              ???

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  e1k6232+16159

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: e1k6232

IMAGE_NAME:  e1k6232.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4bbae470

FAILURE_BUCKET_ID:  0xA_e1k6232+16159

BUCKET_ID:  0xA_e1k6232+16159

Followup: MachineOwner
---------

 
In some cases, it may be desirable to check the BIOS version. USe the !sysinfo machineid debugger command to get this information about the BIOS and the make/model of the machine that generated the dump. This was caused by a Dell Latitude e6410 and it is running a BIOS from 2010. In this case, the BIOS is out of date and an update may help resolve the issue that caused this crash.
2: kd> !sysinfo machineid
Machine ID Information [From Smbios 2.6, DMIVersion 38, Size=3634]
BiosMajorRelease = 4
BiosMinorRelease = 6
BiosVendor = Dell Inc.
BiosVersion = A05
BiosReleaseDate = 08/10/2010
SystemManufacturer = Dell Inc.
SystemProductName = Latitude E6410
SystemVersion = 0001
SystemSKU =  
BaseBoardManufacturer = Dell Inc.
BaseBoardProduct = 0667CC
BaseBoardVersion = A01

The dates of all of the drivers loaded at the time of the crash can be determined using the lm n t debugger command. More information about a specific driver can be gained using the lm vm drivername command. This can be helpful to identify whether an old antivirus or an older driver might be contributing to the crash.
2: kd> lm n t
start    end        module name
80bac000 80bb4000   kdcom    kdcom.dll    Mon Jul 13 19:08:58 2009 (4A5BDAAA)
82e13000 83223000   nt       ntkrpamp.exe Fri Jun 18 21:55:24 2010 (4C1C3FAC)
83223000 8325a000   hal      halmacpi.dll Mon Jul 13 17:11:03 2009 (4A5BBF07)
...
924e1000 9251a000   dxgmms1  dxgmms1.sys  Mon Nov 01 20:37:04 2010 (4CCF7950)
9251a000 92553000   e1k6232  e1k6232.sys  Tue Apr 06 01:36:16 2010 (4BBAE470)
92553000 9259e000   USBPORT  USBPORT.SYS  Mon Jul 13 17:51:13 2009 (4A5BC871)
...

2: kd> lmvm usbport
start    end        module name
92553000 9259e000   USBPORT    (deferred)             
    Mapped memory image file: c:\symbols\USBPORT.SYS\4A5BC8714b000\USBPORT.SYS
    Image path: \SystemRoot\system32\DRIVERS\USBPORT.SYS
    Image name: USBPORT.SYS
    Timestamp:        Mon Jul 13 17:51:13 2009 (4A5BC871)
    CheckSum:         0004BC3B
    ImageSize:        0004B000
    File version:     6.1.7600.16385
    Product version:  6.1.7600.16385
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     usbport.sys
    OriginalFilename: usbport.sys
    ProductVersion:   6.1.7600.16385
    FileVersion:      6.1.7600.16385 (win7_rtm.090713-1255)
    FileDescription:  USB 1.1 & 2.0 Port Driver
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

 
Getting further help

If the debugger output references the NT kernel (ntoskrnl.exe, ntkrnlpa.exe, ntkrnlmp.exe, and ntkrnlpamp.exe), the driver verifier may be necessary to further pinpoint the problem.

After analyzing the dump, if you have not been able to solve your issue, then you seek help from the hardware vendor, the forums, or directly from Microsoft. The hardware vendor is the most preferred out of the three. If the vendor determines that there is a bug in the driver, then they may ask for a kernel/full memory dump to help them analyze the problem.

If you seek help in the forums, then be sure to upload the dumps for your system in an accessible location and post a link to the thread that you create. See this post for more details. Users in the forums can rarely tell you more information than is in this post.

Microsoft may not be helpful unless this is related to a Microsoft device driver or a kernel bug, which they will generally tell you it's not a Microsoft bug. Microsoft support is also relatively expensive.

Best of luck!

Have an idea for something that you'd like to see explored? Leave a comment or send an e-mail to razorbackx_at_gmail<dot>com

References:
Mark Russinovich, David Solomon, and Alex Ionescu. Windows Internals: Covering Windows Server 2008 and Windows Vista. 5th edition. Microsoft Press

Bug Check 0xA: IRQL_NOT_LESS_OR_EQUAL

Microsoft Windows Driver Development Kit

3 comments:

  1. Can you please make also a version for ppl with less IQ xD I dont really get it, please just write it like:
    1. Download bla bla bla
    2. Run it
    3. Open this file

    Etc. with pics please

    Ive got the bluescreen problem to, but mine closes when i try to update the game called
    "aion".

    Hope for more help please :)

    ReplyDelete
  2. I agree with Ranger.
    I just got this error yesterday, and now it won't stop. Every time my computer starts, in safe or normal mode, I get this screen. I tried several different ways to try to fix it, and then finally turned to the internet when none of those worked.

    I'm sure this page would be helpful if I knew more about computers, but I'm not getting much out of it. It seems to be over my head.

    ReplyDelete
  3. Received a BSOD and found your website. It's awesome! I agree with the other posters that you go deep into detail. Thank you for that! If we really want to solve a BSOD on our own you make it much more feasible.

    ReplyDelete