Windows Server guest gets BSOD/Bugcheck on VMware ESXi 5.5 and 5.5u1 [update]

At a networksite we migrated a few weeks ago from VMware ESXi 5.1 to version 5.5, there was a server that crashed daily with a BSOD/bugcheck error. We couldn’t find a cause for this directly, but it seems there is a memory leak caused by VmWare tools VMCI driver VMware vShield Endpoint TDI manager in VMware ESXi 5.5 and this problem is still here with an update to version 5.5 update 1. The problem lies in the VMCI driver, that comes when you install VMware tools with the option ‘complete’ instead of ‘typical’. A quick way to fix this is to uninstall the VMCI driver for VMware vShield Endpoint TDI manager. This can simply be done in the Control Panel of Windows Server. VMCI_001 Click on ‘Uninstall a program’ VMCI_002 Select the VMware Tools and click on Change. VMCI_003 Choose for Modify and click Next. VMCI_004 Search for the VMCI Driver and select ‘Entire feature will be unavailable’ for the vShield Drivers. Click Next and confirm the changes to be made and watch the magic happen. After the changes are made a reboot of the guest OS is needed. In the eventviewer you get a bugcheck something like this:

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000000a (0x0000000000000350, 0x0000000000000002, 0x0000000000000001, 0xfffff80001676f03). A dump was saved in: C:WindowsMEMORY.DMP.

And if you use WinDbg to analyze the dump you get a report a bit like:

1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
 
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 0000000000000350, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, bitfield :
     bit 0 : value 0 = read operation, 1 = write operation
     bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff80001676f03, address which referenced memory
 
Debugging Details:
------------------
 
 
WRITE_ADDRESS:  0000000000000350
 
CURRENT_IRQL:  2
 
FAULTING_IP:
nt!KeAcquireSpinLockAtDpcLevel+43
fffff800`01676f03 f0480fba2900    lock bts qword ptr [rcx],0
 
DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT
 
BUGCHECK_STR:  0xA
 
PROCESS_NAME:  System
 
ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre
 
TRAP_FRAME:  fffff88001ff58e0 -- (.trap 0xfffff88001ff58e0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000350
rdx=fffffa800c387cf0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001676f03 rsp=fffff88001ff5a70 rbp=0000000000000001
r8=fffffa8007a15980  r9=0000000000000000 r10=fffff880009cdb80
r11=fffff88001ff5c08 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
nt!KeAcquireSpinLockAtDpcLevel+0x43:
fffff800`01676f03 f0480fba2900    lock bts qword ptr [rcx],0 ds:00000000`00000350=????????????????
Resetting default scope
 
LAST_CONTROL_TRANSFER:  from fffff80001681169 to fffff80001681bc0
 
STACK_TEXT: 
fffff880`01ff5798 fffff800`01681169 : 00000000`0000000a 00000000`00000350 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
fffff880`01ff57a0 fffff800`0167fde0 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff880`009cd180 : nt!KiBugCheckDispatch+0x69
fffff880`01ff58e0 fffff800`01676f03 : 00000000`c0000239 fffff880`02f13bdd fffffa80`0c4d9cc0 00000000`00000070 : nt!KiPageFault+0x260
fffff880`01ff5a70 fffff880`02f78a07 : fffffa80`07a15980 00000000`00000001 00000000`00000000 fffffa80`07a15980 : nt!KeAcquireSpinLockAtDpcLevel+0x43
fffff880`01ff5ac0 fffff800`016855d1 : fffffa80`0c387e53 00000000`00000005 00000000`00000000 fffffa80`0c387cf0 : netbt!AcceptCompletionRoutine+0x47
fffff880`01ff5b20 fffff880`02f4060c : fffffa80`0c5f55e0 fffffa80`0c3c3800 fffffa80`0c387cf0 00000000`00000000 : nt!IopfCompleteRequest+0x341
fffff880`01ff5c10 fffff800`019781d3 : fffffa80`07179ab0 fffff800`018272d8 fffffa80`06d10040 fffffa80`06d10040 : vnetflt+0x160c
fffff880`01ff5c80 fffff800`0168b261 : fffff800`01827200 fffff800`01978101 fffffa80`06d10000 fffff800`018272d8 : nt!IopProcessWorkItem+0x23
fffff880`01ff5cb0 fffff800`0191e2ea : e00df00f`001f0116 fffffa80`06d10040 00000000`00000080 fffffa80`06d099e0 : nt!ExpWorkerThread+0x111
fffff880`01ff5d40 fffff800`016728e6 : fffff880`01edf180 fffffa80`06d10040 fffff880`01ee9fc0 000b7419`000a1901 : nt!PspSystemThreadStartup+0x5a
fffff880`01ff5d80 00000000`00000000 : fffff880`01ff6000 fffff880`01ff0000 fffff880`01ff59e0 00000000`00000000 : nt!KxStartSystemThread+0x16
 
STACK_COMMAND:  kb
 
FOLLOWUP_IP:
netbt!AcceptCompletionRoutine+47
fffff880`02f78a07 488d4b60        lea     rcx,[rbx+60h]
 
SYMBOL_STACK_INDEX:  4
 
SYMBOL_NAME:  netbt!AcceptCompletionRoutine+47
 
FOLLOWUP_NAME:  MachineOwner
 
MODULE_NAME: netbt
 
IMAGE_NAME:  netbt.sys
 
DEBUG_FLR_IMAGE_TIMESTAMP:  4ce79386
 
FAILURE_BUCKET_ID:  X64_0xA_netbt!AcceptCompletionRoutine+47
 
BUCKET_ID:  X64_0xA_netbt!AcceptCompletionRoutine+47
 
ANALYSIS_SOURCE:  KM
 
FAILURE_ID_HASH_STRING:  km:x64_0xa_netbt!acceptcompletionroutine+47
 
FAILURE_ID_HASH:  {3a74f055-ea53-e758-90a1-3e612f992e18}
 
Followup: MachineOwner
---------
 
1: kd> lmvm netbt
start             end                 module name
fffff880`02f4f000 fffff880`02f94000   netbt      (pdb symbols)          C:ProgramDatadbgsymnetbt.pdb3D581F5A08614A7CB02D71638469228D2netbt.pdb
    Loaded symbol image file: netbt.sys
    Image path: SystemRootSystem32DRIVERSnetbt.sys
    Image name: netbt.sys
    Timestamp:        Sat Nov 20 10:23:18 2010 (4CE79386)
    CheckSum:         00041134
    ImageSize:        00045000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
1: kd> .trap 0xfffff88001ff58e0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000350
rdx=fffffa800c387cf0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff80001676f03 rsp=fffff88001ff5a70 rbp=0000000000000001
r8=fffffa8007a15980  r9=0000000000000000 r10=fffff880009cdb80
r11=fffff88001ff5c08 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
nt!KeAcquireSpinLockAtDpcLevel+0x43:
fffff800`01676f03 f0480fba2900    lock bts qword ptr [rcx],0 ds:00000000`00000350=???????????????

Sources i’ve used: Windows Bugcheck Analysis,  MCSEboard.de and VMware Community.

UPDATE: You can also see if your server is showing this in the eventlog:

Log Name:      System
Source:        AFD
Date:          DD-MM-YYYY HH:MM:SS
Event ID:      16001
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      hostname.domain.local
Description:
A TDI filter (Drivervnetflt) was detected. This filter has not been certified by Microsoft and may cause system instability.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="AFD" />
    <EventID Qualifiers="32768">16001</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="YYYY-MM-DDTHH:MM:SS.000000000Z" />
    <EventRecordID>41300</EventRecordID>
    <Channel>System</Channel>
    <Computer>hostname.domain.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data>DeviceAfd</Data>
    <Data>Drivervnetflt</Data>
    <Binary>000000000200300000000000813E0080000000000000000000000000000000000000000000000000</Binary>
  </EventData>
</Event>

UPDATE 2: Thanks for Peter Chang’s comment, there is now an official update available, more info at VMware KB2077302.

Branko Vucinec

About Branko Vucinec

Hi! I'm Branko, a Systems Engineer focused on Microsoft technologies from the Netherlands. I enjoy helping organizations with the business and people opportunities and challenges surrounding tech.

Comments