[BSOD] UNEXPECTED_KERNEL_MODE_TRAP - tdx.sys stopcode 0x7f daily on VMware fix

[BSOD] UNEXPECTED_KERNEL_MODE_TRAP - tdx.sys stopcode 0x7f daily on VMware fix

Today I was fortunate again to do a debug, at an environment we had several BSOD/bugcheck errors on a daily basis, sometimes twice a day. This time it was probably caused by: tdx.sys. The stopcode was 0x7f. This time the error occured on a Windows Server 2008 R2 virtual machine hosted by a VMware ESXi 5.5 server, with VMware tools ‘complete’ installed. The problem seems a bit similar to this post and the steps towards a solution are also the same, it’s just about to remove the VMCI driver for VMware vShield Endpoint TDI manager. After removing the VMCI driver, the server hadn’t a BSOD in days.

This can simply be done in the Control Panel of Windows Server.

rp_VMCI_001-300x225.jpg

Click on ‘Uninstall a program’

VMCI_002

Select the VMware Tools and click on Change.

VMCI_003

Choose for Modify and click Next.

VMCI_004

Search for the VMCI Driver and select ‘Entire feature will be unavailable’ for the vShield Drivers. Click Next and confirm the changes to be made and watch the magic happen. After the changes are made a reboot of the guest OS is needed. In the eventviewer you get a bugcheck something like this:

The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000007f (0x0000000000000008, 0x0000000080050031, 0x00000000000006f8, 0xfffff880031cc943). A dump was saved in: C:\Windows\MEMORY.DMP.

The bugcheck analysis:

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050031
Arg3: 00000000000006f8
Arg4: fffff880031cc943

Debugging Details:
------------------


DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  7601.19045.amd64fre.win7sp1_gdr.151019-1254

SYSTEM_MANUFACTURER:  VMware, Inc.

VIRTUAL_MACHINE:  VMware

SYSTEM_PRODUCT_NAME:  VMware Virtual Platform

SYSTEM_VERSION:  None

BIOS_VENDOR:  Phoenix Technologies LTD

BIOS_VERSION:  6.00

BIOS_DATE:  04/14/2014

BASEBOARD_MANUFACTURER:  Intel Corporation

BASEBOARD_PRODUCT:  440BX Desktop Reference Platform

BASEBOARD_VERSION:  None

DUMP_TYPE:  1

BUGCHECK_P1: 8

BUGCHECK_P2: 80050031

BUGCHECK_P3: 6f8

BUGCHECK_P4: fffff880031cc943

BUGCHECK_STR:  0x7f_8

TRAP_FRAME:  fffff80001660e70 -- (.trap 0xfffff80001660e70)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffffa800d2fdcf0 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000118 rsi=0000000000000000 rdi=0000000000000000
rip=fffff880031cc943 rsp=fffff880025f5fd0 rbp=fffffa800d2fde40
 r8=0000000071544c4b  r9=fffffa800dda45fc r10=fffffa800fbac06c
r11=fffff880025f6148 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
kltdi+0x2943:
fffff880`031cc943 ff15b7780000    call    qword ptr [kltdi+0xa200 (fffff880`031d4200)] ds:fffff880`031d4200={nt!ExAllocatePoolWithTag (fffff800`019be0f0)}
Resetting default scope

CPU_COUNT: 4

CPU_MHZ: 95f

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 1a

CPU_STEPPING: 5

CPU_MICROCODE: 6,1a,5,0 (F,M,S,R)  SIG: 19'00000000 (cache) 19'00000000 (init)

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  0

ANALYSIS_SESSION_HOST:  LTSUP1014

ANALYSIS_SESSION_TIME:  07-13-2016 09:50:11.0801

ANALYSIS_VERSION: 10.0.10586.567 amd64fre

STACK_OVERFLOW: Stack Limit: fffff880025f6000. Use (kF) and (!stackusage) to investigate stack usage.

STACKUSAGE_FUNCTION: The function at address 0xFFFFF8800318DFFC was blamed for the stack overflow. It is using 4864 bytes of stack total in 16 instances (likely recursion).

FOLLOWUP_IP: 
tdx!TdxQueryAddressComplete+21c
fffff880`0318dffc 488b9c2438010000 mov     rbx,qword ptr [rsp+138h]

STACK_COMMAND:  .trap 0xfffff80001660e70 ; kb

THREAD_SHA1_HASH_MOD_FUNC:  bd0dce97fcb42a293c97267799693792f5008d62

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  06c8306092854b675d9de68bd00212adb000c686

THREAD_SHA1_HASH_MOD:  9f0ff9e4e1597b124eb1c512193e221c2d9dc29e

FAULT_INSTR_CODE:  249c8b48

SYMBOL_STACK_INDEX:  5

SYMBOL_NAME:  tdx!TdxQueryAddressComplete+21c

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: tdx

IMAGE_NAME:  tdx.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  561d3401

FAILURE_BUCKET_ID:  X64_0x7f_8_STACK_USAGE_RECURSION_tdx!TdxQueryAddressComplete+21c

BUCKET_ID:  X64_0x7f_8_STACK_USAGE_RECURSION_tdx!TdxQueryAddressComplete+21c

PRIMARY_PROBLEM_CLASS:  X64_0x7f_8_STACK_USAGE_RECURSION_tdx!TdxQueryAddressComplete+21c

TARGET_TIME:  2016-07-06T07:55:49.000Z

OSBUILD:  7601

OSSERVICEPACK:  1000

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  272

PRODUCT_TYPE:  3

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 7

OSEDITION:  Windows 7 Server (Service Pack 1) TerminalServer SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  2015-10-20 01:48:44

BUILDDATESTAMP_STR:  151019-1254

BUILDLAB_STR:  win7sp1_gdr

BUILDOSVER_STR:  6.1.7601.19045.amd64fre.win7sp1_gdr.151019-1254

ANALYSIS_SESSION_ELAPSED_TIME: 15f3

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:x64_0x7f_8_stack_usage_recursion_tdx!tdxqueryaddresscomplete+21c

FAILURE_ID_HASH:  {e2c4910b-b31c-fc5a-b7c1-9b9e24e133a6}

Followup:     MachineOwner
---------

0: kd> lmvm tdx
Browse full module list
start             end                 module name
fffff880`03188000 fffff880`031aa000   tdx        (pdb symbols)          c:\symbols\tdx.pdb\1FD8AABF70A24211AF4EF60863019ECE2\tdx.pdb
    Loaded symbol image file: tdx.sys
    Image path: \SystemRoot\system32\DRIVERS\tdx.sys
    Image name: tdx.sys
    Browse all global symbols  functions  data
    Timestamp:        Tue Oct 13 18:40:33 2015 (561D3401)
    CheckSum:         00021D8A
    ImageSize:        00022000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Branko Vucinec

About Branko Vucinec

Hi! I'm Branko, a Systems Engineer focused on Microsoft technologies from the Netherlands. I enjoy helping organizations with the business and people opportunities and challenges surrounding tech.

Comments