"If a VMM emulates a guest instruction that would encounter a debug trap (single step or data or I/O breakpoint), it should cause that trap to be delivered."
After reading the subsequent paragraph going on about injecting single-step debug exceptions, I created a new project in Visual Studio, confirming the suspicions in the back of my head:
The issue here is that, by not injecting a debug exception after the emulated CPUID instruction, the trap is essentially delayed until the NOP can recognize it.
This brings us to the main point, which is that the phrase "none of the events normally associated take place," albeit used to describe exiting for startup-IPIs, describes every fault-like VM exit.
Single-Step Debug ExceptionsIt's not very hard to handle single-step debug exceptions. I don't know how to make flowcharts, so here's a simple step-by-step explanation:
1. After incrementing the guest instruction pointer (meaning that this can only be done if the instruction has been emulated), check if the trap flag (bit 8) of the guest's RFLAGS is set, and that bit 1 of the guest's IA32_DEBUGCTL isn't set (more on that later).
2. Inject a single-step debug exception using the guest pending debug exception VMCS field (rather than using traditional event injection, due to the way that the processor handles pending debug exceptions) by setting the caused by single step bit to 1.
The format of the pending debug exceptions VMCS field. As said in the manual, even though the structure's format matches that of DR6, the RTM bit has an inverse effect (where if the bit in DR6 would be set, it would not be set in the pending debug exceptions field).
But Wait, There's MoreThis isn't perfect, as shown here:
In this GIF, the effects of the MOV SS blocking of debug exceptions are extended to the NOP instruction, when they should have ended after the CPUID instruction.
The bit format of the guest interruptibility state VMCS field. The blocked by nmi and enclave interruption fields aren't touched as those are generally set and cleared by the processor and are only relevant for specific VM exits.
Similar to the checks done on the trap flag, these bits should only be cleared if the instruction has been emulated, leading to the correct behaviour.
Everything now works as it should.
Closing NotesAlong with the fields mentioned above, I've found that the following should be taken into consideration:
Preserving cr2; depending on the design of the hypervisor, a page fault could be possible.
Preserving SSE register state, including every xmm* register and the mxcsr, as well as setting up the mxcsr so that #XM faults cannot happen, by setting it to a value of 0x1F80. Many compilers, such as MSVC, will optimize certain portions of code to use the SSE instruction set due to its speed.
If the hypervisor interacts with guest memory or VM exits on any I/O port interactions, debug exceptions (as well as alignment checks) should be injected for both, using the aforementioned pending debug exceptions VMCS field. Do note that a debug exception being encountered will not stop the instruction from executing, and I/O port debug exceptions are checked using a sign-extended I/O port address (the last point may be wrong - make sure to consult the manual).
VM exits encountered while the processor is injecting an event into the guest will cause it to fill out the IDT vectoring information field, which contains the event that was being injected. Ignoring this field could cause the event to be lost in the infinite void. Note that this can be filled out even if the hypervisor did not inject the event. More information can be found in sections 24.9.2, 27.2, and 126.96.36.199 of volume 3.
Of course, not everything is covered here, such as the handling of the trap flag if a write to IA32_DEBUGCTL causes a VM exit (which is covered in 32.2.1). Always consult the Intel manual before listening to a stranger on the internet.