Monday 22 October 2018

Control Register Access Exiting and Crashing VMware

(Updated on November 4 2018 to correct a minor error)

Coinciding with my previous two posts, here's how you can crash or at least detect VMware and many other hypervisors:
https://gist.github.com/drew-gpf/d31840bebbbb1ff1d112a6f46e162c05

Backstory:
When I was writing a simple SEH emulator (following the documentation on msdn as well as this excellent blogpost) for my hypervisor, I was testing under VMware.
When trying to execute that, I found that VMware would instantly close without any message. After being stumped for a while, I tried on my PC only to find that my SEH emulator did, in fact, work.

When I talked about it with my friend daax (whose blog can be found over at https://revers.engineering/), he recalled experiencing the exact same issue (albeit with different motivations): when he tried to unset CR0.pe to cause a #GP(0), VMware would just close on him. This eventually drove me to the conclusion that VMware was improperly handling the CR access VM exit for CR0, or more specifically, they don't check that cr0.pg is already enabled, which would normally cause a #GP(0). Since they write the invalid value into the guest CR0 VMCS field, the processor objects upon VM entry in the form of a VM entry failure, which VMware responds to by just closing itself.

Additional checks in that gist linked above rely on hypervisors not properly emulating CPU behaviour, which includes:
  • Injecting an exception upon updating bits of CR0 required to be a certain value by the CPU for VMX operation or just not updating them at all
  • Not injecting a fault when the guest attempts to set a reserved bit of CR4, which can result in either VM entry failure or a triple fault due to repeated #GP(0)s
  • Not forcing reserved bits of CR0 to 0 (despite me previously stating otherwise, I forgot that reserved bits of CR0 won't actually cause a #GP(0) and will be forced to 0)
  • Updating the state of CR0 even though the write caused a #GP(0)

Fixes for such include:
  • Always checking if a change is valid before changing any CPU state
  • For control register bits that are documented, if they are changed, the hypervisor should ensure that the processor supports the bit and if the processor would inject an exception if the bit was changed (i.e with the VMware example, they should check if CR0.pg is set, and if it is, declare the change as invalid)
  • Control register bits that are forced to be a fixed value should be host owned bits which values only change in the read shadow
  • Control register bits which don't exist at the time of writing the hypervisor should never be allowed to change - this also means that the hypervisor *must* control CPUID responses to remove reserved bits from responses, as well as reserved leafs


Now to shamelessly plug my first post. An easy mechanism to implement what I described is found right here!
Note that, at least for my CPU, reserved bits of CR4 will be marked as bits which must be 0 by the architecture; using my system will just make those bits change in the read shadow, so those bits must be set inside of the "no set" bitmask.

Comments and suggestions are appreciated.

No comments:

Post a Comment

Unexplained crashes, microcode updates, and CPU errata

Computers are complicated machines. It's all too common to be browsing the internet, playing the newest Call of Duty , or just watching ...