NVIDIA has a huge market share in the area of vGPU. Starting from supporting artificial intelligence, deep learning, data science to cloud gaming, the vGPU is getting more and more perceived to the general public. The vGPU can bring a new attack surface to the cloud infrastructures.
Compared to well-analyzed hypervisors, there is still not much research on the security of vGPU and its components. The problem that researchers will face is not only the lack of information but also that all the components of NVIDIA vGPU are closed-source, with no symbols and obfuscated function names.
Regardless of the hypervisor, vGPU has a component called nvidia-vgpu-mgr, running independently on the host. Through a detailed study of it, we have figured out how the guest machine is communicating with the vGPU manager. The guest kernel driver in the guest virtual machine communicates with the host through a mechanism called "vRPC message". This message is first processed by nvidia-vgpu-mgr, reorganized, and then sent via ioctl to the host kernel driver, for further processing.
We've developed several fuzz methods to test if its handler is secure. The security of its kernel drivers is also tested through fuzzing.
So far, we have found multiple vulnerabilities in its vRPC handler. Three of them are OOB write, one of them is OOB read/write, and the last one is an information leak. We have also found a vulnerability in the kernel that can be triggered directly from the guest machine, resulting in arbitrary kernel address writing.
By using these vulnerabilities, regardless of the hypervisor, an attacker could exploit the nvidia-vgpu-mgr from the guest machine and get root access on the host machine.