KVM Irqfd Introduction

Table of Contents

1 About this page

This page is going to sololy talk about irqfd in KVM, on what it is, and also how it is implemented. The code references come from Linux 4.14.

2 What does irqfd do?

Irqfd in KVM is implemented based on eventfd in Linux. Eventfd is a simple and fast notification mechanism in Linux kernel to deliver a single and simple message, just like interrupts.

As its name shows, irqfd is basically a fd that is bound to an interrupt in the virtual machine. Here the fd must be an eventfd. The delivery path is single direction, say, interrupt is delivered from outside world into the guest. There is another thing called ioeventfd which does the reverse thing that is a hypercall from guest to the host, which is out of the scope of this article.

With irqfd, if we want to trigger an interrupt we have setup, what we need to do is only write to that corresponding eventfd. To write it in userspace, a simple write() syscall would suffice (actually there is a libc call named eventfd_write(), however that's merely a wrapper of the write() system call). To do it in kernel, we can use eventfd_signal() instead.

3 How is irqfd implemented?

The main function to look at is kvm_irqfd_assign(). Firstly, one irqfd is represented as a struct kvm_kernel_irqfd object in KVM. Here gsi is the index of the guest's interrupt table that this irqfd corresponds to. Two works are initialized for interrupt injection and destruction of irqfds:

INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);

Irqfd only accept eventfds, so we'll check against this and cache the eventfd context:

eventfd = eventfd_ctx_fileget(f.file);
if (IS_ERR(eventfd)) {
        ret = PTR_ERR(eventfd);
        goto fail;
}

The most important thing for irqfd is to connect the fd signal to an gsi interrupt in guest. Here we need to provide our own wait queue handler to do that deliver to show how do we want to be waked up, which is irqfd_wakeup():

init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);

Setup the wait queue entry is not enough, we need to do explicit poll() to start polling on that fd, so when someone issues a write or signal, we can be notified:

init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
...
events = f.file->f_op->poll(f.file, &irqfd->pt);

Here the init_poll_funcptr() is not that important, it just let the poll() code know how to setup the wait process. Here we just add the wait entry onto the eventfd context's wait queue head. The following f.file->f_op->poll() actually triggers the eventfd_poll(), then the common poll_wait() to insert that wait queue entry onto the context's wait queue.

Date: 2017-12-07 15:46:05 CST

Author: peterx

Org version 7.7 with Emacs version 25

Validate XHTML 1.0