Tales of a Code Monkey

Red Hat announces "next-generation" virtualization based on KVM

2008-06-18T14:21:00.000-07:00

Today, at the Red Hat Summit, Red Hat announced three virtualization initiatives including oVirt. The press release is here.

Some choice quotage:

KVM technology has rapidly emerged as the next-generation virtualization technology, following on from the highly successful Xen implementation.

Another good one:

We continue to see huge improvements in functionality, performance and time to market because of our close relationship with our open source partners. For example, Intel and IBM have worked with us for many years covering virtualization technologies that span from Red Hat Enterprise Linux 5 to today's KVM-based announcements.

And of course:

"IBM works closely with Red Hat and the open source community to drive innovation within the Linux kernel," said Daniel Frye, vice president, open systems development at IBM. "IBM has a heterogenous approach toward virtualization, with KVM one of several options. KVM leverages the core features of the Linux kernel, including paravirtualization interfaces contributed by IBM engineers. By combining Linux virtualization infrastructure with open management interfaces such as CIM and libvirt, we gain a solution that eliminates lock-in and open source community innovations, we are able to offer our customers a solution with outstanding performance, scalability and agility."

If you want to see what all the fuss is about, check out KVM.

KVM and Green Computing

2008-06-09T12:20:00.000-07:00

A ran across this article today from Tom Henderson that draws attention to the fact that most existing hypervisors (ESX, Xen, Hyper-V) do not support frequency scaling and therefore are not very eco-friendly.

This is partly true. There has been some recent work in Xen to add deep sleep state support and I believe even some work on frequency scaling. It is certainly not true though that virtualization and power-consciousness are at odds with each other. KVM is able to leverage all of the work that's been invested into Linux to manage power wisely. Good power management does not cause any sort of performance drop. Reducing the performance of your workload is only going to make the machine run longer and consume more power.

The reason most hypervisors don't support power management is that it's very hard. When inventing a new Operating System, there's a lot of things you have to focus on before you can even start looking at power management. Again, we see the benefits of using an existing Operating System for virtualization.

The truth about KVM and Xen

2008-05-09T15:16:00.001-07:00

When I saw this article in my inbox, I knew I shouldn't bother reading it. I really couldn't help myself though. I'm weak for gossip and my flight was delayed so boredom got the best of me.

I can't blame the tech media for the wild reporting though. The situation surrounding KVM, Xen, and Linux virtualization is pretty confused right now. I'll attempt to do my best to clear things up. I'll make an extra disclaimer though that this is purely my own opinions and does not represent any official position of my employer.

I'm think we can finally admit that we, the Linux community, made a very big mistake with Xen. Xen should have never been included in a Linux distribution. There, I've said it. We've all been thinking it, have whispered it in closed rooms, and have done our bests to avoid it.

I say this, not because Xen isn't useful technology and certainly not because people shouldn't use it. Xen is a very useful project and can really make a huge impact in an enterprise environment. Quite simply, Xen is not, and will never be, a part of Linux. Therefore, including it in a Linux distribution has only led to massive user confusion about the relationship between Linux and Xen.

Xen is a hypervisor that is based on the Nemesis microkernel. Linux distributions ship Xen today and by default install a Linux guest (known as domain-0) and do their best to hide the fact that Xen is not a part of Linux. They've done a good job, most users won't even notice that they are running an entirely different Operating System. The whole situation is somewhat absurd though. It's like if the distributions shipped a NetBSD kernel automatically and switched to using it when you wanted to run a LAMP stack. We don't ship a plethora of purpose-built kernels in a distribution. We ship one kernel and make sure that it works well for all users. That's what makes a Linux distribution Linux. When you take away the Linux kernel, it's not Linux any more.

There is no shortage of purpose-built kernels out there. NetBSD is a purpose-built kernel for networking workloads. QNX is a purpose-built kernel for embedded environments. VxWorks is a purpose-built kernel for real-time environments. Being purpose-built doesn't imply superiority and Linux currently is very competitive in all of these areas.

When the distros first shipped Xen, it was done mostly out of desperation. Virtualization was, and still is, the "hot" thing. Linux did not provide any native hypervisor capability. Most Linux developers didn't even really know that much about virtualization. Xen was a pretty easy to use purpose-built kernel that had a pretty good community. So we made the hasty decision to ship Xen instead of investing in making Linux a proper hypervisor.

This decision has come back to haunt us now in the form of massive confusion. When people talk about Xen not being merged into Linux, I don't think they realize that Xen will *never* be merged into Linux. Xen will always be a separate, purpose-built kernel. There are patches to Linux that enable it to run well as a guest under Xen. These patches are likely to be merged in the future, but Xen will never been a part of the Linux kernel.

As a Linux developer, it's hard for me to be that interested in Xen--for the same reasons I have no interest in NetBSD, QNX, or VxWorks. The same is true for the vast majority of Linux developers. When you think about it, it is really quite silly. We advocate Linux for everything from embedded systems, to systems requiring real-time performances, to high-end mainframes. I trust Linux to run on my dvd player, my laptop, and to run on the servers that manage my 401k. Is virtualization so much harder than every other problem in the industry that Linux is somehow incompatible of doing it well on its own? Of course not. Virtualization is actually quite simple compared to things like real-time.

This does not mean that Xen is dead or that we should have never encouraged people to use it in the first place. At the time, it was the best solution available. At this moment in time, it's still unclear whether Linux as hypervisor is better than Xen in every scenario. I won't say that all users should switch en-masse from Xen to Linux for their virtualization needs. All of the projects I've referenced here are viable projects that have large user bases.

I'm a Linux developer though, and just as others Linux hackers who are trying to make Linux run well in everything from mainframes to dvd players, I will continue to work to make Linux work well as a hypervisor. The Linux community will work toward making Linux the best hypervisor out there. The Linux distros will stop shipping a purpose-built kernel for virtualization and instead rely on Linux for it.

Looking at the rest of the industry, I'm surprised that other kernels haven't gone in the direction of Linux in terms of adding hypervisor support directly to the kernel.

Why is Windows not good enough to act a hypervisor such that Microsoft had to write a new kernel from scratch (Hyper-V)?

Why is Solaris not good enough to act as a hypervisor requiring Sun to ship Xen in xVM? Solaris is good enough to run enterprise workloads but not good enough to run a Windows VM? Really? Maybe :-)

Forget about all of the "true hypervisor" FUD you may read. The real question to ask yourself is what is so wrong with these other kernels that they aren't capable of running virtual machines well and instead have to rely on a relatively young and untested microkernel to do their heavy lifting?

Update: modified some of the text for clarity. Flight delayed more so another round of editing :-)

KVM Forum 2008 Call For Presentations

2008-04-07T21:22:00.000-07:00

This is the Call for Presentations for the second annual KVM Developer's Forum, to be held on June 10-13, 2008, in Napa, California, USA [1]. We are looking for presentations on KVM development, quality assurance, management, security, interoperability, architecture support, and interesting use cases. Presentations are 50 minutes in length; there are also 25-minute mini-presentation slots available.

KVM Forum presentations are an excellent way to inform the KVM development community about your work, and to gather valuable feedback about your approach.

Please send your presentation proposal to the KVM Forum 2008 Content Committee at kf2008-cfp@qumranet.com by April 20th.

KVM Forum 2008 Content Committee:
Dor Laor
Anthony Liguori
Avi Kivity
[1] http://kforum.qumranet.com/KVMForum/about_kvmforum.php

On a personal note, I found KVM Forum 2007 to be one of the best run conferences I've attended. The facilities were great and each talk was interesting. There was a great deal of discussion during each talk. Definitely worth the trip.

KVM for the Mainframe

2008-04-06T16:31:00.000-07:00

kvm-65 was released today. The most interesting feature in this release is support for the s390 architecture, more specifically, the System z9 line of mainframes.

The s390 is the grand-daddy of virtualization. Everything started there. In so many ways, everything we're doing with x86 virtualization is just playing catch-up. The new exciting features like hardware virtualization support and hardware paging support have been in s390 forever.

s390 clearly has a very mature hypervisor. What many people may not know though is that it's normal to run two hypervisors at any given time on s390. At the bottom level, there's PR/SM which divides the machine into rather coarse partitions. Within a PR/SM partition, you can run z/OS or Linux. You can also run z/VM within a PR/SM partition. z/VM is another hypervisor that allows for much more sophisticated features like memory overcommit and processor overcommit. The user has the ability to decide how much hypervisor they need to maximize the efficiency of their workloads.

Within a z/VM partition, you can run z/OS or Linux. The beauty of s390 is that this configuration has been supported in the hardware for many years and is very fast.

When Linux adopted native support for virtualization, it became obvious that this could be easily supported on the s390. The hardware has long supported this sort of nested virtualization and the implementation turned out to be very straight forward. It helps that the x86 virtualization extensions were inspired by a paper written about s370 almost 30 years ago :-)

What do you get from a platform that has supported virtualization for longer than I've been alive? In this very first release of KVM for s390, it already supports 64-way guests. After two years of development, we've just gotten to supporting 16-way guests on x86.

Exploiting live migration

2008-03-19T08:40:00.001-07:00

Apparently at this year's BlackHat, someone presented a paper about attacking live migration traffic. The paper describes a tool called Xensploit which uses a man-in-the-middle attack on live migration traffic to do all sorts of bad things. The core problem is that Xen live migration is not encrypted. Neither is VMotion traffic so the exploits are equally applicable.

While there's already been a lot of commentary suggesting that live migration shouldn't happen over insecure networks, that's not good enough for me. If you are sending the memory of a VM over the network unencrypted, you might as well not have any passwords on any of your machines since you are exposing all of the VM's sensitive data to anyone on the network.

For IBM Director Virtualization Manager, we go to great lengths to always ensure that Xen live migration traffic is always encrypted. As far as I know, no other Xen management tool is capable of encrypting live migration traffic. If you are using Virtualization Manager, you are protected from Xensploit style attacks.

For KVM, we were careful not to make the same mistakes that had been made for Xen. KVM supports live migration over SSH by default and provides a mechanism for third-parties to encrypt migration traffic in anyway they please.

A preview of gtk-vnc v0.3.3

2008-01-12T10:16:00.000-08:00

Since Dan beat me to blogging about the gtk-vnc 0.3.2 release, I decided to co-opt him for 0.3.3 and post a full two weeks before the release actually happens :-)

The 0.3.3 release will add support for the Tight encoding which is perhaps the most widely supported compressed encoding out there. This was really the last piece in making gtk-vnc a first class VNC client supporting all the protocol options that one would expect a good client to support. Much to my surprise, 0.3.3 will also contain a Firefox plugin that allows a VNC widget to be embedded within your web browser thanks to Rich Jones.

At first, a VNC web-browser plugin may sound like a silly idea. Of course, both RealVNC and TightVNC ship a Java applet VNC client. Clearly, there is demand for embedding a VNC session within a web browser. Besides the obvious concerns about performance, Java applets are severely limited in what they can do. You cannot grab the mouse and you cannot grab arbitrary key events. You really can't build a first class VNC client as a Java applet.

With a gtk-vnc based plugin, you can have a first class VNC client in your web browser. An exciting application of such a technology would be a rich web-based management application for virtualization. Things that were not possible in Java, like full-screening a VNC session, supporting copy/paste and drag-n-drop, are all within the realm of possibility using a gtk-vnc plugin.

There's still a fair bit of work to do to harden the plugin and gtk-vnc, such that it could be trusted to be invoked by any web page, but I'm looking forward to see what this leads to.

First release of extboot

2007-12-05T14:23:00.000-08:00

Today I released the first set of patches for extboot. extboot is an option ROM that allows booting a guest from virtually any type of block device.

Historically, the PC BIOS is only capable of booting from IDE devices. The PC BIOS doesn't need a special driver for every type of IDE controller simply because every IDE controller supports a compatibility mode that dates back to the earliest IBM PCs. The PC BIOS uses this compatibility mode to access the disk thus avoiding having to support dozens of different IDE controllers. When SCSI was introduced, to allow these devices to used for boot, option ROM support was added to the PC BIOS. Every PCI device can provide a piece of ROM memory that the BIOS runs before booting. These ROMs can do horrible things to overwrite portions of the BIOS and trick the BIOS and bootloaders into thinking they are booting from an IDE device when it's really booting from a SCSI device.

Most virtualization solutions offer support for IDE and SCSI devices and include SCSI option ROMs to enable booting from SCSI. Some products, like Xen, also provide paravirtual disk drivers. Up until now, these devices were not bootable. This required guests to have a bootable IDE partition and then another PV disk partition. It's a real pain from an administration perspective. Beyond performance, there are a few reasons to prefer PV disk drivers over SCSI. PV disk drivers allow unlimited support for adding new features whereas with SCSI you are limited to whatever hardware supports.

extboot is an option ROM that can trick the PC BIOS into thinking that any block device is actually an IDE drive. It can be used not only for booting from SCSI devices but also from true PV disk drivers. This is something that, to the best of my knowledge, has never been possible in any x86 virtualization solution.

extboot support should be available in QEMU, KVM, and Xen in the near future so keep an eye out for it :-)

CIM support for KVM and Xen

2007-11-05T08:39:00.001-08:00

As announced on the libvirt list today:

This is the announcement of a new open-source project called libvirt-cim based on libvirt and aiming at offering the complete functionality of libvirt via a CIM provider implementing the DMTF SVPC virtualization model and released under an LGPL licence. A CIM [1] provider is an implementation of a set of standardized interfaces (Common Information Model) whose goal are to provide well defined entry points allowing easier and interoperable management tools to be built. In the case of libvirt-cim, the goal is of course to export the SVPC virtualization model, which then can be used to manage storage, hosts and domains remotely.

Since this new CIM provider is based on libvirt, it supports QEMU, KVM, Xen, and potentially much more. The provider is already quite functional and was developed at IBM by Dan Smith, Jay Gagnon, and Heidi Eckhart. See the announcement for more information. Update: For clarification, the CIM provider only supports Xen today but it should very easy to add support for the other VMMs supported by libvirt.

TPR patching

2007-10-29T15:55:00.000-07:00

I'm heading off to Japan tomorrow morning for the Linux Foundation Japan Symposium but instead of packing like I should, I figured I'd post about an exciting new feature in KVM.

First, a little background. Even when doing hardware accelerated virtualization (using VT or SVM), there is a lot of emulation that is required for IO devices. While there are probably at least 15-20 different devices that must be emulated for a virtual machine, there are only a few that are performance sensitive. The two most notable are the network card and disk controller. Since all Operating Systems support a wide variety of these devices, we can create a fake network card driver that we can emulate in a high performance way and everything works out nicely (these are commonly called paravirtual device drivers).

There are some devices in the modern PC that you cannot write drivers for because there simply aren't that many of them. For instance, there are really only a couple kinds of interrupt controllers so most Operating Systems don't provide a mechanism for loading interrupt controller device drivers. Instead, these devices are baked in deeply within the Operating System's core.

For the most part, none of these devices affect performance significantly. The notably exception is the local APIC. The local APIC is a per-processor interrupt controller whose interface is memory-mapped. This means that an OS communicates with the local APIC by writing to a special memory location. In particular, the local APIC has a feature called the TPR (task priority register). Certain OS's (namely, Windows), access the TPR extremely frequently. If you've used Windows under KVM, you may be familiar with the ACPI work-around which effectively tricks Windows into thinking there isn't a local APIC. The result is a significant increase in performance since we no longer have to emulate thousands of TPR accesses per-second. Unfortunately, ACPI is a useful thing. You can't have SMP without it. Disabling it is not really a great solution to the problem.

At this past KVM Forum, Ben Serebin , from AMD, shared an interesting observation. Windows guests only access the TPR with instructions that are at least 5 bytes. The significance of 5 bytes is that that happens to be the size of an absolute call on the x86. This means that you can replace any of the TPR access instructions with an absolute call without the need to do fancy dynamic translation. If you're very clever about hiding routines within the BIOS (it turns out, Windows always has a valid virtual mapping to the BIOS), you can actually rewrite TPR access instruction to instead be calls to functions, that you provide, that access the TPR in a more efficient way.

Avi Kivity posted an implementation of this to KVM recently. The results are quite dramatic. Windows XP installs are at least twice as fast--perhaps even faster. The very latest Intel processors have a hardware feature that ends up with the same result but the nice thing about a purely software approach is that it will work with older processors.

This code hasn't made it's way into a KVM release yet as it needs a bit more testing and clean-up. I suspect we won't see it in a release for a couple more weeks but once it's there, you can reenable ACPI in your Windows guests and enjoy good performance :-)

The Myth of Type I and Type II Hypervisors

2007-10-08T13:42:00.001-07:00

This has been something that has bothered me for a while that I have never gotten a chance to articulate. In the virtualization community, the terms "type-1" and "type-2" hypervisors get thrown around a lot--often carrying different meanings. Lately, "type-2" is being used as a derogatory term suggesting that a virtualization solution is "lesser" than a true "type-1" hypervisor.

The most common definition of "type-1" and "type-2" seem to be that "type-1" hypervisors do not require a host Operating System. In actuality, all hypervisors require an Operating System of some sort. Usually, "type-1" is used for hypervisors that have a micro-kernel based Operating System (like Xen and VMware ESX). In this case, a macro-kernel Operating System is still required for the control partition (Linux for both Xen and ESX).

The whole argument of micro-kernel vs macro-kernel hosts is a different blog post (just as a spoiler, I think one can make a better argument for macro-kernel hypervisors). I want to focus, instead, on why we have these terms and what they really mean.

Virtualization theory really started with a paper from Gerald Popek and Robert Goldberg called Formal Requirements for Virtualizable Third Generation Architectures. The paper is a mathematical proof of the architectural requirements to allow virtualization. It is very terse and I don't expect most people have read it. The paper focuses on implementing full virtualization on native hardware and focuses on things like whether privileged instructions are trappable. It was written in 1974 and Operating Systems were not actually all that common back then. Many people think the terms "type-1" and "type-2" originated from this paper but that is simply not the case. The paper does mention the concept of recursive virtualization and briefly discusses the requirements to allow one virtual machine to run within another virtual machine.

As best as I can tell, the terms "type-1" and "type-2" originate from a paper by John Robin called Analyzing the Intel Pentium's Capability to Support a Secure Virtual Machine Monitor. This paper was Robin's master thesis at the Naval Postgrade School. There are two versions of the paper available, the actual master's thesis and a condensed version for USENIX 2000.

This paper is really an application of the Popek/Goldberg proof to the Pentium architecture. A few points were missed, but it does a rather good analysis of why the Pentium architecture did not satisfy the Popek/Goldberg requirements for virtualization. Now, some folks at VMware have made a rather compelling case that this is in fact incorrect because the Popek/Goldberg proof does not eliminate the possibility of using dynamic translation. At any rate, Robin makes a distinction between "type-1" and "type-2" VMMs. The reason for the distinction is simple. When discussing "type-1" VMMs that access hardware directly, the set of requirements to enable Secure Virtualization entirely depends on the hardware. When discussing "type-2" VMMs, however, you do not have direct access to hardware so the requirements to enable virtualization are actually at the Operating System interface. A true "type-2" VMM is just a process in an Operating System and is not capable of accessing hardware directly.

The important point to take away here is that all modern virtualization solutions (except for unaccelerated QEMU maybe) are technically "type-1" VMMs according to Robin. The things commonly cited as "type-2" VMMs like VMware Workstation, Parallels, VirtualPC, and KVM all rely on kernel modules which means they do have direct access to hardware. This makes all of these solutions "type-1" VMMs. What's more important though is that the distinction of "type-1" and "type-2" has absolutely no bearings on performance, robustness, or any other qualitative factor. It is merely a distinction made when attempting to formulate a proof about whether virtualization is possible or not. It starts to lose meaning too when an Operating System is capable of supporting a true "type-2" VMM (which arguable, the KVM interface in Linux enables). Does that mean that Linux is a "type-1" VMM and QEMU using the KVM interface is a "type-2" VMM? How can the same solution be both though? IMHO, the introduction of the term "type-2" was really a mistake on Robin's part perhaps as a misunderstanding of the section of the Popek paper regarding recursive virtualization. That's just speculating though. The distinction really just doesn't make much sense in my mind.

So if you've made it this far, I'll hope you agree that these terms really have no practical meaning and will join me in refraining from using them in the future :-)

Coherence for QEMU

2007-02-28T23:20:00.000-08:00

As I have previously discussed, I have been fascinated by the idea of Coherence that Parallels has now officially supporting. After some digging, I think I have a pretty good idea of how it works.

Very similar technologies exist. SeamlessRDP is a special program you can run in a Terminal Services session to expose only a single application over RDP. It works by replacing the normal Shell program (explorer.exe) with a process that uses SetWindowsHookEx to keep track of window creation, destruction, resizing, and movement events. For SeamlessRDP, this information is sent over a special RDP channel.

The RDP session is always full screen and this window position information is used to only show the portion of the RDP session that the window occupies. Since the RDP session is full screen, and the window positions are mapped at the same location in the host as in the RDP session, things like z-order and window dragging Just Work.

To just get the taskbar, you just have to launch explorer and track it's children being careful to not display the desktop window. Here is a screenshot demonstrating this with QEMU, KQEMU, SeamlessRDP, and a slightly modified rdesktop.

I'd like to integrate this all a little more into QEMU. The first thing I'd like to do is eliminate the need for RDP. We can use a paravirtual channel to communicate the windowing information and then just use VNC extensions to communicate that data to the client. A tricky problem is that the session has to be full screen for this to work and QEMU does not provide VGA emulation that supports some weird resolutions (1400x1050--which my laptop uses!). I think this can be solved with software scaling though.

KQEMU is now free software!

2007-02-06T07:30:00.000-08:00

As part of the 0.9.0 release, Fabrice Bellard released KQEMU under the GPL. KQEMU is an accelerator for QEMU that works on older hardware (without hardware virtualization). As part of this release, Fabrice also published detailed technical notes.

You can get the GPLv2 version of KQEMU here. I want to thank Fabrice for doing this. There are a lot of people in the QEMU community who are very happy about this.

QEMU 0.9.0 is now available

2007-02-06T07:28:00.000-08:00

This release has been in the works for quite a while. A whole bunch of changes went in. The official changelog is:


version 0.9.0:

  - Support for relative paths in backing files for disk images
  - Async file I/O API
  - New qcow2 disk image format
  - Support of multiple VM snapshots
  - Linux: specific host CDROM and floppy support
  - SMM support
  - Moved PCI init, MP table init and ACPI table init to Bochs BIOS
  - Support for MIPS32 Release 2 instruction set (Thiemo Seufer)
  - MIPS Malta system emulation (Aurelien Jarno, Stefan Weil)
  - Darwin userspace emulation (Pierre d'Herbemont)
  - m68k user support (Paul Brook)
  - several x86 and x86_64 emulation fixes
  - Mouse relative offset VNC extension (Anthony Liguori)
  - PXE boot support (Anthony Liguori)
  - '-daemonize' option (Anthony Liguori)

But this is just scratching the surface. You can obtain it from the usual place.

KVM, Xen, and the Linux kernel

2007-02-06T07:15:00.000-08:00

I stumbled upon an article from DevX where Ian Pratt is quoted on a number of topics including KVM and upstream merge. I thought what he said about KVM was a little odd, but what disturbed me was that I think the interviewer misinterpreted what Ian said re: upstream merge. Ian said:

Putting Xen into Linux doesn't make sense: hypervisors are different beasts from operating systems, so they share little code.

He's referring to putting the actual hypervisor into the kernel. Unfortunately, the interviewer took this to mean:

Pratt also explained that Xen is no longer actively seeking inclusion in the mainline Linux kernel either.

Which is totally missing the point. We've never wanted the hypervisor to be included in mainline Linux. It's not a part of Linux so I don't even see how we would do it without major rewrites. What we've been trying to get into the kernel is the Linux changes for guest that run on top of the hypervisor.

We are still very interested in getting the Linux changes upstream. In fact, this is a major priority.

Migration in QEMU

2007-01-17T11:54:00.000-08:00

For a long time, I've thought about QEMU migration. For KVM, Qumranet added a static form of migration to QEMU. I've been working with Xen migration for a while now (mostly in the scope of IBM products) and I certainly have learned a lot from it. Honestly, I don't really like how KVM is doing migration so after spending a weekend heads down on V2E, I decided to take some time and implement migration for QEMU.

The biggest problem I have with KVM and Xen's migration is that it uses open TCP ports. This is just such a bad idea. It's a security nightmare to transfer the contents of memory over an unencrypted channel. For QEMU, I decided to allow the user to spawn an external program to setup the channel to send the migration traffic over. This lets a user just use SSH or RSH if they want something that works. This also let's management tools implement their own mechanism. This may use OpenSSL, CIM, or any other mechanisms out there. It also provides a mechanism for implementing some interesting things like light weight checkpointing (although that's another topic).

This does make things a bit more complicated though. Instead of just saying 'migrate hostname' you now have to construct a rather long command like 'migrate "ssh hostname qemu -loadvm -"'. A nice side effect though is that you can completely change the command line arguments in case you have NFS mounts at different locations.

At the moment, I have a static migration patch. I'd like to implement live migration real soon. I think it will be pretty easy. It's just a matter of adding a new set of callbacks to allow devices that may take a long time to save/restore to instead, provide a "live" save/restore callback. We'll just run through the live callbacks first and when they've signaled that they're done, we'll go ahead and activate the non-live callbacks. This probably will only touch the RAM save/restore code at first which seems more than okay to me.

If you're interested in taking a peak, you can just take a look in my QEMU patch queue.

Binary kernel modules are dead in 2008--now what

2006-12-13T19:45:00.000-08:00

It was decided today on LKML that starting in January 2008, binary modules are no longer going to be loadable in the Linux kernel. This has some rather major consequences for a number of virtualization technologies.

Parallels, Win4Lin, kqemu, and VMware rely on binary modules for their Linux products. I suspect all of these products will have a hard time moving their code out of kernel space seeing that it's performance sensitive. So what are they all going to do? I see three possible options: 1) drop Linux support (Win4lin and kqemu disappear completely) 2) build a minimal kernel interface to privilege state and try to develop fast userspace interfaces. I can't see how one could do a fast userspace shadow paging implementation though. 3) open source the kernel bits.

Everyone's focused on management now right? Can you imagine if the VMware binary translator was GPL'd? Kudos to the kernel developers for finally doing the right thing here.

Update: Linus is insisting that the distros merge this patch first before he'll take it.

Coherence for the rest of us?

2006-12-06T20:20:00.000-08:00

There's quite a buzz about Parallels new coherence technology. In principle, there's nothing that exciting going on here from a technology perspective but what we have is the result of competition. Basiclally, Parallels is serving a very useful market that VMware has forgotten -- the minority home user. Coherence allows you to run a virtual machine and have individual applications display their windows in the host OS. Essentially, you have a small program running in the guest OS that exposes the window for an app in the host. This is similar to what Meta-VNC is already doing with VNC. The nice thing is that they've packaged it all up in an easy to use form. So, running Windows apps on Mac OS X is nice and all but what about us Linux users? Well, I've done a little bit of research here and it looks like there are a number of tricks you can do in Windows to get an image of an app. The best way seems to be WM_PRINT/WM_PRINTCLIENT except that it requires support from the application. I don't know how many apps support this since WM_PRINTCLIENT is not handled by the default message handler. I'm thinking of starting though with the opposite case. Let's expose a single application from a Linux guest via VNC. The obvious way to do this is with a special X server. You simply launch your VNC X server on a new display, and then launch your app with the proper DISPLAY environmental variable set. We'll need a custom X server that can actually know how to render the individual windows of course. Plus, popups are going to require some VNC extensions. It should end up being pretty neat though. Definitely a fun little project. Is it useful to run Linux apps under Windows? I don't know. It's a fun project though :-)

Novell Sells Out (In the name of virtualization)?

2006-11-02T20:22:00.000-08:00

Groklaw says it best. Maybe the GPLv3 isn't so bad after all...

Common Neutral Hypervisor? Trademarks and the GPL

2006-10-07T19:35:00.000-07:00

I love Free Software. Like many free software developers, I'm rather concerned about the recent debates regarding the role of trademarks and free software. Just five years ago, the idea that trademarks would be a problem with free software was almost laughable. With high profile projects like Firefox developing questionable trademarking policies, the question of how trademarks affect free software is becoming very important. As I write this, there's a heated debate within the Xen community over XenSource's new trademark terms specifically regarding the Xen trademark. Suffice to say, that the terms concern RedHat enough that they've announced that they're considering renaming Xen to CNH, or Common Neutral Hypervisor. They appear to be concerned that they can not live up to the trademarking terms. Personally, I'm not in a position to comment about the new trademark policy. I try to keep my nose clean of this sort of thing. However, it's times like this that I realize how important the GPL is in defining what free software is and how lost we would be without it (or at least, how much arguing there would be). I wonder if there's enough room in the GPLv3 process to introduce trademark terms...

Optimizing VNC for localhost

2006-10-03T19:01:00.000-07:00

I've got some free time now and have been thinking recently about revisiting QEMU GUI support. Previously, I had a set of patches that implemented a shared memory transport for QEMU's graphic interface. The first change I wanted to make to my old patches, was to use a TCP transport instead of QEMU's char device interface. I quickly realized though that there would be a lot of shared code between this new transport and the VNC transport. At this point, I started thinking about what it would take to add a shared memory transport to VNC. Conceptionally, all this would require is a new encoding type that can send back a shared memory ID. The client would have to send a little more than just a SetPixelFormat though since the bytes-per-line is also needed. What this would allow though, is for the server to allocate a shared memory segment, hand that info over to the client, and the client could then hand that over to the X server. This would have fantastic performance on the localhost case. Reusing the VNC protocol means a much simpler client. I sent off a note to the VNC folks asking to reserve a pseudo-encoding range. Once I get a response, I should be able to hack something up fairly soon.

Finally back to normal

2006-09-14T15:26:00.000-07:00

While it took a bit longer than it should have, I've finally gotten my IBM accounts straightened out again. I've decided to ditch one of them but chances are if you've ever sent me mail at my IBM address, you were using that one. If you gotten a vacation message from my IBM email address or a bounce, you shouldn't have a problem anymore.

Back in Austin

2006-09-11T07:06:00.000-07:00

I'm now back in Austin after quite a lot of travel. I'm officially back at IBM now and should resume blogger as before about my work on Open Source virtualization. My email should now be back to normal. I will be going through the unanswered mail in my INBOX over the next few days. I've had a lot of email trouble lately (which has all now been resolved) so if you don't get a response to something you think I should have responded to, I may have not seen it (or I'm just being lazy :-)).

More email nonsense

2006-08-22T05:22:00.000-07:00

Someone gave me a heads up that aliguori at us ibm com is bouncing right now. I took some time off before switching to being a full timer at IBM (to spend some time w/family and do a little traveling). I go back to IBM on September 6th. I was expecting that my ibm accounts would stick around until then but it doesn't seem that way... Either way, my codemonkey.ws or utexas accounts can be used instead.

Email Glitch

2006-08-21T17:14:00.000-07:00

My cable modem gave out late last night and since I'm out of town I won't be able to restart the stupid thing until September. Fortunately, I was able to bring up a quick relay so that @codemonkey.ws email is now going somewhere. I don't think it was down long enough for retries to timeout but if you sent me an email between midnight-8pm CST today, you may have to resend. If mail servers are caching the MX record aggressively, it'll definitely bounce. On a positive note, I discovered that I can use my utexas account as an authenticated SMTP relay for @codemonkey.ws. Lately, I've been getting a lot of outgoing mail rejected b/c my cable modem's network is on a few blacklists since they're dynamic IPs. This should provide a solution until I get back to Austin and colocate a real server somewhere.