Virt-v2v converts guests from VMware to KVM, installing any necessary virtio drivers to make the guest work optimally on KVM.

A new feature in virt-v2v 1.37.10 is the ability to import VMware guests directly from disk storage. You do this by pointing virt-v2v directly to the guest’s .vmx metadata file (guest disks are found from references within the VMX file):

$ virt-v2v -i vmx /folder/Fedora_20/Fedora_20.vmx -o local -os /var/tmp
[   0.0] Opening the source -i vmx /folder/Fedora_20/Fedora_20.vmx
[   0.0] Creating an overlay to protect the source from being modified
[   0.1] Initializing the target -o local -os /var/tmp
[   0.1] Opening the overlay
[   6.5] Inspecting the overlay
[  14.0] Checking for sufficient free disk space in the guest
[  14.0] Estimating space required on target for each disk
[  70.8] Creating output metadata
[  70.8] Finishing off

The problem was how to deal with the case where VMware is storing guests on a NAS (Network Attached Storage). Previously we had to go through VMware to read the guests, either over https or using the proprietary ovftool, but both methods are really slow. If we can directly mount the NAS on the conversion server and read the storage, VMware is no longer involved and things go (much) faster. The result is we’re liberating people from proprietary software much more efficiently.

You’re supposed to use VMware’s proprietary libraries to read and write the VMX file format, which is of course out of the question for virt-v2v, so this was an interesting voyage into VMware’s undocumented and unspecified VMX file format. Superficially it seems like a simple list of key/value pairs in a text file:

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "10"
nvram = "Fedora_20.nvram"
pciBridge0.present = "TRUE"
svga.present = "TRUE"
scsi0.virtualDev = "pvscsi"
scsi0.present = "TRUE"
sata0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "Fedora_20.vmdk"
sched.cpu.min = "0"
sched.cpu.shares = "normal"
sched.mem.min = "0"
sched.mem.minSize = "0"
sched.mem.shares = "normal"

But there are some things to catch you out:

  • All values are quoted, even booleans, integers and lists. This means you cannot distinguish in the parser between strings, booleans and other types. The code which interprets the value must know the type and do the conversion and deal with failures.
  • It’s case insensitive, except when it’s case sensitive. In the example above, all of the keys, and the values "TRUE", "normal" and "scsi-hardDisk" are case insensitive. But the filenames ("Fedora_20.vmdk") are case sensitive. The virt-v2v parser tries to be as case insensitive as possible.
  • Keys are arranged into a tree. Adding key.present = "FALSE" causes VMware to ignore all keys at that level and below in the tree.
  • Quoting of values is plain strange. I’ve never seen the | (pipe) character used as a quoting symbol before.

Luckily libvirt has a large selection of VMX files from the wild we could use to test against.