LinkedIn Sourceforge

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)

How to move bhyve VM and Jail container from one host to another host ?

Posted on 2025-02-16 10:21:00 from Vincent in FreeBSD

After having played with bhyve VM and Jail container in my previous posts, I'm investigating here how to migrate them from one host to another.

Unfortunately FreeBSD does not have "live migration" yet. But here you will see that the downtime could be around 1 second. For many applications such down time is not noticeable


Introduction

Been able to move VM from one host to another one is a feature that lot of sys admin are looking for. It can be to better spread the workload or to facilitate an upgrade, a maintenance activity, ...

And when this can be done without impacting the end users, this is much better for everyone.

I will explain how I did it for bhyve VM and for Jail container. Indeed, as already explained with a Jail container, we have access to all VM files directly from the host. As you could guess, we could use tools like rsync to facilitate the task.

Byhyve VM migration

Thanks to a tool called vm-bhyve, we can easily perform such migration for bhyve VM.

Preparations

It's advised that both "source" and "destination" hosts have the same OS and the same version.
In my case both are running FreeBSD. The source machine ghostbsd is a laptop running GhostBSD 24.10, so FreeBSD 14.1. The destination machine hostdst is running FreeBSD 14.2.

On both source and destination, I've installed vm-bhyve.

On source host, I've created a VM running Alpine Linux and having the sshd daemon up and running permanently.

I do not comeback on the details to create a VM with vm-bhyve, but you can find here after his config:

loader="uefi"
cpu="2"
memory="4G"
network0_type="virtio-net"
network0_switch="public"
disk0_type="virtio-blk"
disk0_name="disk0.img"
grub_install0="linux /boot/vmlinuz-lts initrd=/boot/initramfs-lts alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage,sr-mod"
grub_install1="initrd /boot/initramfs-lts"
grub_run0="linux /boot/vmlinuz-lts root=/dev/vda3 modules=ext4"
grub_run1="initrd /boot/initramfs-lts"
uuid="80c49b96-e648-11ef-ae5d-503eaaxx"
network0_mac="58:9c:fc:xx:xx:xx"

Before we migrate it, we will start it:

root@ghostbsd:~ # vm start alpinevm
Starting alpinevm
  * found guest in /vm/alpinevm
  * booting...
root@srchost:~ # vm list
NAME      DATASTORE  LOADER     CPU  MEMORY  VNC  AUTO  STATE
alpinevm  default    uefi       2    4G      -    No    Running (27011)

To measure the downtime, I've build a small loop doing ping:

#!/bin/sh
trap "echo 'Exiting...'; exit 0" SIGINT

while true; do
    if ping -c 1 -W 500 alpinevm > /dev/null; then
        echo "$(date) ping ok"
        sleep 1
    else
        echo "$(date) ping NOT ok"
    fi
done

Informations from this script will allow us to better see how long the VM will be down.

Migrate

Thanks to vm-bhye, the migration of a VM is really simple:

vm migrate alpinevm hostdst

More simple, this is difficult ;).
In short, you ask "vm" to migrate the VM called "alpinevm" to the new host called hostdst

Here after the results of the command:

root@ghostbsd:~ # vm migrate alpinevm hostdst
Attempting to send alpinevm to hostdst
  * remote dataset rpool/vm/alpinevm
  * source guest is powered on (#27011)
  * stage 1: taking snapshot 20250215095904-s1
  * stage 1: sending zroot/vm/alpinevm@20250215095904-s1
  * stage 1: snapshot sent
  * stage 2: attempting to stop guest.
  * stage 2: guest powered off
  * stage 2: taking snapshot 20250215095947-s2
  * stage 2: sending zroot/vm/alpinevm@20250215095947-s2 (incremental source 20250215095904-s1)
  * stage 2: snapshot sent
  * attempting to start alpinevm on hostdst
Starting alpinevm
  * found guest in /vm/alpinevm
  * booting...
  * removing snapshots
  * done

So, in short, the migration consist to take a snapshot called "-s1", send it to hostdst.
Since this is the first snapshot send to hostdst, it take quite a long time. In my case this was a disk of 10GB on which alpine linux takes about 255MB of space. So, we should have lot of "zero" bytes.
Then, the VM is stopped, an incremental snapshot is taken. Which is normaly very quick. And finally the VM is restarted.

By doing the sync in 2 stages, we minimize to strict minimum the down time of the VM.

To better see the amount of down time, here after the results of my check script:

Sat Feb 15 09:59:42 CET 2025 ping ok
Sat Feb 15 09:59:43 CET 2025 ping ok
Sat Feb 15 09:59:46 CET 2025 ping NOT ok
Sat Feb 15 09:59:47 CET 2025 ping NOT ok
Sat Feb 15 09:59:49 CET 2025 ping NOT ok
Sat Feb 15 09:59:50 CET 2025 ping NOT ok
Sat Feb 15 09:59:52 CET 2025 ping NOT ok
Sat Feb 15 09:59:53 CET 2025 ping NOT ok
Sat Feb 15 09:59:55 CET 2025 ping NOT ok
Sat Feb 15 09:59:55 CET 2025 ping ok
Sat Feb 15 09:59:56 CET 2025 ping ok

So, to be clear, 2 first ping the VM runs on ghostbsd host and last 2 ping, the VM runs on hostdst. Amzing ;).

So, 6 seconds of downtime, this is not bad ;).
I do not have references with other hypervisors, but this sounds very good to my eyes.

To be able to run a 2nd migration, I first delete all what this first migration has made on hostdst.
This can be done by doing "zfs destroy" of the 2 snapshots and "zfs destroy" of the dataset.

Thanks to "vm migrate", we can do it a bit more clever and have a better control on when the migration can occurs.
Indeed, we can dissociate the 2 stages in 2 different commands.

root@ghostbsd:~ # time vm migrate -1 alpinevm hostdst
Attempting to send alpinevm to hostdst
  * remote dataset rpool/vm/alpinevm
  * source guest is powered on (#28088)
  * stage 1: taking snapshot 20250215100301-s1
  * stage 1: sending zroot/vm/alpinevm@20250215100301-s1
  * stage 1: snapshot sent
  * done

________________________________________________________
Executed in   39.25 secs    fish           external
   usr time    7.36 secs  463.00 micros    7.36 secs
   sys time    5.57 secs  252.00 micros    5.57 secs

I'm making some changes in few files inside the alpinevm machine.

root@ghostbsd:~ # time vm migrate -2 -i 20250215100301-s1 alpinevm hostdst
Attempting to send alpinevm to hostdst
  * remote dataset rpool/vm/alpinevm
  * source guest is powered on (#28088)
  * stage 2: attempting to stop guest.
  * stage 2: guest powered off
  * stage 2: taking snapshot 20250215101544-s2
  * stage 2: sending zroot/vm/alpinevm@20250215101544-s2 (incremental source 20250215100301-s1)
  * stage 2: snapshot sent
  * attempting to start alpinevm on hostdst
Starting alpinevm
  * found guest in /vm/alpinevm
  * booting...
  * removing snapshots
  * done

________________________________________________________
Executed in    5.18 secs      fish           external
   usr time  117.18 millis  396.00 micros  116.78 millis
   sys time   69.82 millis  212.00 micros   69.61 millis

By using the time command, we can better see the effective duration of both stages.

On the new host, alpinevm restart, and I find my last changes ;)

By looking at the "pings", we can see that the downtime is the same as in the 1st migration. This is normal since we use the same concept of 2 stages.
The big advantages of this method in 2 apart stages is that we can control exactly when the interruption will occurs.

Sat Feb 15 10:15:38 CET 2025 ping ok
Sat Feb 15 10:15:39 CET 2025 ping ok
Sat Feb 15 10:15:42 CET 2025 ping NOT ok
Sat Feb 15 10:15:43 CET 2025 ping NOT ok
Sat Feb 15 10:15:45 CET 2025 ping NOT ok
Sat Feb 15 10:15:47 CET 2025 ping NOT ok
Sat Feb 15 10:15:48 CET 2025 ping NOT ok
Sat Feb 15 10:15:50 CET 2025 ping NOT ok
Sat Feb 15 10:15:51 CET 2025 ping NOT ok
Sat Feb 15 10:15:53 CET 2025 ping NOT ok
Sat Feb 15 10:15:53 CET 2025 ping ok
Sat Feb 15 10:15:54 CET 2025 ping ok

Lets's now look at how we can migrate Jail's container

At contrary to my previous post, I've not made Jail container with iocage. Indeed, the only possibility I've found to perform an "iocage export", then move a big zip file manually to the other host and perform an "iocage import". This is working smoothly, but is take long minutes to perform the zip.

To apply the same concept of "stages", I've decided to stay with the official Jail command we can find in any FreeBSD install.

Create a Jail

Let's go back to the creation phase. To better understand it, I strongly recommend to look at the Video of Lucas.

There are few simple steps to have a Jail container. It's not that compex, you will see:

1 Make sure you have a root dataset which has a mountpoint. In my case, this is "/vm", you can choose what you want.

root@ghostbsd:~ # zfs get mountpoint  zroot/vm
NAME      PROPERTY     VALUE    SOURCE
zroot/vm  mountpoint   /vm      local

2 Create child datasets for your Jail container:

root@ghostbsd:~ # zfs create -p zroot/vm/jls/jlstest

So, in my case the name fo the Jail will be jlstest

3 Download base.txz associated to FreeBSD version of your host. For my FreeBSD 14.1, I find it here

4 Extract base.txz to: /vm/jls/jlstest

root@ghostbsd:~ # tar -xf base.txz -C /vm/jls/jlstest

5 Adapt some files of your Jail container, create a root's password, set timezone, ...

cp /etc/resolv.conf /vm/jls/jlstest/etc
chroot /vm/jls/jlstest passwd root
tzsetup -sC /vm/jls/jlstest Europe/Brussels
echo "devfs  /dev  devfs  rw  0 0" >> /vm/jls/jlstest/etc/fstab
touch /vm/jls/jlstest/etc/rc.conf

6 Create a config file in /etc/jail.conf.d. It must have the name of your jail's container

root@ghostbsd:~ # cat /etc/jail.conf.d/jlstest.conf

jlstest {
  # STARTUP/LOGGING
  exec.start = "/bin/sh /etc/rc";
  exec.stop = "/bin/sh /etc/rc.shutdown";
  exec.consolelog = "/var/log/jail_console_${name}.log";

  # PERMISSIONS
  allow.raw_sockets;
  exec.clean;
  mount.devfs;

  # HOSTNAME/PATH
  host.hostname = "${name}";
  path = "/vm/jls/${name}";

  # NETWORK
  ip4.addr = 192.168.3.226;
  interface = ue0;
}

Please adapt ip4.addr and interface to your specific context.

Now, we should be able to start it:

root@ghostbsd:~ # jail -vc jlstest
jlstest: run command: /sbin/ifconfig eu0 inet 192.168.3.226 netmask 255.255.255.255 alias
jlstest: run command: /sbin/mount -t devfs -oruleset=4 . /vm/jls/jlstest/dev
jlstest: jail_set(JAIL_CREATE) persist name=jlstest allow.raw_sockets host.hostname=jlstest path=/vm/jls/jlstest ip4.addr=192.168.3.226
jlstest: created
jlstest: run command in jail: /bin/sh /etc/rc
jlstest: jail_set(JAIL_UPDATE) jid=4 nopersist

How to migrate a Jail ?

We will apply the same concept of stages as vm-bhyve does.

So, as stage 1 we do a "zfs send" of the last snapshot.
On stage 2, we do an incremental snapshot, stop the vm, send this incremental snapshot, we have to send the jlstest.conf file and finaly we can start the VM on the new host.

Because I'm very lazy, I do not want to perform those steps each time I want to move the Jail.
so, here you can find stage1 and here stage2.
Those script will not work as they are for your context, so adapt them before hand

root@ghostbsd:~ # sh jls-move_s1.sh
dataset rpool/vm/jls/jlstest not found on hostdst
dataset rpool/vm/jls/jlstest created on hostdst
mount point: /vm/jls/jlstest ***
create snapshot: zroot/vm/jls/jlstest@20250215-215119_s1
send the snapshot zroot/vm/jls/jlstest@20250215-215119_s1 to hostdst in rpool/vm/jls/jlstest
full send of zroot/vm/jls/jlstest@20250215-215119_s1 estimated size is 962M
total estimated size is 962M
TIME        SENT   SNAPSHOT zroot/vm/jls/jlstest@20250215-215119_s1
receiving full stream of zroot/vm/jls/jlstest@20250215-215119_s1 into rpool/vm/jls/jlstest@20250215-215119_s1
21:51:20    136M   zroot/vm/jls/jlstest@20250215-215119_s1
21:51:21    308M   zroot/vm/jls/jlstest@20250215-215119_s1
21:51:22    479M   zroot/vm/jls/jlstest@20250215-215119_s1
21:51:23    660M   zroot/vm/jls/jlstest@20250215-215119_s1
21:51:24    825M   zroot/vm/jls/jlstest@20250215-215119_s1
received 978M stream in 5.87 seconds (167M/sec)
root@ghostbsd:~ #
root@ghostbsd:~ #
root@ghostbsd:~ # jls
   JID  IP Address      Hostname                      Path
    19  192.168.3.226   jlstest                       /vm/jls/jlstest
root@ghostbsd:~ #
root@ghostbsd:~ #
root@ghostbsd:~ # sh jls-move_s2.sh
dataset rpool/vm/jls/jlstest found on hostdst
mount point: /vm/jls/jlstest ***
test ok
We are stopping jlstest 2025-02-15 21:51:43.809097569
jlstest: removed
jail jlstest stopped
create snapshot: zroot/vm/jls/jlstest@20250215-215143_final
send incremental snapshot  with zroot/vm/jls/jlstest@20250215-215143_final to hostdst in rpool/vm/jls/jlstest
send from zroot/vm/jls/jlstest@20250215-215119_s1 to zroot/vm/jls/jlstest@20250215-215143_final estimated size is 143K
total estimated size is 143K
TIME        SENT   SNAPSHOT zroot/vm/jls/jlstest@20250215-215143_final
receiving incremental stream of zroot/vm/jls/jlstest@20250215-215143_final into rpool/vm/jls/jlstest@20250215-215143_final
received 388K stream in 0.03 seconds (14.4M/sec)
copy config file /etc/jail.conf.d/jlstest.conf to hostdst:/etc/jail.cond.d
starting Jail jlstest on hostdst
jlstest: created
Jail restarted on hostdst at 2025-02-15 21:51:44.967719085
root@ghostbsd:~ #

I did not use the same "ping test" as for the "jm migrate" because here it's much faster.
Instead, I've added timestamps in the script.

We can see, that between the stop at 21:51:43.809097569 and the restart at 21:51:44.967719085 we have 1.159 milliseconds. Just a bit more than 1 second !!!

Good observers have noted that I have on source host the config file for destination host.
So indeed, I have jlstest.conf_hostdst in /etc/jail.conf.d. This is the same file as jlstest.conf, but with interface adapted to hostdst.

Conclusion

Build Jail containers and bhyve VMs are quite easy to do.
I must admit that the documentation is not ideal, and I had to look at different sites to find an up-to-date information. Jail containers and bhyve VMs are existing since +10 years, but config of that period are not 100% compliant with the current version.

Thanks to ZFS, migration of VM is really simple and fast. Even for big .img file of 10GB.

Jail containers are amazing in term of performance, capabilities, ...

So, for my own needs, choices are clear:

  • Jail container always
  • bhyve VM when jail container is not possible.

I've not covered the concepts of thin Jail and Service Jail. In my test those were always Thick Jail ;).
Maybe in the future I could review this, but for now, ThickJails are perfect.



0, 0
displayed: 1485



What is the first letter of the word Python?