Podman Checkpoint

Posted by Ashutosh Sudhakar Bhakare on February 13, 2023

Podman Checkpoint

Podman is a tool which runs, manages and deploys containers under the OCI standard. Running containers with rootless access and creating pods (a Pod is a group of containers ) are additional features of Podman. Note, however, that checkpointing only works for containers run by root users because it requires root privileges. This article describes and explains how to use checkpointing in Podman to save the state of a running container for later use.

Checkpointing Containers : Checkpoint / Restore In User-space, or CRIU, is Linux software available in the Fedora Linux repository as the “criu” package. It can freeze a running container (or an individual application) and checkpoint its state to disk (Reference : https://criu.org/Main_Page). The saved data can be used to restore the container and run it exactly as it was during the time of the freeze. Using this, we can achieve live migration, snapshots, or remote debugging of applications or containers. This capability requires CRIU 3.11 or later installed on the system.

Podman Checkpoint

# podman container checkpoint <containername>

This command will create a checkpoint of the container and freeze its state. Checkpointing a container will stop the running container as well. If you do podman ps there will be no container existing named <containername>.

You can export the checkpoint to a specific location as a file and copy that file to a different server

# podman container checkpoint <containername> -e /tmp/mycheckpoint.tar.gz

Podman Restore

# podman container restore --keep <containername>

the –keep option will restore the container with all the temporary files.

To import the container checkpoint you can use:

# podman container restore -i /tmp/mycheckpoint.tar.gz

Live Migration using Podman Checkpoint

This section describes how to migrate a container from client1 to client2 using the podman checkpoint feature. This example uses the https://lab.redhat.com/tracks/rhel-system-roles playground provided by Red Hat as it has multiple hosts with ssh-keygen already configured.

The example will run a container with some process on client1, create a checkpoint, and migrate it to client2. First run a container on the client1 machine with the commands below:

podman run --name=demo1 -d docker.io/httpd
podman exec -it demo1 bash
sleep 600& (run a process for verification )
exit

The above snippet runs a container as demo1 with the httpd process which runs a sleep process for 600 seconds ( 10 mins ) in background. You can verify this by doing:

# podman top demo1

USER        PID         PPID        %CPU        ELAPSED          TTY         TIME        COMMAND
root        1           0           0.000       5m40.61208846s   ?           0s          httpd -DFOREGROUND 
www-data    3           1           0.000       5m40.613179941s  ?           0s          httpd -DFOREGROUND 
www-data    4           1           0.000       5m40.613258012s  ?           0s          httpd -DFOREGROUND 
www-data    5           1           0.000       5m40.613312515s  ?           0s          httpd -DFOREGROUND 
root        88          1           0.000       16.613370018s    ?           0s          sleep 600

Now create a container checkpoint and export it to a specific file:

# podman container checkpoint myapache2 -e /tmp/mycheckpoint.tar.gz
# scp /tmp/mycheckpoint.tar.gz client2:/tmp/

Then on client2:

# cd /tmp
# podman container restore -i mycheckpoint.tar.gz
# podman top demo1

You should see the output as follows:

USER        PID         PPID        %CPU        ELAPSED          TTY         TIME        COMMAND
root        1           0           0.000       5m40.61208846s   ?           0s          httpd -DFOREGROUND 
www-data    3           1           0.000       5m40.613179941s  ?           0s          httpd -DFOREGROUND 
www-data    4           1           0.000       5m40.613258012s  ?           0s          httpd -DFOREGROUND 
www-data    5           1           0.000       5m40.613312515s  ?           0s          httpd -DFOREGROUND 
root        88          1           0.000       16.613370018s    ?           0s          sleep 600

In this way you can achieve a live migration using the podman checkpoint feature.

Fedora Project community

Ashutosh Sudhakar Bhakare

16 Comments

Dan DeRemer

I think your definition of Live Migration needs to be revised. There’s no transfer of state or any other indication that the container is handed off live to the destination host.

February 13, 2023
- Dilip
  
  This is cold migration and not live migration.
  
  Aliter method:
  podman-commit followed by any safe copy method and podman run can be used as well.
  
  February 14, 2023
- Ashutosh Bhakare
  
  https://podman.io/getting-started/checkpoint – as per the official docs this seems ok
  
  February 19, 2023
- Marcus
  
  Well, the process state an is preserved, including the operating system timer that is supposed to wake up the sleeping thread.
  
  The wording “live migration” stems from podman’s own documentation – so it’s not Ashutosh’s definition that needs to be revised. It’s podman’s docs, which happen to be a bit sloppy here.
  
  I agree – this is not a live migration. This is a freeze, store, restore, unfreeze – it’s a necessary part of a migration, but calling it a live migration without demonstrating the transfer of IP address ownership, sockets and thus TCP connections preservation, file system persistence and shared memory is maaaaybe a bit proud of the podman documentation. There’s significant parts of the puzzle missing, like setting up a shared storage, the software-defined IP networking and so on. It’s a bit like “here, we draw a circle (show that sleep works). Now, just draw the rest of the hooting owl!”.
  
  February 20, 2023
Stephen

600 seconds is 10 minutes, not 6, which would be 360 seconds.

February 13, 2023
- Richard England
  
  Yep. Good catch
  
  February 13, 2023
- Ashutosh Bhakare
  
  Yes, Apologies
  
  February 19, 2023
Ian

Great Article! Had no idea this was even possible. I would expect the ‘ELAPSED’ values would be different when running top on the 2nd client. Thanks 🙂

February 13, 2023
Scott Dowdle

Thanks for the article. It brings me back to the days of OpenVZ when checkpoint, live migration and restore were new. That was 2006. Good times.

February 14, 2023
- Zain gondalz675@gmail.com
  
  1282
  
  February 16, 2023
- Ashutosh Bhakare
  
  Thanks for your kind appreciation !!
  
  February 19, 2023
Alexander Haas

It should be mentioned that this only works for containers run by root users as checkpointing requires root privileges.
So, rootless containers can not be checkpointed.

February 19, 2023
- Ashutosh Bhakare
  
  Yes Alexander, Agree +1, Not sure if I can edit this article now . Thanks
  
  February 19, 2023
  - Gregory Bartholomew
    
    @Ashutosh: You won’t be able to edit it directly now that the article has gone live, but the Fedora Magazine editors can. If you want to provide some text and instructions for where to place it, I can get it updated.
    
    February 19, 2023
    - Ashutosh
      
      @Gregory : If you can add a small note as suggested by Alexander, Podman only works for containers run by root users as checkpointing requires root privileges.
      
      February 20, 2023
      - Gregory Bartholomew
        
        I’ve added the following sentence to the first paragraph right after rootless containers are mentioned.
        
        Note, however, that checkpointing only works for containers run by root users because it requires root privileges.
        
        February 20, 2023