How to Build a Netboot Server, Part 1

Posted by Gregory Bartholomew on November 23, 2018

Some computer networks need to maintain identical software installations and configurations on several physical machines. One such environment would be a school computer lab. A netboot server can be set up to serve an entire operating system over a network so that the client computers can be configured from one central location. This tutorial will show one method of building a netboot server.

Part 1 of this tutorial will cover creating a netboot server and image. Part 2 will show how to add Kerberos-authenticated home directories to the netboot configuration.

Initial Configuration

Start by downloading one of Fedora Server’s netinst images, burning it to a CD, and booting the server that will be reformatted from it. We just need a typical “Minimal Install” of Fedora Server for our starting point and we will use the command line to add any additional packages that are needed after the installation is finished.

NOTE: For this tutorial we will be using Fedora 28. Other versions may include a slightly different set of packages in their “Minimal Install”. If you start with a different version of Fedora, then you may need to do some troubleshooting if an expected file or command is not available.

Once you have your minimal installation of Fedora Server up and running, log in and then become root using this command:

$ sudo -i

Set the hostname:

# MY_HOSTNAME=server-01.example.edu
# hostnamectl set-hostname $MY_HOSTNAME

NOTE: Red Hat recommends that both static and transient names match the fully-qualified domain name (FQDN) used for the machine in DNS, such as host.example.com (Understanding Host Names).

NOTE: This guide is meant to be copy-and-paste friendly. Any value that you might need to customize will be stated as a MY_* variable that you can tweak before running the remaining commands. Beware that if you log out, the variable assignments will be cleared.

NOTE: Fedora 28 Server tends to dump a lot of logging output to the console by default. You may want to disable the console logging temporarily by running: sysctl -w kernel.printk=0

Next, we need a static network address on our server. The following sequence of commands should find and reconfigure your default network connection appropriately:

# MY_DNS1=192.0.2.91
# MY_DNS2=192.0.2.92
# MY_IP=192.0.2.158
# MY_PREFIX=24
# MY_GATEWAY=192.0.2.254
# DEFAULT_DEV=$(ip route show default | awk '{print $5}')
# DEFAULT_CON=$(nmcli d show $DEFAULT_DEV | sed -n '/^GENERAL.CONNECTION:/s!.*:\s*!! p')
# nohup bash << END
nmcli con mod "$DEFAULT_CON" connection.id "$DEFAULT_DEV"
nmcli con mod "$DEFAULT_DEV" connection.interface-name "$DEFAULT_DEV"
nmcli con mod "$DEFAULT_DEV" ipv4.method disabled
nmcli con up "$DEFAULT_DEV"
nmcli con add con-name br0 ifname br0 type bridge
nmcli con mod br0 bridge.stp no
nmcli con mod br0 ipv4.dns $MY_DNS1,$MY_DNS2
nmcli con mod br0 ipv4.addresses $MY_IP/$MY_PREFIX
nmcli con mod br0 ipv4.gateway $MY_GATEWAY
nmcli con mod br0 ipv4.method manual
nmcli con up br0
nmcli con add con-name br0-slave0 ifname "$DEFAULT_DEV" type bridge-slave master br0
nmcli con up br0-slave0
END

NOTE: The last set of commands above is wrapped in a “nohup” script because it will disable networking temporarily. The nohup command should allow the nmcli commands to finish running even while your ssh connection is down. Beware that it may take 10 or so seconds for the connection to come back up and that you will have to start a new ssh connection if you changed the server’s IP address.

NOTE: The above network configuration creates a network bridge on top of the default connection so that we can run a virtual machine instance directly on the server for testing later. If you do not want to test the netboot image directly on the server, you can skip creating the bridge and set the static IP address directly on your default network connection.

Install and Configure NFS4

Start by installing the nfs-utils package:

# dnf install -y nfs-utils

Create a top-level pseudo filesystem for the NFS exports and share it out to your network:

# MY_SUBNET=192.0.2.0
# mkdir /export
# echo "/export -fsid=0,ro,sec=sys,root_squash $MY_SUBNET/$MY_PREFIX" > /etc/exports

SELinux will interfere with the netboot server’s operation. Configuring exceptions for it is beyond the scope of this tutorial, so we will disable it:

# sed -i '/GRUB_CMDLINE_LINUX/s/"$/ audit=0 selinux=0"/' /etc/default/grub
# grub2-mkconfig -o /boot/grub2/grub.cfg
# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/sysconfig/selinux
# setenforce 0

NOTE: Editing the grub command line should not be necessary, but simply editing /etc/sysconfig/selinux proved ineffective across reboots of Fedora Server 28 during testing, so the “selinux=0” flag has been set here to be doubly sure.

Now, add an exception for the NFS service to the local firewall and start the NFS service:

# firewall-cmd --add-service nfs
# firewall-cmd --runtime-to-permanent
# systemctl enable nfs-server.service
# systemctl start nfs-server.service

Create the Netboot Image

Now that our NFS server is up and running, we need to supply it with an operating system image to serve to the client computers. We will start with a very minimal image and add to it after everything is working.

First, create a new directory where our image will be stored:

# mkdir /fc28

Use the “dnf” command to build the image under the new directory with only a few base packages:

# dnf -y --releasever=28 --installroot=/fc28 install fedora-release systemd passwd rootfiles sudo dracut dracut-network nfs-utils vim-minimal dnf

It is important that the “kernel” packages were omitted from the above command. Before they are installed, we need to tweak the set of drivers that will be included in the “initramfs” image that is built automatically when the kernel is first installed. In particular, we need to disable “hostonly” mode so that the initramfs image will work on a wider set of hardware platforms and we need to add support for networking and NFS:

# echo 'hostonly=no' > /fc28/etc/dracut.conf.d/hostonly.conf
# echo 'add_dracutmodules+=" network nfs "' > /fc28/etc/dracut.conf.d/netboot.conf

Now, install the kernel:

# dnf -y --installroot=/fc28 install kernel

Set a rule to prevent the kernel from being updated:

# echo 'exclude=kernel-*' >> /fc28/etc/dnf/dnf.conf

Set the locale:

# echo 'LANG="en_US.UTF-8"' > /fc28/etc/locale.conf

NOTE: Some programs (e.g. GNOME Terminal) will not function if the locale is not properly configured.

Set the client’s hostname:

# MY_CLIENT_HOSTNAME=client-01.example.edu
# echo $MY_CLIENT_HOSTNAME > /fc28/etc/hostname

Disable logging to the console:

# echo 'kernel.printk = 0 4 1 7' > /fc28/etc/sysctl.d/00-printk.conf

Define a local “liveuser” in the netboot image:

# echo 'liveuser:x:1000:1000::/home/liveuser:/bin/bash' >> /fc28/etc/passwd
# echo 'liveuser::::::::' >> /fc28/etc/shadow
# echo 'liveuser:x:1000:' >> /fc28/etc/group
# echo 'liveuser:::' >> /fc28/etc/gshadow

Allow “liveuser” to sudo:

# echo 'liveuser ALL=(ALL) NOPASSWD: ALL' > /fc28/etc/sudoers.d/liveuser

Enable automatic home directory creation:

# dnf install -y --installroot=/fc28 authselect oddjob-mkhomedir
# echo 'dirs /home' > /fc28/etc/rwtab.d/home
# chroot /fc28 authselect select sssd with-mkhomedir --force
# chroot /fc28 systemctl enable oddjobd.service

Since multiple clients will be mounting our image concurrently, we need to configure the image so that it will operate in read-only mode:

# sed -i 's/^READONLY=no$/READONLY=yes/' /fc28/etc/sysconfig/readonly-root

Configure logging to go to RAM rather than permanent storage:

# sed -i 's/^#Storage=auto$/Storage=volatile/' /fc28/etc/systemd/journald.conf

Configure DNS:

# MY_DNS1=192.0.2.91
# MY_DNS2=192.0.2.92
# cat << END > /fc28/etc/resolv.conf
nameserver $MY_DNS1
nameserver $MY_DNS2
END

Work-around a few bugs that exist for read-only root mounts at the time this tutorial is being written (BZ1542567):

# echo 'dirs /var/lib/gssproxy' > /fc28/etc/rwtab.d/gssproxy
# cat << END > /fc28/etc/rwtab.d/systemd
dirs /var/lib/systemd/catalog
dirs /var/lib/systemd/coredump
END

Finally, we can create the NFS filesystem for our image and share it out to our subnet:

# mkdir /export/fc28
# echo '/fc28 /export/fc28 none bind 0 0' >> /etc/fstab
# mount /export/fc28
# echo "/export/fc28 -ro,sec=sys,no_root_squash $MY_SUBNET/$MY_PREFIX" > /etc/exports.d/fc28.exports
# exportfs -vr

Create the Boot Loader

Now that we have an operating system available to netboot, we need a boot loader to kickstart it on the client systems. For this setup, we will be using iPXE. Note you should be logged in to your user account here, not root.

NOTE: This section and the following section — Testing with QEMU — can be done on a separate computer; they do not have to be run on the netboot server.

Install git and use it to download iPXE:

$ sudo dnf install -y git
$ git clone http://git.ipxe.org/ipxe.git $HOME/ipxe

Now we need to create a special startup script for our bootloader:

$ cat << 'END' > $HOME/ipxe/init.ipxe
#!ipxe

prompt --key 0x02 --timeout 2000 Press Ctrl-B for the iPXE command line... && shell ||

dhcp || exit
set prefix file:///linux
chain ${prefix}/boot.cfg || exit
END

Enable the “file” download protocol:

$ echo '#define DOWNLOAD_PROTO_FILE' > $HOME/ipxe/src/config/local/general.h

Install the C compiler and related tools and libraries:

$ sudo dnf groupinstall -y "C Development Tools and Libraries"

Build the boot loader:

$ cd $HOME/ipxe/src
$ make clean
$ make bin-x86_64-efi/ipxe.efi EMBED=../init.ipxe

Make note of where the where the newly-compiled boot loader is. We will need it for the next section:

$ IPXE_FILE="$HOME/ipxe/src/bin-x86_64-efi/ipxe.efi"

Testing with QEMU

This section is optional, but you will need to duplicate the file layout of the EFI system partition that is shown below on your physical machines to configure them for netbooting.

NOTE: You could also copy the files to a TFTP server and reference that server from DHCP if you wanted a fully diskless system.

In order to test our boot loader with QEMU, we are going to create a small disk image containing only an EFI system partition and our startup files.

Start by creating the required directory layout for the EFI system partition and copying the boot loader that we created in the previous section to it:

$ mkdir -p $HOME/esp/efi/boot
$ mkdir $HOME/esp/linux
$ cp $IPXE_FILE $HOME/esp/efi/boot/bootx64.efi

The below command should identify the kernel version that our netboot image is using and store it in a variable for use in the remaining configuration directives:

$ DEFAULT_VER=$(ls -c /fc28/lib/modules | head -n 1)

Define the boot configuration that our client computers will be using:

$ MY_DNS1=192.0.2.91
$ MY_DNS2=192.0.2.92
$ MY_NFS4=server-01.example.edu
$ cat << END > $HOME/esp/linux/boot.cfg
#!ipxe

kernel --name kernel.efi \${prefix}/vmlinuz-$DEFAULT_VER initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=$MY_DNS1 nameserver=$MY_DNS2 root=nfs4:$MY_NFS4:/fc28 console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet
initrd --name initrd.img \${prefix}/initramfs-$DEFAULT_VER.img
boot || exit
END

NOTE: The above boot script shows a minimal example of how to get iPXE to netboot Linux. Much more complex configurations are possible. Most notably, iPXE has support for interactive boot menus which can be configured with a default selection and a timeout. A more advanced iPXE script could, for example, default to booting an operation system from the local disk and only go to the netboot operation if a user pressed a key before a countdown timer reached zero.

Copy the Linux kernel and its associated initramfs to the EFI system partition:

$ cp $(find /fc28/lib/modules -maxdepth 2 -name 'vmlinuz' | grep -m 1 $DEFAULT_VER) $HOME/esp/linux/vmlinuz-$DEFAULT_VER
$ cp $(find /fc28/boot -name 'init*' | grep -m 1 $DEFAULT_VER) $HOME/esp/linux/initramfs-$DEFAULT_VER.img

Our resulting directory layout should look like this:

esp
├── efi
│   └── boot
│       └── bootx64.efi
└── linux
    ├── boot.cfg
    ├── initramfs-4.18.18-200.fc28.x86_64.img
    └── vmlinuz-4.18.18-200.fc28.x86_64

To use our EFI system partition with QEMU, we need to create a small “uefi.img” disk image containing it and then connect that to QEMU as the primary boot drive.

Begin by installing the necessary tools:

$ sudo dnf install -y parted dosfstools

Now create the “uefi.img” file and copy the files from the “esp” directory into it:

$ ESP_SIZE=$(du -ks $HOME/esp | cut -f 1)
$ dd if=/dev/zero of=$HOME/uefi.img count=$((${ESP_SIZE}+5000)) bs=1KiB
$ UEFI_DEV=$(sudo losetup --show -f $HOME/uefi.img)
$ sudo parted ${UEFI_DEV} -s mklabel gpt mkpart EFI FAT16 1MiB 100% toggle 1 boot
$ mkfs -t msdos ${UEFI_DEV}p1
$ mkdir -p $HOME/mnt
$ sudo mount ${UEFI_DEV}p1 $HOME/mnt
$ cp -r $HOME/esp/* $HOME/mnt
$ sudo umount $HOME/mnt
$ sudo losetup -d ${UEFI_DEV}

NOTE: On a physical computer, you need only copy the files from the “esp” directory to the computer’s existing EFI system partition. You do not need the “uefi.img” file to boot a physical computer.

NOTE: On a physical computer you can rename the “bootx64.efi” file if a file by that name already exists, but if you do so, you will probably have to edit the computer’s BIOS settings and add the renamed efi file to the boot list.

Next we need to install the qemu package:

$ sudo dnf install -y qemu-system-x86

Allow QEMU to access the bridge that we created in the “Initial Configuration” section of this tutorial:

$ sudo su -
# echo 'allow br0' > /etc/qemu/bridge.conf
# exit

Create a copy of the “OVMF_VARS.fd” image to store our virtual machine’s persistent BIOS settings:

$ cp /usr/share/edk2/ovmf/OVMF_VARS.fd $HOME

Now, start the virtual machine:

$ qemu-system-x86_64 -machine accel=kvm -nographic -m 1024 -drive if=pflash,format=raw,unit=0,file=/usr/share/edk2/ovmf/OVMF_CODE.fd,readonly=on -drive if=pflash,format=raw,unit=1,file=$HOME/OVMF_VARS.fd -drive if=ide,format=raw,file=$HOME/uefi.img -net bridge,br=br0 -net nic,model=virtio

If all goes well, you should see results similar to what is shown in the below image:

You can use the “shutdown” command to get out of the virtual machine and back to the server:

$ sudo shutdown -h now

NOTE: If something goes wrong and the virtual machine hangs, you may need to start a new ssh session to the server and use the “kill” command to terminate the “qemu-system-x86_64” process.

Adding to the Image

Adding to the image should be a simple matter of chroot’ing into the image on the server and running “dnf install <package_name>”.

There is no limit to what can be installed on the netboot image. A full graphical installation should function perfectly.

Here is an example of how to bring our minimal netboot image up to a complete graphical installation:

# for i in dev dev/pts dev/shm proc sys run; do mount -o bind /$i /fc28/$i; done
# chroot /fc28 /usr/bin/bash --login
# dnf -y groupinstall "Fedora Workstation"
# dnf -y remove gnome-initial-setup
# systemctl disable sshd.service
# systemctl enable gdm.service
# systemctl set-default graphical.target
# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/sysconfig/selinux
# logout
# for i in run sys proc dev/shm dev/pts dev; do umount /fc28/$i; done

Optionally, you may want to enable automatic login for the “liveuser” account:

# sed -i '/daemon/a AutomaticLoginEnable=true' /fc28/etc/gdm/custom.conf
# sed -i '/daemon/a AutomaticLogin=liveuser' /fc28/etc/gdm/custom.conf

For System Administrators

Gregory Bartholomew

Systems Administrator for the department of Computer Science at Southern Illinois University Edwardsville

20 Comments

Milos

Oh no. Another reason to spend money on a small server cluster 🙁

November 23, 2018
- Gregory Bartholomew
  
  Yeah, actually, I probably should have made some mention of the server requirements in the article. I’m using a 16-core server with a 8Gbps xor-bonded network connection (https://en.wikipedia.org/wiki/Link_aggregation). This setup will netboot 50 clients in parallel in about 3 minutes. You can netboot your workstations with a much smaller server, but it might take a bit longer to boot the clients. Also, I mentioned that you could put the kernel and initramfs on a TFTP server, but I wouldn’t recommend it unless you have a fairly big server because the files are so big. Having the kernel and initramfs on the workstations’ local disk significantly reduces the load on the netboot server and speeds up the boot time for the clients.
  
  November 23, 2018
  - Milos
    
    In regards to link aggregation: In 2011 worked as a sysadmin where we got fresh new HPservers and 3 racks full of them to manage.
    
    Each rack with redundant power and switches. Each server had 2 NICs with 4 gbit interfaces per NIC. Our idea was to bond the ETH1_x to ETH2_x, but we couldn’t use any aggregating protocol with the switches/firewalls we had. We had to settle for a bonding protocol that does failover rather than get the full speed of combined interfaces.
    
    What I’m planning in the next few years is to have a small cluster most likely running OpenStack or some such. I don’t expect to have very powerful machines in that cluster. Probably something along the lines of Odroid machines + a decent storage unit.
    
    I also definitely won’t have 50 clients running in parallel, but it’s nice to hear how this works for you!
    
    Looking forward to part 2.
    
    November 23, 2018
    - Gregory Bartholomew
      
      The xor load balancing protocol should work with any brand of switch and no special configuration on the switch should be required. If you have a router between your server and your clients though, then that could create a bottleneck.
      
      November 25, 2018
Thijs

I work at a high school, where everyone is hating the very slow windows 10 computers. It’s not the computers, they are not very fast, but not too slow either. It’s the system. I will not be able to convince our system administrator to use linux. But if I could set this up, maybe I could convince some other colleagues to use it. If it works flawlessly, we still won’t switch to linux, but I could maybe create some linux mind share 🙂

November 23, 2018
- Gregory Bartholomew
  
  I have a similar problem — the general purpose (shared) labs at our university must run windows and the central IT department for our university was unwilling to re-partition all the computers for “dual boot” operation or to allocate space on them for a linux virtual machine. That is why I looked into how to set this up.
  
  I might write a “Part 3” to this series at some point to show how to configure iPXE to present the user with a menu for choosing which OS to load (local or netboot).
  
  BTW: If you use SCCM to manage your windows systems, it can also be used to copy the netboot startup files out to the EFI partition on all your client systems — just use “mountvol s: /s” to mount the EFI partition before running the copy command and then “mountvol s: /d” to un-mount it after the copy is complete.
  
  November 25, 2018
  - fmiz
    
    My original question was something like “Why did you prefer this route instead of using a configuration manager like ansible?”, then I read your answer. So, if you could have space allocated for dual boot, would you still have gone with netbooting?
    
    November 25, 2018
    - Gregory Bartholomew
      
      There are pros and cons to every option. I guess the main trade-off between net-boot versus dual-boot would be the performance of the local storage versus the manageability and consistency of the centralized storage. So, as long as the net-boot performance is adequate for our needs, I’d say the centralized storage is a benefit. If you are trying to maintain multiple copies of the OS, there is always a chance that some copies are not identical to others which, in our case, might leave some student scratching his head during a course assignment walk-through and wondering what he is doing wrong when in actuality, it is just a problem with the workstation that he happened to sit down at. I kind of think consistency is better in that situation — even if they are consistently wrong — for the sake of fairness. There are scalability problems though — a server can only churn out so much IO.
      
      November 26, 2018
Cesar

Will this work for Fedora 29?

November 23, 2018
- Gregory Bartholomew
  
  I did start to setup Fedora 29 to test that, but I quickly discovered that the “read-only root” config file was missing. If you want it to work on Fedora 29, you’ll either have to figure out which packages are missing or — what I would probably do if I really needed Fedora 29 at this point — setup everything with Fedora 28 and then run “dnf distro-sync –releasever=29” to upgrade it.
  
  November 25, 2018
  - Cesar
    
    Great suggestion Gregory!
    Many thanks.
    
    November 25, 2018
Cristian Gherman

network/hostname settings can be done much easier on calameres installer

November 23, 2018
Michael Charissis

IPXE should support http and ftp delivery of the kernel and initramfs. Removing all local storage may be a requirement in certain environments. Thank you for taking the time to share this article.

November 25, 2018
Juan Antonio

¿What about Linux Terminal Server Project?
¿What about solutions like little vnc clients and a on-demand virtual machine server?

We currently have 200+ computers in an student lab
with net booting and an NBD (Network Block Device) as root filesystem configured as Fat Clients with LTSP and LDAP as auth method

No problems at all, booting fast, and really enjoy it… but need to upgrade (using Ubuntu16.04) and we are studying alternatives like virtual machine cloud server and so

Frankly: this tutorial is very interesting, but not sure about how easily scale up to 200+ simultaneous clients…. ¿What about K12Linux project?¿Is still alive?

November 26, 2018
- Gregory Bartholomew
  
  200 simultaneous clients does sound like quite a challenge! I’m not sure that a terminal server solution would be any better though. With netbooting, the RAM and CPU of the clients are being utilized and only the storage is centralized. With a terminal server, the server would have to handle all three — RAM, processing, and disk IO. I guess it all depends on exactly what you have your clients running though.
  
  NBD or iSCSI or AoE are all fine alternatives to NFS. As I stated in the first paragraph of the article — this is one method of building a netboot server. There are many alternatives though; each with their trade-offs.
  
  Sorry, I don’t know anything about K12Linux.
  
  November 26, 2018
- Oscar
  
  IsardVDI (https://github.com/isard-vdi/isard) could be an option (it’s done by mates from Barcelona). Another one much more “famous” is Guacamole (https://guacamole.apache.org)
  
  December 12, 2018
Drew

Great instructions! It is somewhat over my head, but how else am I going to learn?

Is the entire image served read-only? If so, can you make the client image mount a location to store local configurations, like /home?

I’m sorry if you already mentioned this.

November 28, 2018
Gregory Bartholomew

Yes, the entire image is served read-only. I’m currently working on a “Part 2” that will show how to add writable home directories, but I want to show how to do it with kerberos authentication and that is making the article very long, so it looks like it will be broken up into more posts — one on how to configure the home directories on the server and then another on how to reconfigure the netboot client to mount them.

BTW: It is possible to make a writable version of the image, but you would have to set up a system that “cloned” a common template on the server for each user and the user would have to authenticate through iPXE before it attempts to mount the image.

November 28, 2018
Justin

We have a setup similar to this at my work for thinclients which also run Fedora 28. The clients boot over dhcp and pxe. An issue we’ve encountered is running out of RAM space since some of the problems do a lot of logging and eventually the thinclient crashes. We experimented with utilizing the overlay kernel parameter with a directory on a local disk that all the clients have but couldn’t get it to work. Any guides for this sort of setup?

December 2, 2018
- Gregory Bartholomew
  
  I had a quick look at the overlayfs documentation (https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt). It looks like it has some caveats about using it with NFS. Personally, if I wanted a client-side overlay for a read-only netboot target like that, I’d probably switch to using iSCSI for serving the image and write a custom dracut hook (http://man7.org/linux/man-pages/man7/dracut.modules.7.html) to setup the cow overlay at the block level with dmsetup (https://linuxgazette.net/114/kapil.html).
  
  Or, alternatively, you could setup multiple cow overlays server-side on a per-user or per-computer bases before exporting them.
  
  Or, if you can use NBD, it has a “copyonwrite” option that can create temporary COWs (see “man 5 nbd-server” if you have the “nbd” package installed).
  
  Like the PERL folks say — There’s more than one way to do it.
  
  December 3, 2018