Mkosi-initrd

Jump to: navigation, search
Icon-warning.png
Warning: This is a PoC, the initrd is a vital part of your system, so please be careful and keep a working copy or be ready to rollback to a previous snapshot.

Introduction

mkosi is a tool developed by systemd to build OS images (like kiwi). But, it also builds the initrds used on its bootable images. This functionality started as a separate project in May 2021, and now is part of the main project since Dec 2023. Since 99ca14c it provides a separate mkosi-initrd script to simplify this process.

Pros

  • It builds the initrd from RPM packages.
  • Clear ownership of bugs: no 3rd party (e.g. dracut) deciding what is installed from each package or even modifying them adding custom udev rules or systemd services. E.g., if LVM fails at early boot, the lvm2 package is the main responsible for it.
  • Any change on a package configured to be included would automatically apply to the initrd (if the package triggers initrd rebuild). E.g., if a new systemd version requires a new binary, the initrd generator does not need to be updated.
  • It shifts the maintenance effort from the initrd generator to the responsible package. E.g., when systemd decided to make /usr read-only in the initrd, the initrd generator would not have been affected.

Cons

  • Initrd generation is slower (zypper ref + in). Times improve after enabling RPM file caching (zypper mr -k --all), but it's still slow compared to dracut.
  • Larger initrd size. This will require careful packaging in the long term, but it can be fixed meanwhile using RemoveFiles= via configuration (with this, for now the reduction is from ~100MB to ~50MB in Tumbleweed).
  • More frequent initrd rebuilds (configured list of packages and their dependencies).
  • Limited arch support: x86_64 (UEFI/BIOS), aarch64.
  • No "official" support (yet) for all startup setups: NVMe-TCP, NBD, FCoE.


Features

Architectures
x86_64 (UEFI/BIOS) aarch64 riscv64 s390x ppc64le
yes yes no no no
Local setups
IDE/SATA/SCSI virtiofs LUKS LVM RAID Multipath
yes yes yes* yes yes* yes*
Network setups
NFS CIFS iSCSI NVMe-TCP NBD FCoE
yes* no yes* no no no

* tested, but it needs some manual actions


Initial enablement for openSUSE Tumbleweed

The mkosi-initrd package is available from Factory since 20240801 (v24.3). The current version is v25.3 (available since 20250204).

   localhost:~ # zypper mr -k --all
   localhost:~ # zypper ref
   localhost:~ # zypper in mkosi-initrd

Set INITRD_GENERATOR=mkosi-initrd in /etc/sysconfig/bootloader to use mkosi-initrd when a package calls %{?regenerate_initrd_post} or %{?regenerate_initrd_posttrans} in an RPM scriptlet:

   localhost:~ # echo "INITRD_GENERATOR=mkosi-initrd" >> /etc/sysconfig/bootloader

To try experimental features, or directly use the current development version from the upstream repository, just clone the GitHub repo and create simlynks to the shims provided.

   localhost:~ # git clone http://github.com/systemd/mkosi
   localhost:~ # ln -s $PWD/mkosi/bin/mkosi /usr/local/bin/mkosi
   localhost:~ # ln -s $PWD/mkosi/bin/mkosi-initrd /usr/local/bin/mkosi-initrd
To use the git version from zypper, create symlinks in /usr/bin instead.

Transactional systems using GRUB2

Open a transactional shell, add the PoC repo and install transactional-update from there:

   localhost:~ # transactional-update shell
   transactional update # zypper mr -k --all
   transactional update # zypper ref
   transactional update # zypper in mkosi-initrd
   transactional update # zypper ar http://download.opensuse.org/repositories/home:/afeijoo:/poc/openSUSE_Tumbleweed/?ssl_verify=no mkosi-initrd-poc
   transactional update # zypper ref mkosi-initrd-poc
   transactional update # zypper in --from mkosi-initrd-poc transactional-update

After that, set INITRD_GENERATOR=mkosi-initrd in /etc/sysconfig/bootloader, as described above for regular systems:

   transactional update # echo "INITRD_GENERATOR=mkosi-initrd" >> /etc/sysconfig/bootloader

Then, exit the transactional shell and reboot the system to boot from the new snapshot:

   transactional update # exit
   localhost:~ # reboot

After rebooting, mkosi-initrd can be called from a transactional shell, or directly with transactional-update initrd (remember to reboot after that):

   localhost:~ # transactional-update initrd
   localhost:~ # reboot

Transactional systems using systemd-boot

Not supported yet.

Usage

See man mkosi-initrd(1) for details.


Configuration

  • The distribution configuration is installed in /usr/lib/mkosi-initrd/mkosi.conf{,.d/*.conf}.
  • Custom configuration can be added in /etc/mkosi-initrd/mkosi.conf{,.d/*.conf}.

See man mkosi(1) for details.


Enable new zypper media backend and parallel download support

If you don't want to enable RPM files caching in your host repositories, you can try this new zypper feature using the following configuration:

   # cat /etc/mkosi-initrd/mkosi.conf 
   [Build]
   Environment=
           ZYPP_PCK_PRELOAD=1
           ZYPP_CURL2=1

For more details, see http://lists.opensuse.org/archives/list/factory@lists.opensuse.org/thread/LOCZIG43MFJSTUIQ3VH2CRSYRCBNR4O7/


Known limitations

  • In this early stage you will need to do a lot of things manually, i.e., add packages, configuration files or even kernel modules.


Internals

This section will attempt to describe how to perform some actions that users take for granted because traditional initrd generators (such as dracut) pick configuration from the running system and override/insert custom services/binaries/scripts.

Everything described in this section adds functionality to the initrd using configuration snippets and files in mkosi.extra under /etc/mkosi-initrd, but most of it (not all) could be packaged and installed from an external repository, adding the package name via Packages= in mkosi.conf.


Access the emergency shell

If the emergency shell is reached because the real root failed to be mounted, or simply because the user wanted to open it (e.g., appending rd.emergency to the kernel command line), the following issue can be found:

   Cannot open access to console, the root account is locked.
   See sulogin(8) man page for more details.
   
   Press Enter to continue.

But, why? Taking a look at the real emergency.service (the one provided by systemd, not the one injected by dracut), we can see that it executes systemd-sulogin-shell.

   # systemctl cat emergency.service | grep Exec
   ExecStartPre=-plymouth --wait quit
   ExecStart=-/usr/lib/systemd/systemd-sulogin-shell emergency

By default, this wrapper just calls sulogin without any options. Since mkosi-initrd builds the initrd using distribution packages, the root password cannot be there... and sulogin without the --force option does not allow to open the shell.

Here are two different ways to workaround this issue:

  • Append SYSTEMD_SULOGIN_FORCE=1 to the kernel command line, so systemd-sulogin-shell calls sulogin with --force and the emergency shell can be opened without password.
  • Add the root password to the emergency shell (i.e., via /etc/shadow) using the mkosi-initrd configuration:
   # mkdir -p /etc/mkosi-initrd/mkosi.extra/etc
   # touch /etc/mkosi-initrd/mkosi.extra/etc/shadow
   # chmod 640 /etc/mkosi-initrd/mkosi.extra/etc/shadow
   # chown root:shadow /etc/mkosi-initrd/mkosi.extra/etc/shadow
   # grep '^root:' /etc/shadow >> /etc/mkosi-initrd/mkosi.extra/etc/shadow
   # tree /etc/mkosi-initrd/
   /etc/mkosi-initrd/
   └── mkosi.extra
       └── etc
           └── shadow


Add breakpoints in the boot process

The following section is kept for reference, since systemd-v258 will provide this functionality via rd.systemd.break, see man systemd-debug-generator(8).

Allowing to stop the boot process at a specific point and open a shell is another feature provided by classic initrd generators (e.g., dracut's rd.break= option). But, since mkosi-initrd is not a classic initrd generator, it will not "inject" custom systemd services into the boot process. So, unless this functionality is provided by systemd in the future, it's up to the users to insert their own breakpoints (see man bootup(7) for a clear view of where to order them).

Here is an example adding the following rd.break= options:

  • rd.break=udev: at the very beginning, before systemd-udevd.service starts.
  • rd.break=mount: before the real root is mounted.
  • rd.break=cleanup: at the end, before cleanup and switching root.
   # tree /etc/mkosi-initrd/
   /etc/mkosi-initrd/
   └── mkosi.extra
       └── usr
           └── lib
               └── systemd
                   β”œβ”€β”€ system
                   β”‚   β”œβ”€β”€ mkosi-initrd-break-cleanup.service
                   β”‚   β”œβ”€β”€ mkosi-initrd-break-mount.service
                   β”‚   └── mkosi-initrd-break-udev.service
                   └── system-preset
                       └── 10-mkosi-initrd.preset
   
   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system-preset/10-mkosi-initrd.preset
   enable mkosi-initrd-break-udev.service
   enable mkosi-initrd-break-mount.service
   enable mkosi-initrd-break-cleanup.service
   
   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/mkosi-initrd-break-udev.service
   [Unit]
   Description=mkosi-initrd breakpoint udev
   AssertPathExists=/etc/initrd-release
   DefaultDependencies=no
   Conflicts=shutdown.target emergency.target
   Wants=systemd-journald.socket
   After=systemd-journald.socket
   Before=systemd-udevd.service
   ConditionKernelCommandLine=rd.break=udev
   
   [Service]
   Environment=HOME=/root
   Environment=PS1="udev:$PWD# "
   WorkingDirectory=/root
   Type=oneshot
   ExecStartPre=-plymouth --wait quit
   ExecStart=-/sbin/sulogin --force
   StandardInput=tty-force
   StandardOutput=inherit
   StandardError=inherit
   TasksMax=infinity
   IgnoreSIGPIPE=no
   KillMode=process
   KillSignal=SIGHUP
   
   [Install]
   WantedBy=initrd.target
   
   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/mkosi-initrd-break-mount.service
   [Unit]
   Description=mkosi-initrd breakpoint mount
   AssertPathExists=/etc/initrd-release
   DefaultDependencies=no
   Conflicts=shutdown.target emergency.target
   After=mkosi-initrd-break-udev.service sysinit.target
   Before=initrd-root-fs.target sysroot.mount systemd-fsck-root.service
   ConditionKernelCommandLine=rd.break=mount
   
   [Service]
   Environment=HOME=/root
   Environment=PS1="mount:$PWD# "
   WorkingDirectory=/root
   Type=oneshot
   ExecStartPre=-plymouth --wait quit
   ExecStart=-/sbin/sulogin --force
   StandardInput=tty-force
   StandardOutput=inherit
   StandardError=inherit
   TasksMax=infinity
   IgnoreSIGPIPE=no
   KillMode=process
   KillSignal=SIGHUP
   
   [Install]
   WantedBy=initrd.target
   
   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/mkosi-initrd-break-cleanup.service
   [Unit]
   Description=mkosi-initrd breakpoint cleanup
   AssertPathExists=/etc/initrd-release
   DefaultDependencies=no
   Conflicts=shutdown.target emergency.target
   Wants=remote-fs.target
   After=initrd.target initrd-parse-etc.service sysroot.mount remote-fs.target mkosi-initrd-break-mount.service
   Before=initrd-cleanup.service
   ConditionKernelCommandLine=rd.break=cleanup
   
   [Service]
   Environment=HOME=/root
   Environment=PS1="cleanup:$PWD# "
   WorkingDirectory=/root
   Type=oneshot
   ExecStartPre=-plymouth --wait quit
   ExecStart=-/sbin/sulogin --force
   StandardInput=tty-force
   StandardOutput=inherit
   StandardError=inherit
   TasksMax=infinity
   IgnoreSIGPIPE=no
   KillMode=process
   KillSignal=SIGHUP
   
   [Install]
   WantedBy=initrd.target


Decrypt devices

Since v25, encrypted devices marked with the x-initrd.attach option in /etc/crypttab are automatically decrypted, and also crypto kernel modules are always included, so no further configuration should be needed.

With mkosi-initrd v24 or lower, add /etc/crypttab and necessary crypto kernel modules:

   # cat /etc/mkosi-initrd/mkosi.conf 
   [Content]
   ExtraTrees=/etc/crypttab:/etc/crypttab
   KernelModulesInclude=crypto/


Plymouth

Add the following configuration file with the necessary packages:

   # cat /etc/mkosi-initrd/mkosi.conf.d/10-plymouth.conf
   [Content]
   Packages=
           cantarell-fonts
           distribution-logos-openSUSE-Tumbleweed
           plymouth
           plymouth-branding-openSUSE


Assemble RAID arrays

Add the mdadm package, /etc/mdadm.conf and md kernel modules:

   # cat /etc/mkosi-initrd/mkosi.conf 
   [Content]
   ExtraTrees=/etc/mdadm.conf:/etc/mdadm.conf
   KernelModulesInclude=md/
   Packages=mdadm


Set up network

Before being able to boot from a root device located over the network (NFS, iSCSI, etc), the network must be configured and running in the initrd. Let's dive deeper into this feature, to show some issues encountered due to how mkosi-initrd is designed and how to resolve them.

The following example will set up NetworkManager using dracut's module 35network-manager as a reference (although it should also work similarly with other network handlers, like systemd-networkd).

First, create a configuration file with the necessary packages and kernel modules. E.g.:

   # cat /etc/mkosi-initrd/mkosi.conf.d/20-network.conf 
   [Content]
   Packages=
           NetworkManager
   
           # For libnss_dns
           glibc
   
           # For libnss_mdns4
           nss-mdns
   
           # Nice to have for debugging
           iproute2
           iputils
   
   # Add modules not loaded when the initrd is built
   # KernelModulesInclude=
   #       net/...

The dracut module shows that it also needs a specific configuration snippet:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/NetworkManager/conf.d/nm-initrd.conf 
   [.config]
   enable=env:initrd
   
   [main]
   no-auto-default=*
NetworkManager-1.54 provides systemd services to enable networking in the initrd: NetworkManager-config-initrd.service, NetworkManager-initrd.service and NetworkManager-wait-online-initrd.service. The implementation of systemd services in this section is retained for reference purposes.

Let's go to the boot process. First, the kernel command line must be parsed, taking care of specific network options. Usually, this job is done via dracut cmdline hook using its internal libraries (note: mkosi-initrd does not provide anything like that and it will not do so), but luckily, NetworkManager has a helper binary that already implements this (see man nm-initrd-generator(8) for details). So, we can create a systemd service ordered early at boot:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nm-config-initrd.service 
   [Unit]
   Description=Network Manager Configuration (initrd)
   DefaultDependencies=no
   Wants=systemd-journald.socket
   After=systemd-journald.socket
   Before=systemd-udevd.service
   ConditionPathExists=/etc/initrd-release
   
   [Service]
   Type=oneshot
   ExecStart=/bin/bash -c "/usr/libexec/nm-initrd-generator -- $(< /proc/cmdline)"
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

At this point the NetworkManager configuration should be in /run/NetworkManager/system-connections, so lets rework a bit dracut's nm-initrd.service (BTW, dbus should not be used in the initrd, but this is not the topic we are dealing with):

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nm-initrd.service 
   [Unit]
   Description=Network Manager (initrd)
   DefaultDependencies=no
   Wants=systemd-udev-trigger.service network.target
   After=systemd-udev-trigger.service dbus.service nm-config-initrd.service
   Before=network.target
   ConditionPathExists=/etc/initrd-release
   ConditionPathExistsGlob=|/usr/lib/NetworkManager/system-connections/*
   ConditionPathExistsGlob=|/run/NetworkManager/system-connections/*
   ConditionPathExistsGlob=|/etc/NetworkManager/system-connections/*
   
   [Service]
   Type=dbus
   BusName=org.freedesktop.NetworkManager
   ExecReload=/usr/bin/busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 0
   ExecStart=/usr/sbin/NetworkManager
   KillMode=process
   Environment=NM_CONFIG_ENABLE_TAG=initrd
   Restart=on-failure
   ProtectSystem=true
   ProtectHome=read-only
   
   [Install]
   WantedBy=initrd.target
   Also=nm-config-initrd.service nm-wait-online-initrd.service

And finally, nm-wait-online-initrd.service.

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nm-wait-online-initrd.service 
   [Unit]
   Description=Network Manager Wait Online (initrd)
   DefaultDependencies=no
   Requires=nm-initrd.service
   After=nm-initrd.service
   Before=network-online.target
   ConditionPathExists=/etc/initrd-release
   ConditionPathExistsGlob=|/usr/lib/NetworkManager/system-connections/*
   ConditionPathExistsGlob=|/run/NetworkManager/system-connections/*
   ConditionPathExistsGlob=|/etc/NetworkManager/system-connections/*
   
   [Service]
   Type=oneshot
   ExecStart=/usr/bin/nm-online -s -q -t 3600
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target network-online.target

Preset file to enable these services:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system-preset/20-network.preset 
   enable nm-config-initrd.service
   enable nm-initrd.service
   enable nm-wait-online-initrd.service

The specific configuration should end up looking like this:

   # tree /etc/mkosi-initrd/
   /etc/mkosi-initrd/
   β”œβ”€β”€ mkosi.conf.d
   β”‚   └── 20-network.conf
   └── mkosi.extra
       └── usr
           └── lib
               β”œβ”€β”€ NetworkManager
               β”‚   └── conf.d
               β”‚       └── nm-initrd.conf
               └── systemd
                   β”œβ”€β”€ system
                   β”‚   β”œβ”€β”€ nm-config-initrd.service
                   β”‚   β”œβ”€β”€ nm-initrd.service
                   β”‚   └── nm-wait-online-initrd.service
                   └── system-preset
                       └── 20-network.preset

Let's show that this works, simply by passing ip=dhcp on the kernel command line and checking the emergency shell before switching root:

   localhost:~ # systemctl status nm-config-initrd.service
   ● nm-config-initrd.service - Network Manager Configuration (initrd)
        Loaded: loaded (/usr/lib/systemd/system/nm-config-initrd.service; enabled; preset: enabled)
        Active: active (exited) since Mon 2024-11-11 14:55:02 UTC; 3min 10s ago
    Invocation: 35dfacb4b84a4873927c5641acabc4eb
       Process: 217 ExecStart=/bin/bash -c /usr/libexec/nm-initrd-generator -- $(< /proc/cmdline) (code=exited, status=0/SUCCESS)
      Main PID: 217 (code=exited, status=0/SUCCESS)
           CPU: 6ms
   
   localhost:~ # ls -l /run/NetworkManager/system-connections/
   total 8
   -rw------- 1 root root 360 Nov 11 14:55 default_connection.nmconnection
   -rw------- 1 root root 314 Nov 11 14:55 lo.nmconnection
   
   localhost:~ # systemctl status nm-initrd.service | cat
   β—‹ nm-initrd.service - Network Manager (initrd)
        Loaded: loaded (/usr/lib/systemd/system/nm-initrd.service; enabled; preset: enabled)
        Active: inactive (dead) since Mon 2024-11-11 14:55:02 UTC; 3min 43s ago
    Invocation: f906961ee6c34244ad05cb2c5f56a85a
       Process: 271 ExecStart=/usr/sbin/NetworkManager (code=exited, status=0/SUCCESS)
      Main PID: 271 (code=exited, status=0/SUCCESS)
         Tasks: 4 (limit: 4458)
           CPU: 74ms
        CGroup: /system.slice/nm-initrd.service
                └─296 /usr/sbin/NetworkManager
   
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8828] device (enp1s0): state change: ip-config -> ip-check (reason 'none', managed-type: 'full')
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8861] device (enp1s0): state change: ip-check -> secondaries (reason 'none', managed-type: 'full')
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8863] device (enp1s0): state change: secondaries -> activated (reason 'none', managed-type: 'full')
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8868] manager: NetworkManager state is now CONNECTED_SITE
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8872] device (enp1s0): Activation: successful, device activated.
   Nov 11 14:55:02 localhost.localdomain NetworkManager[296]: <info>  [1731336902.8882] manager: startup complete
   Nov 11 14:55:03 localhost.localdomain NetworkManager[296]: <info>  [1731336903.0087] manager: NetworkManager state is now CONNECTED_GLOBAL
   
   localhost:~ # systemctl status nm-wait-online-initrd.service
   ● nm-wait-online-initrd.service - Network Manager Wait Online (initrd)
        Loaded: loaded (/usr/lib/systemd/system/nm-wait-online-initrd.service; enabled; preset: enabled)
       Drop-In: /etc/systemd/system/nm-wait-online-initrd.service.d
                └─mkosi-initrd.conf
        Active: active (exited) since Mon 2024-11-11 14:55:02 UTC; 4min 17s ago
    Invocation: b3a398159a4b4768a0b4182f2368d194
       Process: 301 ExecStart=/usr/bin/nm-online -s -q -t 3600 (code=exited, status=0/SUCCESS)
      Main PID: 301 (code=exited, status=0/SUCCESS)
           CPU: 14ms
   
   Nov 11 14:55:02 localhost systemd[1]: Starting nm-wait-online-initrd.service...
   Nov 11 14:55:02 localhost.localdomain systemd[1]: Finished nm-wait-online-initrd.service.
   
   localhost:~ # ping www.opensuse.org
   PING pRoXy-prg2.opensuse.org (195.135.223.50) 56(84) bytes of data.
   64 bytes from legacy-ip.atlas.opensuse.org (195.135.223.50): icmp_seq=1 ttl=48 time=59.5 ms
   64 bytes from legacy-ip.atlas.opensuse.org (195.135.223.50): icmp_seq=2 ttl=48 time=60.4 ms
   64 bytes from legacy-ip.atlas.opensuse.org (195.135.223.50): icmp_seq=3 ttl=48 time=60.4 ms
   
   --- pRoXy-prg2.opensuse.org ping statistics ---
   3 packets transmitted, 3 received, 0% packet loss, time 2003ms
   rtt min/avg/max/mdev = 59.478/60.094/60.417/0.436 ms


Root file system over NFS

Icon-warning.png
Warning: Some parts of this section are crafted as a PoC, just to show that everything that works with other initrd generators can also work with mkosi-initrd.

After the network is up and running in the initrd, the next step is mounting the root file system remotely. The first attempt will be NFS, and as in the previous section, using a dracut's module 95nfs as a reference.

First, configuration file with the necessary packages and kernel modules:

   localhost:/etc/mkosi-initrd # cat mkosi.conf.d/30-nfs.conf
   [Content]
   Packages=
           nfs-client
           libnfsidmap1
           rpcbind
           
           # netconfig
           libtirpc-netconfig
           netcfg
           
           # /etc/nsswitch.conf
           # /etc/rpc
           glibc
           
           # /usr/lib/modprobe.d/nfs.conf
           # alias nfs4 nfs
           suse-module-tools
           
           # libnss_*
           glibc
           libnss_usrfiles2
           nss-mdns
           
           # pidof
           procps
           
           # mount, kill
           util-linux
           
           grep
           
   KernelModulesInclude=
           /nfs_acl.ko
           fs/nfs/
           net/ipv6/
           net/sunrpc/

Parse kernel command line and start RPC:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nfs-start-initrd.service
   [Unit]
   Description=NFS start (initrd)
   DefaultDependencies=no
   Requires=modprobe@sunrpc.service rpc_pipefs.target
   Wants=systemd-journald.socket
   After=systemd-journald.socket modprobe@sunrpc.service rpc_pipefs.target
   Before=systemd-udevd.service
   ConditionPathExists=/etc/initrd-release
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/nfs-start.sh
   RemainAfterExit=yes
   KillMode=process
   KillSignal=SIGHUP
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/nfs-start.sh:

# Parts of the code extracted from http://github.com/openSUSE/dracut/blob/SUSE/059/modules.d/95nfs/nfs-lib.sh

# root=nfs:[<server-ip>:]<root-dir>[:<nfs-options>]
# root=nfs4:[<server-ip>:]<root-dir>[:<nfs-options>]
nfsroot_to_var() {
    # strip nfs[4]:
    local arg="$*:"
    nfs="${arg%%:*}"
    arg="${arg##"$nfs":}"

    # check if we have a server
    if [ "${arg##*:/*}" != "$arg" ]; then
        server="${arg%%:/*}"
        arg="/${arg##*:/}"
    fi

    path="${arg%%:*}"

    # rest are options
    options="${arg##"$path"}"
    # strip leading ":"
    options="${options##:}"
    # strip  ":"
    options="${options%%:}"

    # Does it really start with '/'?
    [ -n "${path%%/*}" ] && path="error"

    # Fix kernel legacy style separating path and options with ','
    if [ "$path" != "${path#*,}" ]; then
        options=${path#*,}
        path=${path%%,*}
    fi
}

# RFC2224: nfs://<server>[:<port>]/<path>
rfc2224_nfs_to_var() {
    nfs="nfs"
    server="${1#nfs://}"
    path="/${server#*/}"
    server="${server%%/*}"
    server="${server%%:}" # anaconda compat (nfs://<server>:/<path>)
    local port="${server##*:}"
    [ "$port" != "$server" ] && options="port=$port"
}

# nfs_to_var NFSROOT
# use NFSROOT to set $nfs, $server, $path, and $options.
# NFSROOT is something like: nfs[4]:<server>:/<path>[:<options>|,<options>]
nfs_to_var() {
    # Unfortunately, there's multiple styles of nfs "URL" in use, so we need
    # extra functions to parse them into $nfs, $server, $path, and $options.
    case "$1" in
        nfs://*) rfc2224_nfs_to_var "$1" ;;
        *) nfsroot_to_var "$1" ;;
    esac
}

nfs_parse_cmdline() {
    local _i _cmdline=()
    local _root
    local _nfsroot

    _cmdline=($(< /proc/cmdline))
    for _i in "${_cmdline[@]}"; do
        [[ "${_i%%=*}" == "root" ]] && _root="${_i##*=}"
        [[ "${_i%%=*}" == "nfsroot" ]] && _nfsroot="${_i##*=}"
    done

    if [[ -n "$_nfsroot" ]] && [[ "$_root" != "/dev/nfs" ]]; then
        echo "nfs-start: kernel command line option nfsroot= only accepted for root=/dev/nfs"
        return 1
    fi

    [[ -z "$_nfsroot" ]] && _nfsroot="$_root"
    case "${_nfsroot%%:*}" in
        nfs | nfs4) ;;
        *)
            return 255
            ;;
    esac

    nfsroot_to_var "$_nfsroot"
    if [[ "$path" == "error" ]]; then
        echo "nfs-start: argument nfsroot must contain a valid path"
        return 1
    fi
    if [[ -z "$server" ]]; then
        echo "nfs-start: required parameter 'server' is missing"
        return 1
    fi

    return 0
}

if nfs_parse_cmdline; then
    # set $nfs, $server, $path, and $options for nfs-mount.sh            
    {
        echo "nfs=$nfs"
        echo "server=$server"
        echo "path=$path"
        echo "options=$options"
    } > /tmp/nfs.mount

    # Start rpcbind
    if command -v rpcbind > /dev/null && [ -z "$(pidof rpcbind)" ]; then
        # Create default state directory for distros that do not create it via
        # tmpfiles conf file
        if ! [[ -e /usr/lib/tmpfiles.d/rpcbind.conf ]]; then
            mkdir -m 0700 -p /run/rpcbind
            _rpcuser=$(grep -m1 -E '^_rpc:|^nfsnobody:|^rpc:|^rpcuser:' /etc/passwd)
            [[ -n "$_rpcuser" ]] && chown "${_rpcuser%%:*}": /run/rpcbind
        fi
        rpcbind
    fi

    # Start rpc.statd as mount won't let us use locks on a NFSv4 filesystem
    # without talking to it.
    command -v rpc.statd > /dev/null && [ -z "$(pidof rpc.statd)" ] && rpc.statd

    # Start rpc.idmapd in case nfs4_disable_idmapping = 0
    command -v rpc.idmapd > /dev/null && [ -z "$(pidof rpc.idmapd)" ] && rpc.idmapd
fi

Mount:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nfs-mount-initrd.service 
   [Unit]
   Description=NFS mount (initrd)
   DefaultDependencies=no
   Requires=network-online.target
   Wants=remote-fs-pre.target systemd-udev-trigger.service
   After=systemd-udev-trigger.service network-online.target
   Before=remote-fs-pre.target
   ConditionPathExists=/etc/initrd-release
   ConditionPathExists=/tmp/nfs.mount
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/nfs-mount.sh
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/nfs-mount.sh:

#!/bin/bash
# Parts of the code extracted from http://github.com/openSUSE/dracut/blob/SUSE/059/modules.d/95nfs/nfs-lib.sh

# Look through $options, fix "rw"/"ro", move "lock"/"nolock" to $nfslock
munge_nfs_options() {
    local f="" flags="" nfsrw="ro" OLDIFS="$IFS"
    IFS=,
    for f in $options; do
        case $f in
            ro | rw) nfsrw=$f ;;
            lock | nolock) nfslock=$f ;;
            *) flags=${flags:+$flags,}$f ;;
        esac
    done
    IFS="$OLDIFS"

    # Override rw/ro if set on cmdline
    grep -q -w ro /proc/cmdline && nfsrw=ro
    grep -q -w rw /proc/cmdline && nfsrw=rw

    options=$nfsrw${flags:+,$flags}
}

if [[ ! -e /tmp/nfs.mount ]]; then
    echo "nfs-mount: missing required /tmp/nfs.mount"
    exit 1
fi

# get $nfs, $server, $path, and $options
. /tmp/nfs.mount

munge_nfs_options
if [ "$nfs" = "nfs4" ]; then
    options=$options${nfslock:+,$nfslock}
else
    # NFSv{2,3} doesn't support using locks as it requires a helper to
    # transfer the rpcbind state to the new root
    [ "$nfslock" = "lock" ] \
        && echo "nfs-mount: locks unsupported on NFSv{2,3}, using nolock"
    options=$options,nolock
fi

if [[ -z "$nfs" ]] || [[ -z "$server" ]] || [[ -z "$path" ]] || [[ -z "$options" ]]; then
    echo "nfs-mount: missing required input values"
    exit 1
fi

mkdir -p /sysroot
mount -t "$nfs" -o"$options" "$server:$path" /sysroot

Cleanup:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/nfs-cleanup-initrd.service 
   [Unit]
   Description=NFS cleanup (initrd)
   DefaultDependencies=no
   Wants=remote-fs.target
   After=initrd.target initrd-parse-etc.service sysroot.mount remote-fs.target nfs-mount-initrd.service
   Before=initrd-cleanup.service
   ConditionPathExists=/etc/initrd-release
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/nfs-cleanup.sh
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/nfs-cleanup.sh:

#!/bin/bash
# Code extracted from http://github.com/openSUSE/dracut/blob/SUSE/059/modules.d/95nfs/nfsroot-cleanup.sh

pid=$(pidof rpc.statd)
[ -n "$pid" ] && kill "$pid"

pid=$(pidof rpc.idmapd)
[ -n "$pid" ] && kill "$pid"

pid=$(pidof rpcbind)
[ -n "$pid" ] && kill "$pid"

rpcpipefspath=$(grep -w rpc_pipefs /proc/mounts | cut -d ' ' -f 2)
if [[ -n $rpcpipefspath ]]; then
    [ -d "/sysroot${rpcpipefspath}" ] \
        || mkdir -m 0755 -p "/sysroot${rpcpipefspath}" 2> /dev/null
    if [ -d "/sysroot${rpcpipefspath}" ]; then
        mount --bind "$rpcpipefspath" "/sysroot${rpcpipefspath}"
    fi
    umount "$rpcpipefspath" 2> /dev/null
fi

Preset file to enable these services:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system-preset/30-nfs.preset 
   enable nfs-cleanup-initrd.service
   enable nfs-mount-initrd.service
   enable nfs-start-initrd.service

The specific configuration should end up looking like this:

   # tree /etc/mkosi-initrd/
   /etc/mkosi-initrd/
   β”œβ”€β”€ mkosi.conf.d
   β”‚   β”œβ”€β”€ 20-network.conf
   β”‚   └── 30-nfs.conf
   └── mkosi.extra
       └── usr
           β”œβ”€β”€ lib
           β”‚   β”œβ”€β”€ NetworkManager
           β”‚   β”‚   └── conf.d
           β”‚   β”‚       └── nm-initrd.conf
           β”‚   └── systemd
           β”‚       β”œβ”€β”€ system
           β”‚       β”‚   β”œβ”€β”€ nfs-cleanup-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nfs-mount-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nfs-start-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nm-config-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nm-initrd.service
           β”‚       β”‚   └── nm-wait-online-initrd.service
           β”‚       └── system-preset
           β”‚           β”œβ”€β”€ 20-network.preset
           β”‚           └── 30-nfs.preset
           └── sbin
               β”œβ”€β”€ nfs-cleanup.sh
               β”œβ”€β”€ nfs-mount.sh
               └── nfs-start.sh

Some logs from the NFS server, showing how everything is ordered properly (after booting the client with root=nfs:MY_SERVER_IP:/srv/nfsroot,rw ip=dhcp):

   localhost:/ # journalctl --root=/srv/nfsroot -b | grep -e "Switching root" -e nm- -e nfs- -e rpc -e NFS -e RPC
   Nov 14 14:46:40 localhost systemd[1]: Starting modprobe@sunrpc.service...
   Nov 14 14:46:40 localhost systemd[1]: Starting nm-config-initrd.service...
   Nov 14 14:46:40 localhost kernel: RPC: Registered named UNIX socket transport module.
   Nov 14 14:46:40 localhost kernel: RPC: Registered udp transport module.
   Nov 14 14:46:40 localhost kernel: RPC: Registered tcp transport module.
   Nov 14 14:46:40 localhost kernel: RPC: Registered tcp-with-tls transport module.
   Nov 14 14:46:40 localhost kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
   Nov 14 14:46:40 localhost systemd[1]: modprobe@sunrpc.service: Deactivated successfully.
   Nov 14 14:46:40 localhost systemd[1]: Finished modprobe@sunrpc.service.
   Nov 14 14:46:40 localhost systemd[1]: Finished nm-config-initrd.service.
   Nov 14 14:46:40 localhost systemd[1]: Starting nfs-start-initrd.service...
   Nov 14 14:46:40 localhost rpc.statd[268]: Version 2.8.1 starting
   Nov 14 14:46:40 localhost rpc.statd[268]: Initializing NSM state
   Nov 14 14:46:40 localhost rpc.idmapd[273]: Setting log level to 0
   Nov 14 14:46:40 localhost systemd[1]: Finished nfs-start-initrd.service.
   Nov 14 14:46:40 localhost systemd[1]: Starting nm-initrd.service...
   Nov 14 14:46:40 localhost NetworkManager[333]: <info>  [1731592000.1451] Read config: /etc/NetworkManager/NetworkManager.conf (lib: conncheck-openSUSE.conf, nm-initrd.conf) (run: 15-carrier-timeout.conf)
   Nov 14 14:46:40 localhost systemd[1]: nm-initrd.service: Deactivated successfully.
   Nov 14 14:46:40 localhost systemd[1]: nm-initrd.service: Unit process 333 (NetworkManager) remains running after unit stopped.
   Nov 14 14:46:40 localhost systemd[1]: Started nm-initrd.service.
   Nov 14 14:46:40 localhost systemd[1]: Starting nm-wait-online-initrd.service...
   Nov 14 14:46:40 localhost.localdomain NetworkManager[333]: <info>  [1731592000.7377] Loaded device plugin: NMWifiFactory (/usr/lib64/NetworkManager/1.50.0/libnm-device-plugin-wifi.so)
   Nov 14 14:46:40 localhost.localdomain systemd[1]: Finished nm-wait-online-initrd.service.
   Nov 14 14:46:40 localhost.localdomain systemd[1]: Starting nfs-mount-initrd.service...
   Nov 14 14:46:41 localhost.localdomain kernel: NFS: Registering the id_resolver key type
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Finished nfs-mount-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Starting nfs-cleanup-initrd.service...
   Nov 14 14:46:41 localhost.localdomain rpc.idmapd[273]: exiting on signal 15
   Nov 14 14:46:41 localhost.localdomain systemd[1]: var-lib-nfs-rpc_pipefs.mount: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Finished nfs-cleanup-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: nfs-cleanup-initrd.service: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Stopped nfs-cleanup-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: nfs-mount-initrd.service: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Stopped nfs-mount-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: nm-wait-online-initrd.service: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Stopped nm-wait-online-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: nfs-start-initrd.service: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Stopped nfs-start-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: nm-config-initrd.service: Deactivated successfully.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Stopped nm-config-initrd.service.
   Nov 14 14:46:41 localhost.localdomain systemd[1]: Switching root.


iSCSI

Icon-warning.png
Warning: Some parts of this section are crafted as a PoC, just to show that everything that works with other initrd generators can also work with mkosi-initrd.

This will be a good litmus test. Using dracut's module 95iscsi as a reference, set configuration file with the necessary packages and kernel modules:

   # cat /etc/mkosi-initrd/mkosi.conf.d/20-iscsi.conf
   [Content]
   Packages=
           open-iscsi
           iscsiuio
   
   KernelModulesInclude=
           /8021q.ko
           /crc32c.ko
           drivers/scsi/

dracut writes some dropins for iscsid.service/socket and iscsiuio.service/socket, let's do the same thing:

   # tree /etc/mkosi-initrd/mkosi.extra/etc/systemd/system
   /etc/mkosi-initrd/mkosi.extra/etc/systemd/system
   β”œβ”€β”€ iscsid.service.d
   β”‚   └── mkosi-initrd.conf
   β”œβ”€β”€ iscsid.socket.d
   β”‚   └── mkosi-initrd.conf
   β”œβ”€β”€ iscsiuio.service.d
   β”‚   └── mkosi-initrd.conf
   └── iscsiuio.socket.d
       └── mkosi-initrd.conf

Now, the tricky part. The iSCSI parameters can be read from firmware or from the kernel command line, so we can split this feature into two different parts.

As usual, here is the preset file to enable the new services (and also disable iscsi.service so that it does not interfere at boot):

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system-preset/30-iscsi.preset
   disable iscsi.service
   enable iscsi-config-initiator-initrd.service
   enable iscsi-firmware-initrd.service
   enable iscsi-start-initrd.service
   enable iscsid.socket
   enable iscsiuio.socket

The specific configuration should end up looking like this:

   # tree /etc/mkosi-initrd/
   /etc/mkosi-initrd/
   β”œβ”€β”€ mkosi.conf.d
   β”‚   β”œβ”€β”€ 20-network.conf
   β”‚   └── 30-iscsi.conf
   └── mkosi.extra
       β”œβ”€β”€ etc
       β”‚   └── systemd
       β”‚       └── system
       β”‚           β”œβ”€β”€ iscsid.service.d
       β”‚           β”‚   └── mkosi-initrd.conf
       β”‚           β”œβ”€β”€ iscsid.socket.d
       β”‚           β”‚   └── mkosi-initrd.conf
       β”‚           β”œβ”€β”€ iscsiuio.service.d
       β”‚           β”‚   └── mkosi-initrd.conf
       β”‚           └── iscsiuio.socket.d
       β”‚               └── mkosi-initrd.conf
       └── usr
           β”œβ”€β”€ lib
           β”‚   β”œβ”€β”€ NetworkManager
           β”‚   β”‚   └── conf.d
           β”‚   β”‚       └── nm-initrd.conf
           β”‚   └── systemd
           β”‚       β”œβ”€β”€ system
           β”‚       β”‚   β”œβ”€β”€ iscsi-config-initiator-initrd.service
           β”‚       β”‚   β”œβ”€β”€ iscsi-firmware-initrd.service
           β”‚       β”‚   β”œβ”€β”€ iscsi-start-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nm-config-initrd.service
           β”‚       β”‚   β”œβ”€β”€ nm-initrd.service
           β”‚       β”‚   └── nm-wait-online-initrd.service
           β”‚       └── system-preset
           β”‚           β”œβ”€β”€ 20-network.preset
           β”‚           └── 30-iscsi.preset
           └── sbin
               β”œβ”€β”€ iscsi-config-initiator.sh
               β”œβ”€β”€ iscsi-firmware.sh
               └── iscsi-start.sh


Read iSCSI parameters from firmware

The following systemd service should handle the root=? netroot=iscsi rd.iscsi.firmware case:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/iscsi-firmware-initrd.service
   [Unit]
   Description=iSCSI firmware (initrd)
   DefaultDependencies=no
   JobTimeoutSec=infinity
   JobRunningTimeoutSec=infinity
   Requires=network-online.target
   Requires=modprobe@crc32c.service modprobe@iscsi_tcp.service modprobe@iscsi_boot_sysfs.service modprobe@iscsi_ibft.service
   Requires=modprobe@be2iscsi.service modprobe@bnx2i.service modprobe@cxgb3i.service modprobe@cxgb4i.service modprobe@qla4xxx.service
   Wants=remote-fs-pre.target systemd-udev-trigger.service
   After=systemd-udev-trigger.service network-online.target
   Before=remote-fs-pre.target
   ConditionPathExists=/etc/initrd-release
   ConditionKernelCommandLine=netroot=iscsi
   ConditionKernelCommandLine=rd.iscsi.firmware
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/iscsi-firmware.sh
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/iscsi-firmware.sh:

#!/bin/bash
# Parts of the code extracted from http://github.com/openSUSE/dracut/tree/SUSE/059/modules.d/95iscsi
# This should handle:
#
#     root=? netroot=iscsi rd.iscsi.firmware
#

handle_firmware() {
    local _res

    # Depending on the 'ql4xdisablesysfsboot' qla4xxx
    # will be autostarting sessions without presenting
    # them via the firmware interface.
    # In these cases 'iscsiadm -m fw' will fail, but
    # the iSCSI sessions will still be present.
    if ! iscsiadm -m fw; then
        echo "iscsi-firmware: iscsiadm could not get list of targets from firmware"
    else
        # check to see if we have the new iscsiadm command,
        # that supports the "no-wait" (-W) flag. If so, use it.
        iscsiadm -m fw -l -W 2> /dev/null
        _res=$?
        if [ $_res -eq 7 ]; then
            # ISCSI_ERR_INVALID (7) => "-W" not supported
            echo "iscsi-firmware: iscsiadm does not support no-wait firmware logins"
            iscsiadm -m fw -l
            _res=$?
        fi
        if [ $_res -ne 0 ]; then
            echo "iscsi-firmware: iscsiadm log-in to iscsi target failed"
        fi
    fi
    [ -d /sys/class/iscsi_session ]
}

for i in /sys/firmware/ibft/ethernet*; do
    [[ -e "$i" ]] || continue
    if handle_firmware; then
        echo "iscsi-firmware: login ok"
        break
    fi 
done

Read iSCSI parameters from kernel command line

First, we need to set InitiatorName= in /etc/iscsi/initiatorname.iscsi. The dracut code is messy, apparently with several duplicate parts, and performing a restart of the iscsid.service after every time it sets the configuration value. Let's do this only once, using a specific systemd service ordered before iscsid.service, which also requires loading the kernel modules needed for iSCSI to work.

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/iscsi-config-initiator-initrd.service
   [Unit]
   Description=iSCSI configure initiator (initrd)
   DefaultDependencies=no
   Requires=modprobe@crc32c.service modprobe@iscsi_tcp.service
   Requires=modprobe@be2iscsi.service modprobe@bnx2i.service modprobe@cxgb3i.service modprobe@cxgb4i.service modprobe@qla4xxx.service
   Wants=systemd-udev-trigger.service iscsid.service
   After=systemd-udev-trigger.service
   Before=iscsid.service
   ConditionPathExists=/etc/initrd-release
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/iscsi-config-initiator.sh
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/iscsi-config-initiator.sh:

#!/bin/bash
# Parts of the code extracted from http://github.com/openSUSE/dracut/blob/SUSE/059/modules.d/95iscsi

iscsi_set_initiatorname() {
    local _i _cmdline=()
    local iscsi_initiator

    # 1st: from kernel command line
    echo "iscsi-config-initiator: trying kernel command line..."
    _cmdline=($(< /proc/cmdline))
    for _i in "${_cmdline[@]}"; do
        [[ "${_i%%=*}" == "rd.iscsi.initiator" ]] && iscsi_initiator="${_i##*=}"
    done

    # 2nd: from sysfs
    if [[ -z "$iscsi_initiator" ]] && [[ -f /sys/firmware/ibft/initiator/initiator-name ]]; then
        echo "iscsi-config-initiator: trying sysfs..."
        iscsi_initiator=$(while read -r line || [[ -n "$line" ]]; do echo "$line"; done < /sys/firmware/ibft/initiator/initiator-name)
    fi

    # 3rd: using iscsi-iname
    if [[ -z "$iscsi_initiator" ]]; then
        echo "iscsi-config-initiator: trying iscsi-name..."
        iscsi_initiator=$(iscsi-iname)
    fi

    if [[ -n "$iscsi_initiator" ]]; then
        echo "iscsi-config-initiator: set InitiatorName=$iscsi_initiator in /etc/iscsi/initiatorname.iscsi"
        echo "InitiatorName=$iscsi_initiator" > /run/initiatorname.iscsi
        rm -f /etc/iscsi/initiatorname.iscsi
        mkdir -p /etc/iscsi
        ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
        return 0
    fi

    return 1
}

iscsi_set_initiatorname

Now, the main systemd service that handles the root=? netroot=iscsi:[<servername>]:[<protocol>]:[<port>]:[<LUN>]:<targetname> case, and most of the rd.iscsi.* options:

   # cat /etc/mkosi-initrd/mkosi.extra/usr/lib/systemd/system/iscsi-start-initrd.service
   [Unit]
   Description=iSCSI discovery and login (initrd)
   DefaultDependencies=no
   JobTimeoutSec=infinity
   JobRunningTimeoutSec=infinity
   Requires=network-online.target iscsi-config-initiator-initrd.service
   Wants=remote-fs-pre.target systemd-udev-trigger.service
   After=systemd-udev-trigger.service network-online.target iscsi-config-initiator-initrd.service
   Before=remote-fs-pre.target
   ConditionPathExists=/etc/initrd-release
   
   [Service]
   Type=oneshot
   ExecStart=/usr/sbin/iscsi-start.sh
   RemainAfterExit=yes
   
   [Install]
   WantedBy=initrd.target

Content of /etc/mkosi-initrd/mkosi.extra/usr/sbin/iscsi-start.sh:

#!/bin/bash
# Parts of the code extracted from http://github.com/openSUSE/dracut/blob/SUSE/059/modules.d/95iscsi
# This should handle:
#
#     root=? netroot=iscsi:[<servername>]:[<protocol>]:[<port>]:[<LUN>]:<targetname>
#

strglobin() {
    [ -n "$1" -a -z "${1##*$2*}" ]
}

is_ip() {
    echo "$1" | {
        IFS=. read -r a b c d
        test "$a" -ge 0 -a "$a" -le 255 \
            -a "$b" -ge 0 -a "$b" -le 255 \
            -a "$c" -ge 0 -a "$c" -le 255 \
            -a "$d" -ge 0 -a "$d" -le 255 \
            2> /dev/null
    } && return 0
    return 1
}

parse_iscsi_root() {
    local v
    v=${1#iscsi:}

    # extract authentication info
    case "$v" in
        *@*:*:*:*:*)
            authinfo=${v%%@*}
            v=${v#*@}
            # allow empty authinfo to allow having an @ in iscsi_target_name like this:
            # netroot=iscsi:@192.168.1.100::3260::iqn.2009-01.com.example:testdi@sk
            if [ -n "$authinfo" ]; then
                OLDIFS="$IFS"
                IFS=:
                # shellcheck disable=SC2086
                set $authinfo
                IFS="$OLDIFS"
                if [ $# -gt 4 ]; then
                    echo "iscsi-start: wrong authentication info in iscsi: parameter!"
                    return 1
                fi
                iscsi_username=$1
                iscsi_password=$2
                if [ $# -gt 2 ]; then
                    iscsi_in_username=$3
                    iscsi_in_password=$4
                fi
            fi
            ;;
    esac

    # extract target ip
    case "$v" in
        [[]*[]]:*)
            iscsi_target_ip=${v#[[]}
            iscsi_target_ip=${iscsi_target_ip%%[]]*}
            # shellcheck disable=SC1087
            v=${v#[[]"$iscsi_target_ip"[]]:}
            ;;
        *)
            iscsi_target_ip=${v%%[:]*}
            v=${v#"$iscsi_target_ip":}
            ;;
    esac

    unset iscsi_target_name
    # extract target name
    case "$v" in
        *:iqn.*)
            iscsi_target_name=iqn.${v##*:iqn.}
            v=${v%:iqn.*}:
            ;;
        *:eui.*)
            iscsi_target_name=eui.${v##*:eui.}
            v=${v%:eui.*}:
            ;;
        *:naa.*)
            iscsi_target_name=naa.${v##*:naa.}
            v=${v%:naa.*}:
            ;;
    esac

    # parse the rest
    OLDIFS="$IFS"
    IFS=:
    # shellcheck disable=SC2086
    set $v
    IFS="$OLDIFS"

    iscsi_protocol=$1
    shift # ignored
    iscsi_target_port=$1
    shift

    if [ -n "$iscsi_target_name" ]; then
        if [ $# -eq 3 ]; then
            iscsi_iface_name=$1
            shift
        fi
        if [ $# -eq 2 ]; then
            iscsi_netdev_name=$1
            shift
        fi
        iscsi_lun=$1
        shift
        if [ $# -ne 0 ]; then
            echo "iscsi-start: invalid parameter in iscsi: parameter!"
            return 1
        fi
        return 0
    fi

    if [ $# -gt 3 ] && [ -n "$1$2" ]; then
        if [ -z "$3" ] || [ "$3" -ge 0 ] 2> /dev/null; then
            iscsi_iface_name=$1
            shift
            iscsi_netdev_name=$1
            shift
        fi
    fi

    iscsi_lun=$1
    shift

    iscsi_target_name=$(printf "%s:" "$@")
    iscsi_target_name=${iscsi_target_name%:}
}

handle_iscsi_root() {
    local iscsi_initiator
    local iscsi_target_name
    local iscsi_target_ip
    local iscsi_target_port
    local iscsi_target_group="$iscsi_target_group"
    local iscsi_lun
    local iscsi_username="$iscsi_username"
    local iscsi_password="$iscsi_password"
    local iscsi_in_username="$iscsi_in_username"
    local iscsi_in_password="$iscsi_in_password"
    local iscsi_iface_name
    local iscsi_netdev_name
    local iscsi_param="$iscsi_param"
    local param
    local found
    local login_retry_max_seen="$iscsi_login_retry_max_seen"
    local _res

    parse_iscsi_root "$1" || return 1

    # Bail out early, if there is no route to the destination
    if is_ip "$iscsi_target_ip" && [[ "$iscsi_testroute" == 1 ]]; then
        ip route get "$iscsi_target_ip" > /dev/null 2>&1 || return 0
    fi

    # limit iscsistart login retries
    if [[ "$login_retry_max_seen" != "yes" ]] && [[ "$iscsi_login_retry_max" -gt 0 ]]; then
        iscsi_param="${iscsi_param% } node.session.initial_login_retry_max=$iscsi_login_retry_max"
    fi

    # get initiator from config
    [ -f /run/initiatorname.iscsi ] && . /run/initiatorname.iscsi
    [ -f /etc/initiatorname.iscsi ] && . /etc/initiatorname.iscsi
    [ -f /etc/iscsi/initiatorname.iscsi ] && . /etc/iscsi/initiatorname.iscsi
    iscsi_initiator=$InitiatorName
    if [ -z "$iscsi_initiator" ]; then
        echo "iscsi-start: failed to get InitiatorName from configuration"
        return 1
    fi

    if [ -z "$iscsi_target_port" ]; then
        iscsi_target_port=3260
    fi

    if [ -z "$iscsi_target_group" ]; then
        iscsi_target_group=1
    fi

    if [ -z "$iscsi_lun" ]; then
        iscsi_lun=0
    fi

    # $iscsi_protocol not used...

    if strglobin "$iscsi_target_ip" '*:*:*' && ! strglobin "$iscsi_target_ip" '['; then
        iscsi_target_ip="[$iscsi_target_ip]"
    fi
    targets=$(iscsiadm -m discovery -t st -p "$iscsi_target_ip":${iscsi_target_port:+$iscsi_target_port})
    _res=$?
    targets=$(echo -n "$targets" | {
        while read -r _ target _ || [ -n "$target" ]; do
            echo "$target"
        done
    })
    [ -z "$targets" ] && echo "iscsi-start: iscsiadm target discovery to $iscsi_target_ip:${iscsi_target_port:+$iscsi_target_port} failed with status $_res" && return 1

    found=
    for target in $targets; do
        if [ "$target" = "$iscsi_target_name" ]; then
            if [ -n "$iscsi_iface_name" ]; then
                iscsiadm -m iface -I "$iscsi_iface_name" --op=new
                EXTRA=" ${iscsi_netdev_name:+--name=iface.net_ifacename --value=$iscsi_netdev_name} "
                EXTRA="$EXTRA ${iscsi_initiator:+--name=iface.initiatorname --value=$iscsi_initiator} "
            fi
            [ -n "$iscsi_param" ] && for param in $iscsi_param; do EXTRA="$EXTRA --name=${param%=*} --value=${param#*=}"; done

            CMD="iscsiadm -m node -T $target \
                     ${iscsi_iface_name:+-I $iscsi_iface_name} \
                     -p $iscsi_target_ip${iscsi_target_port:+:$iscsi_target_port}"
            __op="--op=update \
                     --name=node.startup --value=onboot \
                     ${iscsi_username:+   --name=node.session.auth.username    --value=$iscsi_username} \
                     ${iscsi_password:+   --name=node.session.auth.password    --value=$iscsi_password} \
                     ${iscsi_in_username:+--name=node.session.auth.username_in --value=$iscsi_in_username} \
                     ${iscsi_in_password:+--name=node.session.auth.password_in --value=$iscsi_in_password} \
                     $EXTRA"
            # shellcheck disable=SC2086
            $CMD $__op
            $CMD --login
            found=yes
            break
        fi
    done

    if [ "$found" != yes ]; then
        echo "iscsi-start: target \"$iscsi_target_name\" not found on portal $iscsi_target_ip:$iscsi_target_port"
        return 1
    fi

    return 0
}

iscsi_parse_cmdline() {
    local _i _cmdline=()
    local _root _netroot
    
    _cmdline=($(< /proc/cmdline))
    for _i in "${_cmdline[@]}"; do
        [[ "${_i%%=*}" == "root" ]] && _root="${_i##*=}"
        if [[ "${_i%%=*}" == "netroot" ]]; then
            _netroot="${_i##*=}"
            [[ "${_netroot%%:*}" == "iscsi" ]] && iscsi_root+=("${_netroot##iscsi:}")
        fi
        [[ "${_i%%=*}" == "rd.iscsi.target.group" ]] && iscsi_target_group="${_i##*=}"
        [[ "${_i%%=*}" == "rd.iscsi.username" ]] && iscsi_username="${_i##*=}"
        [[ "${_i%%=*}" == "rd.iscsi.password" ]] && iscsi_password="${_i##*=}"
        [[ "${_i%%=*}" == "rd.iscsi.in.username" ]] && iscsi_in_username="${_i##*=}"
        [[ "${_i%%=*}" == "rd.iscsi.in.password" ]] && iscsi_in_password="${_i##*=}"
        if [[ "${_i%%=*}" == "rd.iscsi.param" ]]; then
            local _param="${_i##*=}"
            [[ "${_param%=*}" == "node.session.initial_login_retry_max" ]] \
                && iscsi_login_retry_max_seen=yes
            iscsi_param="$iscsi_param ${_param}"
        fi
        [[ "${_i%%=*}" == "rd.iscsi.login_retry_max" ]] && iscsi_login_retry_max="${_i##*=}"
        [[ "${_i%%=*}" == "rd.iscsi.testroute" ]] && iscsi_testroute="${_i##*=}"
    done

    if [[ "${_root%%:*}" == "iscsi" ]]; then
        echo "iscsi-start: root=iscsi:... not implemented, use netroot="
        return 1
    fi

    ((${#iscsi_root[@]} > 0))
}

iscsi_root=()
iscsi_target_group=
iscsi_username=
iscsi_password=
iscsi_in_username=
iscsi_in_password=
iscsi_param=
iscsi_login_retry_max_seen=
iscsi_login_retry_max=3
iscsi_testroute=1

if iscsi_parse_cmdline; then
    for _i in "${iscsi_root[@]}"; do
        handle_iscsi_root "$_i"
    done
fi

Some logs from the iSCSI client, showing how first InitiatorName= is saved to /etc/iscsi/initiatorname.iscsi, and then the system logs in to the target defined in the netroot= option.

   mount:/root# journalctl -b | grep -i -e iscsi -e nm- | cat
   Nov 18 14:25:21 localhost kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.11.3-1-default rd.break=mount SYSTEMD_SULOGIN_FORCE=1 root=UUID=b62d9424-1196-4336-8cc4-251b11cb1324 rd.iscsi.initiator=iqn.2023-03.com.example:01:96382270ebec netroot=iscsi:192.168.122.184:::1:iqn.2023-03.com.example:e4bdc9693921e72185e3 ip=dhcp console=tty0 console=ttyS0,9600 security=apparmor mitigations=auto
   Nov 18 14:25:21 localhost systemd[1]: Listening on iscsid.socket.
   Nov 18 14:25:21 localhost systemd[1]: Listening on iscsiuio.socket.
   Nov 18 14:25:21 localhost systemd[1]: Starting modprobe@be2iscsi.service...
   Nov 18 14:25:21 localhost kernel: Loading iSCSI transport class v2.0-870.
   Nov 18 14:25:21 localhost kernel: QLogic NetXtreme II iSCSI Driver bnx2i v2.7.10.1 (Jul 16, 2014)
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (bnx2i)
   Nov 18 14:25:21 localhost kernel: libcxgbi:libcxgbi_init_module: Chelsio iSCSI driver library libcxgbi v0.9.1-ko (Apr. 2015)
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (be2iscsi)
   Nov 18 14:25:21 localhost kernel: In beiscsi_module_init, tt=00000000cffea62e
   Nov 18 14:25:21 localhost kernel: Chelsio T3 iSCSI Driver cxgb3i v2.0.1-ko (Apr. 2015)
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (cxgb3i)
   Nov 18 14:25:21 localhost systemd[1]: Starting modprobe@iscsi_boot_sysfs.service...
   Nov 18 14:25:21 localhost systemd[1]: Starting modprobe@iscsi_ibft.service...
   Nov 18 14:25:21 localhost systemd[1]: Starting modprobe@iscsi_tcp.service...
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (tcp)
   Nov 18 14:25:21 localhost kernel: Chelsio T4-T6 iSCSI Driver cxgb4i v0.9.5-ko (Apr. 2015)
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (cxgb4i)
   Nov 18 14:25:21 localhost systemd[1]: Starting nm-config-initrd.service...
   Nov 18 14:25:21 localhost kernel: iscsi: registered transport (qla4xxx)
   Nov 18 14:25:21 localhost kernel: QLogic iSCSI HBA Driver
   Nov 18 14:25:21 localhost systemd[1]: modprobe@be2iscsi.service: Deactivated successfully.
   Nov 18 14:25:21 localhost systemd[1]: Finished modprobe@be2iscsi.service.
   Nov 18 14:25:21 localhost systemd[1]: modprobe@iscsi_boot_sysfs.service: Deactivated successfully.
   Nov 18 14:25:21 localhost systemd[1]: Finished modprobe@iscsi_boot_sysfs.service.
   Nov 18 14:25:21 localhost systemd[1]: modprobe@iscsi_ibft.service: Deactivated successfully.
   Nov 18 14:25:21 localhost systemd[1]: Finished modprobe@iscsi_ibft.service.
   Nov 18 14:25:21 localhost systemd[1]: modprobe@iscsi_tcp.service: Deactivated successfully.
   Nov 18 14:25:21 localhost systemd[1]: Finished modprobe@iscsi_tcp.service.
   Nov 18 14:25:21 localhost systemd[1]: Finished nm-config-initrd.service.
   Nov 18 14:25:21 localhost systemd[1]: Starting iscsi-config-initiator-initrd.service...
   Nov 18 14:25:21 localhost iscsi-config-initiator.sh[271]: iscsi-config-initiator: trying kernel command line...
   Nov 18 14:25:21 localhost iscsi-config-initiator.sh[271]: iscsi-config-initiator: set InitiatorName=iqn.2023-03.com.example:01:96382270ebec in /etc/iscsi/initiatorname.iscsi
   Nov 18 14:25:21 localhost systemd[1]: Finished iscsi-config-initiator-initrd.service.
   Nov 18 14:25:21 localhost systemd[1]: Starting nm-initrd.service...
   Nov 18 14:25:21 localhost NetworkManager[349]: <info>  [1731939921.1661] Read config: /etc/NetworkManager/NetworkManager.conf (lib: conncheck-openSUSE.conf, nm-initrd.conf) (run: 15-carrier-timeout.conf)
   Nov 18 14:25:21 localhost systemd[1]: nm-initrd.service: Deactivated successfully.
   Nov 18 14:25:21 localhost systemd[1]: nm-initrd.service: Unit process 349 (NetworkManager) remains running after unit stopped.
   Nov 18 14:25:21 localhost systemd[1]: Started nm-initrd.service.
   Nov 18 14:25:21 localhost systemd[1]: Starting nm-wait-online-initrd.service...
   Nov 18 14:25:21 localhost.localdomain NetworkManager[349]: <info>  [1731939921.6139] Loaded device plugin: NMWifiFactory (/usr/lib64/NetworkManager/1.50.0/libnm-device-plugin-wifi.so)
   Nov 18 14:25:22 localhost.localdomain systemd[1]: Finished nm-wait-online-initrd.service.
   Nov 18 14:25:22 localhost.localdomain systemd[1]: iscsi-firmware-initrd.service was skipped because of an unmet condition check (ConditionKernelCommandLine=rd.iscsi.firmware).
   Nov 18 14:25:22 localhost.localdomain systemd[1]: Starting iscsi-start-initrd.service...
   Nov 18 14:25:22 localhost.localdomain systemd[1]: iscsi-init.service was skipped because of an unmet condition check (ConditionPathExists=!/etc/iscsi/initiatorname.iscsi).
   Nov 18 14:25:22 localhost.localdomain systemd[1]: Starting iscsid.service...
   Nov 18 14:25:22 localhost.localdomain systemd[1]: Started iscsid.service.
   Nov 18 14:25:22 localhost.localdomain iscsid[517]: iscsid: connection-1:0 Invalid timeo.noop_out_interval. Must be greater than zero. Using default 5.
   Nov 18 14:25:22 localhost.localdomain iscsid[517]: iscsid: Connection1:0 to [target: iqn.2023-03.com.example:e4bdc9693921e72185e3, portal: 192.168.122.184,3260] through [iface: default] is operational now
   Nov 18 14:25:22 localhost.localdomain kernel: scsi host6: iSCSI Initiator over TCP/IP
   Nov 18 14:25:22 localhost.localdomain iscsi-start.sh[522]: Logging in to [iface: default, target: iqn.2023-03.com.example:e4bdc9693921e72185e3, portal: 192.168.122.184,3260]
   Nov 18 14:25:22 localhost.localdomain iscsi-start.sh[522]: Login to [iface: default, target: iqn.2023-03.com.example:e4bdc9693921e72185e3, portal: 192.168.122.184,3260] successful.
   Nov 18 14:25:22 localhost.localdomain systemd[1]: Finished iscsi-start-initrd.service.


Open questions

The following points reflect some open questions about required (?) functionality that mkosi-initrd is not going to implement due to the way it's designed.

Parse the kernel command line

  • If a package needs to parse the kernel command line, it must provide a binary helper to do so (e.g., NetworkManager provides nm-initrd-generator), executed from a properly ordered systemd service at boot.
  • This approach will cause other initrd generators that already have custom systemd services performing the same task would need to be patched to avoid conflicts.

Extend the kernel command line at build time

  • In dracut, some modules write specific command line options in /etc/cmdline.d/*.conf files when the initrd is being built, and then it extends the /proc/cmdline at boot with these values.
  • systemd will not implement this neither: http://github.com/systemd/systemd/issues/22935#issuecomment-1085991448
  • Is this really necessary? Well, this is an example of what dracut automatically inserts into the initrd of an iSCSI initiator:
   # lsinitrd -f etc/cmdline.d/95iscsi.conf
   ifname=eth0:52:54:00:fd:ad:46 ip=eth0:dhcp rd.iscsi.initiator=iqn.2023-03.com.example:01:96382270ebec
   netroot=iscsi:192.168.122.184:::1:iqn.2023-03.com.example:e4bdc9693921e72185e3
   rd.neednet=1

Extend the kernel command line at boot

  • Some dracut modules have hooks that modify the values in /etc/cmdline.d/*.conf during the boot process, therefore, a subsequent hook may have different behavior if the previous hook writes some kernel command line option that affects it. For example, the old network-legacy parses and injects ip= and vlan= to the kernel command line before other scripts handle them.
  • How to achieve this using systemd services only? For example, nm-initrd-generator also handles rd.iscsi.ibft and it does not "extends" the kernel command line.

Hooks

  • Hooks are just Bash scripts that dracut executes at a certain point in the boot process. These hooks are also ordered within each point.
  • Most of them could be rewritten using systemd services only.

The initqueue

  • dracut's initqueue allows to suspend the boot process until all the configured Bash scripts end successfully. These Bash scripts are injected by dracut and its modules, when it builds the initrd, or even dinamically at boot.


Work in progress

References