Wednesday, January 29, 2025

Provisioning CoreOS - Intel and Raspberry Pi 4

Provisioning CoreOS

Installing CoreOS on a new system involves just booting a copy of the installation media with the configuration file (produced earlier) embedded. On most systems, the installation media is a USB stick and the target storage is an internal disk or SSD. The CoreOS image is copied bitwise to the destination and is then tuned according to the configuration file. On systems like the Raspberry Pi, that boot from an integrated SD card slot, this bootable media is the target device as well.

It is possible, on systems capable of network boot by DHCP/PXE, to boot in memory over a network and then to install to local disk. In that case the configuration file is retrieved by HTTP(S) from a local web site. This demonstration will only use bootable media.

Both of the procedures shown here are derived from the Fedora CoreOS Documentation, specifically the instructions for Bare Metal Intel and Raspberry Pi 4: EKD2 Combined Fedora CoreOS + EDK2 Firmware Disk. The bare metal procedure is very close to identical. For the Raspberry Pi 4 procedure the individual steps have been scripted to a single command, but that is not strictly necessary.

Bare Metal Installation - Intel

The Bare Metal installation is the most straightforward. It consists of just five steps:

  1. Download CoreOS ISO image

  2. Customize ISO image - Destination Disk and Ignition File

  3. Write ISO to USB Stick

  4. Boot from the USB stick

  5. Boot from the target disk

This example assumes that the Ignition file for the host is coreos-config.ign and the destination disk is /dev/sda. The invocation lets most of the CLI arguments default:

  • archtecture: x86_64

  • platform: metal

  • stream: stable

  • format: ISO

The network configuration is allowed to default to DHCP for any NICs that detect link.

# Download ISO if necessary. Write file name
IMAGE_FILE=$(coreos-installer download --format iso)
# Customize the ISO image for the target host
coreos-installer iso customize --dest-device /dev/sda --dest-ignition coreos-config.ign ${IMAGE_FILE}
# Write the ISO to bootable media
sudo dd if=${IMAGE_FILE} of=/dev/sda bs=1M status=progress
# Reset the local ISO file for next use
coreos-installer iso reset

Two things to note here about coreos-installer download

  1. It checks if the ISO file exists locally, and if so, checks that it is current before trying to download again.

  2. It writes the filename of the ISO file to stdout on exit. That value can be used in following script lines.

The final step of the list above restores the ISO image for next use. The image file is under 800MB, and so will fit on even very small USB sticks.

At this point booting from the USB will install CoreOS on the target host and disk. Insert the USB stick, boot and use the system boot options to boot from the USB. Observe the installation process from the console. When installation is complete, reboot and remove the USB stick.

Raspberry Pi 4 Installation

Currently only Raspberry Pi 4 are supported for Fedora CoreOS, and that only using a set of U-Boot or EFI files managed by third parties. Raspberry Pi 5 can run Fedora with a few minor tweaks, but CoreOS is still waiting for updated EFI and BMC files.

The Raspberry Pi boots from an integrated SD card reader. The CoreOS image is written to the SD card along with the Ignition file. The CoreOS image for aarch64 needs a bootloader, either U-Boot or EFI to boot correctly. This process installs a set of EFI binaries and auxiliary files into the boot partition to take the place of the typical firmware that other systems would have.

When the SD card is inserted and the system boots for the first time, the kernel and initrd are loaded into memory, including the Ignition file and the configuration is laid into the storage before mounting the disk filesystems and handing control to the init process.

The process of writing all the files to the SD card is described in Booting on Raspberry Pi 4 - EDK2: Combined Fedora CoreOS + EDK2 Firmware Disk. First the stock raw CoreOS aarch64 image is written to the SD card. Then the EFI partition is mounted and the EDK2 UEFI firmware is written. At that point the SD card is bootable on a Raspberry Pi 4.

The complete procedure for writing the SD Card is provide in the scripts sub-directory: prepare-pi.sh

Connect the SD card to the working system. Make sure that any auto-mounted partitions are unmounted before proceding. Determine the device path and provide the ignition file.

prepare-sd.sh
bash scripts/prepare-sd.sh <device path> <ignition file>

As with the Intel media, the final step is to install the SD Card in the Raspberry Pi 4 and power it on. Assuming that the Pi is connected to a network with DHCP and internet access, it will boot, complete the Ignition installation, install Ansible and reboot itself.

In both cases, at that point the new system is by SSH using the core user and it is ready to be managed by Ansible.

Finally Ready

With the OS installation complete it becomes possible to start addressing the goal of this series: Deploying containerized network services with Ansible. Keep an eye out for the next post where we’ll configure Ansible and demonstrate that we have connectivity and control of our target hosts.

References

  • coreos-installer
    Usage and arguments for the CoreOS installer binary. This can be run from a live ISO or on a second host to write to the boot media.

  • CoreOS on Bare Metal
    How to install CoreOS on Bare Metal. This includes variants for PXE, and Live ISO installations.

  • CoreOS on Raspberry Pi 4
    How to install CoreOS on Raspberry Pi 4 or 5. This includes instructions for installing EFI boot components that are not present in the Pi boot firmware.

  • Ignition
    Ignition is the engine that applies the provided configuration to a new CoreOS instance on first boot.

  • UEFI-Shell
    a UEFI Shell for built from EDK2 sources

  • Raspberry Pi 4 UEFI Firmware Images
    A build of the UEFI-Shell specifically for Rasberry Pi 4

Thursday, January 16, 2025

CoreOS Configuration - Less is the right amount

Configuring CoreOS

There are already a number of good resources for deploying CoreOS to various systems. See References This document focuses on the particulars of configuring CoreOS as a base for small and medium network infrastructure services.

The Principle of Least Config

In keeping with the minimalist philosophy of CoreOS, the configuration will apply only those settings necessary to boot the system and provide remote access and configuration management. The first two are fairly trivial, but the last involves a bit of system gymnastics.

The CoreOS configuration is applied at first boot and is provided to the installer when writing the boot media to storage.

coreos-infra.bu
---
# 1 - Specify the target and schema version
variant: fcos
version: 1.6.0

# 2 - Provide an ssh public key for the core user
passwd:
  users:
    - name: core
      ssh_authorized_keys_local:
        - infra-ansible-ed25519.pub

storage:
  files:

    # 3 - Define the system hostname
    - path: /etc/hostname
      contents:
        inline: |
          infra-01.example.com

    # 4a - A script to overlay the ansible packages and clean up
    - path: /usr/local/bin/install-overlay-packages
      user:
        name: root
      group:
        name: root
      mode: 0755
      contents:
        inline: |
          #!/bin/bash
          if [ -x /usr/bin/ansible ] ; then
            rm /usr/local/bin/install-overlay-packages
            systemctl disable install-overlay-packages
            rm /etc/systemd/system/install-overlay-packages.service
          else
            rpm-ostree install --assumeyes ansible
            systemctl reboot
          fi

systemd:
  units:

    # 4b - Define a one-time service to run at first boot
    - name: install-overlay-packages.service
      enabled: true
      contents: |
        [Unit]
        Description=Install Overlay Packages
        After=systemd-resolved.service
        Before=zincati.service

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/install-overlay-packages

        [Install]
        WantedBy=multi-user.target

1 - Butane Preamble

The Butane configuration schema begins with two values that identify the target OS and the schema version itself.

variant: fcos
version: 1.6.0

This indicates that the file targets Fedora CoreOS and the schema version is 1.6.0. This assists the parser in validating the remainder of the configuration against the indicated schema.

2 - Core User - SSH Public Key

CoreOS deploys with two default users, root and core. The root user is not intended for direct login. Neither has a password by default. CoreOS is meant to be accessed by SSH on a network by the core user.

passwd:
  users:
    - name: core
      ssh_authorized_keys_local:
        - infra-ansible-ed25519.pub

The core user already exists so no additional parameters need to be provided. The user definition only specifies a public key file who’s contents will be inserted into the authorized_keys file of that user.

The ssh_authorized_keys_local option above consists of a list of filenames on the local machine that will be merged into the ignition file during transformation. The directory containing that file is provided on the butane command line using the --files-dir argument.

3 - Hostname

When you log into a system it’s convenient to see the hostname in the CLI prompts. It’s also good for reviewing logs. The hostname for Fedora is set using the /etc/hostname file.

storage:
  files:

    - path: /etc/hostname
      contents:
        inline: |
          infra-01.example.com

By convention this file contains the fully-qualified domain name of the host, and the hostname is the first element of the FQDN.

4 - Package Overlay - Install Ansible

This is the first place where CoreOS is properly customized. The goal is to automate management of the host and service using Ansible. The Fedora Project is agonistic to the user selection of configuration management software, so no CM software is installed by default. These two sections create the parts needed to overlay Ansible on first boot and then reboot so that the Ansible package contents are available.

4a - Overlay Script

The first part of this first-boot process is a shell script placed so that it can be written and removed after use.

    - path: /usr/local/bin/install-overlay-packages
      user:
        name: root
      group:
        name: root
      mode: 0755
      contents:
        inline: |
          #!/bin/bash
          if [ -x /usr/bin/ansible ] ; then
            rm /usr/local/bin/install-overlay-packages
            systemctl disable install-overlay-packages
            rm /etc/systemd/system/install-overlay-packages.service
          else
            rpm-ostree install --assumeyes ansible
            systemctl reboot
          fi

The first half of this section defines the location, ownership and permissions of the file. The second half, under the contents key contains the body of this script.

This script checks to see if the ansible binary is present and executable. If so, then the script removes itself and the systemd service unit file that triggers the script on boot. If ansible is not present, then the script overlays the Ansible RPM and then reboots.

This means that the service and hence the script is executed twice. On first boot it runs the installlation command and reboots. The second time it detects that ansible is present and then disables and removes itself.

4b - One-time First Boot Service

The CoreOS specification allows the user to define and control the operation of systemd services. This final section defines a service that executes the script previously defined.

systemd:
  units:
    - name: install-overlay-packages.service
      enabled: true
      contents: |
        [Unit]
        Description=Install Overlay Packages
        After=systemd-resolved.service
        Before=zincati.service

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/install-overlay-packages

        [Install]
        WantedBy=multi-user.target

This unit file defines when the service should start and what it should do. The service will run after networking is enabled and the DNS systemd-resolved service is running, but before the zincati update service is started. It runs the script defined above but does not detach as it would for a daemon.

As noted, this unit is deleted by the script when it runs the second time and detects the presence of the ansible binary.

Transforming the Butane System Spec

The next step is to transform the Butane file to Ignition. The CoreOS installer places the Ignition file onto the new filesystem so that it is available on first boot so it must be provided at the installer CLI invocation.

The butane binary can be installed on a Fedora system from an RPM, or it can run as a software container. See Getting Started in the Butane documents to decide what works best for you.

butane --pretty --files-dir ~/.ssh < coreos-infra.bu > coreos-infra.ign

This call only takes two parameters:

  • --pretty
    This just pretty prints the JSON output. It’s entirely cosmetic and unnecessary.

  • --files-dir ~/.ssh
    This tells butane where to find any external files, specifically, in this case, the location of the public key file for the core user.

The result of running

coreos-infra.ign
{
  "ignition": {
    "version": "3.5.0"
  },
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGl7GOHs9enyGZ7tTSh8E8G5mE+B9gyVVnz41hRyxbbN Infrastructure Ansible Key"
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "path": "/etc/hostname",
        "contents": {
          "compression": "",
          "source": "data:,infra-01.example.com%0A"
        }
      },
      {
        "group": {
          "name": "root"
        },
        "path": "/usr/local/bin/install-overlay-packages",
        "user": {
          "name": "root"
        },
        "contents": {
          "compression": "gzip",
          "source": "data:;base64,H4sIAAAAAAAC/3yPPQ7CMAyF95zCiDnkABwFMTipSyOcpMpLK3p71B8hMcBkyX7fZ/t8cj5m5xmDiT3dyL7ITahblzOiV6E7XakNkg1RTftYS2DdQjGjsaots1TlxY4cnvwQGCIsaJJCU+oieDX9Ca9macHtUHfUn/oLpM4xiBGFrPiYbEGr8llC1jIwJVkEdLzydVQVX0ozfTTvAAAA//9VmB3oBgEAAA=="
        },
        "mode": 493
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nDescription=Install Overlay Packages\nAfter=systemd-resolved.service\nBefore=zincati.service\n\n[Service]\nType=oneshot\nExecStart=/usr/local/bin/install-overlay-packages\n\n[Install]\nWantedBy=multi-user.target",
        "enabled": true,
        "name": "install-overlay-packages.service"
      }
    ]
  }
}

There are a couple of things to note in this transformation and result. The SSH public key string is merged verbatim from the file. The install-overlay-packages script is compressed and serialized as base64 of a gzip file. The systemd unit file is a JSON string with embedded newlines: \n. Together these make a single configuration file that can be copied around, served over HTTP or other file service without corruption from encoding.

Keep this file handy as it is used as input for the next step.

References

  • Butane
    The Butane format usage and specifications.

  • Ignition
    The Ignition spec for CoreOS configuration.

  • CoreOS on Bare Metal
    How to install CoreOS on Bare Metal. This includes variants for PXE, and Live ISO installations.

  • CoreOS on Raspberry Pi 4
    How to install CoreOS on Raspberry Pi 4 or 5. This includes instructions for installing EFI boot components that are not present in the Pi boot firmware.

  • systemd one-shot service
    A blog post on the workings of Systemd one-shot service units.

  • coreos-installer
    Usage and arguments for the CoreOS installer binary. This can be run from a live ISO or on a second host to write to the boot media.

Friday, January 10, 2025

The Case for CoreOS - Network Infrastructure on an Immutable OS

The Lifetime of Silent Services

For small and medium sized organizations, a local network requires the creation of and management of local network services such as DNS, NTP, DHCP, monitoring and user access controls.  These are the ante needed to get in the game but when they work properly they become invisible. This is good, but it means they can be neglected from the standpoint of management and maintenance. As long as they work it's easy to ignore them until they do break. There is a tendency to treat maintenance is a risk rather than a benefit, the fear of service interruption and downtime leading to neglect and a sense that these services are somehow fragile and precious.

For these silent services, the neglect usually manifests when the admins discover that the OS has gone end-of-life or a bug is discovered in the current version of a service or there are 200 CVEs to apply because the last reboot was 700 days ago. The problem is that accumulated updates required and unfamiliarity with the services and the maintenance history makes admins gun-shy of updates. Time only makes the fear and the debt worse.

What are you afraid of, Really?

The modern alternate is the cliche "Fail Fast", which, when thrown about without comprehension, is correctly scorned.  I prefer to say "Find the scariest thing you have to do, and do it repeatedly until it stops being scary. Then find the next scariest thing.".

The real fear and risk is of downtime without a recovery plan.  In a corporate environment the tendency of management is to CYA by avoiding any downtime by avoiding any change. While this can provide the illusion of stability, it treats the infrastructure as a static monolith. It ignores the facts that failures and updates are inevitable and sets the operations teams up for failure. It restricts their ability to practice the very update and mitigation processes that would allow them to create a robust reliable service.

The real solution is to create a system where any change can be rolled back quickly, reliably and completely.  Fedora CoreOS provides that.

Git for Filesystems?

Fedora CoreOS is a distribution of Fedora Linux that is created specifically to run software containers.  Red Hat promotes it for cloud use and only supports it as a base for OpenShift.  It is a minimal distribution with no GUI only a simple installer that writes the initial state to a bootable storage device and a simple configuration file that is applied on first boot. This by itself is unremarkable. The feature that makes CoreOS significant is that the file and package systems are based on rpm-ostree. This is an integrated file and package management system. It presents to users as an XFS filesystem, but it is mounted read-only. The filesystem is immutable. To install packages you must use the rpm-ostree command to layer the package into a new image version and then reboot to the new image. Installing application packages is discouraged in favor of running services in containers.

Did you get that? The filesystem is read only. To see updated packages you have to reboot. Wait, there's more.

The Turtle or the Frog?

Most distributions provide updates through online package repositories. Admins must periodically poll the repository, pull down any new packages, and then overlay them into the running system. At that point it becomes extremely difficult to reliably roll back. If anything fails, the only recourse is to recover the system from backups, which is understandably an extreme and time-consuming process.  This leads to a "slow and steady" approach to updates. Updates are applied to a few test systems. If no problems are discovered they are rolled forward to a set of staging systems.  Finally the updates are deployed to production.

This is an expensive, time consuming system, suited only to large organizations with the resources to implement them. It's also error prone, as it is often difficult to adequately simulate the production operating conditions in a small test environment.  More commonly in smaller organizations, updates are shunted to backlog work and neglected in favor of feature requests or helpdesk issues until some outside event brings the problem to the attention of management, when it becomes an emergency.
To compound the problems, it is common to run package updates without rebooting the system. This can result in failures that don't appear until long after the actual change is applied. All together this makes IT management very averse to regular updates and reboots because they see these as introducing problems and risking downtime with long recovery periods.

Until recently (well ages in Internet Time) this "frog in the pot" approach was really the only option. The fact that it was impossible to reliably roll back changes rightly made management and operations averse to any change to a system that was "working". 

Double-Buffered Operating System

CoreOS updates are atomic. That is, updates are published as a unit.  The stable stream is updated approximately every two weeks. There are also test and "next" streams that update more often but aren't meant for regular use.  CoreOS runs a service called Zincati. This service polls the release streams for new images and will apply them and reboot when needed. Zincati can be tuned to create staged roll-outs, applying updates first to a set of canary systems before moving on to more critical systems. It can also be tuned to restrict reboots to specific days of the week and times of day.

By conventional standards, read-only systems that update automatically and require reboot every two weeks provides the opposite of stability and reliability. But the risks posed when this is implemented on a conventional Linux distribution are mitigated when presented using rpm-ostree, zincati and software containers.  The benefits of atomic rollback and application decoupling mean that it is possible to keep systems up to date and to respond instantly to any update-induced problems. In essence the operating system is double-buffered and the current system is preserved perfectly across updates. You don't have to worry about losing the working configuration because it's still there.

For The Best Services, Don't Install Any

On CoreOS you're discouraged from installing application or service software on the system.  CoreOS is designed to run software containers. The only major service component integrated into the OS is podman, while all of the network services run on Linux as systemd services.

In 2021, a project called Quadlets was created to allow containers to be managed as first-class services under systemd. In 2022 quadlets were merged into the systemd project and as of 2024 they are available on any systemd based Linux. This means that your system services no longer are tightly coupled to the OS updates.  They don't even need to be based on the same OS distribution.

Using Quadlets, deploying a network service is a matter of defining a systemd container spec, providing the service configuration files and enabling and starting the service. No service software needs to be installed or updated ever.  Updating the service software is a matter of updating the container image path and tag and restarting the systemd service.  Reverting is just as simple. It becomes possible to basically ignore the OS when updating system services and vice-versa.  The loose coupling means that changes to one are very unlikely to affect the other and that any change can be trivially and reliably reverted without affecting the other components.

Do it again! Do it again!

The simplicity and minimalism of using CoreOS with software containers enables one last element for providing stable reliable network services. CoreOS can be installed with a simple DHCP/PXE boot and, once installed, it can be configured with a small set of Ansible scripts. These aren't remarkable by themselves but the simplicity of and compartmentalization that the immutable OS are somewhat novel in the on-premise hardware environment.  These are usually thought of as features of cloud-based services, but are perfectly applicable for small and medium organizations with limited resources.

As a matter of practice I tend not to say I can do something until I can do it 100 times with the push of a single button. With some simple automation the infrastructure can be restored in a matter of minutes on the old hardware or new.  These services tend to be small and light-weight, so they can run on inexpensive redundant hardware.

So You Say, But How?

Well, I plan to show you.  This first post is a long pontification on some thoughts I've had over the last couple of years. I've put it into practice for my home network and at one employer.  It falls under a larger theme of adapting cloud networking practices for on-premise network services.  After all, Red Hat now only supports their CoreOS stream as the base for OpenShift, Red Hat's extended Kubernetes offering. Red Hat recommends the very practices I'm going to detail to maintain the underpinnings of their enterprise distributed application service. I suspect that part of the reason they don't support it for general use is that serious adoption would undercut their revenue stream from RHEL, and I can tell you from personal experience that matters to them a lot.

This isn't a perfect strategy for all purposes either.  Unless your application is extremely simple and has already been designed and implemented for containers it doesn't make sense to shoehorn it in.  Large distributed applications are better supported on a proper Kubernetes or OpenShift deployment, whether on-premise or on a cloud service. Heavy-weight monolithic services (I'm looking at you JBoss/Tomcat apps) aren't well suited to containers, despite the trend to push them in.

In following posts I mean to walk through the deployment of Fedora CoreOS, preparation for automated configuration management and the deployment of service containers. I'm not actually sure where this will end but I mean to see just how far I can push it.  Come along if it seems like your kind of fun.

Resources

  • Fedora Linux - An extremely popular and well managed Linux distribution
  • Fedora CoreOS - A spin of Fedora that is designed to run software containers
  • libostree - A checkpointed filesystem that allows atomic rollback of file changes
  • rpm-ostree - An extension of libostree that integrates RPM package management
  • butane - YAML schema to define OS configurations for CoreOS
  • ignition - JSON schema to define OS configurations for CoreOS
  • zincati - A service to control and tune updates from CoreOS image streams
  • Quadlets - Software containers as systemd services
  • Ansible - System configuration language and toolset
  • OpenShift - Red Hat's enterprise extended version of Kubernetes
  • Kubernetes - A computing cluster system for running applications in software containers