Thursday, January 16, 2025

CoreOS Configuration - Less is the right amount

Configuring CoreOS

There are already a number of good resources for deploying CoreOS to various systems. See References This document focuses on the particulars of configuring CoreOS as a base for small and medium network infrastructure services.

The Principle of Least Config

In keeping with the minimalist philosophy of CoreOS, the configuration will apply only those settings necessary to boot the system and provide remote access and configuration management. The first two are fairly trivial, but the last involves a bit of system gymnastics.

The CoreOS configuration is applied at first boot and is provided to the installer when writing the boot media to storage.

coreos-infra.bu
---
# 1 - Specify the target and schema version
variant: fcos
version: 1.6.0

# 2 - Provide an ssh public key for the core user
passwd:
  users:
    - name: core
      ssh_authorized_keys_local:
        - infra-ansible-ed25519.pub

storage:
  files:

    # 3 - Define the system hostname
    - path: /etc/hostname
      contents:
        inline: |
          infra-01.example.com

    # 4a - A script to overlay the ansible packages and clean up
    - path: /usr/local/bin/install-overlay-packages
      user:
        name: root
      group:
        name: root
      mode: 0755
      contents:
        inline: |
          #!/bin/bash
          if [ -x /usr/bin/ansible ] ; then
            rm /usr/local/bin/install-overlay-packages
            systemctl disable install-overlay-packages
            rm /etc/systemd/system/install-overlay-packages.service
          else
            rpm-ostree install --assumeyes ansible
            systemctl reboot
          fi

systemd:
  units:

    # 4b - Define a one-time service to run at first boot
    - name: install-overlay-packages.service
      enabled: true
      contents: |
        [Unit]
        Description=Install Overlay Packages
        After=systemd-resolved.service
        Before=zincati.service

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/install-overlay-packages

        [Install]
        WantedBy=multi-user.target

1 - Butane Preamble

The Butane configuration schema begins with two values that identify the target OS and the schema version itself.

variant: fcos
version: 1.6.0

This indicates that the file targets Fedora CoreOS and the schema version is 1.6.0. This assists the parser in validating the remainder of the configuration against the indicated schema.

2 - Core User - SSH Public Key

CoreOS deploys with two default users, root and core. The root user is not intended for direct login. Neither has a password by default. CoreOS is meant to be accessed by SSH on a network by the core user.

passwd:
  users:
    - name: core
      ssh_authorized_keys_local:
        - infra-ansible-ed25519.pub

The core user already exists so no additional parameters need to be provided. The user definition only specifies a public key file who’s contents will be inserted into the authorized_keys file of that user.

The ssh_authorized_keys_local option above consists of a list of filenames on the local machine that will be merged into the ignition file during transformation. The directory containing that file is provided on the butane command line using the --files-dir argument.

3 - Hostname

When you log into a system it’s convenient to see the hostname in the CLI prompts. It’s also good for reviewing logs. The hostname for Fedora is set using the /etc/hostname file.

storage:
  files:

    - path: /etc/hostname
      contents:
        inline: |
          infra-01.example.com

By convention this file contains the fully-qualified domain name of the host, and the hostname is the first element of the FQDN.

4 - Package Overlay - Install Ansible

This is the first place where CoreOS is properly customized. The goal is to automate management of the host and service using Ansible. The Fedora Project is agonistic to the user selection of configuration management software, so no CM software is installed by default. These two sections create the parts needed to overlay Ansible on first boot and then reboot so that the Ansible package contents are available.

4a - Overlay Script

The first part of this first-boot process is a shell script placed so that it can be written and removed after use.

    - path: /usr/local/bin/install-overlay-packages
      user:
        name: root
      group:
        name: root
      mode: 0755
      contents:
        inline: |
          #!/bin/bash
          if [ -x /usr/bin/ansible ] ; then
            rm /usr/local/bin/install-overlay-packages
            systemctl disable install-overlay-packages
            rm /etc/systemd/system/install-overlay-packages.service
          else
            rpm-ostree install --assumeyes ansible
            systemctl reboot
          fi

The first half of this section defines the location, ownership and permissions of the file. The second half, under the contents key contains the body of this script.

This script checks to see if the ansible binary is present and executable. If so, then the script removes itself and the systemd service unit file that triggers the script on boot. If ansible is not present, then the script overlays the Ansible RPM and then reboots.

This means that the service and hence the script is executed twice. On first boot it runs the installlation command and reboots. The second time it detects that ansible is present and then disables and removes itself.

4b - One-time First Boot Service

The CoreOS specification allows the user to define and control the operation of systemd services. This final section defines a service that executes the script previously defined.

systemd:
  units:
    - name: install-overlay-packages.service
      enabled: true
      contents: |
        [Unit]
        Description=Install Overlay Packages
        After=systemd-resolved.service
        Before=zincati.service

        [Service]
        Type=oneshot
        ExecStart=/usr/local/bin/install-overlay-packages

        [Install]
        WantedBy=multi-user.target

This unit file defines when the service should start and what it should do. The service will run after networking is enabled and the DNS systemd-resolved service is running, but before the zincati update service is started. It runs the script defined above but does not detach as it would for a daemon.

As noted, this unit is deleted by the script when it runs the second time and detects the presence of the ansible binary.

Transforming the Butane System Spec

The next step is to transform the Butane file to Ignition. The CoreOS installer places the Ignition file onto the new filesystem so that it is available on first boot so it must be provided at the installer CLI invocation.

The butane binary can be installed on a Fedora system from an RPM, or it can run as a software container. See Getting Started in the Butane documents to decide what works best for you.

butane --pretty --files-dir ~/.ssh < coreos-infra.bu > coreos-infra.ign

This call only takes two parameters:

  • --pretty
    This just pretty prints the JSON output. It’s entirely cosmetic and unnecessary.

  • --files-dir ~/.ssh
    This tells butane where to find any external files, specifically, in this case, the location of the public key file for the core user.

The result of running

coreos-infra.ign
{
  "ignition": {
    "version": "3.5.0"
  },
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGl7GOHs9enyGZ7tTSh8E8G5mE+B9gyVVnz41hRyxbbN Infrastructure Ansible Key"
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "path": "/etc/hostname",
        "contents": {
          "compression": "",
          "source": "data:,infra-01.example.com%0A"
        }
      },
      {
        "group": {
          "name": "root"
        },
        "path": "/usr/local/bin/install-overlay-packages",
        "user": {
          "name": "root"
        },
        "contents": {
          "compression": "gzip",
          "source": "data:;base64,H4sIAAAAAAAC/3yPPQ7CMAyF95zCiDnkABwFMTipSyOcpMpLK3p71B8hMcBkyX7fZ/t8cj5m5xmDiT3dyL7ITahblzOiV6E7XakNkg1RTftYS2DdQjGjsaots1TlxY4cnvwQGCIsaJJCU+oieDX9Ca9macHtUHfUn/oLpM4xiBGFrPiYbEGr8llC1jIwJVkEdLzydVQVX0ozfTTvAAAA//9VmB3oBgEAAA=="
        },
        "mode": 493
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nDescription=Install Overlay Packages\nAfter=systemd-resolved.service\nBefore=zincati.service\n\n[Service]\nType=oneshot\nExecStart=/usr/local/bin/install-overlay-packages\n\n[Install]\nWantedBy=multi-user.target",
        "enabled": true,
        "name": "install-overlay-packages.service"
      }
    ]
  }
}

There are a couple of things to note in this transformation and result. The SSH public key string is merged verbatim from the file. The install-overlay-packages script is compressed and serialized as base64 of a gzip file. The systemd unit file is a JSON string with embedded newlines: \n. Together these make a single configuration file that can be copied around, served over HTTP or other file service without corruption from encoding.

Keep this file handy as it is used as input for the next step.

References

  • Butane
    The Butane format usage and specifications.

  • Ignition
    The Ignition spec for CoreOS configuration.

  • CoreOS on Bare Metal
    How to install CoreOS on Bare Metal. This includes variants for PXE, and Live ISO installations.

  • CoreOS on Raspberry Pi 4
    How to install CoreOS on Raspberry Pi 4 or 5. This includes instructions for installing EFI boot components that are not present in the Pi boot firmware.

  • systemd one-shot service
    A blog post on the workings of Systemd one-shot service units.

  • coreos-installer
    Usage and arguments for the CoreOS installer binary. This can be run from a live ISO or on a second host to write to the boot media.

Friday, January 10, 2025

The Case for CoreOS - Network Infrastructure on an Immutable OS

The Lifetime of Silent Services

For small and medium sized organizations, a local network requires the creation of and management of local network services such as DNS, NTP, DHCP, monitoring and user access controls.  These are the ante needed to get in the game but when they work properly they become invisible. This is good, but it means they can be neglected from the standpoint of management and maintenance. As long as they work it's easy to ignore them until they do break. There is a tendency to treat maintenance is a risk rather than a benefit, the fear of service interruption and downtime leading to neglect and a sense that these services are somehow fragile and precious.

For these silent services, the neglect usually manifests when the admins discover that the OS has gone end-of-life or a bug is discovered in the current version of a service or there are 200 CVEs to apply because the last reboot was 700 days ago. The problem is that accumulated updates required and unfamiliarity with the services and the maintenance history makes admins gun-shy of updates. Time only makes the fear and the debt worse.

What are you afraid of, Really?

The modern alternate is the cliche "Fail Fast", which, when thrown about without comprehension, is correctly scorned.  I prefer to say "Find the scariest thing you have to do, and do it repeatedly until it stops being scary. Then find the next scariest thing.".

The real fear and risk is of downtime without a recovery plan.  In a corporate environment the tendency of management is to CYA by avoiding any downtime by avoiding any change. While this can provide the illusion of stability, it treats the infrastructure as a static monolith. It ignores the facts that failures and updates are inevitable and sets the operations teams up for failure. It restricts their ability to practice the very update and mitigation processes that would allow them to create a robust reliable service.

The real solution is to create a system where any change can be rolled back quickly, reliably and completely.  Fedora CoreOS provides that.

Git for Filesystems?

Fedora CoreOS is a distribution of Fedora Linux that is created specifically to run software containers.  Red Hat promotes it for cloud use and only supports it as a base for OpenShift.  It is a minimal distribution with no GUI only a simple installer that writes the initial state to a bootable storage device and a simple configuration file that is applied on first boot. This by itself is unremarkable. The feature that makes CoreOS significant is that the file and package systems are based on rpm-ostree. This is an integrated file and package management system. It presents to users as an XFS filesystem, but it is mounted read-only. The filesystem is immutable. To install packages you must use the rpm-ostree command to layer the package into a new image version and then reboot to the new image. Installing application packages is discouraged in favor of running services in containers.

Did you get that? The filesystem is read only. To see updated packages you have to reboot. Wait, there's more.

The Turtle or the Frog?

Most distributions provide updates through online package repositories. Admins must periodically poll the repository, pull down any new packages, and then overlay them into the running system. At that point it becomes extremely difficult to reliably roll back. If anything fails, the only recourse is to recover the system from backups, which is understandably an extreme and time-consuming process.  This leads to a "slow and steady" approach to updates. Updates are applied to a few test systems. If no problems are discovered they are rolled forward to a set of staging systems.  Finally the updates are deployed to production.

This is an expensive, time consuming system, suited only to large organizations with the resources to implement them. It's also error prone, as it is often difficult to adequately simulate the production operating conditions in a small test environment.  More commonly in smaller organizations, updates are shunted to backlog work and neglected in favor of feature requests or helpdesk issues until some outside event brings the problem to the attention of management, when it becomes an emergency.
To compound the problems, it is common to run package updates without rebooting the system. This can result in failures that don't appear until long after the actual change is applied. All together this makes IT management very averse to regular updates and reboots because they see these as introducing problems and risking downtime with long recovery periods.

Until recently (well ages in Internet Time) this "frog in the pot" approach was really the only option. The fact that it was impossible to reliably roll back changes rightly made management and operations averse to any change to a system that was "working". 

Double-Buffered Operating System

CoreOS updates are atomic. That is, updates are published as a unit.  The stable stream is updated approximately every two weeks. There are also test and "next" streams that update more often but aren't meant for regular use.  CoreOS runs a service called Zincati. This service polls the release streams for new images and will apply them and reboot when needed. Zincati can be tuned to create staged roll-outs, applying updates first to a set of canary systems before moving on to more critical systems. It can also be tuned to restrict reboots to specific days of the week and times of day.

By conventional standards, read-only systems that update automatically and require reboot every two weeks provides the opposite of stability and reliability. But the risks posed when this is implemented on a conventional Linux distribution are mitigated when presented using rpm-ostree, zincati and software containers.  The benefits of atomic rollback and application decoupling mean that it is possible to keep systems up to date and to respond instantly to any update-induced problems. In essence the operating system is double-buffered and the current system is preserved perfectly across updates. You don't have to worry about losing the working configuration because it's still there.

For The Best Services, Don't Install Any

On CoreOS you're discouraged from installing application or service software on the system.  CoreOS is designed to run software containers. The only major service component integrated into the OS is podman, while all of the network services run on Linux as systemd services.

In 2021, a project called Quadlets was created to allow containers to be managed as first-class services under systemd. In 2022 quadlets were merged into the systemd project and as of 2024 they are available on any systemd based Linux. This means that your system services no longer are tightly coupled to the OS updates.  They don't even need to be based on the same OS distribution.

Using Quadlets, deploying a network service is a matter of defining a systemd container spec, providing the service configuration files and enabling and starting the service. No service software needs to be installed or updated ever.  Updating the service software is a matter of updating the container image path and tag and restarting the systemd service.  Reverting is just as simple. It becomes possible to basically ignore the OS when updating system services and vice-versa.  The loose coupling means that changes to one are very unlikely to affect the other and that any change can be trivially and reliably reverted without affecting the other components.

Do it again! Do it again!

The simplicity and minimalism of using CoreOS with software containers enables one last element for providing stable reliable network services. CoreOS can be installed with a simple DHCP/PXE boot and, once installed, it can be configured with a small set of Ansible scripts. These aren't remarkable by themselves but the simplicity of and compartmentalization that the immutable OS are somewhat novel in the on-premise hardware environment.  These are usually thought of as features of cloud-based services, but are perfectly applicable for small and medium organizations with limited resources.

As a matter of practice I tend not to say I can do something until I can do it 100 times with the push of a single button. With some simple automation the infrastructure can be restored in a matter of minutes on the old hardware or new.  These services tend to be small and light-weight, so they can run on inexpensive redundant hardware.

So You Say, But How?

Well, I plan to show you.  This first post is a long pontification on some thoughts I've had over the last couple of years. I've put it into practice for my home network and at one employer.  It falls under a larger theme of adapting cloud networking practices for on-premise network services.  After all, Red Hat now only supports their CoreOS stream as the base for OpenShift, Red Hat's extended Kubernetes offering. Red Hat recommends the very practices I'm going to detail to maintain the underpinnings of their enterprise distributed application service. I suspect that part of the reason they don't support it for general use is that serious adoption would undercut their revenue stream from RHEL, and I can tell you from personal experience that matters to them a lot.

This isn't a perfect strategy for all purposes either.  Unless your application is extremely simple and has already been designed and implemented for containers it doesn't make sense to shoehorn it in.  Large distributed applications are better supported on a proper Kubernetes or OpenShift deployment, whether on-premise or on a cloud service. Heavy-weight monolithic services (I'm looking at you JBoss/Tomcat apps) aren't well suited to containers, despite the trend to push them in.

In following posts I mean to walk through the deployment of Fedora CoreOS, preparation for automated configuration management and the deployment of service containers. I'm not actually sure where this will end but I mean to see just how far I can push it.  Come along if it seems like your kind of fun.

Resources

  • Fedora Linux - An extremely popular and well managed Linux distribution
  • Fedora CoreOS - A spin of Fedora that is designed to run software containers
  • libostree - A checkpointed filesystem that allows atomic rollback of file changes
  • rpm-ostree - An extension of libostree that integrates RPM package management
  • butane - YAML schema to define OS configurations for CoreOS
  • ignition - JSON schema to define OS configurations for CoreOS
  • zincati - A service to control and tune updates from CoreOS image streams
  • Quadlets - Software containers as systemd services
  • Ansible - System configuration language and toolset
  • OpenShift - Red Hat's enterprise extended version of Kubernetes
  • Kubernetes - A computing cluster system for running applications in software containers



Sunday, February 18, 2024

Running a Cloudflare Zero-Trust Tunnel as a Systemd Container Service

I'm running a Cloudflare Zero Trust tunnel for inbound access to a remote office.  For a site served by Zero-Trust networking, there is no exposed inbound listener. Instead, a tunnel is initiated outbound from the site to the Cloudflare network service. 

I recently learned of Quadlets, a way to run this service as a software container on Red Hat derived Linux systems such as CentOS, Fedora and Fedora CoreOS that systemd as their init process and podman for container management. 

The tunnel is created by running a process on a host inside the destination network. This process connects out to the Cloudflare infrastructure which can then route traffic down the tunnel from connected clients.

Cloudflare distributes a single binary for each platform that creates the tunnel uplink and then carries and routes the inbound traffic. That binary, cloudflared, must be executed as a service on one or more hosts on the destination network. When running a tunnel on a host using the binary, one must define a system service. On most Linux systems today, system services are managed by systemd. However, while Cloudflare provides the binary, they do not provide a systemd service definition. If you want to run the cloudflared as a service you must create the service file.

Cloudflare also offers the cloudflared as a software container at docker.io/cloudflare/cloudflared.

To create a tunnel daemon on a suitable host you only need to create two files. The first is the container definition for Podman. The other is a sysconfig file that defines the TUNNEL_TOKEN environment variable used to identify your tunnel configuration.

Quadlet container definitions look very similar to systemd service files (because, of course, they are derived from them).  The file below must be placed at
/etc/containers/systemd/cloudflare-tunnel.container on the server host.

--- cloudflare-tunnel.container ---
[Unit]
Description=Cloudflare Tunnel Daemon
After=network-onlone.target

[Container]
EnvironmentFile=/etc/sysconfig/cloudflare-tunnel
Image=docker.io/cloudflare/cloudflared
Exec=tunnel --no-autoupdate run

[Install]
WantedBy=multi-user.target default.target
---

The other file is placed at /etc/sysconfig/cloudflare-tunnel as indicated by the EnvironmentFile value in the container file.


--- cloudflare-tunnel ---

TUNNEL_TOKEN=<your tunnel token>

---

The tunnel token is defined when you create the tunnel on the Cloudflare Zero-Trust Network dashboard. Replace the marker with your token string.

With those two files defined all that remains is to load the container spec into systemd and start the service.  All container specs are, by definition, enabled, so you don't need to enable/disable the service.

$ sudo systemctl daemon-reload

$ sudo systemctl start cloudflare-tunnel

$ sudo systemctl status cloudflare-tunnel

Then check your Cloudflare tunnel dashboard to confirm that the tunnel is indeed up.

NOTE: It's a really good idea, when creating network (vs single host) tunnels to run at least two copies on different servers within the destination network.  This creates redundancy and allows you to work on either tunnel box without (ok with LESS) risk of losing connectivity while you work.

Monday, May 19, 2014

Robust and Flexable DHCP and provisioning: An LDAP backed DHCP service.

In the last post I created an empty LDAP database ready to accept content. In this one I mean to add a DHCP service configuration for a single subnet and a test host entry.

This section is a long argument describing the advantages of using a backing database for DHCP. You can skip it if you're already convinced.

Why use a database?


There are significant reasons to use a proper database (yes, LDAP is a database) for DHCP management.

  • Update without restart
  • Avoid ad hoc file parsing or generation
  • Reduce configuration sites

The use of a flat file for configuration and data, the use of an inaccessible in-memory database and the network limitations of the DHCP protocol all pose problems for all but the smallest networks.  Backing the DHCP services with a database can address all three.

Testing: Emit and Collect Test DHCP Queries - dhtest


It turns out that there aren't many tools for testing DHCP responses. I found several but they were only in source code. The one I decided to use is called dhtest and it's available from Github:
https://github.com/saravana815/dhtest

It builds cleanly on Fedora 19 and 20.
git clone https://github.com/saravana815/dhtest
Cloning into 'dhtest'...
remote: Reusing existing pack: 53, done.
remote: Total 53 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (53/53), done.
cd dhtest
make
gcc    -c -o dhtest.o dhtest.c
gcc    -c -o functions.o functions.c
gcc dhtest.o functions.o -o dhtest

When it runs successfully this is what it looks like

sudo ./dhtest --mac 0a:00:00:00:00:01 \
  --interface p2p1 --server 10.0.2.15 --verbose
DHCP discover sent  - Client MAC : 0a:00:00:00:00:01
DHCP offer received  - Offered IP : 10.0.2.16

DHCP offer details
----------------------------------------------------------
DHCP offered IP from server - 10.0.2.16
Next server IP(Probably TFTP server) - 10.0.2.4
Subnet mask - 255.255.255.0
Router/gateway - 10.0.2.2
DNS server - 10.0.2.3
Lease time - 1 Days 0 Hours 0 Minutes
DHCP server  - 10.0.2.2
----------------------------------------------------------

DHCP request sent  - Client MAC : 0a:00:00:00:00:01
DHCP ack received  - Acquired IP: 10.0.2.16

DHCP ack details
----------------------------------------------------------
DHCP offered IP from server - 10.0.2.16
Next server IP(Probably TFTP server) - 10.0.2.4
Subnet mask - 255.255.255.0
Router/gateway - 10.0.2.2
DNS server - 10.0.2.3
Lease time - 1 Days 0 Hours 0 Minutes
DHCP server  - 10.0.2.2
----------------------------------------------------------

Procedure


Finally I get to the actual process of creating the DHCP service.  First the ingredients and a summary of the process. Then the details.

Ingredients


Before starting there are a set of parameters that should be defined.  The DHCP server will need to gain access to the LDAP service and the DHCP server configuration in the LDAP database must reflect the network on which the DHCP server resides.  I also add one dummy test host that I can use for validation.

LDAP Server
LDAP Server Hostnameldap.example.com
Database DNdc=example,dc=com
Admin Username (DN)dc=Manager,dc=example,dc=com
Admin Passwordchangeme

Subnet  Specification
Base Address10.0.2.0
Netmask/24
Gateway10.0.2.2
DNS Servers10.0.2.3
NTP Servers10.0.2.3

Host Entry
MAC Address0a:00:00:00:00:01
IP Address10.0.2.16

Recipe


Running DHCP with LDAP (conceptually) requires two different servers. You can run them both on the same host if you want. Adjust your IP addresses and hostnames to your environment.
  1. On the LDAP server
    1. Prepare the LDAP database for DHCP configuration
      1. Convert the DHCP schema file to LDIF
      2. Import the DHCP schema (as LDIF) into the cn=config database
    2. Convert the DHCP config to LDIF and load it into the database
      1. dhcpServer
      2. dhcpService
      3. dhcpSubnet
      4. dhcpHost
  2. On the DHCP server
    1. Prepare logging
    2. Verify LDAP connectivity
    3. Configure DHCP service
    4. Start DHCP service
    5. Test DHCP service

LDAP Server Host

Convert DHCP Schema to LDIF

The DHCP schema for LDAP isn't part of the standard OpenLDAP server packages. On Fedora it's part of the DHCP package. On Debian it's part of a special package which includes the DHCP server with LDAP integration: isc-dhcp-ldap. Because the LDAP schema file is provided as part of the DHCP server packaging, it must be transferred to the LDAP server to be loaded into the database schema set.

Even then the schema is provided in the older LDAP schema format. I need it in LDIF format so that I can load it like the others. Fortunately it's possible to load the older schema into memory and then write them out as LDIF using slapcat. The trick is to convince it to use a special alternate configuration file which just imports the old form schema and then dump the config as LDIF. There are a couple of tweaks to make on the resulting LDIF. The schema object is created with an array index of zero (0). That has to be removed. Slapcat also adds a CRC, and some reference and time stamp information that won't apply to the schema definition when it is loaded into a new database.

The section of code below will produce a file named dhcp.ldif. It takes the dhcp.schema file as input. It uses a temporary file for the LDAP configuration which only loads the DHCP schema and a temporary directory to contain the resulting LDIF config tree which slapcat produces as a matter of course.

#!/bin/sh
# Create the required tmp file/directory
mkdir slapd.d
echo 'include /etc/openldap/schema/dhcp.schema' > slapd.conf
# load the schema and then dump it in LDIF format
slapcat -f slapd.conf -F slapd.d -n0 -l dhcp.ldif \
  -H ldap:///cn={0}dhcp,cn=schema,cn=config
# remove the CRC, array index and timestamp/UUID entries
sed -i -e '/CRC32/d ; s/{0}dhcp/dhcp/ ; /structuralObjectClass/,$d' \
  dhcp.ldif
# remove the tmp file/directory
rm -rf slapd.d
rm slapd.conf
sudo cp dhcp.ldif /etc/openldap/schema/dhcp.ldif

(remember, this runs on the LDAP server host)

Import DHCP schema into configuration database


Once I have a the DHCP schema in LDIF format I can load it the same way I loaded the stock schema. This will be the last command which must run as root on the LDAP server and uses local authentication.

sudo ldapadd -Q -Y EXTERNAL -H ldapi:/// /etc/openldap/schema/dhcp.ldif

From this point on I'll be adding things not to the config database but to the hdb database using the RootDN and RootPW account.

Load the DHCP configuration into the LDAP server


The DHCP service configuration (as expressed in LDIF) requires three objects to describe a minimal working DHCP service:

  1. dhcpServer - The host on which the DHCP service will run
  2. dhcpService - The global settings which control the behavior of the DHCP service
  3. dhcpSubnet - A description of a subnet to which the DHCP server is connected
Making changes to any of these objects will require a restart of the affected DHCP daemon processes.

DHCP Server


The LDAP dhcpServer object is the hook to which the dhcpd process will attach when it starts up. This object contains the DN of the top of the DHCP service configuration.

LDAP object classes are additive. That is, a single entry in the database will commonly have more than one objectClass attribute. The objectClass attributes declare the set of attributes which the object can have and
there is no limit (other than conflict) to the combinations.

I believe that the dhcpServer objectClass can be combined with the NIS host class so that information about particular hosts can be unified under a single object.

#
# Define the DHCP host entry which will be used by the DHCP service on startup
# This is the configuration entry hook
#
dn: cn=dhcp-host,dc=example,dc=com
cn: dhcp-host
objectClass: top
objectClass: dhcpServer
dhcpServiceDN: cn=dhcp-service,dc=example,dc=com


DHCP Service


The dhcpService object is the root of the DHCP daemon configuration information. All of the objects which define a DHCP service configuration will be children of this object. That is, the DN of the dhcpService object will be the suffix for the rest of the objects that define the configuration.

There are two types of attribute which all objects in the DHCP configuration can have. These are the dhcpStatement and dhcpOption attributes. These correspond to normal statement lines and option lines in the traditional dhcpd.conf file.

The dhcpService attributes define the deamon behavior and any global options which would apply to all query responses.

# The root object of the DHCP service
# All elements of the DHCP configuration will use this DN for a suffix.
# 
dn: cn=dhcp-service,dc=example,dc=com
cn: dhcp-service
objectClass: top
objectClass: dhcpService
objectClass: dhcpOptions
dhcpPrimaryDN: cn=dhcp-host, dc=example,dc=com
dhcpStatements: authoritative
dhcpStatements: ddns-update-style none
dhcpStatements: max-lease-time 43200
dhcpStatements: default-lease-time 3600
dhcpStatements: allow booting
dhcpStatements: allow bootp
dhcpOption: domain-name "example.com"
dhcpOption: domain-name-servers 10.0.2.3


DHCP Subnet


The DHCP service needs a subnet definition so that it knows what interface(s) to bind to. A DHCP server listens for discovery requests. There's no point in listening if there are no networks to listen on, so the daemon will exit.

# DHCP Subnet object
# 
dn: cn=10.0.2.0, cn=dhcp-service,dc=example,dc=com
cn: 10.0.2.0
objectClass: top
objectClass: dhcpSubnet
dhcpNetMask: 24
dhcpOption: routers 10.0.2.2


Test DHCP Lease Reservation



# A Test Host Lease Reservation
# The definition of a host: name, MAC, IP address
# Additional options can control PXE boot and OS installation
#
dn: cn=testhost, cn=dhcp-service,dc=example,dc=com
cn: testhost
objectClass: top
objectClass: dhcpHost
objectClass: dhcpOptions
dhcpHWAddress: ethernet 0a:00:00:00:00:01
dhcpStatements: fixed-address 10.0.2.16
dhcpOption: host-name "testhost"


DHCP Server Host

These operations configure the DHCP server host and the dhcp daemon.

Prepare Logging (Optional)


I like to be able to view the logs for critical services separately from the rest of the system logs. This can make it easier. For this I'll add a config file for rsyslog which filters the dhcpd log entries to a file of their own. This doesn''t change the behavior at all, it just makes viewing the logs simpler.
First, create an empty log file (rsyslog doesn't like to create files that don't exist)
sudo touch /var/log/dhcpd.log

Then create the rsyslog config entry in /etc/rsyslog.d

cat <<EOF >/etc/rsyslog.d/dhcpd.conf
if $programname == "dhcpd" then /var/log/dhcpd.log
EOF

Finally, restart the rsyslog daemon

sudo systemctl restart rsyslog

Verify LDAP access


Before trying to connect the DHCP server to the LDAP service, I need to verify that the DHCP host can make the required connection and retrieve the dhcpServer entry which is the anchor for the configuration data.

ldapsearch -H ldap://ldap.example.com \
    -x -w changeme \
    -D cn=Manager,dc=example,dc=com \
    -b dc=example,dc=com \
    objectClass=dhcpServer

Set the DHCP server configuration - use LDAP server

When the dhcpd is configured for an LDAP database, the configuration file is a lot smaller than is typical.  It merely identifies where to find the configuration.  It can also indicate whether the daemon should read the configuration once and load it into memory, or resolve each query with a check of the database. Finally, it can write a copy of the configuration in the traditional format for verification.

# DHCP Host Location
ldap-server "ldap.example.com" ;
ldap-port 389 ;

# A user with read/write access to the database
ldap-username "cn=Manager,dc=example,dc=com" ;
ldap-password "changeme" ;

# Identify the root object of the config
ldap-base-dn "dc=example,dc=com" ;
ldap-dhcp-server-cn "dhcp-host" ;

# All queries check the database
ldap-method dynamic ;

# Write the DHCP config for validation
#   An empty file must exist before starting the daemon
#   And it must be writable by the dhcpd user
#ldap-debug-file "/var/log/dhcp-ldap-startup.log" ;


Start the DHCP server


sudo systemctl start dhcpd

Verify that the daemon has started and is serving queries for the subnet

May 16 20:06:53 fedora-20-x64 dhcpd: Internet Systems Consortium DHCP Server 4.2
.6
May 16 20:06:53 fedora-20-x64 dhcpd: Copyright 2004-2014 Internet Systems Consor
tium.
May 16 20:06:53 fedora-20-x64 dhcpd: All rights reserved.
May 16 20:06:53 fedora-20-x64 dhcpd: For info, please visit https://www.isc.org/
software/dhcp/
May 16 20:06:53 fedora-20-x64 dhcpd: Wrote 0 leases to leases file.
May 16 20:06:53 fedora-20-x64 dhcpd: Listening on LPF/p2p1/08:00:27:35:3b:b0/10.
0.2.0/24
May 16 20:06:53 fedora-20-x64 dhcpd: Sending on   LPF/p2p1/08:00:27:35:3b:b0/10.
0.2.0/24
May 16 20:06:53 fedora-20-x64 dhcpd: Sending on   Socket/fallback/fallback-net

Verify Operation


sudo dhtest --verbose --mac 0a:00:00:00:00:01 --interface eth0 --server 10.0.2.15
...
May 16 20:11:49 fedora-20-x64 dhcpd: DHCPDISCOVER from 0a:00:00:00:00:01 via eth-
May 16 20:11:49 fedora-20-x64 dhcpd: DHCPOFFER on 10.0.2.16 to 0a:00:00:00:00:01
 via eth0
May 16 20:11:49 fedora-20-x64 dhcpd: DHCPREQUEST for 10.0.2.16 (10.0.2.2) from 0
a:00:00:00:00:01 via eth0
May 16 20:11:49 fedora-20-x64 dhcpd: DHCPACK on 10.0.2.16 to 0a:00:00:00:00:01 v
ia eth0

Additional Work


This is a very simple example. There is considerable work that is still needed for a production system.
  1. Security - LDAP over SSL
  2. Security - Add LDAP users for access control
  3. Security - SASL or Kerberos authentication
  4. Security - Database access controls (user ACLs)
  5. HA - LDAP database replication

References

  • DHCP LDAP Patch
    https://github.com/dcantrell/ldap-for-dhcp/wiki
  • An Early example:
    https://skalyanasundaram.wordpress.com/dhcp/dhcp-with-ldap-support/
  • dhtest - DHCP emitter/responder
    https://github.com/saravana815/dhtest

Sunday, April 13, 2014

Initializing an OpenLDAP database with the LDIF configuration

Pretty much all host and network services have traditionally been configured using flat files in /etc.  Several also have databases which are stored in flat files, and sometimes even intermingled with the configuration proper.  ISC DNS and DHCP are two significant ones.  This has the advantage of making the configuration and data easy to edit and update manually.  The disadvantage is that it must be edited and updated manually and any change means either restarting the daemon or signalling it to reload the database.

The most common solution to the editing problem is to create templates and scripts to make changes and re-generate the config/database files.  This still requires kicking the daemon for each change.   The data is often stored in a back-end database which the scripts read to generate the new config files.

What many people don't know is that both ISC DNS and DHCP can use an LDAP database directly as the back-end.  Using the LDAP database, changes can be made programatically, using standard protocols and standard APIs that implement them.

In the next couple of posts I plan to show how to create an LDAP backed DHCP service, but I need a working LDAP service first.  This post will show how initialize the LDAP service on a Linux server using OpenLDAP.  I'm going to do most of the work on Fedora 20, but it should all translate simply to either Red Hat Linux or to Debian based Linux distributions.  Where I am aware of it I'll make notes on the differences for those.

Ingredients


  • LDAP database top level distinguished name (DN): dc=example,dc=com
    A domain object for DNS domain example.com
  • LDAP admin user: cn=Manager,dc=example,dc=com
  • Initial admin user password: make one up.

LDAP terminology 101


LDAP is actually not nearly as complicated as it has been made to seem.  It does have some rather arcane terminology and it helps to get that out of the way before starting.

LDAP is a hierarchical key/value database.  This means that each value has a unique name (the key) and that each key is composed of two parts.  The first part is the local name and the remaining part is the name of the "parent" object.  At the top is the "root object" which has is special in that it has no parent. The root object can have direct values and it can have children, other objects which have their own values.

In some ways you can think of an LDAP database in the same way as you think of a filesystem. There is a root path to the top directory.  Each directory can contain files and subdirectories which in turn can have their own subdirectories.  Unlike a filesystem each object (directory) has one or more "objectClass" definitions which define the set of acceptable values and types of children.

Unlike a filesystem you can't easily browse the directory tree.  You need to know the name of the value you want, though you can make queries using type and value patterns.

Here are the most important terms you need to know to get started with an LDAP database:
  • LDAP service
    The process which answers LDAP queries.  May contain more than one database
  • LDAP database
    A unit of related data contained within an LDAP service.  Each database has a "Base DN"
  • LDAP schema
    The definition of sets of related data objects.  The schema defines both the attributes of the objects and their relationships (if any)
  • LDAP Data Interchange Format (LDIF)
    A serialized text format which describes both the contents of a database and certain operations on the contents (add/modify/delete)
  • Distinguished Name (dn)
    A unique name for a data object within the database.  A DN is usually composed by prepending a Common Name onto the object's parents DN. 
  • Base DN
    The root of the data hierarchy within an LDAP database.
  • Common Name (cn)
    A potentially non-unique name for a data object.
  • Object Class (objectClass)
    An attribute of a data object which defines which other attributes and relationships the object can have.  An object may have multiple object classes.
  • Domain Component (dc)
    This indicates one part of a DNS domain name.  The parts normally separated by dots (.) This is only called out specially here because DNS domains are commonly used as the conventional RootDN for corporate LDAP databases.

Required Packages


The first step is to install the OpenLDAP software packages.

I work with two main Linux distribution families. I differentiate them by the packaging mechanism since that's the practical difference that I have to deal with.  It's not nearly the only difference.

Since I work at Red Hat (actually long before I worked and Red Hat) I use RPM based distributions like Fedora and Red Hat Enterprise Linux (RHEL).  The other major distribution family is the Debian based distributions which also include Ubuntu and its variants.  Each family tends to contain forks of one of the two "parent" distributions so that the locations and names of packages and the files they contain tend to fall into one of those two groups.

I'm going to refer primarily to the locations of files in the RPM based distributions. I'll call out the variations for Debian distributions when it matters.

If you're installing a new OpenLDAP service then the first thing you need to do is install the required packages.

RPM based systems (Fedora, RHEL)


  • openldap-servers
  • openldap-clients

Debian based systems (Debian, Ubuntu...)


  • slapd
  • ldap-utils

Debian systems in their misguided (though sometimes effective) attempt to make things easier for sysadmins attempts to configure and start new services when the packages are installed.  When you install the slapd package you will be prompted for the initial admin password for your LDAP service.  Have your initial password ready before you begin package installation. When the package finishes installing you will have a running, but not yet properly configured LDAP service. You will be able to skip several of the steps below.  Watch for the notes.

Initialize the LDAP server


The code samples below are from a Fedora 20 system.  You'll need to adjust file locations for the schema and configuration files if you're running on a Debian based system.

Once the OpenLDAP packages are installed it''s time to begin setting up the contents of the LDAP database.  If you're working on a Debian based system you can skip the next step as it is done for you when it starts the service.

Copy default DB_CONFIG (Fedora)


If you installed on Debian and it set the initial password and started the service for you, you can skip down to the next section.

OpenLDAP typically defaults to using one of two varieties of the Berkeley DB storage format.  The standard Berkeley DB format is indicated by "bdb".  A more recent version tuned for hierarchical databases like LDAP is known as "hdb".  When I looked recently both Debian and Fedora create an initial database with the hdb format.

The BDB derivatives are very tunable to a level  to which most people will not be interested.  The tuning it set in a file called DB_CONFIG which resides in the same directory as the database files (/var/lib/ldap). Both Debian and Fedora offer a default tuning file and I generally use it unchanged.

  • /usr/share/openldap-servers/DB_CONFIG.example

cp /usr/share/openldap-servers/DB_CONFIG.example /var/lib/ldap/DB_CONFIG

At this point I can start and enable the slapdservice on RPM based systems.

sudo systemctl start slapd
sudo systemctl enable slapd

Communicating with OpenLDAP (local)


The default initial configuration of OpenLDAP allows the root user to view and manage the database configuration using the LDAP client tools and commands expressed in the LDIF... format (yes, it's redundant, but colloquial).  The database will accept queries and changes from the system root user (UID=0,GID=0).  Since I'm a fan of doing things as a non-root user, you'll see most calls to LDAP client commands via sudo.

There's a special incantation to authenticate this way.  It has three parts and looks like this:


I'll show how this works in the next section.  For ldapsearch commands I'm also going to add -LLL.  This suppresses some formatting and comments that you probably want to see, but which is more verbose than is useful in a blog post.  You can safely leave it out of your queries if you want to see the complete output.

Loading the standard schema


An LDAP service is a database in one traditional sense.  Each of the data objects is defined in a schema which describes the attributes of the object.    The schema must be loaded into the configuration database before the objects they define can be used in the user database.

In the Fedora and Debian software packages, the standard schema are provided as LDIF files which can be loaded using the ldapadd command.  The  call is similar to the ldapsearch command above:

ldapadd -Q -Y -H ldapi:/// -f <filename>>

One Fedora systems, the stock schema files are located in /etc/openldap/schema.  Each one is offered in both the original LDAP schema form and in LDIF.  Most LDAP databases will use three standard schema to start:

  • core
  • cosine
  • inetorgperson

These three define the basic objects and attributes needed to describe a typical organization: people, groups, rooms etc. Loading these three would look like this.

ldapadd -Q -Y -H ldapi:/// -f /etc/openldap/schema/core.ldif
ldapadd -Q -Y -H ldapi:/// -f /etc/openldap/schema/cosine.ldif
ldapadd -Q -Y -H ldapi:/// -f /etc/openldap/schema/inetorgperson.ldif

Finding the database configuration object


As noted above, the LDAP service can contain multiple databases.  In fact, it must because on of the databases is the configuration database itself. Like all LDAP databases, the configuration database has a DN which defines the root of the database for queries.  The DN of the configuration database is cn=config. That is: Common Name = "config".

We can query and modify the OpenLDAP configuration using the ldapsearch, ldapadd and ldapmodify commands (or any other client mechanism which can use SASL external authentication). That is: we can configure LDAP using LDAP.

Now we won't want to store our data in the configuration database.  Each distribution includes a default database configuration object for a user database. Database configuration objects have the objectClass: olcDatabase. The user databases are indicated by the data storage back end (bdb|hdb). This means we can query for the list of databases and then find which one is the user database by looking at the DN.

sudo ldapsearch -Q -Y EXTERNAL -H ldapi:/// -LLL -b cn=config olcDatabase=\* dn
dn: olcDatabase={-1}frontend,cn=config

dn: olcDatabase={0}config,cn=config

dn: olcDatabase={1}monitor,cn=config

dn: olcDatabase={2}hdb,cn=config

Each LDAP search query has two parts.  The first is a filter which selects which records to report.  The second (optional) is a selector for which fields to report for each record.

The query above indicates to search within the base DN (-b) cn=config and search for all records with a key named 'olcDatabase' regardless of the value (olcDatabase=\*) and report the dn field.

The result shows that the LDAP service has four databases. The numbers {0} are essentially LDAP array indices.  The part after the index indicates the database back end.  We're only concerned with two of these right now.

We're working with the config database {0}config,cn=config. The database we want to configure is the hdb back end.  The DN for that is olcDatabase={2}hdb,cn=config. We'll base the rest of our search and change queries on that.  Now we can query the current database configuration object.

(In Debian systems you will likely not see the monitor database, and the index of the hdb database will be 1. Adjust accordingly)

sudo ldapsearch -Q -Y EXTERNAL -H ldapi:/// -LLL -b cn=config 'olcDatabase={2}hdb' 
dn: olcDatabase={2}hdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcHdbConfig
olcDatabase: {2}hdb
olcDbDirectory: /var/lib/ldap
olcRootDN: cn=Manager,dc=my-domain,dc=com
olcDbIndex: objectClass eq,pres
olcDbIndex: ou,cn,mail,surname,givenname eq,pres,sub
olcSuffix: dc=my-domain,dc=com


The olc prefix on the class and attribute names indicates that they are part of the OpenLDAP configuration schema.

The interesting values right now are the olcSuffix and olcRootDN attributes (as well as the absence of an olcRootPW). The default database on Fedora, seen here starts with a suffix of dc=my-domain,dc=com and the root user (aka RootDN) is cn=Manager,dc=my-domain,dc=com. These are perfectly valid but useless values. For a real database we want to define our own DB suffix and root user.

Set the Database Suffix


By loose convention the LDAP database suffix for corporate LDAP services is based on the DNS domain of the organization.  This also defines the top level object in the database which we will add later.

I'm going to replace one useless default convention with another because, well using real DNS names might mess people up if they cut-n-pasted stuff from this blog. I'm going to create a database for the mythical Example Company, Inc. Of course their domain name is example.com. Now I have to translate that into an LDAP dn:

dc=example,dc=com

A domain name is composed of a list of Domain Components.  See how that works?  So we want to replace the existing olcSuffix value with our new one. This will be the first change to the default database.

Changes made using ldapadd or ldapmodify are defined using LDIF in the same way that the output of ldapsearch is expressed in LDIF.  We have to craft a change query for the olcSuffix of olcDatabase={2}hdb,cn=config and replace the existing value with our new one. Here's what that looks like:

sudo ldapmodify -Q -Y EXTERNAL -H ldapi:/// <<EOF
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=com

EOF
modifying entry "olcDatabase={2}hdb,cn=config"

The ldapadd and ldapmodify commands expect a stream of LDIF on stdin unless an input file is indicated with the -f option. I provided the update stream as a shell HERE document indicated by the EOF markers.

If you run the ldapsearch query from the previous section you can verify that the olcSuffix value has been changed.

Set the Root DN


Now that we've set the suffix for our database we need to update the DN of the user who will be able to make changes (who is not the root user on the LDAP server host).

User names in LDAP are Distinguished Names of objects stored within the database, the same as any other record. We might want to keep the (common) name "Manager" but we need to place it within the proper hierarchy for our database. Since our database is now dc=example,dc=com then the manager really must be cn=Manager,dc=example,dc=com. We'll update that in the same way that we did the suffix.

sudo ldapmodify -Q -Y EXTERNAL -H ldapi:/// <<EOF
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootDN
olcRootDN: cn=Manager,dc=example,dc=com

EOF

modifying entry "olcDatabase={2}hdb,cn=config"

Set the root password


The final step of the stock LDAP service set up is to create a database user password which can be used to make queries and changes without requiring the system root user to do it. The attribute for this password is olcRootPW. (it goes with the olcRootDN set above).   If the RootPW is unset then the RootDN cannot log in.  When you add this attribute, you are opening up access to the database a bit, but securing the system by allowing the DB admin to work without needing system root access.

The OpenLDAP service can store passwords in clear text (BAD) or using one of several one-way hash algorithms. You can create a new password hash using the slappasswd command. The default hash is currently SHA1, which is better than all of the others but still could be improved.

slappasswd
New password: 
Re-enter new password: 
{SSHA}nottherealhashstringthiswontworkuseyourown

At the prompts, enter the password you want and confirm it. The last line is the hashed result. This will be placed as the value of the olcRootPW attribute. See the tricky thing I did to prevent you from cut-n-pasting that last bit and using a bad password?

sudo ldapmodify -Q -Y EXTERNAL -H ldapi:/// <<EOF
dn: olcDatabase={2}hdb,cn=config
changetype: modify
add: olcRootPW
olcRootPW: {SSHA}nottherealhashstringthiswontworkuseyourown

EOF
modifying entry "olcDatabase={2}hdb,cn=config"

Create the top object in the database


Since LDAP is a hierarchical database, each object must have a parent.  Because it can't be "Turtles All The Way Up", there must be one special object which has no parent, but which is the parent of all of the other objects in the database. That's the object who's DN is the value of the database configuration olcSuffix.

Most organizations use their domain name as the pattern for the top DN and use an LDAP "organization" object for that top object. An organization object is a container. It is meant to have children of arbitrary types. This allows for the creation of any desired structure for the database. Because the suffix is a domain name, The object must also be a Domain Component object. Domain Components are not top level or container objects. They must have a parent. By combining the organization and domain component classes we create a top level object that can have the name we want.

We're going to create a very minimal organization object at the top of the database to contain the DHCP server (machine) objects and the DHCP service (content) objects.

Organization objects have only one required attribute. the o value is a string which is the organization's name. It may also have a description attribute.

ldapadd -x -w secret -D cn=Manager,dc=example,dc=com  -H ldapi:/// <<EOF
dn: dc=example,dc=com
objectClass: domain
dc: example
description: The Example Company of America

EOF

Summary

At this point I have a running LDAP service with a minimal database. The configuration database contains the minimal schema needed for a typical LDAP service. A single user database has been defined. It contains only the top object named with the shortest DN possible in the database: dc=example,dc=com. An administrative user account has been defined and a password set for it.

The database is ready to be populated and used.

References


  • OpenLDAP - http://www.openldap.org/
  • Configuring slapd; http://www.openldap.org/doc/admin24/slapdconf2.html
  • Another Config Guilde: http://www.zytrax.com/books/ldap/ch6/slapd-config.html