Deploy a Kubernets Cluster (RKE2) with Warewulf
In High Performance Computing (HPC) we frequently encounter node counts
in compute clusters that are impractical to be managed manually.
Here, the saving grace is that the number of variations in installation
and configuration among nodes of a cluster is small. Also, the
number of parameters that are individual to each node is low.
Thus, in the 'cattle/pet' model, compute nodes would be treated like cattle.
Warewulf, a deployment system for HPC compute nodes, is specifically
designed for this case. It utilizes PXE to boot nodes and
provide their root filesystem. Nodes are ephemeral, i.e. their root
filesystem resides in a RAM disk. In a recent Blog
post,
Christian Goll described how to set up and manage a cluster using Warewulf.
Kubernetes (K8s) deployments potentially face similar challanges:
K8s clusters often consist of a large number of mostly identical agent
nodes with a minimal installation and very little individual configuration.
In this article we explore how to set up a K8s cluster with Rancher's
next-generation Kubernetes distribution RKE2 using Warewulf.
Considerations
K8s Server
In K8s we distinguish between a 'server' and 'agents'. While a 'server' may act as an agent as well, it is mainly to organize and control the cluster. In a sense it is comparable to a 'head node' in HPC. It is possible to deploy the server role using Warewulf - and we have done so for our experiments. However, at present Warewulf is capable of deploying ephemeral systems only while the server role may require to maintain some state. Therefore, it may be preferrable to set it up as a permanent installation and utilize Warewulf for agent depoyment only. We will still describe how to deploy a server using Warewulf.
Container Image Storage
Since our workloads are containerized, the container host requires only a very minimal installation. This installation - together with RKE2 - will not use up much of a node's memory when running out of a RAM disk. This is different for container images which are pulled from registries and stored locally. If these were stored on RAM disk, memory would quickly be exhausted. Fortunately, warewulf is able to set up mass storage devices - optionally every time a node is started. We will show how to set up storage for container images using Warewulf.
Basic Setup
This post will not cover how to perform the basic network setup required for the nodes to PXE-boot from the Warewulf deployment server or make nodes known to Warewulf. These topocs are all covered in Christian's Blog already.
Setup
Create Deployment Image
Warewulf utilizes container registries to obtain installation images. We start by importing a base container image from the openSUSE registry.
wwctl container import \
docker://registry.opensuse.org/science/warewulf/leap-15.6/containers/kernel:latest \
leap15.6-RKE2
General Image Preparation
Since this base image is generic, we need to install any missing packages required to install and start the RKE2 service. First, open up a shell inside the node image:
wwctl container shell leap15.6-RKE2
and run:
zypper -n in -y tar iptables awk
cd /root
curl -o rke2.sh -fsSL https://get.rke2.io
tar and awk are required by the RKE2 install script while iptables
is required by K8s to set up the container network.
Image Preparation for Container Image Storage
This step is optional, but it is advisable to set up a storage device to
hold container images. Container image storage is required on every node
that will act as an agent - including the server node.
First we need to prepare the deployment image. To do so, we log into the
image again to we create the image directory:
mkdir /var/lib/rancher
Then we install the packages required to perform the setup:
zypper -n in -y --no-recommends ignition gptfdisk
Prepare the Image for the K8s Server
Now, we are done with the common setup. We can exit the shell session
in the container. When doing so, we need to make sure, the container
image is rebuilt. We should see a message:
Rebuilding container.... If this is not the case, we need to rebuild the
image by hand:
wwctl container build leap15.6-RKE2
It's recommend to install the K8s Server permanently, therefore, if we would like to follow this recommendation, we can skip the reminder of this section.
Otherwise, we clone our image for the server:
wwctl container copy leap15.6-RKE2 leap15.6-RKE2-server
open a shell in the newly created server image:
wwctl container shell leap15.6-RKE2-server
install and enable rke2-server and adjust the environment for RKE2:
# Install the RKE2 tarball and prepare for server start
cd /root
INSTALL_RKE2_SKIP_RELOAD=true INSTALL_RKE2_VERSION="v1.31.1+rke2r1" sh rke2.sh
# Enable the service so it comes up later
systemctl enable rke2-server
# For container deployment we want `helm`
zypper -n in -y helm
# Set up environment so kubectl and crictl are found and will run
cat > /etc/profile.d/rke2.sh << EOF
export PATH=$PATH:/var/lib/rancher/rke2/bin
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
EOF
We are pinning the version to the one this has been tested
with. If we omit INSTALL_RKE2_VERSION=... we will get the
latest version.
Now, we exit the shell in the server container and again make sure the
image is rebuilt.
Prepare the Image for the K8s Agents
We need to finalize the agent image by downloading and installing
RKE2 and enabling the rke2-agent service. For this, we log
into the container
wwctl container shell leap15.6-RKE2
and run:
cd /root
INSTALL_RKE2_SKIP_RELOAD=true INSTALL_RKE2_TYPE="agent" \
INSTALL_RKE2_VERSION="v1.31.1+rke2r1" sh rke2.sh
systemctl enable rke2-agent
Here, we are pinning the RKE2 version to the version this has
been tested with. This has to match the version of the server
node. If the server node has not been deployed using Warewulf,
we need to make sure its version matches the version used here.
If we omit INSTALL_RKE2_VERSION=... we will get the
latest version.
When logging out, we make sure, the container image is rebuilt.
Set up a Configuration Template for RKE2
Since the K8s agents and servers need a shared secret - the connection
token - and secondary nodes need information about the primary server
to connect, we set up a warewulf configuration overlay template for
these.
We create a new overlay rke2-config on the Warewulf deployment
server by running:
wwctl overlay create rke2-config
create a configuration template
cat > /tmp/config.yaml.ww <<EOF
{{ if ne (index .Tags "server") "" -}}
server: https://{{ index .Tags "server" }}:9345
{{ end -}}
{{ if ne (index .Tags "clienttoken") "" -}}
token: {{ index .Tags "connectiontoken" }}
{{ end -}}
EOF
and import it into the overlay setting its owner and permission:
wwctl overlay import --parents rke2-config /tmp/config.yaml.ww /etc/rancher/rke2/config.yaml.ww
wwctl overlay chown rke2-config /etc/rancher/rke2/config.yaml.ww 0
wwctl overlay chmod rke2-config /etc/rancher/rke2/config.yaml.ww 0600
This template will create a server: entry pointing to the
communication endpoint (address & port) of the primary K8s server
and a token: which will hold the client token in case
these entries exist in the configuration of the node or
one of its profiles. (These templates use the Golang text/template
engine. Also, check the
upstream documentation
for the template file syntax.)
Set up Profiles
At this point, we create some profiles which we will use for setting up all node, i.e. the Agents and - if applicable - the Server. To simplify things, we assume the hardware for all the nodes is identical.
The 'Switch to tmpfs' Profile
Container runtimes require pivot_root() to work, which is not possible
as long as we are still running out of a rootfs. This is not only the case
for K8s but also for podman. Since the default init process in a Warewulf
deployment doesn't perform a switch_root, we need to change this.
To do so, we need to perform two things:
- Make sure that rootfs is not a tmpfs. This can be done by adding
rootfstype=ramfsto the kernel command line. - Let
initknow that we intend to switch to tmpfs. We do this by setting up a profile for container hosts:
wwctl profile add container-host
wwctl profile set --root=tmpfs -A "crashkernel=no net.ifnames=1 rootfstype=ramfs" container-host
(Here, crashkernel=no net.ifnames=1 are the default kernel arguments.)
Set up the Container Storage Profile
As stated above, this step is optional but recommended.
To set up storage on the nodes, the deployment images need to be
prepared as describe above in section 'Image Preparation for
Container Image Storage'.
For simplicity, we assume that all nodes will receive the identical
storage configuration. Therefore, we create a profile which we
will add to the nodes later. It, however, would be easy to set up
multiple profiles or override settings per node.
We create the profile container-storage and set up the disk, partition,
file system and mount point:
wwctl profile add container-storage
wwctl profile set --diskname <disk> --diskwipe[=false] \
--partname container_storage --partnumber 1 --partcreate=true \
--fsname container_storage --fsformat ext4 --fspath /var/lib/rancher
container-storage
Here, we need to replace <disk> by the physical storage device we want to
use. If the disks are not empty initially, we should set the option
--diskwipe=true. This will cause the disks to be wiped on every
consecutive boot, therefore, we may want to unset this later.
--partcreate makes sure, the partition is created
if it doesn't exist. Most other arguments should be self-explanatory.
If we need to set up the machines multiple times and want to make sure
the disks are wiped each time, we should not rely on the --diskwipe
option which in fact only wipes the partition table: if an identical
partion table is recreated, ignition will not notice and reuse the
partition from a previous setup.
Set up the Connection Token Profile
RKE2 allows to configure a connection token to both Servers and Agents.
If none is provided to the primary server it will be generated internally.
If we set up the server persistently, we need to create a file
/etc/rancher/rke2/config.yaml with the content:
token: <connection_token>
before we start this server for the first time, or if the server has been
started before already, we need to obtain the token from the file
/var/lib/rancher/rke2/server/node-token on this machine and use it
for the token variable below.
We now run:
wwctl profile add rke2-config-key
generate the token, add the rke2-config overlay to the profile and
set a tag containing the token that will later be used by the profile:
token="$(printf 'K'; \
for n in {1..20}; do printf %x $RANDOM; done; \
printf "::server:"; \
for n in {1..20}; do printf %x $RANDOM; done)"
wwctl profile set --tagadd="connectiontoken=${token}" \
-O rke2-config rke2-config-key
Set up the 'First Server' Profile
This profile is used to point the agents (and secondary servers) to the initial server:
wwctl profile add rke2-config-first-server
wwctl profile set --tagadd="server=${server}" -O rke2-config rke2-config-first-server
Start the Nodes
With these profiles in place, we are now able to set up and boot all machine roles.
Start and Test the first K8s Server
If we use Warewulf to also deploy the K8s server, we need to start it now and make sure it is running correctly before we proceed to start the nodes. Otherwise, we assume a server is running already which we can connect via ssh and proceed to the next section.
It's assumed that we have already performed a basic setup of the server
node (like make its MAC and designated IP address known to Warewulf).
First we add the configuration profiles to the server. This includes
the container-host and container-storage as well as the rke2-config-key
profiles. We also set the container image:
wwctl node set -P default,container-host,container-storage,rke2-config-key -C leap15.6-RKE2-server <server_node>
Finally, we build the overlays:
wwctl overlay build <server_node>
Now, we are ready to power on the server and wait until is has booted. Once
this is the case, We log into it via ssh. There we can observe the RKE2 server
service starting:
systemctl status rke2-server
The output will show containerd, kubelet and several instances of runc
(containerd-shim-runc-v2) running. When the initial containers have
completed starting, the output should contain the lines:
Oct 07 16:36:36 dell04 rke2[1299]: time="2024-10-07T16:36:36Z" level=info
msg="Labels and annotations have been set successfully on node: k8s-server"
Oct 07 16:36:42 dell04 rke2[1299]: time="2024-10-07T16:36:42Z" level=info msg="Adding node k8s-sesrver-d034de85 etcd status condition"
Oct 07 16:37:00 dell04 rke2[1299]: time="2024-10-07T16:37:00Z" level=info msg="Tunnel authorizer set Kubelet Port 0.0.0.0:10250"
We can watch the remaining services starting by running:
kubectl get pods -A
Once all services are up and running, the output should look like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cloud-controller-manager-k8s-server 1/1 Running 0 20m
kube-system etcd-k8s-server 1/1 Running 0 19m
kube-system helm-install-rke2-canal-lnvv2 0/1 Completed 0 20m
kube-system helm-install-rke2-coredns-rjd54 0/1 Completed 0 20m
kube-system helm-install-rke2-ingress-nginx-97rh7 0/1 Completed 0 20m
kube-system helm-install-rke2-metrics-server-8z878 0/1 Completed 0 20m
kube-system helm-install-rke2-snapshot-controller-crd-mt2ds 0/1 Completed 0 20m
kube-system helm-install-rke2-snapshot-controller-l5bbp 0/1 Completed 0 20m
kube-system helm-install-rke2-snapshot-validation-webhook-glkgm 0/1 Completed 0 20m
kube-system kube-apiserver-k8s-server 1/1 Running 0 20m
kube-system kube-controller-manager-k8s-server 1/1 Running 0 20m
kube-system kube-proxy-k8s-server 1/1 Running 0 20m
kube-system kube-scheduler-k8s-server 1/1 Running 0 20m
kube-system rke2-canal-xfq6l 2/2 Running 0 20m
kube-system rke2-coredns-rke2-coredns-6bb85f9dd8-fj4r4 1/1 Running 0 20m
kube-system rke2-coredns-rke2-coredns-autoscaler-7b9c797d64-rxkmm 1/1 Running 0 20m
kube-system rke2-ingress-nginx-controller-nmlhg 1/1 Running 0 19m
kube-system rke2-metrics-server-868fc8795f-gz6pz 1/1 Running 0 19m
kube-system rke2-snapshot-controller-7dcf5d5b46-8lp8w 1/1 Running 0 19m
kube-system rke2-snapshot-validation-webhook-bf7bbd6fc-p6mf9 1/1 Running 0 19m
This server is now ready to accept agents (and secondary servers). If we
require additional servers for redundancy, their setup is identical, however,
we will need to add the rke2-config-first-server profile when setting up
the node above.
Start and verify the Agent
Now, we are ready to bring up the agents. First, we set up the nodes by
adding the profiles container-host, container-storage,
rke2-config-key and rke2-config-first-server to all the client nodes:
agents=<agent_nodes>
wwctl node set -P default,container-host,rke2-agent,container-storage $agents
as well as the container image for the agent:
wwctl node set -C leap15.6-RKE2 <agent_nodes>
and rebuild the overlays for all agent nodes:
wwctl overlay build $agents
We replace <agent_nodes> by the appropriate node names.
This can be a comma-seperated list, but also a range of nodes
specified in squuare brackets - for example k8s-agent[00-15] would
refer to k8s-agent00 to k8s-agent15 - or lists and ranges combined.
At this point, we are able to boot the first agent node. Once the first node
is up, we may log in using ssh and check the status of the rke2-agent
service:
systemctl status rke2-agent
The output should contain lines like:
Oct 07 19:23:59 k8s-agent01 rke2[1301]: time="2024-10-07T19:23:59Z" level=info msg="rke2 agent is up and running"
Oct 07 19:23:59 k8s-agent01 systemd[1]: Started Rancher Kubernetes Engine v2 (agent).
Oct 07 19:24:25 k8s-agent01 rke2[1301]: time="2024-10-07T19:24:25Z" level=info
msg="Tunnel authorizer set Kubelet Port 0.0.0.0:10250"
This should be all we check on the agent. Any further verifications will be done from the server. We log into the server and run:
kubectl get nodes
This should produce an output like:
kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
k8s-server Ready control-plane,etcd,master 168m v1.30.4+rke2r1
k8s-agent01 Ready <none> 69s v1.30.4+rke2r1
We see that the first agent node is available in the cluster. Now, we can spin up more nodes and repeat the last step to verify they appear.
Conclusions
We've shown that it is possible to deploy a functional K8s cluster with
RKE2 using Warewulf. We could for example proceed deploying the NVIDIA GPU
operator with a driver container on this cluster as described in a previous
Blog
and set up a K8s cluster for AI workloads. Most of the steps were straight
forward and could be derived from the Warewulf User
Guide. The only non-obvious
step to take were the ones required to set up the rootfs in a way that it
is ensured the container runtime is able to call pivot_root.
Presenting GRUB2 BLS
GRUB2 with BLS is now in MicroOS and Tumbleweed
Recently the openSUSE project released for MicroOS and Tumbleweed a new version of the GRUB2 package, with a new subpackage grub2-$ARCH-efi-bls. This subpackage deliver a new EFI file, grubbls.efi, that can be used as replacement of the traditional grub.efi.
The new PE binary is a version of GRUB2 that includes a set of patches from Fedora, which makes the bootloader follow the Boot Loader Specification (BLS). This will make GRUB2 understand the boot entries from /boot/efi/loader/entries, and dynamically generate the boot menu showed during boot time.
This is really important for full disk encryption (FDE) because this means that now we can re-use all the architecture and tools designed for systemd-boot. For example, installing or updating the bootloader can now be done with sdbootutil install, the suse-module-tools scriptlets will create new BLS entries when a new kernel is installed, and the tukit and snapper plugins will take care of doing the right thing when snapshots are created or removed.
Reusing all those tools without modification was a significant win, but even better, many of the quirks that classical GRUB2 had when extending the event log are no longer present. Before this package, sdbootutil needed to take ownership of the grub.conf file, as this will be measured by GRUB2 by executed lines. That is right! For each line that is read and executed by the GRUB2 parser, a new PCR#8 will take place, and because GRUB2 support conditional as other complex constructors, it is very hard to predict the final value of PCR#8 without imposing a very minimal and strict grub.conf.
However, with the new BLS subpackage, this file, along with the fonts and graphical assets for the theme, and the necessary modules (such as bli.mod), are now included in the internal squashfs within the EFI binary. GRUB2 will no longer measure those internal files without compromising security guarantees because now it is the firmware that measures the entire EFI when the bootloader is executed during the boot process.
As today, we cannot use YaST2 to install GRUB2 with BLS, but we can do that manually very easily. We need to make a systemd-boot installation, replace LOADER_TYPE from systemd-boot to grub2-bls in /etc/sysconfig/bootloader, install the new GRUB2 BLS package, and do sdbootutil install. Another option is to play with one of the available images for MicroOS or Tumbleweed.
Have a lot of fun!
Budapest Audio Expo 2024
This weekend I visited the first Audio Expo in Budapest. It was the first music event I truly enjoyed in years. Even if corridors and rooms were packed, there was enough fresh air. What sets this event apart from other events is the focus on listening to music on the vendors’ products rather than just the speeds and feeds on why you should buy their products. While, of course, the expected outcome is the same, with the emphasis on listening to live systems, I found the event much more comfortable to walk around.
Key takeaway
Do not judge quickly! Go back to a place multiple times! If you are lucky, there will be less people in the room, and you can sit at a better spot. You can also listen to a different music, or listen to the same speakers with a different amplifier. Actually, both of these happened to me this weekend, and brought drastic changes to the experience.
Best of Audio Expo
Everyone is asking me what I liked the most. I am not an engineer when it comes to listening to music. I just listen to my ears and do not care much about the technical details. At home I listen to a pair of Heed Enigma 5 speakers, which are omnidirectional. At the expo the best listening experience was another omnidirectional speaker: the MBL speakers at Core Audio. This was also probably the most expensive setup at the expo.
According to my ears the best value award should go to NCS Audio Reference One PREMIUM. I visited all rooms on all floors and listened to many different speakers along the way. Some were close to or matching the sound quality of the NCS Audio speakers, however for a lot higher price. I only felt with the MBL speakers that they sounded better, however from the price difference you can buy a luxury car :-)
Exhibitors
I had various programs in the neighborhood, so instead of a long block at the Audio Expo, I spent three times a few hours there. Some places I visited multiple times, just to ensure that my first judgment was not too quick. Let me share here my experiences with some of the exhibitors, in alphabetical order.
Allegro Audio
As usual, the system exhibited at Audio Expo sounded really nice. Allegro Audio not only distributes some quality components, but also has its own amplifier: Flow. I really love listening to their Franco Serblin Accordo monitor speakers, but Ktema was not bad either :-)
Core Audio
Probably the most expensive setup of the expo was exhibited by Core Audio. However, the first time I visited them, they played some terrible (at least to me) music. With that music, the whole setup sounded like a pair of $100 computer speakers. So, I started to wonder what is all the hype about MBL speakers… Fortunately, I returned to the show the next day and with a different selection of music, the system really shined, and became the best sounding system of the whole Expo. However, price is prohibitively expensive for most people…
Heed
I listen to various Heed components at home: DAC, amp and speakers. So, I was very happy to see the founder talking about the latest Heed products, and also having the opportunity to listen to them. I love Heed speakers, especially the omnidirectional variants, however for the demo they used GoldenEar speakers with the Heed amplifiers at the Expo. Not bad at all, but different.

Heed
NCS Audio
I already listened to Reference One a few times, and I was amazed. Rock, classical, jazz and others, all sounded perfectly on these speakers, no matter the room size. This time Reference One Premium was on stage, using cables from Bonsai Audio. This pair sounded even better than speakers costing many times more.

NCS Audio
Popori Acoustics
I have been reading about Popori Acoustics for years. Finally I had a chance to listen to these electrostatic speakers made in Hungary for the first time. And I must admit that my first listening experience was not that good. Hearing a woman singing was fantastic. However, even if the sound of bass guitar was very detailed, it still sounded a kind of meh. Luckily I went back on the second day of the expo again. The amplifier was replaced, and suddenly not just human voice, but everything sounded perfectly.

Popori Acoustics
Closing words
Of course there were many more exhibitors. In some cases I loved the sound I heard, but did not have enough time to go back, ask questions, take photos. Some examples are 72audio and Sabo Audio. And there were many more, where the sound was not bad, but did not impress me too much either.
I really hope that next year we will have a similarly good Audio Expo in Budapest!
Upgrading the Atari VCS with openSUSE Tumbleweed
Development start of Leap 16.0
Hello everyone!
I’d like to announce the start of development and the public availability of what we currently refer to as Leap 16.0 pre-Alpha. Since this is a pre-Alpha version, significant changes may occur, and the final product may look very different in the Alpha, Beta, Release Candidate, or General Availability stages. The installer will currently offer you Base, GNOME, and KDE.
Users can get our new Agama install images from get.opensuse.org/leap/16.0. The installer will currently offer you Base, GNOME, and KDE installation.
Leap 16.0 is a traditional distribution and a successor to Leap 15.6 with expected General Availability arriving in the Fall of 2025.
We intend to provide users with sufficient overlap so that 15.6 users can have a smooth migration, just like they’re used to from previous releases.
Further details are available on our roadmap. The roadmap is subject to change since we have to respond to any SUSE Linux Enterprise Server 16 schedule changes.
Users can expect a traditional distribution in a brand new form based on binaries from the latest SLES 16 and community packages from our Factory development codebase.
There is no plan to make a Leap 15.7, however, we still need to deliver previously released community packages from Leap 15 via Package HUB for the upcoming SLES 15 SP7. This is why there are openSUSE:Backports:SLE-15-SP7 project and 15.7 repos in OBS.
Who should get it?
This is a pre-alpha product that is not intended to be installed as your daily driver. I highly recommend starting with the installation in a virtual machine and becoming familiar with the online installer Agama.
The target audience for pre-Alpha are early adopters and contributors who would like to actively be part of this large effort. Adopters should consider booting Agama Media from time to time just to check compatibility with their hardware.
For non-contributor users, I highly recommend waiting until we have a Beta, which is expected in the late Spring of 2025.
How to report bugs?
I’d like to kindly ask you to check our Known bugs wikipage before reporting a new issue. If you find a new issue that is likely to affect users, please feel free to add it to the page.
Specifically for Agama I highly recommend using github.com/agama-project and collaborating with the YaST team on suggestions and incorporating any changes.
For the rest of the components, the workflow isn’t changing; just select version 16.0 for bug submissions.
Feature requests
All changes to packages inherited from SLES 16 need to be requested via a feature request.
Feature requests will be reviewed every Monday at a feature review meeting where we’ll convert code-o-o requests into JIRA requests used by SUSE Engineering where applicable.
The factory-auto bot will reject all code submit requests against SLES packages with a pointer to code-o-o.
You can get a list of all SLFO/SLES packages simply by running osc ls SUSE:SLFO:1.1:Build.
Just for clarification SLFO, SUSE Linux Framework One, is the source pool for SLES 16 and SL Micro 6.X. SLFO was previously known as Adaptable Linux Platform (ALP).
I highly recommend using code-o-o to co-ordinate larger community efforts such as Xfce enablement, where will likely need to update some of SLES dependencies. This allows us to share the larger story and better reasoning for related SLES update requests. The list of features is also extremely valuable for the Release article.
Where to submit packages, how is it built, and where is it tested?
Leap 16.0 is built in openSUSE:Leap:16.0 project where we will happily welcome any community submissions until the Beta code submission deadline in the late Spring of 2025. We intend to keep the previous development model and avoid forking SLES packages unless necessary. We no longer can mirror SLES code submissions from OBS into IBS. So all SLES 16 update requests have to be requested via feature requests.
For quality control, we have basic test suites based on Agama installations in Leap 16.0 job group. Later, we plan to rework the existing Leap 16.0 Images job group for testing the remaining appliance images.
The project where we maintain community packages is subject to change as we have not fully finalized yet how to make Package HUB; we may use a similar structure with Backports as in 15.3+).
Further test suite enablement is one of the areas where we currently need the most help. Related progress.opensuse.org trackers poo#164141 Leap 16.0 enablement and poo#166562 upgrade from 15.6.
Another area where you can help is new package submissions and related maintainer review of package submissions to Leap 16.0. These reviews make sense as we’d like to check with maintainers whether that software in a given version makes sense for inclusion into Leap 16.0, rather than blindly copying all packages over.
Involvement in branding and marketing efforts
I’m very proud to announce fresh branding efforts and want to thank all the people who helped give Leap and Tumbleweed a new look. We plan to publish an article or a video about the changes, and further plans as we still have a surprise or two in our pocket.
Do you want to help us on this front? Spread the news and feel free to join the openSUSE Marketing Team in our Telegram channel.
Many thanks to all who helped us to reach this point.
Lubos Kocman
on behalf of the openSUSE Release team
SteamDeck Internal Screen Undetected
Fight Flash Fraud | F3 Solid State Media Checker
Tumbleweed – Review of the week 2024/40
Dear Tumbleweed users and hackers,
We released six snapshots during 2024/40 (0926, 0927, 0929, 0930, 1001, and 1002). Based on personal feelings, the week seemed ‘mixed’ – Requests came in, and requests went out. And a few things seem to hang there for longer again.
Let’s first look at what you have received during the last week, starting on the positive side of things:
- Bash 5.2.37
- cURL 8.10.1
- fwupd 1.9.25
- GStreamer 1.24.8
- GTK 4.16.2
- Linux kernel 6.11.0
- openSSH 9.9p1
- systemd 256.6
- TCL 8.6.15
- PostgreSQL 17.0 (final release)
- LibreOffice 24.8.2.1
- PHP 8.3.12
- Audit 4.0
- timezone 2024b
- Virtualbox 7.1.0
- Cups 2.4.11
- AppArmor 4.0.3
- grub2: introduces a new package, grub2-x86_64-efi-bls, which includes a straightforward grubbls.efi file
On the staging projects, we have some larger changes being worked on by multiple people. Some of the more interesting changes to come are:
- Libproxy 0.5.9
- KDE Plasma 6.2
- GNOME 47: webkit2gtk3 breaks python-wxPython on i586; help appreciated
- Busybox 1.37.0
- XWayland 24.1.3
- LLVM 19
- Mesa 24.2.x: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11840
- Change of the default LSM (opted in at installation) to SELinux. AppArmor is still an option, just not the default. This change only impacts new installations.
oath-toolkit: privilege escalation in pam_oath.so (CVE-2024-47191)
Table of Contents
- 1) Introduction
- 2) Vulnerability Details
- 3) Embargo Process and Upstream Communication
- 4) SUSE Bugfix
- 5) Upstream Bugfix
- 6) Timeline
- 7) References
- 8) Change History
1) Introduction
oath-toolkit contains libraries and utilities for managing one-time password (OTP) authentication e.g. as a second factor to password authentication. Fellow SUSE engineer Fabian Vogt approached our Security Team about the project’s PAM module. A couple of years ago, the module gained a feature which allows to place the OTP state file (called usersfile) in the home directory of the to-be-authenticated user. Fabian noticed that the PAM module performs unsafe file operations in users’ home directories. Since PAM stacks typically run as root, this can easily cause security issues.
The feature in question has been introduced in oath-toolkit version 2.6.7 (via commit 60d9902b5c). The following report is based on the most recent oath-toolkit release tag for version 2.6.11.
2) Vulnerability Details
The PAM module is typically configured using a PAM stack configuration line like this:
auth [user_unknown=ignore success=ok] pam_oath.so usersfile=${HOME}/user.oath window=20
The expansion logic of the path components ${HOME} or ${USER} is part of the
problematic feature that introduced the security issue.
The PAM module invokes a liboath library function called
oath_authenticate_usersfile() found in liboath/usersfile.c, which manages
all accesses to the usersfile. Privileges are not dropped, and the function is
not aware of the special privileged PAM context. All file accesses in the
function are naive and follow symlinks. The relevant file operations that are
carried out on successful OTP entry are as follows:
- opening of the usersfile via
fopen()for reading (usersfile.c:470). - opening of a lockfile parallel to the usersfile using a filename suffix “.lock” via
fopen()for writing (usersfile.c:332) - locking of the lockfile using POSIX advisory locks via
fcntl()(usersfile.c:350) - creation of a new usersfile parallel to the old usersfile using a filename suffix “.new” via
fopen()(usersfile.c:372) - changing ownership of the new usersfile to the to-be-authenticated user via
fchown()(usersfile.c:394) - renaming of the new usersfile to the old usersfile via
rename()(usersfile.c:411) - unlinking of the previously created lockfile (usersfile.c:423)
If this happens in a PAM stack running as root and the usersfile is located in an unprivileged user’s home directory, then a simple root exploit is possible by placing a symlink like this:
user$ ln -s /etc/shadow $HOME/user.oath.new
This will cause /etc/shadow to be overwritten and its ownership will be
changed to the to-be-authenticated user. The to-be-authenticated user can
obtain full root privileges. No race condition needs to be won and no
pathnames have to be guessed.
3) Embargo Process and Upstream Communication
Fabian Vogt first approached the main upstream author by email. Since we did not
get a reaction for several days, we created a private Gitlab
issue in the upstream project, offering coordinated
disclosure. There was no reaction, thus we decided to handle the embargo and
bugfix ourselves, since we needed a fixed pam_oath module for our products. We
developed a comprehensive patch, described in Section 4) below.
We requested a CVE from Mitre for this issue and they assigned CVE-2024-47191.
As we were preparing to go public, the upstream author got pinged via private channels and reacted to our report, preparing an upstream bugfix release addressing the issue, described in Section 5) below.
Due to time constraints, we have decided to apply our SUSE bugfix to our products for the time being, until we can evaluate the upstream solution in more depth.
4) SUSE Bugfix
We developed a patch within SUSE to address the issue (note that there is an improved version of the patch available by now). The situation for the bugfix is more complex than it might look at first, because many things are unclear or broken in the current source code:
- the PAM module cannot know for sure if the target usersfile is supposed to
be owned by root or by the to-be-authenticated user, or even some unrelated
user. The presence of a
${HOME}path element makes it likely that the to-be-authenticated user is supposed to own the file. The presence of a${USER}element is not that clear, however. - the locking mechanism used in the current source code is broken:
- the usersfile is initially opened for reading and parsed without owning the lock (usersfile.c:470). A parallel task can be about to replace this file with a new version, thus a lost update can occur.
- the lock file is unlinked again after the usersfile has been updated (usersfile.c:423). This breaks when another task is waiting on the now-unlinked lockfile, while a third task arrives, sees no lockfile and creates a new one.
- the lockfile is placed in the user’s home directory, possibly cluttering it. Cases like the home directory being a network file system (NFS, CIFS) would need to be considered. The unprivileged user might also prevent the privileged PAM stack from obtaining the lock, causing a local denial-of-service.
We decided to develop a patch that takes as many use cases as possible into
account, securing all operations while maintaining backwards compatibility.
With the patch, the usersfile path is safely traversed using the *at family
of system calls. Privileges will be dropped to the owner of the usersfile as an
additional security measure. The locking mechanism has been fixed to cover all
accesses to the usersfile. Instead of creating a separate lockfile, the
usersfile itself is used for locking, which avoids cluttering the home
directory. Additional sanity checks are added e.g. world-writable directory
components are denied. The patch employs Linux specific features (e.g. linking
files from /proc/self/fd), thus it no longer works for non-Linux systems. The
patch description and code comments contain more hints about the individual
decisions taken in this patch.
Improved version of the Patch after Discussions in the Community
After detailed discussions on the oss-security mailing list a few shortcomings of the original SUSE patch have been identified:
- the patch lacks logic to drop supplemental group membership, which typically results in the process retaining root group membership. Since the privilege drop in the patch only serves as a hardening this is not critical.
- the patch does not deal with potential hard link attacks. Hard link attacks are difficult to protect from, which is why on SUSE distributions the kernel sysctl “sys.fs.protected_hardlinks” is active by default. This way the kernel prevents dangerous uses of hardlinks.
For completeness we offer an improved patch that addresses these two aspects. The approach taken in the patch - to accept any ownership of the target file - makes it impossible to fully protect against all hardlink attack scenarios, though. This is outlined by Solar Designer in the thread on the oss-security mailing list.
5) Upstream Bugfix
Upstream developed an alternative solution, designed to be more portable and cross-platform. This does not take into account all aspects that we considered in Section 4), but should be sufficient to fix the specific security issue described in this report.
This fix has been released in version 2.6.12 of oath-toolkit. Upstream has also published an associated Security Advisory.
6) Timeline
| 2024-08-08 | Fabian Vogt of SUSE sent an email to the main upstream author, describing the issue. The SUSE Security Team was involved as well. |
| 2024-08-20 | After not receiving any reply by email, we created a private GitLab issue describing the vulnerability and offering coordinated disclosure according to our disclosure policy. |
| 2024-08-28 | SUSE started developing an internal patch for the issue. |
| 2024-09-19 | Our internal patch was getting ready for publication. We added a comment in the private GitLab issue, granting two final weeks of embargo time before we will publish the vulnerability and the patch. We also shared the current patch in the issue. |
| 2024-09-19 | We requested a CVE for the issue from Mitre. |
| 2024-09-20 | Mitre assigned CVE-2024-47191. |
| 2024-09-29 | After being pinged via private channels, the main upstream author reacted to our communication and started preparing a bugfix release. |
| 2024-10-04 | Upstream published release 2.6.12 containing the bugfix. |
7) References
- oath-toolkit repository
- oath-toolkit vulnerable usersfile commit
- oath-toolkit upstream private issue
- SUSE improved bugfix
- SUSE initial bugfix (old version)
- oath-toolkit Security Advisory for CVE-2024-47191
8) Change History
| 2024-10-21 | Added a section about shortcomings in the original SUSE patch and a link to the improved patch. |
FreeBSD audit source for syslog-ng
Two weeks ago, I was at EuroBSDcon and received a feature request for syslog-ng. The user wanted to collect FreeBSD audit logs together with other logs using syslog-ng. Writing a native driver in C is time consuming. However, creating an integration based on the program() source of syslog-ng is not that difficult.
This blog shows you the current state of the FreeBSD audit source, how it works, and its limitations. It is also a request for feedback. Please share your experiences at https://github.com/syslog-ng/syslog-ng/discussions/5150!
Read more at https://www.syslog-ng.com/community/b/blog/posts/freebsd-audit-source-for-syslog-ng
