Wed, Sep 25th, 2024

Huge improvements for syslog-ng in MacPorts

Last week I wrote about a campaign that we started to resolve issues on GitHub. Some of the fixes are coming from our enthusiastic community. Thanks to this, there is a new syslog-ng-devel port in MacPorts, where you can enable almost all syslog-ng features even for older MacOS versions and PowerPC hardware. Some of the freshly enabled modules include support for Kafka, GeoIP or OpenTelemetry. From this blog entry, you can learn how to install a legacy or an up-to-date syslog-ng version from MacPorts.

Read the rest of my blog at https://www.syslog-ng.com/community/b/blog/posts/huge-improvements-for-syslog-ng-in-macports

syslog-ng logo

Syslog Ng Huge Improvements in Macports

Last week I wrote about a campaign that we started to resolve issues on GitHub. Some of the fixes are coming from our enthusiastic community. Thanks to this, there is a new syslog-ng-devel port in MacPorts, where you can enable almost all syslog-ng features even for older MacOS versions and PowerPC hardware. Some of the freshly enabled modules include support for Kafka, GeoIP or OpenTelemetry. From this blog entry, you can learn how to install a legacy or an up-to-date syslog-ng version from MacPorts.

Read the rest of my blog at https://www.syslog-ng.com/community/b/blog/posts/huge-improvements-for-syslog-ng-in-macports

syslog-ng logo

Improving Labels to Foster Collaboration

Not long ago, we introduced several new features in OBS designed to foster collaboration among users. Today, we’re excited to announce a series of improvements to the newly introduced labels feature, which will help you better work with your projects and packages. These updates are part of the Foster Collaboration and Labels beta programs. You can find more information about the beta program here. Our efforts to foster collaboration started in August 2024, when we...

Tue, Sep 24th, 2024

20 Years of Linux | Blathering

The author reflects on two decades of using Linux, starting with Mandrake Linux in 2003 and evolving through various machines, including laptops from Dell and HP. The journey highlights personal growth, nostalgia, and ongoing challenges in the Linux ecosystem, particularly regarding software support, user accessibility, and community dynamics.

Installing the NVIDIA GPU Operator on Kubernetes on openSUSE Leap

This article shows how to install and deploy Kubernetes (K8s) using RKE2 by SUSE Rancher on openSUSE Leap 15.6 with the NVIDIA GPU Operator. This operator deploys and loads any driver stack components required by CUDA on K8s Cluster nodes without touching the container host and makes sure, the correct driver stack is made available to driver containers. We use a driver container specifically build for openSUSE Leap 15.6 and SLE 15 SP6. GPU acceleration with CUDA is used in many AI applications. AI application workflows are frequently depoyed through K8s.

Introduction

NVIDIA's Compute Unified Device Architecture (CUDA) plays a crucial role in AI today. Only with the enormous compute power of state-of-the-art GPUs it is possible to process training and inferencing with an acceptable amount of resources and compute time.

Most AI workflows rely on containerized workloads deployed and managed by Kubernetes (K8s). To deploy the entire compute stack - including kernel modules - to a K8s cluster, NVIDIA has designed its GPU Operator, which, together with a set of containers, is able to perform this task without ever touching the container hosts.

Most of the components used by the GPU Operator are 'distribution agnostic' however, one container needs to be built specifically for the target distribution: the driver container. This is owed to the fact that drivers are loaded into the kernel space and therefore need to be built specifically for that kernel.

For a long time, NVIDIA kernel drivers were proprietary and closed source. More recently, NVIDIA has published a kernel driver that's entirely open source. This enables Linux distributions to publish pre-built drivers for their products. This allows for a much quicker installation. Also, prebuilt drivers are signed with the key thats used for the distribution kernel. This way, the driver will work seamlessly in systems with secure boot enabled. The container utilized below makes use of a pre-built driver.

In the next section we will explore how to deploy K8s on openSUSE Leap 15.6 once this is done, we will deploy the NVIDA GPU Operator in the following section run some initial tests. If you have K8s already running you may want to skip ahead to the 2nd part.

Install RKE2 on openSUSE Leap 15.6

We have chosen RKE2 from SUSE Rancher for K8s over the K8s packages shipped with openSUSE Leap: RKE2 is a well curated and maintained Kubernetes distribution which works right out of the box while openSUSE's K8s packages have been broken pretty much ever since openSUSE Kubic has been dropped.

RKE2 does not come as an RPM package. This seems strange at first, however, it is owed to the fact that Rancher wants to ensure maximal portability across various Linux distributions.

Instead, it comes as a tar-ball - which is not unusual for application layer software.

Most of what's described in this document has been taken from a great article by Alex Arnoldy on how to deploy NVIDIA's GPU Operator on RKE2 and SLE BCI. Unfortunately, it was no longer fully up-to-date and thus has been taken down.

Install the K8s server

Kubernetes consists of at least one server which serves as a control node for the entire cluster. Additionally clusters may have any number of agents - i.e. machines which workloads will be spread across. Servers will act as an agent as well. If your K8s cluster consists just of one machine, you will be done once your server is installed. You may skip the following section. For system requirements you may want to check here. We assume, you have a Leap 15.6 system installed already (minimal installation is sufficient and even preferred).

  1. Make sure, you have all components installed already which are either required for installation or runtime:
    zypper -n install -y curl tar gawk iptables helm
    
    For the installation, a convenient installation script exists. This downloads the required components, performs a checksum verification and installs them. The installation is minimal. When RKE2 is started for the first time, it will install itself to /var/lib/rancher and /etc/rancher. Download the installation script:
    # cd /root
    # curl -o rke2.sh -fsSL https://get.rke2.io
    
  2. and run it:
    sh rke2.sh
    
  3. To make sure, that the binaries provided by RKE2 - most importantly, kubectl - are found and will find their config files, you may want to create a separate shell profile:
    #  cat > /etc/profile.d/rke2.sh << EOF
    export PATH=$PATH:/var/lib/rancher/rke2/bin
    export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
    export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
    EOF
    
  4. Now enable and start the rke2-server service:
    systemctl enable --now rke2-server
    
    With this, the installation is completed.
  5. To check is all pods have come up properly and are running of have completed successfully, run:
    # kubectl get nodes -n kube-system
    

Install Agents

If you are running a single node cluster, you are done now and may skip this chapter. Otherwise, you will need to perform the steps below for every node you want to install as an agent.

  1. As above, make sure, all required prerequisites are installed:
    # zypper -n install -y curl tar gawk iptables
    
  2. Download the installation script
    # cd /root
    # curl -o rke2.sh -fsSL https://get.rke2.io
    
  3. and run it:
    # INSTALL_RKE2_TYPE="agent" sh rke2.sh
    
  4. Obtain the token from the server node, it can be found on the server at /var/lib/rancher/rke2/server/node-token. and add it to config file for the RK2 agent service:
    # mkdir -p /etc/rancher/rke2/
    # cat > /etc/rancher/rke2/config.yaml
    server: https://<server>:9345
    token <obtained token>
    
    (You have to replace by the name of IP of the RKE2 server host and by the agent token mentioned above.
  5. Now you are able to start the agent:
    kubectl enable --now rke2-agent
    
  6. After a while you should see that the node is has been picked up by the server. Run:
    kubectl get nodes
    
    in the server machine. The output should look something like this:
    NAME     STATUS   ROLES                       AGE    VERSION
    node01   Ready    control-plane,etcd,master   12m   v1.30.4+rke2r1
    node02   Ready    <none>                      5m    v1.30.4+rke2r1
    

Deploying the GPU Operator

Now, with the K8s cluster (hopefully) running, you'd be ready to deploy the GPU operator. The following steps need to be performed on the server node only, regardless if this has a GPU installed or not. The correct driver will be installed on any node that has a GPU installed.

  1. To simply configuration, create a file /root/build-variables.sh on the server node:
    # cat > /root/build-variables.sh <<EOF
    export LEAP_MAJ="15"
    export LEAP_MIN="6"
    export DRIVER_VERSION="555.42.06"
    export OPERATOR_VERSION="v24.6.1"
    export DRIVER_IMAGE=nvidia-driver-container
    export REGISTRY="registry.opensuse.org/network/cluster/containers/containers-${LEAP_MAJ}.${LEAP_MIN}"
    EOF
    
  2. and source this file from the shell you run the following commands from:
    # source /root/build-variables.sh
    
    Note that in the script above we are using kernel driver version 555.42.06 for CUDA 12.5 instead of CUDA 12.6 as in 12.6 NVIDIA has introduced some dependency issues which have not been resolved fully, yet. This will limit CUDA used in the payload to 12.5 or older since a kernel driver version will only work for CUDA versions older or equal to the version it was provided with. This will be fixed in future versions so that later driver of GPU operator versions can be used. Also note, that $REGISTRY points to a driver container in https://build.opensuse.org/package/show/network:cluster:containers/nv-driver-container This is a driver container specifically built for Leap 15.6 and SLE 15 SP6. The nvidia-driver-ctr container will look for a container image ${REGISTRY}/${DRIVER_IMAGE} tagged: ${DRIVER_VERSION}-${ID}${VERSION_ID}. ${ID} and ${VERSION_ID} are taken from /etc/os-release on the container host. Currently, the container above is tagged for Leap 15.6 and SLE 15 SP6.
  3. Add the NVIDIA Helm repository:
    # helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
    
  4. and update it:
    # helm repo update
    
  5. Now deploy the operator using the nvidia/gpu-operator Helm chart:
    # helm install -n gpu-operator \
      --generate-name   --wait \
      --create-namespace \
      --version=${OPERATOR_VERSION} \
      nvidia/gpu-operator \
      --set driver.repository=${REGISTRY} \
      --set driver.image=${DRIVER_IMAGE} \
      --set driver.version=${DRIVER_VERSION} \
      --set operator.defaultRuntime=containerd \
      --set toolkit.env[0].name=CONTAINERD_CONFIG \
      --set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml \
      --set toolkit.env[1].name=CONTAINERD_SOCKET  \
      --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
      --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
      --set toolkit.env[2].value=nvidia \
      --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
      --set-string toolkit.env[3].value=true
    
    After a while, the command will return.
  6. Now, you can view the additional pods that have started in the gpu-operator namespace:
    kubectl get pods --namespace gpu-operator
    
  7. To verify that everything has been deployed correctly, run:
    # kubectl logs -n gpu-operator -l app=nvidia-operator-validator
    
    This should return a result like:
    Defaulted container "nvidia-operator-validator" out of: nvidia-operator-validator, driver-validation (init), toolkit-validation (init), cuda-validation (init), plugin-validation (init)
    all validations are successful
    
    Also, run:
     # kubectl logs -n gpu-operator -l app=nvidia-cuda-validator
    
    which should result in:
    Defaulted container "nvidia-cuda-validator" out of: nvidia-cuda-validator, cuda-validation (init)
    cuda workload validation is successful
    
    To obtain information on the NVIDIA hardware installed on each node, run:
    # kubectl exec -it "$(for EACH in \
      $(kubectl get pods -n gpu-operator \
      -l app=nvidia-driver-daemonset \
      -o jsonpath={.items..metadata.name}); \
      do echo ${EACH}; done)" -n gpu-operator -- nvidia-smi
    

One should note, that most arguments to helm install ... above are for the RKE2 variant of K8s. Some of them may be different for an 'upstream' Kubernetes or may not be needed at all for it.

Mon, Sep 23rd, 2024

GNOME 47 Wallpapers

With GNOME 47 out, it’s time for my bi-annual wallpaper deep dive. For many, these may seem like simple background images, but GNOME wallpapers are the visual anchors of the project, defining its aesthetic and identity. The signature blue wallpaper with its dark top bar remains a key part of that.

GNOME 47 Wallpapers

In this release, GNOME 47 doesn’t overhaul the default blue wallpaper. It’s more of a subtle tweak than a full redesign. The familiar rounded triangles remain, but here’s something neat: the dark variant mimics real-world camera behavior. When it’s darker, the camera’s aperture widens, creating a shallower depth of field. A small but nice touch for those who notice these things.

The real action this cycle, though, is in the supplemental wallpapers.

We haven’t had to remove much this time around, thanks to the JXL format keeping file sizes manageable. The focus has been on variety rather than cutting old designs. We aim to keep things fresh, though you might notice that photographic wallpapers are still missing (we’ll get to that eventually, promise.

In terms of fine tuning changes, the classic, Pixels has been updated to feature newer apps from GNOME Circle.

The dark variant of Pills also got some love with lighting and shading tweaks, including a subtle subsurface scattering effect.

As for the new wallpapers, there are a few cool additions this release. I collaborated with Dominik Baran to create a tube-map-inspired vector wallpaper, which I’m particularly into. There’s also Mollnar, a nod to Vera Molnar, using simple geometric shapes in SVG format.

Most of our wallpapers are still bitmaps, largely because our rendering tools don’t yet handle color banding well with vectors. For now, even designs that would work better as vectors—like mesh gradients—get converted to bitmaps.

We’ve introduced some new abstract designs as well – meet Sheet and Swoosh. And for fans of pixel art, we’ve added LCD and its colorful sibling, LCD-rainbow. Both give off that retro screen vibe, even if the color gradient realism isn’t real-world accurate.

Lastly, there’s Symbolic Soup, which is, well… a bit chaotic. It might not be everyone’s cup of tea, but it definitely adds variety.

Preview

LCD Pills Map Mollnar LCD Raindow Pixels Sheet Swoosh Symbolic Soup

If you’re wondering about the strange square aspect ratio, take a look at the wallpaper sizing guide in our GNOME Interface Guidelines.

Also worth noting is the fact that all of these wallpapers have been created by humans. While I’ve experimented with image generation for some parts of the workflow in some of of my personal projects, all this work is AIgen-free and explicitly credited.

Fri, Sep 20th, 2024

Tumbleweed – Review of the week 2024/38

Dear Tumbleweed users and hackers,

The main task completed this week was bisecting/testing Mesa 24.1.7 together with Stefan Dirsch. Getting things tested was a bit nasty, but at least we managed to work through it and update Tumbleweed to Mesa 24.1.7 as part of snapshot 0915. Of course, that’s only one update picked out and it’s not the biggest one, just the one that consumed the most attention. In total, we have released six snapshots during this week (0912, 0913, 0915, 0916, 0917, and 0918).

The most relevant changes were:

  • cURL 8.10.0
  • KDE Gear 24.08.1
  • Bluez 5.78
  • Boost 1.86.0
  • LibreOffice 24.8.1.2
  • Qemu 9.1.0
  • KDE Frameworks 6.6.0
  • Mesa 24.1.7
  • strace 6.11, linux-glibc-devel 6.11
  • Python Numpy 2.1.1
  • Python Sphinx 8.0.2
  • GTK 4.16.1
  • GNOME Shell & mutter 46.5

Based on the currently staged submit requests, we know that these items are being worked on at the moment:

  • Linux kernel 6.10.11 (no 6.11 just yet)
  • timezone 2024b: postgresql16 currently fails the test suite
  • PostgreSQL 17 as new default
  • Audit 4.0
  • grub2 change: Introduces a new package, grub2-x86_64-efi-bls; some scenarios do not install the proper branding package
  • Python Sphinx 8.0.2
  • Change of the default LSM (opted in at installation) to SELinux. AppArmor is still an option, just not the default. This change only impacts new installations
  • perl-Bootloader will be renamed to update-bootloader: it’s been a while since there was no Perl code. Some openQA tests need to be adjusted for this (https://progress.opensuse.org/issues/165686)
  • Mesa 24.2.x: identified an issue with ‘wrong’ colors (https://gitlab.freedesktop.org/mesa/mesa/-/issues/11840)

Quickstart in Full Disk Encryption with TPM and YaST2

This is a quick start guide for Full Disk Encryption with TPM or FIDO2 and YaST2 on openSUSE Tumbleweed. It focuses on the few steps to install openSUSE Tumbleweed with YaST2 and using Full Disk Encryption secured by a TPM2 chip and measured boot or a FIDO2 key.

Hardware Requirement:

  • UEFI Firmware
  • TPM2 Chip or FIDO2 key which supports the hmac-secret extension
  • 2GB Memory

Installation of openSUSE MicroOS

There is an own Quickstart for openSUSE MicroOS

Installation of openSUSE Tumbleweed

Boot installation media

  • Follow the workflow until “Suggested Partitioning”:
    • Partitioning: Select “Guided Setup” and “Enable Disk Encryption”, keep the other defaults
  • Continue Installation until “Installation Settings”:
    • Booting:
      • Change Boot Loader Type from “GRUB2 for EFI” to “Systemd Boot”, ignore “Systemd-boot support is work in progress” and continue
    • Software:
      • Install additional tmp2.0-tools, tpm2-0-tss and libtss2-tcti-device0
  • Finish Installation

Finish FDE Setup

Boot new system

  • Enter passphrase to unlock disk during boot
  • Login
  • Enroll system:
    • With TPM2 chip: sdbootutil enroll --method tpm2
    • With FIDO2 key: sdbootutil enroll --method fido2
  • Optional, but recommended:
    • Upgrade your LUKS key derivation function (do that for every encrypted device listed in /etc/crypttab):
            # cryptsetup luksConvertKey /dev/vdaX --pbkdf argon2id
            # cryptsetup luksConvertKey /dev/vdaY --pbkdf argon2id
      

Adjusting kernel boot parameters

The configuration file for kernel command line options is /etc/kernel/cmdline.

After editing this file, call sdbootutil update-all-entries to update the bootloader configuration. If that option does not exist yet or does not work, a workaround is: sdbootutil remove-all-kernels && sdbootutil add-all-kernels.

Re-enrollment

If the prediction system fails, a new policy must be created for the new measurements to replace the policy stored in the TPM2.

If you have a recovery PIN:

  # sdbootutil --ask-pin update-predictions

If you don’t have the recovery PIN, you can set one with this steps:

  # sdbootutil unenroll --method=tpm2
  # PIN=<new recovery PIN> sdbootutil enroll --method=tpm2

Virtual Machines

If your machine is a VM, it is recommended to remove the “0” from the FDE_SEAL_PCR_LIST variable in /etc/sysconfig/fde-tools. An update of the hypervisor can change PCR0. Since such an update is not visible inside the VM, the PCR values cannot be updated. As result, the disk cannot be decrypted automatically at the next boot, the recovery key needs to be entered and a manual re-enrollment is necessary.

Next Steps

The next steps will be:

  • Support grub2-BLS (grub2 following the Boot Loader Specification)
  • Add support to the installers (YaST2 and Agama)
  • Make this the default if a TPM2 chip is present

Any help is welcome!

Further Documentation

(Image made with DALL-E)

Wed, Sep 18th, 2024

pcp: pmcd network daemon review (CVE-2024-45769), (CVE-2024-45770)

Table of Contents

1) Introduction

Earlier this year we already reported a local symlink attack in Performance Co-Pilot (PCP). The rather complex PCP software suite was difficult to judge just from a cursory look, so we decided to take a closer look especially at PCP’s networking logic at a later time. This report contains two CVEs and some non-CVE related findings we also gathered during the follow-up review.

2) Overview of the PCP Network Protocol and Design

Since PCP is a complex system, this section gives a short overview of the components and network logic found in PCP, that are relevant for this report.

Network Access

The central component of PCP is the pmcd daemon. It implements a custom network protocol that is accessible either only locally, or on all available network interfaces, depending on the configuration. On openSUSE it only listens on the loopback device by default. On other distributions, like Debian, it listens on all interfaces by default. Even then, PCP specific configuration is in place that denies certain operations for remote connections, like so-called store operations, based on access rules. On Debian these accesses are setup so that only connections considered to be “local” are allowed to perform data store operations.

Whether a connection is local or not is determined either from the type of connection (e.g. UNIX domain socket connections are considered local) or by the sender’s IP address (loopback IP addresses are considered local). Using sender IP addresses for security decisions is generally not considered safe, since IP addresses can be spoofed. As this is a special case of checking for loopback IP addresses, it can be considered safe, since the Linux kernel should not allow packets received on remote interfaces to carry loopback IP addresses as sender.

The access configuration is found in the “pmcd.conf” configuration file.

Daemon and Agent Credentials

The PCP system can collect more or less arbitrary data in a generic manner. In the protocol, metric IDs are specified that are used to identify an agent responsible for managing the actual data of interest. A PCP agent can be a shared object (plugin) which is loaded directly into the pmcd daemon, or a separate program or script that communicates with pmcd via a pipe file descriptor.

pmcd itself drops privileges to an unprivileged pcp user and group, but a privileged special component pmdaroot is always kept around to perform privileged operations, if necessary. Also separate agents can (and usually do) run with full root privileges.

Typical agents that are configured by default are:

  • /var/lib/pcp/pmdas/proc/pmdaproc: gathers data about every process listed in /proc.
  • /var/lib/pcp/pmdas/linux/pmdalinux: gathers a plethora of Linux specific data e.g. from the /proc and /sys file systems.
  • /var/lib/pcp/pmdas/kvm/pmdakvm: tracks performance data related to KVM virtual machine emulation.

The actual agent configuration on a system is also found in the “pmcd.conf” configuration file.

3) Scope of the Review

For the review we looked into PCP release 6.2.1. For this final report we verified and updated everything to match the more recent 6.3.0 tag.

Our focus during the review was on the networking protocol implemented in the pmcd daemon. Furthermore we peeked into the most common agents and helper processes like pmdaroot, pmdaproc, pmdalinux and pmdakvm. We only looked into the situation of PCP running on Linux.

4) Reproducer Files

Together with this report, we provide a couple of reproducers for vulnerabilities that can be triggered over the network. They will be mentioned in the respective sections. Every reproducer contains a complete binary client-side protocol exchange that can trigger the issue. A simple way to run such a reproducer is by using the netcat utility in this manner:

nc -U /run/pcp/pmcd.socket <reproducer-file

5) Findings

Bugfixes for these issues are found in the recent 6.3.1 upstream release. Individual bugfixes are pointed out in the following sections, as far as possible.

A) __pmDecodeValueSet() Miscalculates Available Buffer Space Leading to a Possible Heap Corruption (CVE-2024-45769)

There is a miscalculation in __pmDecodeValueSet(). The vindex jumps to 32-bit offsets, while the check in p_result.c:415 (vindex > pdulen) uses byte offsets. This makes it possible to address data beyond the actual packet payload. Since __ntohpmValueBock() in line 432 also swaps bytes in these places, this represents a full remote DoS leading to SIGABRT, SIGSEGV and/or corruption of the heap. By very skillfully corrupting the heap, this might even allow more advanced attacks like privilege escalation or integrity violation. For an in-depth look at exploiting this issue, see section 6) below.

The reproducer file decode-value-set-out-of-bound-write can trigger this issue. When running pmcd in Valgrind, the following output can be seen:

Invalid read of size 4
   at 0x48B57DC: __pmDecodeValueSet (p_result.c:432)
   by 0x4D007BF: ???
   by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
   by 0x11BC8F: DoStore (dostore.c:149)
   by 0x111F25: HandleClientInput (pmcd.c:445)
   by 0x110984: ClientLoop (pmcd.c:880)
   by 0x110984: main (pmcd.c:1192)
 Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"

Invalid write of size 4
   at 0x48E06C4: __ntohpmValueBlock (endian.c:283)
   by 0x48B57E0: __pmDecodeValueSet (p_result.c:432)
   by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
   by 0x11BC8F: DoStore (dostore.c:149)
   by 0x111F25: HandleClientInput (pmcd.c:445)
   by 0x110984: ClientLoop (pmcd.c:880)
   by 0x110984: main (pmcd.c:1192)
 Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"

Invalid read of size 4
   at 0x48B57E1: __pmDecodeValueSet (p_result.c:433)
   by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
   by 0x11BC8F: DoStore (dostore.c:149)
   by 0x111F25: HandleClientInput (pmcd.c:445)
   by 0x110984: ClientLoop (pmcd.c:880)
   by 0x110984: main (pmcd.c:1192)
 Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"

Since remote connections are by default not allowed to enter this code path (this is a store operation), the issue is less severe than it looks at first.

This issue is fixed in upstream commit 3fc59861174a.

B) __pmDecodeCreds() Accesses numcreds Even if There is not Enough Data

__pmDecodeCreds() checks the amount of available data too late, so that the numcreds field of creds_t is accessed and byte swapped even if it wasn’t supplied by the client. This happens in p_creds.c:78.

The reproducer file numcreds-undefined-data can trigger the issue. When running pmcd in Valgrind then the following output can be seen:

Conditional jump or move depends on uninitialised value(s)
   at 0x48B83A5: __pmDecodeCreds (p_creds.c:74)
   by 0x11BFFD: DoCreds (dopdus.c:1427)
   by 0x111F1C: HandleClientInput (pmcd.c:469)
   by 0x110A74: ClientLoop (pmcd.c:880)
   by 0x110A74: main (pmcd.c:1192)

Since the heap allocated buffer returned from pmGetPDU() is bigger than the actual payload (at least 1024 bytes), this only results in an undefined data error. No practical exploit should result from this.

This issue is fixed in upstream commit 3561a367072b.

C) __pmDecodeCreds() shaky need calculation when numcred == 0

__pmDecodeCreds() behaves shakily when numcred == 0. The need calculation ends up using a negative value of -1 in p_creds.c:86. An attacker can get past the need != len check, providing insufficient data. Luckily the negative need is not used for anything else. The result of the call will be a zero length credlist, that will not be processed further by the calling DoCreds() function.

This issue is addressed by the same bugfix commit as for issue 5.B).

D) ntohEventArray() Blindly Processes Client Provided nrecords

The function ntohEventArray() does not check whether there is enough input data (and cannot check, since it is missing a length input parameter). It processes the nrecords provided by the client and starts byte swapping away, leading to out of bound heap read and write operations.

The Valgrind output for an attack of this function looks like this:

Invalid read of size 4
   at 0x48E168A: __bswap_32 (byteswap.h:52)
   by 0x48E168A: ntohEventArray (endian.c:250)
   by 0x48B67DA: __pmDecodeValueSet (p_result.c:432)
   by 0x48B737B: __pmDecodeResult_ctx (p_result.c:806)
   by 0x11BC8F: DoStore (dostore.c:149)
   by 0x111F25: HandleClientInput (pmcd.c:445)
   by 0x110984: ClientLoop (pmcd.c:880)
   by 0x110984: main (pmcd.c:1192)
 Address 0x4fc109c is 2,891,036 bytes inside an unallocated block of size 3,382,368 in arena "client"

Invalid write of size 4
   at 0x48E168E: ntohEventArray (endian.c:250)
   by 0x48B67DA: __pmDecodeValueSet (p_result.c:432)
   by 0x48B737B: __pmDecodeResult_ctx (p_result.c:806)
   by 0x11BC8F: DoStore (dostore.c:149)
   by 0x111F25: HandleClientInput (pmcd.c:445)
   by 0x110984: ClientLoop (pmcd.c:880)
   by 0x110984: main (pmcd.c:1192)
 Address 0x4fc109c is 2,891,036 bytes inside an unallocated block of size 3,382,368 in arena "client"

The reproducer ntohevent-array-out-of-bound-write is able to provoke this situation. We found this problem by using AFL fuzzing. The problematic function is nested rather deeply in the parsing logic and it escaped manual review efforts.

Regarding the severity of this issue, there is not much degree of freedom for an attacker, because the function simply linearly swaps data past the end of the valid pdubuf. It would only have impact beyond DoS, if the immediately following heap-block contains relevant application data. Chances are that the data is corrupted so much, that the program will crash anyway, though.

This issue is fixed in upstream commit 3561a367072b.

E) Profile Message Allows to Add Infinite Profiles

The “profile” message allows unauthenticated users to DoS the pmcd daemon. Memory is allocated for the lifetime of the TCP session for every new ctx index, which is 32-bit wide and thus allows to store up to 2^32 profiles, likely leading to an out of memory situation. See DoProfile().

It might make sense to limit the number of profiles at least for unauthenticated users, if this is possible.

The issue is fixed in upstream commit 1e54aa7de51b0e6c6cceab2a52e3f6893070f70f.

F) Fetch Message Allows to Allocate Unlimited nPmids

In HandleFetch() the client controlled nPmids is assigned to maxnpmids and is in turn used to allocate memory via pmAllocResult(). This could also lead to memory hogging or a network DoS.

A fix for this issue is found in upstream commit c9b1a2ecb4.

This issue is somewhat related to CVE-2023-6917 we reported earlier this year.

pmpost is used to append messages to the “PCP notice board”. It is called from different contexts, one of them is as root from within the pmcd startup script (called rc_pmcd in the repository). The program writes the message provided on the command line to the file in /var/log/pcp/NOTICES. The relevant code for opening the file is found in pmpost’s main() function (found in pmpost.c):

    if ((fd = open(notices, O_WRONLY|O_APPEND, 0)) < 0) {
        if ((fd = open(notices, O_WRONLY|O_CREAT|O_APPEND, 0664)) < 0) {
            fprintf(stderr, "pmpost: cannot open or create file \"%s\": %s\n",
                notices, osstrerror());
            goto oops;
        }
#ifndef IS_MINGW
        /* if root, try to fix ownership */
        if (getuid() == 0) {
            if ((fchown(fd, uid, gid)) < 0) {
                fprintf(stderr, "pmpost: cannot set file gid \"%s\": %s\n",
                    notices, osstrerror());
            }
        }
#endif
        lastday = LAST_NEWFILE;
    }

The directory /var/log/pcp belongs to pcp:pcp. The file is opened without passing the O_NOFOLLOW flag, thus it will open symlinks placed there by the pcp user. This allows to trick pmpost into creating new files in arbitrary locations, or to corrupt arbitrary existing files in the system. It thus poses a local denial of service vector.

Furthermore, if the NOTICES file is newly created and pmpost runs as root, then a fchown() to pcp:pcp is executed on the file. Thus it allows to pass the ownership of arbitrary newly created files in the system to pcp:pcp. This is likely a full local root exploit from pcp to root. Possible attack vectors are placing files into one of the various .d drop in configuration file directories in /etc.

Since the directory /var/log/pcp does not have a sticky bit set, the protected_symlinks setting of the Linux kernel does not protect from harm in this context.

This issue is addressed in upstream commit 22505f9a43

H) GetContextLabels() Uses Untrusted PCP_ATTR_CONTAINER to Construct JSON Document

When a client connects to pmcd, then attributes can be passed (found in ClientInfo.attrs). One of these attributes, PCP_ATTR_CONTAINER, is stored without further verification in ConnectionAttributes(). This value is used in the function GetContextLabels() to construct a JSON document. Here it is not checked whether the data contains any JSON syntax elements, which allows to inject arbitrary additional data into the JSON document by crafting a suitable CONTAINER attribute value.

The reproducer label-req-container-json-injection demonstrates this problem, by injecting an "evilkey": "evilvalue" element into the JSON document, by choosing a crafted container attribute value. It seems that by doing this a client can only fool itself; this doesn’t have any practical value for an attacker.

We followed the use of the CONTAINER attribute also into the pmdaroot helper program, where the attribute can also arrive, to query data regarding a specific container in root_container_search(). For a while it looked like this might even allow command line parameter injection e.g. in lxc.c, where the container name is passed to lxc-info. It turned out, however, that the caller provided value is only used for comparing it against the container names found locally, so crafted data should not cause any harm in this spot.

The fix for this issue is found in upstream commit d68bd777ae

I) Issues with __pmProcessPipe() and File Descriptors not Marked O_CLOEXEC

Most, if not all, file descriptors opened by PCP code are not marked O_CLOEXEC. This may cause problems when executing child processes that operate in a different security context than the parent, or are not prepared to safely handle any unexpectedly inherited open files, and might leak them on their end to further child processes.

This is not a problem when starting agents from within pmcd, because CreateAgentPOSIX() explicitly closes any file descriptors larger than 2. Similarly in the pmdaroot process in function root_create_agent() any non-std file descriptors are closed in the child context, before running execvp(). It is a problem in the context of the __pmProcessPipe() function, though, which executes arbitrary command lines in child processes in a popen() style.

The latter function does not close excess file descriptors. Depending on the context in which the function is invoked, sensitive file descriptors may leak into unexpected contexts. One such context we identified is in the pmdaroot process when it executes lxc-info to obtain information about LXC containers. To verify this, we replaced the lxc-info binary by a custom script and triggered the execution of lxc-info via pmcd. The custom script received the following open file descriptors:

lr-x------ 1 root root 64 Aug  2 12:23 0 -> pipe:[104916]
l-wx------ 1 root root 64 Aug  2 12:23 1 -> pipe:[107248]
l-wx------ 1 root root 64 Aug  2 12:23 2 -> /var/log/pcp/pmcd/root.log
lrwx------ 1 root root 64 Aug  2 12:23 3 -> socket:[105912]
lrwx------ 1 root root 64 Aug  2 12:23 4 -> socket:[105913]
lrwx------ 1 root root 64 Aug  2 12:23 5 -> socket:[105914]
lrwx------ 1 root root 64 Aug  2 12:23 6 -> socket:[105917]
lrwx------ 1 root root 64 Aug  2 12:23 7 -> socket:[105922]

As can be seen from this, the process inherited all open socket connections from the pmdaroot process. This could prove a vital local root exploit, if the sockets end up in the wrong hands, since clients of pmdaroot can start arbitrary commands as root via the PDUROOT_STARTPMDA_REQ message.

Another use of __pmProcessPipe() that could be problematic in this respect, is in the Perl module glue code, where the __pmProcessPipe() function is made available as $pmda->add_pipe(...) (see function local_pipe() in perl/PMDA/local.c. The in-tree Perl modules that make use of this function don’t seem to open any additional files that could leak, though.

This issue is addressed in upstream commit 1d5a8d1c6fe8b3d5b35a9cc0ed6644696c67ec91

6) Exploiting the Heap Corruption in Issue 5.A)

This section investigates to what ends the heap corruption issue outlined in section 5.A) can be exploited by a skillful attacker.

The location where the out-of-bound write occurs in issue 5.A) is under quite some attacker control. As we know from the issue, there is a boundary check in p_result.c: 415, but the check is in bytes, while we can address 32-bit offsets from the start of pdubuf. The PDU (protocol data unit) is received in LIMIT_SIZE mode, thus at max 64 KiB of data can be transferred for the attack. This means the attacker can specify a vindex of up to 65536. The valid pdulen will be 65536, but the vindex will address up to 4 * 65536 = 256 KiB. Thus an attacker can cause heap corruption in the heap memory area made up of the 192 KiB following the pdubuf.

An interesting data structure that caught our interest is also found on the heap: the client array, holding the ClientInfo data structures for all connected clients. When sending small PDUs, the client buffer will already be located some 10 KiB after the pdubuf in memory. Sending a small PDU won’t do, though, because then the vindex cannot address far enough into the heap to reach it. When sending a larger PDU of a few kilobytes, pdubuf will be located after the client buffer on the heap, making it again unreachable for the attack.

Things can be turned around by creating a lot of connections to pmcd, though. The client buffer is realloc()‘d in the NewClient() function, when new clients are coming in that no longer fit into the client array. By temporarily creating e.g. 200 connections to pmcd, it is possible to force realloc() to move the client buffer to a larger heap address. This in turn makes it possible to send an attack payload that is large enough to cause heap corruption 10 to 20 KiB beyond the end of pdubuf, while pdubuf will still be located at a smaller address than the client buffer.

The heap addresses used are relatively deterministic. ASLR protection does not help much here, because the attack is not about absolute addresses, but about relative offsets between data structures on the heap. When freshly starting pmcd and sending an initial attack payload, the offset between client and pdubuf is always the same. When doing more complex operations that are needed to perform a full attack, the offsets are somewhat less deterministic, but still patterns can be observed. Thus a successful guess is well within reach, as we believe, especially since the local attacker also has the possibility to force the daemon to crash and be restarted, allowing for multiple attempts.

A full attack scenario that we came up with is the following:

  1. The attacker creates a connection from remote, which ends up with ClientInfo->denyOps == 0x2 which means PMCD_OP_STORE is denied for the remote connection. This connection only sends the initial message and then stays idle, but connected.
  2. The attacker sends a valid PDU_RESULT of a somewhat larger size (3 KiB) using a local connection.
  3. The attacker creates 200 parallel idling connections towards pmcd, to force the client buffer to be moved to a larger heap address. Then the connections are terminated again.
  4. The attacker sends an attack PDU_RESULT payload of 3-4 KiB size using a local connection. The attack payload contains just one bad vindex that is tuned just so that __ntohpmValueBlock() will operate exactly on the address of client[0].denyOps for the connection still open from step 1).
  5. The attack will corrupt the ClientInfo from step 1) in such a way that denyOps no longer contains PMCD_OP_STORE. The connection will thus be “upgraded” to be treated like a local connection, although it is remote.

We verified this scenario in a practical example on openSUSE Tumbleweed against pmcd version 6.2.1. Arriving at step 4) the distance to cross to reach the client[0] structure was a bit over 5 KiB:

(gdb) p (char*)client - (char*)pdubuf
$2 = 5520

Before the processing of the attack payload, the client[] structure is intact:

(gdb) p client[0]
$8 = {fd = 17, status = {connected = 1, changes = 0, attributes = 0}, profile = {nodes = 0, hsize = 0,
    hash = 0x0, next = 0x0, index = 0}, denyOps = 2, pduInfo = {features = 3652, licensed = 1, version = 0,
    zero = 0}, seq = 1, start = 1723726720, addr = 0x5625f56b9160, attrs = {nodes = 0, hsize = 0, hash = 0x0,
    next = 0x0, index = 0}}

After the attack has been carried out, it is corrupted like this:

(gdb) p client[0]
$30 = {fd = 17, status = {connected = 1, changes = 0, attributes = 0}, profile = {nodes = 0, hsize = 0,
    hash = 0x0, next = 0x0, index = 0}, denyOps = 33554432, pduInfo = {features = 0, licensed = 0, version = 1,
    zero = 0}, seq = 1141768448, start = 1723726720, addr = 0x5625f56b9160, attrs = {nodes = 0, hsize = 0,
    hash = 0x0, next = 0x0, index = 0}}

As can be seen this also corrupts features, licensed, version and seq. This did not stop the connection from step 1) from sending a PDU_RESULT message without being denied. So the upgrade of the remote connection was carried out successfully. The effects of the attack could be tuned further by changing the vindex offset to a smaller or larger value, to maybe cause less fallout in the ClientInfo structure, depending on the needs of the attacker.

As this shows, the heap corruption issue offers more possibilities than it might look at first. It allows to violate the integrity of the pmcd daemon in unexpected ways.

7) About CVE Assignments

The PCP maintainers don’t consider denial-of-service attacks CVE worthy, since the service will be restarted automatically via systemd. For this reason no CVEs have been assigned for this class of issues.

A similar consideration has been made by the PCP maintainers regarding the memory corruption issues: as long as the service only crashes, it’s not CVE worthy. For this reason a CVE has been assigned only for issue 5A), which proved to be exploitable as shown in section 6).

8) Timeline

2024-08-06 We shared a comprehensive report with findings and recommendations with the PCP maintainers at pcp-maintainers@groups.io. We offered coordinated disclosure according to our disclosure policy
2024-08-14 The date of 2024-09-17 has been agreed upon for publication of the findings
2024-08-15 There was some uncertainty about the severity of the heap corruption issue 5a), so we investigated it more deeply and shared our findings with the PCP maintainers
2024-09-09 We recommended to the PCP maintainers to obtain CVEs from the RedHat security team, and they received the two CVEs by this date
2024-09-17 A bugfix release has been published as planned by the PCP upstream maintainers

9) References

Tue, Sep 17th, 2024

Why sudo 1.9.16 enables secure_path by default?

Sudo 1.9.16 is now out, containing mostly bug fixes. However, there are also some new features, like the json_compact option I wrote about a while ago. The other major change is, secure_path is now enabled by default in the sudoers file, and there is a new option to fine-tune its content.

Read more at https://www.sudo.ws/posts/2024/09/why-sudo-1.9.16-enables-secure_path-by-default/

Sudo logo