Wed, Sep 25th, 2024
Huge improvements for syslog-ng in MacPorts
Last week I wrote about a campaign that we started to resolve issues on GitHub. Some of the fixes are coming from our enthusiastic community. Thanks to this, there is a new syslog-ng-devel port in MacPorts, where you can enable almost all syslog-ng features even for older MacOS versions and PowerPC hardware. Some of the freshly enabled modules include support for Kafka, GeoIP or OpenTelemetry. From this blog entry, you can learn how to install a legacy or an up-to-date syslog-ng version from MacPorts.
Read the rest of my blog at https://www.syslog-ng.com/community/b/blog/posts/huge-improvements-for-syslog-ng-in-macports
Syslog Ng Huge Improvements in Macports
Last week I wrote about a campaign that we started to resolve issues on GitHub. Some of the fixes are coming from our enthusiastic community. Thanks to this, there is a new syslog-ng-devel port in MacPorts, where you can enable almost all syslog-ng features even for older MacOS versions and PowerPC hardware. Some of the freshly enabled modules include support for Kafka, GeoIP or OpenTelemetry. From this blog entry, you can learn how to install a legacy or an up-to-date syslog-ng version from MacPorts.
Read the rest of my blog at https://www.syslog-ng.com/community/b/blog/posts/huge-improvements-for-syslog-ng-in-macports
Improving Labels to Foster Collaboration
Tue, Sep 24th, 2024
20 Years of Linux | Blathering
Installing the NVIDIA GPU Operator on Kubernetes on openSUSE Leap
This article shows how to install and deploy Kubernetes (K8s) using RKE2 by SUSE Rancher on openSUSE Leap 15.6 with the NVIDIA GPU Operator. This operator deploys and loads any driver stack components required by CUDA on K8s Cluster nodes without touching the container host and makes sure, the correct driver stack is made available to driver containers. We use a driver container specifically build for openSUSE Leap 15.6 and SLE 15 SP6. GPU acceleration with CUDA is used in many AI applications. AI application workflows are frequently depoyed through K8s.
Introduction
NVIDIA's Compute Unified Device Architecture (CUDA) plays a crucial role in AI today. Only with the enormous compute power of state-of-the-art GPUs it is possible to process training and inferencing with an acceptable amount of resources and compute time.
Most AI workflows rely on containerized workloads deployed and managed by Kubernetes (K8s). To deploy the entire compute stack - including kernel modules - to a K8s cluster, NVIDIA has designed its GPU Operator, which, together with a set of containers, is able to perform this task without ever touching the container hosts.
Most of the components used by the GPU Operator are 'distribution agnostic' however, one container needs to be built specifically for the target distribution: the driver container. This is owed to the fact that drivers are loaded into the kernel space and therefore need to be built specifically for that kernel.
For a long time, NVIDIA kernel drivers were proprietary and closed source. More recently, NVIDIA has published a kernel driver that's entirely open source. This enables Linux distributions to publish pre-built drivers for their products. This allows for a much quicker installation. Also, prebuilt drivers are signed with the key thats used for the distribution kernel. This way, the driver will work seamlessly in systems with secure boot enabled. The container utilized below makes use of a pre-built driver.
In the next section we will explore how to deploy K8s on openSUSE Leap 15.6 once this is done, we will deploy the NVIDA GPU Operator in the following section run some initial tests. If you have K8s already running you may want to skip ahead to the 2nd part.
Install RKE2 on openSUSE Leap 15.6
We have chosen RKE2 from SUSE Rancher for K8s over the K8s packages shipped with openSUSE Leap: RKE2 is a well curated and maintained Kubernetes distribution which works right out of the box while openSUSE's K8s packages have been broken pretty much ever since openSUSE Kubic has been dropped.
RKE2 does not come as an RPM package. This seems strange at first, however, it is owed to the fact that Rancher wants to ensure maximal portability across various Linux distributions.
Instead, it comes as a tar-ball - which is not unusual for application layer software.
Most of what's described in this document has been taken from a great article by Alex Arnoldy on how to deploy NVIDIA's GPU Operator on RKE2 and SLE BCI. Unfortunately, it was no longer fully up-to-date and thus has been taken down.
Install the K8s server
Kubernetes consists of at least one server which serves as a control node for the entire cluster. Additionally clusters may have any number of agents - i.e. machines which workloads will be spread across. Servers will act as an agent as well. If your K8s cluster consists just of one machine, you will be done once your server is installed. You may skip the following section. For system requirements you may want to check here. We assume, you have a Leap 15.6 system installed already (minimal installation is sufficient and even preferred).
- Make sure, you have all components installed already which are either
required for installation or runtime:
For the installation, a convenient installation script exists. This downloads the required components, performs a checksum verification and installs them. The installation is minimal. When RKE2 is started for the first time, it will install itself tozypper -n install -y curl tar gawk iptables helm
/var/lib/rancher
and/etc/rancher
. Download the installation script:# cd /root # curl -o rke2.sh -fsSL https://get.rke2.io
- and run it:
sh rke2.sh
- To make sure, that the binaries provided by RKE2 - most importantly,
kubectl
- are found and will find their config files, you may want to create a separate shell profile:# cat > /etc/profile.d/rke2.sh << EOF export PATH=$PATH:/var/lib/rancher/rke2/bin export KUBECONFIG=/etc/rancher/rke2/rke2.yaml export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml EOF
- Now enable and start the
rke2-server
service:
With this, the installation is completed.systemctl enable --now rke2-server
- To check is all pods have come up properly and are running of have
completed successfully, run:
# kubectl get nodes -n kube-system
Install Agents
If you are running a single node cluster, you are done now and may skip this chapter. Otherwise, you will need to perform the steps below for every node you want to install as an agent.
- As above, make sure, all required prerequisites are installed:
# zypper -n install -y curl tar gawk iptables
- Download the installation script
# cd /root # curl -o rke2.sh -fsSL https://get.rke2.io
- and run it:
# INSTALL_RKE2_TYPE="agent" sh rke2.sh
- Obtain the token from the server node, it can be found on the server
at
/var/lib/rancher/rke2/server/node-token
. and add it to config file for the RK2 agent service:
(You have to replace# mkdir -p /etc/rancher/rke2/ # cat > /etc/rancher/rke2/config.yaml server: https://<server>:9345 token <obtained token>
by the name of IP of the RKE2 server host and by the agent token mentioned above. - Now you are able to start the agent:
kubectl enable --now rke2-agent
- After a while you should see that the node is has been picked up
by the server. Run:
in the server machine. The output should look something like this:kubectl get nodes
NAME STATUS ROLES AGE VERSION node01 Ready control-plane,etcd,master 12m v1.30.4+rke2r1 node02 Ready <none> 5m v1.30.4+rke2r1
Deploying the GPU Operator
Now, with the K8s cluster (hopefully) running, you'd be ready to deploy the GPU operator. The following steps need to be performed on the server node only, regardless if this has a GPU installed or not. The correct driver will be installed on any node that has a GPU installed.
- To simply configuration, create a file
/root/build-variables.sh
on the server node:# cat > /root/build-variables.sh <<EOF export LEAP_MAJ="15" export LEAP_MIN="6" export DRIVER_VERSION="555.42.06" export OPERATOR_VERSION="v24.6.1" export DRIVER_IMAGE=nvidia-driver-container export REGISTRY="registry.opensuse.org/network/cluster/containers/containers-${LEAP_MAJ}.${LEAP_MIN}" EOF
- and source this file from the shell you run the following commands from:
Note that in the script above we are using kernel driver version 555.42.06 for CUDA 12.5 instead of CUDA 12.6 as in 12.6 NVIDIA has introduced some dependency issues which have not been resolved fully, yet. This will limit CUDA used in the payload to 12.5 or older since a kernel driver version will only work for CUDA versions older or equal to the version it was provided with. This will be fixed in future versions so that later driver of GPU operator versions can be used. Also note, that# source /root/build-variables.sh
$REGISTRY
points to a driver container in https://build.opensuse.org/package/show/network:cluster:containers/nv-driver-container This is a driver container specifically built for Leap 15.6 and SLE 15 SP6. Thenvidia-driver-ctr
container will look for a container image${REGISTRY}/${DRIVER_IMAGE}
tagged:${DRIVER_VERSION}-${ID}${VERSION_ID}
.${ID}
and${VERSION_ID}
are taken from/etc/os-release
on the container host. Currently, the container above is tagged for Leap 15.6 and SLE 15 SP6. - Add the NVIDIA Helm repository:
# helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
- and update it:
# helm repo update
- Now deploy the operator using the nvidia/gpu-operator Helm chart:
After a while, the command will return.# helm install -n gpu-operator \ --generate-name --wait \ --create-namespace \ --version=${OPERATOR_VERSION} \ nvidia/gpu-operator \ --set driver.repository=${REGISTRY} \ --set driver.image=${DRIVER_IMAGE} \ --set driver.version=${DRIVER_VERSION} \ --set operator.defaultRuntime=containerd \ --set toolkit.env[0].name=CONTAINERD_CONFIG \ --set toolkit.env[0].value=/var/lib/rancher/rke2/agent/etc/containerd/config.toml \ --set toolkit.env[1].name=CONTAINERD_SOCKET \ --set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \ --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \ --set toolkit.env[2].value=nvidia \ --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \ --set-string toolkit.env[3].value=true
- Now, you can view the additional pods that have started in the
gpu-operator
namespace:kubectl get pods --namespace gpu-operator
- To verify that everything has been deployed correctly, run:
This should return a result like:# kubectl logs -n gpu-operator -l app=nvidia-operator-validator
Also, run:Defaulted container "nvidia-operator-validator" out of: nvidia-operator-validator, driver-validation (init), toolkit-validation (init), cuda-validation (init), plugin-validation (init) all validations are successful
which should result in:# kubectl logs -n gpu-operator -l app=nvidia-cuda-validator
To obtain information on the NVIDIA hardware installed on each node, run:Defaulted container "nvidia-cuda-validator" out of: nvidia-cuda-validator, cuda-validation (init) cuda workload validation is successful
# kubectl exec -it "$(for EACH in \ $(kubectl get pods -n gpu-operator \ -l app=nvidia-driver-daemonset \ -o jsonpath={.items..metadata.name}); \ do echo ${EACH}; done)" -n gpu-operator -- nvidia-smi
One should note, that most arguments to helm install ...
above are
for the RKE2 variant of K8s. Some of them may be different for an 'upstream'
Kubernetes or may not be needed at all for it.
Mon, Sep 23rd, 2024
GNOME 47 Wallpapers
With GNOME 47 out, it’s time for my bi-annual wallpaper deep dive. For many, these may seem like simple background images, but GNOME wallpapers are the visual anchors of the project, defining its aesthetic and identity. The signature blue wallpaper with its dark top bar remains a key part of that.
In this release, GNOME 47 doesn’t overhaul the default blue wallpaper. It’s more of a subtle tweak than a full redesign. The familiar rounded triangles remain, but here’s something neat: the dark variant mimics real-world camera behavior. When it’s darker, the camera’s aperture widens, creating a shallower depth of field. A small but nice touch for those who notice these things.
The real action this cycle, though, is in the supplemental wallpapers.
We haven’t had to remove much this time around, thanks to the JXL format keeping file sizes manageable. The focus has been on variety rather than cutting old designs. We aim to keep things fresh, though you might notice that photographic wallpapers are still missing (we’ll get to that eventually, promise.
In terms of fine tuning changes, the classic, Pixels
has been updated to feature newer apps from GNOME Circle.
The dark variant of Pills
also got some love with lighting and shading tweaks, including a subtle subsurface scattering effect.
As for the new wallpapers, there are a few cool additions this release. I collaborated with Dominik Baran to create a tube-map-inspired vector wallpaper, which I’m particularly into. There’s also Mollnar
, a nod to Vera Molnar, using simple geometric shapes in SVG format.
Most of our wallpapers are still bitmaps, largely because our rendering tools don’t yet handle color banding well with vectors. For now, even designs that would work better as vectors—like mesh gradients—get converted to bitmaps.
We’ve introduced some new abstract designs as well – meet Sheet
and Swoosh
. And for fans of pixel art, we’ve added LCD
and its colorful sibling, LCD-rainbow
. Both give off that retro screen vibe, even if the color gradient realism isn’t real-world accurate.
Lastly, there’s Symbolic Soup
, which is, well… a bit chaotic. It might not be everyone’s cup of tea, but it definitely adds variety.
Preview
If you’re wondering about the strange square aspect ratio, take a look at the wallpaper sizing guide in our GNOME Interface Guidelines.
Also worth noting is the fact that all of these wallpapers have been created by humans. While I’ve experimented with image generation for some parts of the workflow in some of of my personal projects, all this work is AIgen-free and explicitly credited.
Fri, Sep 20th, 2024
Tumbleweed – Review of the week 2024/38
Dear Tumbleweed users and hackers,
The main task completed this week was bisecting/testing Mesa 24.1.7 together with Stefan Dirsch. Getting things tested was a bit nasty, but at least we managed to work through it and update Tumbleweed to Mesa 24.1.7 as part of snapshot 0915. Of course, that’s only one update picked out and it’s not the biggest one, just the one that consumed the most attention. In total, we have released six snapshots during this week (0912, 0913, 0915, 0916, 0917, and 0918).
The most relevant changes were:
- cURL 8.10.0
- KDE Gear 24.08.1
- Bluez 5.78
- Boost 1.86.0
- LibreOffice 24.8.1.2
- Qemu 9.1.0
- KDE Frameworks 6.6.0
- Mesa 24.1.7
- strace 6.11, linux-glibc-devel 6.11
- Python Numpy 2.1.1
- Python Sphinx 8.0.2
- GTK 4.16.1
- GNOME Shell & mutter 46.5
Based on the currently staged submit requests, we know that these items are being worked on at the moment:
- Linux kernel 6.10.11 (no 6.11 just yet)
- timezone 2024b: postgresql16 currently fails the test suite
- PostgreSQL 17 as new default
- Audit 4.0
- grub2 change: Introduces a new package, grub2-x86_64-efi-bls; some scenarios do not install the proper branding package
- Python Sphinx 8.0.2
- Change of the default LSM (opted in at installation) to SELinux. AppArmor is still an option, just not the default. This change only impacts new installations
- perl-Bootloader will be renamed to update-bootloader: it’s been a while since there was no Perl code. Some openQA tests need to be adjusted for this (https://progress.opensuse.org/issues/165686)
- Mesa 24.2.x: identified an issue with ‘wrong’ colors (https://gitlab.freedesktop.org/mesa/mesa/-/issues/11840)
Quickstart in Full Disk Encryption with TPM and YaST2
This is a quick start guide for Full Disk Encryption with TPM or FIDO2 and YaST2 on openSUSE Tumbleweed. It focuses on the few steps to install openSUSE Tumbleweed with YaST2 and using Full Disk Encryption secured by a TPM2 chip and measured boot or a FIDO2 key.
Hardware Requirement:
- UEFI Firmware
- TPM2 Chip or FIDO2 key which supports the hmac-secret extension
- 2GB Memory
Installation of openSUSE MicroOS
There is an own Quickstart for openSUSE MicroOS
Installation of openSUSE Tumbleweed
Boot installation media
- Follow the workflow until “Suggested Partitioning”:
- Partitioning: Select “Guided Setup” and “Enable Disk Encryption”, keep the other defaults
- Continue Installation until “Installation Settings”:
- Booting:
- Change Boot Loader Type from “GRUB2 for EFI” to “Systemd Boot”, ignore “Systemd-boot support is work in progress” and continue
- Software:
- Install additional tmp2.0-tools, tpm2-0-tss and libtss2-tcti-device0
- Booting:
- Finish Installation
Finish FDE Setup
Boot new system
- Enter passphrase to unlock disk during boot
- Login
- Enroll system:
- With TPM2 chip:
sdbootutil enroll --method tpm2
- With FIDO2 key:
sdbootutil enroll --method fido2
- With TPM2 chip:
- Optional, but recommended:
- Upgrade your LUKS key derivation function (do that for every encrypted device listed in
/etc/crypttab
):# cryptsetup luksConvertKey /dev/vdaX --pbkdf argon2id # cryptsetup luksConvertKey /dev/vdaY --pbkdf argon2id
- Upgrade your LUKS key derivation function (do that for every encrypted device listed in
Adjusting kernel boot parameters
The configuration file for kernel command line options is /etc/kernel/cmdline
.
After editing this file, call sdbootutil update-all-entries
to update the
bootloader configuration. If that option does not exist yet or does not work,
a workaround is: sdbootutil remove-all-kernels && sdbootutil add-all-kernels
.
Re-enrollment
If the prediction system fails, a new policy must be created for the new measurements to replace the policy stored in the TPM2.
If you have a recovery PIN:
# sdbootutil --ask-pin update-predictions
If you don’t have the recovery PIN, you can set one with this steps:
# sdbootutil unenroll --method=tpm2
# PIN=<new recovery PIN> sdbootutil enroll --method=tpm2
Virtual Machines
If your machine is a VM, it is recommended to remove the “0” from the FDE_SEAL_PCR_LIST
variable in /etc/sysconfig/fde-tools
. An update of the hypervisor can change PCR0. Since such an update is not visible inside the VM, the PCR values cannot be updated. As result, the disk cannot be decrypted automatically at the next boot, the recovery key needs to be entered and a manual re-enrollment is necessary.
Next Steps
The next steps will be:
- Support grub2-BLS (grub2 following the Boot Loader Specification)
- Add support to the installers (YaST2 and Agama)
- Make this the default if a TPM2 chip is present
Any help is welcome!
Further Documentation
(Image made with DALL-E)
Wed, Sep 18th, 2024
pcp: pmcd network daemon review (CVE-2024-45769), (CVE-2024-45770)
Table of Contents
- 1) Introduction
- 2) Overview of the PCP Network Protocol and Design
- 3) Scope of the Review
- 4) Reproducer Files
-
5) Findings
- A)
__pmDecodeValueSet()
Miscalculates Available Buffer Space Leading to a Possible Heap Corruption (CVE-2024-45769) - B)
__pmDecodeCreds()
Accessesnumcreds
Even if There is not Enough Data - C)
__pmDecodeCreds()
shakyneed
calculation whennumcred == 0
- D)
ntohEventArray()
Blindly Processes Client Providednrecords
- E) Profile Message Allows to Add Infinite Profiles
- F) Fetch Message Allows to Allocate Unlimited
nPmids
- G)
pmpost
Fosters a Symlink Attack Allowing to Escalate frompcp
toroot
(CVE-2024-45770) - H)
GetContextLabels()
Uses UntrustedPCP_ATTR_CONTAINER
to Construct JSON Document - I) Issues with
__pmProcessPipe()
and File Descriptors not MarkedO_CLOEXEC
- A)
- 6) Exploiting the Heap Corruption in Issue 5.A)
- 7) About CVE Assignments
- 8) Timeline
- 9) References
1) Introduction
Earlier this year we already reported a local symlink attack in Performance Co-Pilot (PCP). The rather complex PCP software suite was difficult to judge just from a cursory look, so we decided to take a closer look especially at PCP’s networking logic at a later time. This report contains two CVEs and some non-CVE related findings we also gathered during the follow-up review.
2) Overview of the PCP Network Protocol and Design
Since PCP is a complex system, this section gives a short overview of the components and network logic found in PCP, that are relevant for this report.
Network Access
The central component of PCP is the pmcd
daemon. It implements a custom
network protocol that is accessible either only locally, or on all available
network interfaces, depending on the configuration. On openSUSE it only
listens on the loopback device by default. On other distributions, like
Debian, it listens on all interfaces by default. Even then, PCP specific
configuration is in place that denies certain operations for remote
connections, like so-called store operations, based on access rules. On Debian
these accesses are setup so that only connections considered to be “local” are
allowed to perform data store operations.
Whether a connection is local or not is determined either from the type of connection (e.g. UNIX domain socket connections are considered local) or by the sender’s IP address (loopback IP addresses are considered local). Using sender IP addresses for security decisions is generally not considered safe, since IP addresses can be spoofed. As this is a special case of checking for loopback IP addresses, it can be considered safe, since the Linux kernel should not allow packets received on remote interfaces to carry loopback IP addresses as sender.
The access configuration is found in the “pmcd.conf” configuration file.
Daemon and Agent Credentials
The PCP system can collect more or less arbitrary data in a generic manner.
In the protocol, metric IDs are specified that are used to identify an agent
responsible for managing the actual data of interest. A PCP agent can be a
shared object (plugin) which is loaded directly into the pmcd
daemon, or a
separate program or script that communicates with pmcd
via a pipe file
descriptor.
pmcd
itself drops privileges to an unprivileged pcp
user and group, but a
privileged special component pmdaroot
is always kept around to perform
privileged operations, if necessary. Also separate agents can (and usually do)
run with full root privileges.
Typical agents that are configured by default are:
- /var/lib/pcp/pmdas/proc/pmdaproc: gathers data about every process listed in /proc.
- /var/lib/pcp/pmdas/linux/pmdalinux: gathers a plethora of Linux specific data e.g. from the /proc and /sys file systems.
- /var/lib/pcp/pmdas/kvm/pmdakvm: tracks performance data related to KVM virtual machine emulation.
The actual agent configuration on a system is also found in the “pmcd.conf” configuration file.
3) Scope of the Review
For the review we looked into PCP release 6.2.1. For this final report we verified and updated everything to match the more recent 6.3.0 tag.
Our focus during the review was on the networking protocol implemented in the
pmcd
daemon. Furthermore we peeked into the most common agents and helper
processes like pmdaroot
, pmdaproc
, pmdalinux
and pmdakvm
. We only
looked into the situation of PCP running on Linux.
4) Reproducer Files
Together with this report, we provide a couple of reproducers for
vulnerabilities that can be triggered over the network. They will be mentioned
in the respective sections. Every reproducer contains a complete binary
client-side protocol exchange that can trigger the issue. A simple way to run
such a reproducer is by using the netcat
utility in this manner:
nc -U /run/pcp/pmcd.socket <reproducer-file
5) Findings
Bugfixes for these issues are found in the recent 6.3.1 upstream release. Individual bugfixes are pointed out in the following sections, as far as possible.
A) __pmDecodeValueSet()
Miscalculates Available Buffer Space Leading to a Possible Heap Corruption (CVE-2024-45769)
There is a miscalculation in __pmDecodeValueSet()
. The vindex
jumps to 32-bit
offsets, while the check in p_result.c:415
(vindex > pdulen
) uses byte
offsets. This makes it possible to address data beyond the actual packet
payload. Since __ntohpmValueBock()
in line 432 also swaps bytes in these
places, this represents a full remote DoS leading to SIGABRT, SIGSEGV and/or
corruption of the heap. By very skillfully corrupting the heap, this might
even allow more advanced attacks like privilege escalation or integrity
violation. For an in-depth look at exploiting this issue, see section 6)
below.
The reproducer file decode-value-set-out-of-bound-write
can trigger this issue. When running pmcd
in Valgrind, the following output
can be seen:
Invalid read of size 4
at 0x48B57DC: __pmDecodeValueSet (p_result.c:432)
by 0x4D007BF: ???
by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
by 0x11BC8F: DoStore (dostore.c:149)
by 0x111F25: HandleClientInput (pmcd.c:445)
by 0x110984: ClientLoop (pmcd.c:880)
by 0x110984: main (pmcd.c:1192)
Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"
Invalid write of size 4
at 0x48E06C4: __ntohpmValueBlock (endian.c:283)
by 0x48B57E0: __pmDecodeValueSet (p_result.c:432)
by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
by 0x11BC8F: DoStore (dostore.c:149)
by 0x111F25: HandleClientInput (pmcd.c:445)
by 0x110984: ClientLoop (pmcd.c:880)
by 0x110984: main (pmcd.c:1192)
Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"
Invalid read of size 4
at 0x48B57E1: __pmDecodeValueSet (p_result.c:433)
by 0x48B633B: __pmDecodeResult_ctx (p_result.c:806)
by 0x11BC8F: DoStore (dostore.c:149)
by 0x111F25: HandleClientInput (pmcd.c:445)
by 0x110984: ClientLoop (pmcd.c:880)
by 0x110984: main (pmcd.c:1192)
Address 0x4d012c0 is 2,320 bytes inside an unallocated block of size 3,372,592 in arena "client"
Since remote connections are by default not allowed to enter this code path (this is a store operation), the issue is less severe than it looks at first.
This issue is fixed in upstream commit 3fc59861174a.
B) __pmDecodeCreds()
Accesses numcreds
Even if There is not Enough Data
__pmDecodeCreds()
checks the amount of available data too late, so that the
numcreds
field of creds_t
is accessed and byte swapped even if it wasn’t
supplied by the client. This happens in p_creds.c:78
.
The reproducer file
numcreds-undefined-data can trigger
the issue. When running pmcd
in Valgrind then the following output can be
seen:
Conditional jump or move depends on uninitialised value(s)
at 0x48B83A5: __pmDecodeCreds (p_creds.c:74)
by 0x11BFFD: DoCreds (dopdus.c:1427)
by 0x111F1C: HandleClientInput (pmcd.c:469)
by 0x110A74: ClientLoop (pmcd.c:880)
by 0x110A74: main (pmcd.c:1192)
Since the heap allocated buffer returned from pmGetPDU()
is bigger than the
actual payload (at least 1024 bytes), this only results in an undefined data
error. No practical exploit should result from this.
This issue is fixed in upstream commit 3561a367072b.
C) __pmDecodeCreds()
shaky need
calculation when numcred == 0
__pmDecodeCreds()
behaves shakily when numcred == 0
. The need
calculation ends up using a negative value of -1 in p_creds.c:86
. An
attacker can get past the need != len
check, providing insufficient data.
Luckily the negative need
is not used for anything else. The result of the
call will be a zero length credlist
, that will not be processed further by
the calling DoCreds()
function.
This issue is addressed by the same bugfix commit as for issue 5.B).
D) ntohEventArray()
Blindly Processes Client Provided nrecords
The function ntohEventArray()
does not check whether there is enough input
data (and cannot check, since it is missing a length input parameter). It
processes the nrecords
provided by the client and starts byte swapping away,
leading to out of bound heap read and write operations.
The Valgrind output for an attack of this function looks like this:
Invalid read of size 4
at 0x48E168A: __bswap_32 (byteswap.h:52)
by 0x48E168A: ntohEventArray (endian.c:250)
by 0x48B67DA: __pmDecodeValueSet (p_result.c:432)
by 0x48B737B: __pmDecodeResult_ctx (p_result.c:806)
by 0x11BC8F: DoStore (dostore.c:149)
by 0x111F25: HandleClientInput (pmcd.c:445)
by 0x110984: ClientLoop (pmcd.c:880)
by 0x110984: main (pmcd.c:1192)
Address 0x4fc109c is 2,891,036 bytes inside an unallocated block of size 3,382,368 in arena "client"
Invalid write of size 4
at 0x48E168E: ntohEventArray (endian.c:250)
by 0x48B67DA: __pmDecodeValueSet (p_result.c:432)
by 0x48B737B: __pmDecodeResult_ctx (p_result.c:806)
by 0x11BC8F: DoStore (dostore.c:149)
by 0x111F25: HandleClientInput (pmcd.c:445)
by 0x110984: ClientLoop (pmcd.c:880)
by 0x110984: main (pmcd.c:1192)
Address 0x4fc109c is 2,891,036 bytes inside an unallocated block of size 3,382,368 in arena "client"
The reproducer ntohevent-array-out-of-bound-write is able to provoke this situation. We found this problem by using AFL fuzzing. The problematic function is nested rather deeply in the parsing logic and it escaped manual review efforts.
Regarding the severity of this issue, there is not much degree of freedom for
an attacker, because the function simply linearly swaps data past the end of
the valid pdubuf
. It would only have impact beyond DoS, if the immediately
following heap-block contains relevant application data. Chances are that the
data is corrupted so much, that the program will crash anyway, though.
This issue is fixed in upstream commit 3561a367072b.
E) Profile Message Allows to Add Infinite Profiles
The “profile” message allows unauthenticated users to DoS the pmcd
daemon.
Memory is allocated for the lifetime of the TCP session for every new ctx
index, which is 32-bit wide and thus allows to store up to 2^32 profiles,
likely leading to an out of memory situation. See DoProfile()
.
It might make sense to limit the number of profiles at least for unauthenticated users, if this is possible.
The issue is fixed in upstream commit 1e54aa7de51b0e6c6cceab2a52e3f6893070f70f.
F) Fetch Message Allows to Allocate Unlimited nPmids
In HandleFetch()
the client controlled nPmids
is assigned to maxnpmids
and is in turn used to allocate memory via pmAllocResult()
. This could also
lead to memory hogging or a network DoS.
A fix for this issue is found in upstream commit c9b1a2ecb4.
G) pmpost
Fosters a Symlink Attack Allowing to Escalate from pcp
to root
(CVE-2024-45770)
This issue is somewhat related to CVE-2023-6917 we reported earlier this year.
pmpost
is used to append messages to the “PCP notice board”. It is called
from different contexts, one of them is as root
from within the pmcd
startup script (called rc_pmcd
in the repository). The program writes the
message provided on the command line to the file in /var/log/pcp/NOTICES. The
relevant code for opening the file is found in pmpost’s main()
function
(found in pmpost.c
):
if ((fd = open(notices, O_WRONLY|O_APPEND, 0)) < 0) {
if ((fd = open(notices, O_WRONLY|O_CREAT|O_APPEND, 0664)) < 0) {
fprintf(stderr, "pmpost: cannot open or create file \"%s\": %s\n",
notices, osstrerror());
goto oops;
}
#ifndef IS_MINGW
/* if root, try to fix ownership */
if (getuid() == 0) {
if ((fchown(fd, uid, gid)) < 0) {
fprintf(stderr, "pmpost: cannot set file gid \"%s\": %s\n",
notices, osstrerror());
}
}
#endif
lastday = LAST_NEWFILE;
}
The directory /var/log/pcp belongs to pcp:pcp. The file is opened without
passing the O_NOFOLLOW
flag, thus it will open symlinks placed there by the
pcp user. This allows to trick pmpost
into creating new files in arbitrary
locations, or to corrupt arbitrary existing files in the system. It thus poses
a local denial of service vector.
Furthermore, if the NOTICES file is newly created and pmpost
runs as root,
then a fchown()
to pcp:pcp is executed on the file. Thus it allows to pass
the ownership of arbitrary newly created files in the system to pcp:pcp.
This is likely a full local root exploit from pcp to root. Possible attack
vectors are placing files into one of the various .d
drop in configuration
file directories in /etc.
Since the directory /var/log/pcp does not have a sticky bit set, the
protected_symlinks
setting of the Linux kernel does not protect from harm
in this context.
This issue is addressed in upstream commit 22505f9a43
H) GetContextLabels()
Uses Untrusted PCP_ATTR_CONTAINER
to Construct JSON Document
When a client connects to pmcd
, then attributes can be passed (found in
ClientInfo.attrs
). One of these attributes, PCP_ATTR_CONTAINER
, is stored
without further verification in ConnectionAttributes()
. This value is used in
the function GetContextLabels()
to construct a JSON document. Here it is not
checked whether the data contains any JSON syntax elements, which allows
to inject arbitrary additional data into the JSON document by crafting a
suitable CONTAINER attribute value.
The reproducer
label-req-container-json-injection
demonstrates this problem, by injecting an "evilkey": "evilvalue"
element
into the JSON document, by choosing a crafted container attribute value.
It seems that by doing this a client can only fool itself; this doesn’t
have any practical value for an attacker.
We followed the use of the CONTAINER attribute also into the pmdaroot
helper
program, where the attribute can also arrive, to query data regarding a
specific container in root_container_search()
. For a while it looked like
this might even allow command line parameter injection e.g. in lxc.c
, where
the container name is passed to lxc-info
. It turned out, however, that the
caller provided value is only used for comparing it against the container
names found locally, so crafted data should not cause any harm in this
spot.
The fix for this issue is found in upstream commit d68bd777ae
I) Issues with __pmProcessPipe()
and File Descriptors not Marked O_CLOEXEC
Most, if not all, file descriptors opened by PCP code are not marked
O_CLOEXEC
. This may cause problems when executing child processes that
operate in a different security context than the parent, or are not prepared
to safely handle any unexpectedly inherited open files, and might leak them on
their end to further child processes.
This is not a problem when starting agents from within pmcd
, because
CreateAgentPOSIX()
explicitly closes any file descriptors larger than 2.
Similarly in the pmdaroot
process in function root_create_agent()
any
non-std file descriptors are closed in the child context, before running
execvp()
. It is a problem in the context of the __pmProcessPipe()
function, though, which executes arbitrary command lines in child processes in
a popen()
style.
The latter function does not close excess file descriptors. Depending on the
context in which the function is invoked, sensitive file descriptors may leak
into unexpected contexts. One such context we identified is in the pmdaroot
process when it executes lxc-info
to obtain information about LXC containers.
To verify this, we replaced the lxc-info
binary by a custom script and
triggered the execution of lxc-info
via pmcd
. The custom script received
the following open file descriptors:
lr-x------ 1 root root 64 Aug 2 12:23 0 -> pipe:[104916]
l-wx------ 1 root root 64 Aug 2 12:23 1 -> pipe:[107248]
l-wx------ 1 root root 64 Aug 2 12:23 2 -> /var/log/pcp/pmcd/root.log
lrwx------ 1 root root 64 Aug 2 12:23 3 -> socket:[105912]
lrwx------ 1 root root 64 Aug 2 12:23 4 -> socket:[105913]
lrwx------ 1 root root 64 Aug 2 12:23 5 -> socket:[105914]
lrwx------ 1 root root 64 Aug 2 12:23 6 -> socket:[105917]
lrwx------ 1 root root 64 Aug 2 12:23 7 -> socket:[105922]
As can be seen from this, the process inherited all open socket connections
from the pmdaroot
process. This could prove a vital local root exploit, if
the sockets end up in the wrong hands, since clients of pmdaroot
can
start arbitrary commands as root via the PDUROOT_STARTPMDA_REQ
message.
Another use of __pmProcessPipe()
that could be problematic in this
respect, is in the Perl module glue code, where the __pmProcessPipe()
function is made available as $pmda->add_pipe(...)
(see function
local_pipe()
in perl/PMDA/local.c
. The in-tree Perl modules that make use
of this function don’t seem to open any additional files that could leak,
though.
This issue is addressed in upstream commit 1d5a8d1c6fe8b3d5b35a9cc0ed6644696c67ec91
6) Exploiting the Heap Corruption in Issue 5.A)
This section investigates to what ends the heap corruption issue outlined in section 5.A) can be exploited by a skillful attacker.
The location where the out-of-bound write occurs in issue 5.A) is under quite
some attacker control. As we know from the issue, there is a boundary
check in p_result.c: 415
, but the check is in bytes, while we can address
32-bit offsets from the start of pdubuf
. The PDU (protocol data unit) is
received in LIMIT_SIZE
mode, thus at max 64 KiB of data can be transferred
for the attack. This means the attacker can specify a vindex
of up to 65536.
The valid pdulen
will be 65536, but the vindex
will address up to 4 *
65536 = 256 KiB. Thus an attacker can cause heap corruption in the heap memory
area made up of the 192 KiB following the pdubuf
.
An interesting data structure that caught our interest is also found on the
heap: the client
array, holding the ClientInfo
data structures for all
connected clients. When sending small PDUs, the client
buffer will
already be located some 10 KiB after the pdubuf
in memory. Sending a
small PDU won’t do, though, because then the vindex
cannot address far
enough into the heap to reach it. When sending a larger PDU of a few
kilobytes, pdubuf
will be located after the client
buffer on the
heap, making it again unreachable for the attack.
Things can be turned around by creating a lot of connections to pmcd
,
though. The client
buffer is realloc()
‘d in the NewClient()
function,
when new clients are coming in that no longer fit into the client
array. By
temporarily creating e.g. 200 connections to pmcd
, it is possible to force
realloc()
to move the client
buffer to a larger heap address. This in turn
makes it possible to send an attack payload that is large enough to cause heap
corruption 10 to 20 KiB beyond the end of pdubuf
, while pdubuf
will still
be located at a smaller address than the client
buffer.
The heap addresses used are relatively deterministic. ASLR protection does not
help much here, because the attack is not about absolute addresses, but about
relative offsets between data structures on the heap. When freshly starting
pmcd
and sending an initial attack payload, the offset between client
and pdubuf
is always the same. When doing more complex operations that are
needed to perform a full attack, the offsets are somewhat less
deterministic, but still patterns can be observed. Thus a successful guess is
well within reach, as we believe, especially since the local attacker also has
the possibility to force the daemon to crash and be restarted, allowing for
multiple attempts.
A full attack scenario that we came up with is the following:
- The attacker creates a connection from remote, which ends up with
ClientInfo->denyOps == 0x2
which meansPMCD_OP_STORE
is denied for the remote connection. This connection only sends the initial message and then stays idle, but connected. - The attacker sends a valid
PDU_RESULT
of a somewhat larger size (3 KiB) using a local connection. - The attacker creates 200 parallel idling connections towards
pmcd
, to force theclient
buffer to be moved to a larger heap address. Then the connections are terminated again. - The attacker sends an attack
PDU_RESULT
payload of 3-4 KiB size using a local connection. The attack payload contains just one badvindex
that is tuned just so that__ntohpmValueBlock()
will operate exactly on the address ofclient[0].denyOps
for the connection still open from step 1). - The attack will corrupt the
ClientInfo
from step 1) in such a way thatdenyOps
no longer containsPMCD_OP_STORE
. The connection will thus be “upgraded” to be treated like a local connection, although it is remote.
We verified this scenario in a practical example on openSUSE Tumbleweed against
pmcd
version 6.2.1. Arriving at step 4) the distance to cross to reach the
client[0]
structure was a bit over 5 KiB:
(gdb) p (char*)client - (char*)pdubuf
$2 = 5520
Before the processing of the attack payload, the client[]
structure is
intact:
(gdb) p client[0]
$8 = {fd = 17, status = {connected = 1, changes = 0, attributes = 0}, profile = {nodes = 0, hsize = 0,
hash = 0x0, next = 0x0, index = 0}, denyOps = 2, pduInfo = {features = 3652, licensed = 1, version = 0,
zero = 0}, seq = 1, start = 1723726720, addr = 0x5625f56b9160, attrs = {nodes = 0, hsize = 0, hash = 0x0,
next = 0x0, index = 0}}
After the attack has been carried out, it is corrupted like this:
(gdb) p client[0]
$30 = {fd = 17, status = {connected = 1, changes = 0, attributes = 0}, profile = {nodes = 0, hsize = 0,
hash = 0x0, next = 0x0, index = 0}, denyOps = 33554432, pduInfo = {features = 0, licensed = 0, version = 1,
zero = 0}, seq = 1141768448, start = 1723726720, addr = 0x5625f56b9160, attrs = {nodes = 0, hsize = 0,
hash = 0x0, next = 0x0, index = 0}}
As can be seen this also corrupts features
, licensed
, version
and seq
.
This did not stop the connection from step 1) from sending a
PDU_RESULT
message without being denied. So the upgrade of the remote
connection was carried out successfully. The effects of the attack could be
tuned further by changing the vindex
offset to a smaller or larger value, to
maybe cause less fallout in the ClientInfo
structure, depending on the needs
of the attacker.
As this shows, the heap corruption issue offers more possibilities than it
might look at first. It allows to violate the integrity of the pmcd
daemon
in unexpected ways.
7) About CVE Assignments
The PCP maintainers don’t consider denial-of-service attacks CVE worthy, since the service will be restarted automatically via systemd. For this reason no CVEs have been assigned for this class of issues.
A similar consideration has been made by the PCP maintainers regarding the memory corruption issues: as long as the service only crashes, it’s not CVE worthy. For this reason a CVE has been assigned only for issue 5A), which proved to be exploitable as shown in section 6).
8) Timeline
2024-08-06 | We shared a comprehensive report with findings and recommendations with the PCP maintainers at pcp-maintainers@groups.io. We offered coordinated disclosure according to our disclosure policy |
2024-08-14 | The date of 2024-09-17 has been agreed upon for publication of the findings |
2024-08-15 | There was some uncertainty about the severity of the heap corruption issue 5a), so we investigated it more deeply and shared our findings with the PCP maintainers |
2024-09-09 | We recommended to the PCP maintainers to obtain CVEs from the RedHat security team, and they received the two CVEs by this date |
2024-09-17 | A bugfix release has been published as planned by the PCP upstream maintainers |
9) References
Tue, Sep 17th, 2024
Why sudo 1.9.16 enables secure_path by default?
Sudo 1.9.16 is now out, containing mostly bug fixes. However, there are also some new features, like the json_compact option I wrote about a while ago. The other major change is, secure_path is now enabled by default in the sudoers file, and there is a new option to fine-tune its content.
Read more at https://www.sudo.ws/posts/2024/09/why-sudo-1.9.16-enables-secure_path-by-default/