Skip to main content

a silhouette of a person's head and shoulders, used as a default avatar

TLP: Polkit Authentication Bypass in Profiles Daemon in Version 1.9.0 (CVE-2025-67859)

Table of Contents

1) Introduction

TLP is a utility for saving laptop battery power when running Linux (note: the TLP acronym has no special meaning). In version 1.9.0 of TLP a profiles daemon similar to GNOME’s power profiles daemon has been added to the project, providing a D-Bus API for controlling some of TLP’s settings.

Our SUSE TLP package maintainer asked us for a review of the changes contained in the new TLP release, leading us to discover issues in the Polkit authentication logic used in TLP’s profiles daemon, which allow a complete authentication bypass. While looking into the daemon we also found some additional security problems in the area of local Denial-of-Service (DoS).

We reported the issues to upstream in December and performed coordinated disclosure. TLP release 1.9.1 contains fixes for the issues described below. This report is based on TLP 1.9.0.

The next section provides a quick overview of the TLP power daemon. Section 3 discusses the security issues we discovered in detail. Section 4 looks into the CVEs we assigned. Section 5 provides a summary of the coordinated disclosure process we followed for these findings.

2) Overview of the TLP Daemon

The new TLP power daemon is implemented in a Python script of moderate size. The daemon runs with full root privileges and accepts D-Bus client connections from arbitrary users. For authorization of clients a Polkit policy defines a couple of actions which are checked in the daemon’s _check_polkit_auth() function. Some of these actions are allowed for local users in an active session without providing further credentials, others require admin credentials.

3) Security Issues

3.1 Polkit Authorization Check can be Bypassed

The check_polkit_auth() function relies on Polkit’s “unix-process” subject in an unsafe way. The function obtains the caller’s PID and passes this information to the Polkit daemon for authorization, which is inherently subject to a race condition: at the time the Polkit daemon looks up the provided PID, the process can already have been replaced by a different one with higher privileges than the D-Bus client actually has.

As a result of this, the Polkit authorization check in the TLP power daemon can be bypassed by local users, allowing them to arbitrarily control the power profile in use as well as the daemon’s log settings.

This is a well-known issue when using the “unix-process” Polkit subject which was assigned CVE-2013-4288 in the past. For this reason the subject has been marked as deprecated in Polkit. The “unix-process” subject is seeing new use these days, however, when combined with the use of Linux PID file descriptors, which are not affected by the race condition.

Upstream Bugfix

We suggested to upstream to switch to Polkit’s D-Bus “system bus name” subject instead, which is a robust way to authenticate D-Bus clients based on the UNIX domain socket the client uses to connect to the bus. This is what upstream did in commit 08aa9cd.

3.2 Predictable Cookie Values in HoldProfile Method Allow to Release Holds

The D-Bus methods “HoldProfile” and “ReleaseProfile” can be used by locally logged-in users without admin authentication and allow to establish a “profile hold”, preventing the profile from being automatically switched until it is released again.

The “HoldProfile” method returns a cookie value to the caller which needs to be presented to the “ReleaseProfile” method again to release it. This cookie value is a simple integer which starts counting at zero and is incremented for each call to “HoldProfile”. This makes the cookie value predictable and allows other, unrelated users or applications to release an active profile hold by trying to guess the cookie value in use.

Upstream Bugfix

We suggested to upstream to make the cookie value unpredictable by generating a random number. This is what upstream did in commit a88002e.

As described in the previous section, the “ReleaseProfile” D-Bus method expects an integer cookie parameter as input. The Python D-Bus framework used to implement the method allows clients to pass non-integer types as cookie, however, which causes an exception to be thrown in the daemon. This does not lead to the daemon exiting, however, since the framework catches the exception.

The issue can be reproduced via the following command line:

user$ dbus-send --system --dest=org.freedesktop.UPower.PowerProfiles \
      --type=method_call --print-reply /org/freedesktop/UPower/PowerProfiles \
      org.freedesktop.UPower.PowerProfiles.ReleaseProfile string:test
Error org.freedesktop.DBus.Python.ValueError: Traceback (most recent call
last):
  File "/usr/lib/python3.13/site-packages/dbus/service.py", line 712, in
_message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/sbin/tlp-pd", line 223, in ReleaseProfile
    cookie = int(cookie)
ValueError: invalid literal for int() with base 10: dbus.String('test')

Upstream Bugfix

While this is not strictly a security issue, we still suggested to make the daemon more robust by actively catching type mismatch issues for the cookie input parameter. Upstream followed this suggestion and implemented it in the same commit as above which introduces unpredictable cookie values.

3.4 Unlimited Number of Profile Holds Provides DoS Attack Surface

The profile hold mechanism described in section 3.2 allows local users in an active session to create an unlimited number of profile holds without admin authentication. This can lead to resource exhaustion in the TLP power daemon, since an integer is entered into a Python dictionary along with arbitrary strings reason and application_id which are also supplied by the client. This API thus offers Denial-of-Service attack surface.

We found a similar issue in GNOME’s power profile daemon some years ago, but GNOME upstream disagreed with our analysis at the time, which is why SUSE distributions are applying a custom patch to limit the number parallel profile holds.

Upstream Bugfix

We asked upstream whether there are any valid use cases for supporting a large number of profile holds in parallel, and it turns out that the typical use case is only to support a single profile hold at any given time. Thus upstream agreed to restrict the number of profile holds to a maximum of 16, which is implemented in commit 6a637c9.

4) CVE Assignment

We assigned CVE-2025-67859 to track issue 3.1 (Polkit authentication bypass). Issues 3.2 (predictable cookie values) and 3.4 (unlimited number of profile holds) would formally also justify CVE assignments; their severity is low, however, and we agreed with upstream to focus on the main aspect of the Polkit authentication bypass.

5) Coordinated Disclosure

We reached out to the upstream author on December 16 with details about the issues and offered coordinated disclosure. Upstream confirmed the issues and accepted coordinated disclosure. We discussed patches and further details over the course of the following two weeks. Due to the approaching Christmas holiday season we decided to set the general publication date to January 7.

We want to express our thanks to the TLP upstream author for the smooth cooperation in handling these issues.

6) Timeline

2025-12-16 We reached out to the upstream developer by email providing a detailed report and offered coordinated disclosure.
2025-12-17 We received a reply discussing details of the report. Coordinated disclosure was established with a preliminary publication date set to 2026-01-27.
2025-12-20 We received a set of patches from upstream for review. 2026-01-07 was suggested as new publication date.
2025-12-23 We provided positive feedback on the patches and agreed to the new publication date. We also pointed out the additional problem of the unlimited number of profile holds (issue 3.4).
2025-12-25 We received a follow-up patch from upstream limiting the number of profile holds.
2025-12-29 We reviewed the follow-up patch and provided positive feedback to upstream.
2025-01-07 Upstream published bugfix release 1.9.1 as planned.
2025-01-07 Publication of this report.

7) References

a silhouette of a person's head and shoulders, used as a default avatar

Foomuuri: Lack of Client Authorization and Input Verification allow Control over Firewall Configuration (CVE-2025-67603, CVE-2025-67858)

Table of Contents

1) Introduction

Foomuuri is an nftables-based firewall manager for Linux. The project includes a D-Bus daemon which offers an API similar to firewalld. In early December an openSUSE community member asked us to review Foomuuri for addition to openSUSE Tumbleweed.

During the review we quickly noticed a lack of client authorization and input validation in the implementation of Foomuuri’s D-Bus service. We reported the issues to upstream and performed coordinated disclosure. Upstream published version 0.31 of Foomuuri on 2026-01-07 which contains bugfixes for the security issues.

The next section provides an overview of the Foomuuri D-Bus service. Section 3) discusses the security issues in detail. Section 4) provides an overview of the upstream bugfixes to address the issues. Section 5) looks into the CVEs which were assigned. Section 6) gives insight into the coordinated disclosure process which was established for these findings.

This report is based on Foomuuri release v0.29.

2) Overview of the D-Bus Service

Foomuuri runs with full root privileges and registers a D-Bus interface under the name”fi.foobar.Foomuuri1”. Optionally a firewalld drop-in replacement interface is also registered under “org.fedoraproject.FirewallD1”. Both interfaces hook into the same logic, however, and there is no need to look at them separately.

There are only a few methods provided by the D-Bus interface: getting the list of available zones and managing the assignment of network interfaces to zones.

3) Security Issues

3.1) Lack of Client Authorization

There is no authentication layer like Polkit present in the Foomuuri D-Bus service, and there are also no restrictions on D-Bus configuration level as to who is allowed to connect to the D-Bus interfaces provided.

As a result any local user, including low privilege service user accounts or even nobody, can invoke the D-Bus interface and change the firewall configuration. The only state which can be modified this way is the assignment of interfaces to zones, but this is enough to weaken the firewall configuration or to perform a limited Denial-of-Service.

3.2 Missing Input Parameter Verification

Apart from the lack of access restrictions pointed out above, the input parameters to the D-Bus methods are not carefully scrutinized. While the zone input parameter is at least checked against currently configured zones, no further checks are performed on the interface parameter. This means that, e.g. via the “addInterface” D-Bus method, arbitrary strings can be passed as interface name. There is also intentionally no check if the specified name corresponds to an existing network device in the system (to allow seamless coverage of network devices even before they are added to the system).

One result from this can be log spoofing, since the interface name is passed to logging functions unmodified. The string could contain control characters or newlines, which can manipulate the log.

In DbusCommon.add_interface() the possibly crafted interface name is added to the to-be-generated JSON configuration via the out() method. While we did not verify whether this works in practice, a local attacker could attempt to largely control the JSON configuration passed to nftables, by skillfully embedding additional JSON configuration in the interface parameter.

We were worried that this could even lead to arbitrary code execution by abusing features of nftables like loading external files or plugin code, but it turned out that there are no such features available in the nftables configuration format.

3.3) Unsafe umask used in Daemonize Code

Foomuuri contains optional support to daemonize itself. Normally this is done by systemd and the code in question is not invoked. It contains logic to set the daemon’s umask to 0, however, which is a bad default, since applications or libraries which intend to foster user control of the file mode of newly created files can pass modes like 0666 to open(), rendering them world-writable.

Foomuuri does not contain any code paths that create new files, but the umask setting is also inherited by child processes, for example. While we did not think this was a tangible security issue in this form, we suggested to choose a more conservative value here to prevent future issues.

4) Upstream Bugfixes

We suggested the following fixes to upstream:

  • restrict access to the D-Bus interfaces to root only, maybe also to members of a dedicated opt-in group. Alternatively Polkit could be used for authentication of callers, which is more effort and complex, however.
  • the interface input parameter should be verified right from the beginning of each D-Bus method to make sure that it does not contain any whitespace or special characters and is not longer than IFNAMSIZ bytes (which is currently 16 bytes on Linux).
  • as an additional hardening measure we also suggested to apply systemd directives like ProtectSystem=full to Foomuuri’s systemd services, to prevent possible privilege escalation should anything go wrong at the first line of defense.

Upstream decided to implement Polkit authentication for Foomuuri’s D-Bus service and otherwise followed closely our suggestions:

  • commit 5944a42 adds Polkit authentication to the D-Bus service. Changing firewall settings now requires admin authorization. The use of Polkit can be disabled in Foomuuri, in which case only clients with UID 0 are allowed to perform the operations.
  • commit d1961f4 adds verification of the interface parameter to prevent manipulation of the JSON configuration data.
  • commit 806e11d sets the umask used in the daemonize code to a more conservative 0o022 setting, preventing world- or group-writable files from coming into existence.
  • commit 5fcf125 adds the ProtectSystem=full directive to all Foomuuri systemd service units.

All of the bugfixes are contained in version 0.31 of Foomuuri.

5) CVE Assignment

In agreement with upstream we assigned the following two CVEs corresponding to this report:

  • CVE-2025-67603: lack of client authorization allows arbitrary users to influence the firewall configuration (issue 3.1).

  • CVE-2025-67858: a crafted interface input parameter to D-Bus methods can lead to integrity loss of the firewall configuration or further unspecified impact by manipulating the JSON configuration passed to nft (issue 3.2).

6) Coordinated Disclosure

We reported these issues to the upstream developer on 2025-12-11, offering coordinated disclosure. We soon got a reply and discussed the details of the non-disclosure process. Upstream quickly shared patches with us for review and we agreed on the final patches already on 2025-12-19. In light of the approaching Christmas season we agreed on a publication date of 2026-01-07 for general disclosure.

We want to thank the upstream author for the prompt reaction and cooperation in fixing the issues.

7) Timeline

2025-12-11 We contacted the Foomuuri developer by email providing a detailed report about the D-Bus related findings and offered coordinated disclosure.
2025-12-12 The upstream author confirmed the issues, agreed to coordinated disclosure and asked us to assign CVEs the way we suggested them. 2026-01-07 was suggested for publication date.
2025-12-15 We discussed some additional technical details like the umask issue and the question of whether arbitrary code execution could result from the ability to control the JSON configuration passed to nft.
2025-12-18 Upstream shared with us a first version of patches for the issues we reported. The patches for minor issues and hardening were already published on GitHub at this point.
2025-12-19 We provided feedback on the patches, suggesting minor improvements.
2025-12-19 With the fixes ready we discussed whether earlier publication would make sense, but we agreed to stick to the date of 2026-01-07 to accommodate the Christmas holiday season.
2026-01-07 Upstream release v0.31 was published.
2026-01-07 Publication of this report.

8) References

the avatar of FreeAptitude

openSUSE 15.6 to 16.0 upgrade notes

In a previous article I have shown how to upgrade a distro using zypper and the plugin zypper-upgradedistro, but some issues might always happen for a specific version, that’s why I collected all the changes and the tweaks I applied switching from openSUSE Leap 15.6 to 16.0 during and after the installation process.
the avatar of Klaas Freitag

Kraft 2.0 Announcement

Kraft 2.0 logo interpretationWith the start of the new year, I am very happy to announce the release of version Kraft 2.0.0.

Kraft provides effective invoicing and document management for small businesses on Linux. Check the feature list.

This new version is a big step ahead for the project. It does not only deliver the outstanding ports to Qt6 and KDE Frameworks 6 and tons of modernizations and cleanups, but for the first time, it also does some significant changes in the underlying architecture and drops outdated technology.

Kraft now stores documents not longer in a relational database, but as XML documents in the filesystem. While separate files are more natural for documents anyway, this is paving the way to let Kraft integrate with private cloud infrastructures like OpenCloud or Nextcloud via sync. That is not only for backup- and web-app-purposes, but also for synced data that enables to run Kraft as distributed system. An example is if office staff works from different home offices. Expect this and related usecases to be supported in the near future of Kraft.

But there are more features: For example, the document lifecycle was changed to be more compliant: Documents remain in a draft status now until they get finalized, when they get their final document number. From that point on, they can not longer be altered.

There is too much on the long Changes-List to mention here.

However, what is important is that after more than 20 years of developing and maintaining this app, I continue to be motivated to work on this bit. It is not a big project, but I think it is important that we have this kind of “productivity”-applications available for Linux to make it attractive for people to switch to Linux.

Around Kraft, a small but beautiful community has built up. I like to thank everybody who contributed in any way to Kraft over the years. It is big fun to work with you all!

If you are interested, please get in touch.

a silhouette of a person's head and shoulders, used as a default avatar

pgtwin as OCF Agent

When I was looking for a solution that could provide High Availability for two Datacenters, the only solution that remained viable and comprehensible for me was using Corosync/Pacemaker. The reason that I actually need this is, that Mainframe environments typically use two Datacenters, since z/OS can nicely operate with that. The application that I had to setup is Kubernetes on Linux on Z and since Kubernetes itself normally runs with 3 or more nodes, I had to find a different solution. I found, that I could use an external database to run Kubernetes with https://github.com/k3s-io/kine, and being no DBA, I selected PostgreSQL as first try.

For pacemaker, there already exists an OCF Agent called pgsql https://linux.die.net/man/7/ocf_heartbeat_pgsql that is included with the clusterlabs OCF agents. In addition, RedHat created another OCF agent, called PAF https://clusterlabs.github.io/PAF/ that sounded promising. However, I first had to build it on my own, and later I found that it was really nicely promoted, but was missing out on some needed features.

That is, a colleague asked, if I wanted to try to use his AI, and countless improvements and bugs later, the pgtwin https://github.com/azouhr/pgtwin agent really seems quite stable. Now, to some of the main design concepts.

Make use of the promotable clone resource

PostgreSQL’s primary/standby model maps perfectly to promoted/unpromoted. This is actually how you also would configure pgsql with a current pacemaker release. All documentation relies on the current schema of this configuration.

Use Physical Replication with Slots

  • Prevent WAL files from being recycled while standby is offline
  • Enable standby to catch up after brief disconnections
  • Automatically created/managed by pgtwin
  • Automatically cleaned up when excessive (prevents disk fill)

Why physical, and not logical replication?

  • Byte-identical replica (all databases, all tables, all objects)
  • Lower overhead than logical replication
  • Supports pg_rewind for timeline divergence recovery

Automatic Standby Initialization

Traditionally, the database admin would have to setup the replication and the OCF agent would then take over the management. However, since we already had basebackup functionality ready in case the WAL had been cleaned up, it was just a small step to provide full initialization

The only steps on the secondary for the admin after configuring the primary are:

  • Create the PostgreSQL Data Directory with correct ownership/permissions
  • Setup the password file .pgpass

The remaining tasks of creating a sync streaming replication is done during startup of the node by pgtwin.

Timeline Divergence and pg_rewind

After a failover, the old primary may have diverged from the new primary, and thus the synchronous replication will fail. pgtwin handles this as folows:

  1. Detect divergence (timeline check in pgsql_demote)
  2. Runs pg_rewind to sync from new primary
  3. Replays necessary WAL ro reconcile
  4. Starts as standby.

This is much faster than trying to do a full basebackup, at least with big databases. Typical failover times are merely seconds.

Replication Health Monitoring

Every monitor cycle, pgtwin does not only check if PostgreSQL is running, but also the replication health. This includes the replication state (streaming, catchup, etc.) as well as the replication lag and the synchronous state.

If the replication check fails for 5 consecutive monitor cycles (configurable), pgtwin automatically triggers recovery. First trying with pg_rewind, however if that fails, it will go for pg_basebackup.

Configuration Validation

At startup, pgtwin validates PostgreSQL configuration for a number of settings that it considers critical. There are hard checks like “restart_after_crash = off” that must be set to off to prevent PostgreSQL from trying to promote itself instead of letting pacemaker handle the situation. But also a number of other parameters.

To check the startup validation, have a look at the pacemaker system logs:

journalctl -u pacemaker -f

State Machine and Lifecyle

pgtwin has a clear idea about the state of PostgreSQL lifecycle:

┌─────────────────────────────────────────────────────────────┐
│                      STOPPED STATE                          │
│  PostgreSQL not running                                     │
└──────────────────────┬──────────────────────────────────────┘
                       │ start operation
                       ↓
              ┌────────────────┐
              │ PGDATA valid?  │
              └────┬───────┬───┘
                   │       │
             NO ←──┘       └──→ YES
              │                 │
              ↓                 ↓
    ┌──────────────────┐  ┌─────────────────┐
    │ Auto-initialize  │  │ Start PostgreSQL│
    │ (pg_basebackup)  │  │ as standby      │
    └────────┬─────────┘  └────────┬────────┘
             │                     │
             └──────────┬──────────┘
                        ↓
┌─────────────────────────────────────────────────────────────┐
│                   UNPROMOTED STATE                          │
│  PostgreSQL running as standby                              │
│  - Replaying WAL from primary                               │
│  - Read-only queries allowed                                │
│  - Monitor checks replication health                        │
└──────────────────────┬──────────────────────────────────────┘
                       │ promote operation
                       ↓
              ┌────────────────────┐
              │ pg_ctl promote     │
              │ (remove standby    │
              │  signal)           │
              └────────┬───────────┘
                       ↓
┌─────────────────────────────────────────────────────────────┐
│                    PROMOTED STATE                           │
│  PostgreSQL running as primary                              │
│  - Accepts write operations                                 │
│  - Streams WAL to standby                                   │
│  - Manages replication slot                                 │
│  - Monitor checks replication health                        │
└──────────────────────┬──────────────────────────────────────┘
                       │ demote operation
                       ↓
              ┌────────────────────┐
              │ Stop PostgreSQL    │
              │ Check timeline     │
              │ pg_rewind if needed│
              │ Create standby     │
              │ signal             │
              └────────┬───────────┘
                       ↓
       (returns to UNPROMOTED STATE)

Failure Handling

The following Failures are handled completely automatically and are designed to provide seamless operation without dataloss:

  1. Primary Failure and Recovery
  2. Standby Failure and Recovery
  3. Replication Failure
  4. Split-Brain Prevention

For the Split-Brain Prevention, additional Pacemaker configurations like a second corosync with direct network connection as well as a third ring with IPMI will be needed.

Container Mode

pgtwin is prepared to also support containers instead of a locally installed PostgreSQL database. However, the current implementation is too sluggish and has too much overhead during management of the database.

For future releases, I plan to change the implementation by switching from “podman run” to the use of “nsexec”. We will see, if this makes the implementation usable. Still, currently implemented is

  • Version check, that prevents from using a wrong Container PostgreSQL Version with the current PGDATA
  • Additional PostgreSQL User that allows to use the PGDATA Userid to be used within the Container.
  • All PostgreSQL commands are run by a wrapper, so that there is a seamless integration between bare-metal and container operations guaranteed.

Single-Node Startup

The original authors of pgsql were very considered about the data even in the case of a double crash of the cluster. The scenario they had in mind was like this:

  • Primary crashes
  • Secondary takes over and handles applications
  • Secondary crashes
  • Primary comes up with outdated data and continues as primary

Now, with pgtwin there is a number of considerations going to the startup

  1. If both nodes come up, pgtwin will check the timeline on who should become promoted
  2. If cluster was down, and one node comes up:
    • If Node was primary and had sync mode enabled: Node likely crashed, should not be promoted.
    • If Node was primary and had async mode enabled: Node likely crashed when other node was missing. This node should become primary
    • If Node was secondary: Cluster probably crashed, or was restarted after the secondary crashed, node should not be promoted

The key insight here is, that in case just one node is restarted, it only should be promoted standalone if it was primary before, and in addition it had async streaming replication activated even though the cluster was configured for sync streaming replication.

The cluster will refuse to start with a single node else. If startup is really needed, the admins will have to override.

pgtwin-migrate

In a future blog entry, I will cover the features of the currently experimental pgtwin-migrate OCF agent. This agent allows to fail over between two PostgreSQL Clusters, like two Versions or between different Vendors.

a silhouette of a person's head and shoulders, used as a default avatar

What does it mean to write in 2026?

I've been writing for something like 50 years now. I started by scribbling letters on paper as a child because I was fascinated that these expressed meaning. I wrote a lot for school, for university, for work, and privately. I wrote letters, emails, posts on social media, articles, papers, documentation, diaries, opinion pieces, and presentations. I've been writing my blog for more than 20 years.

Writing always has been a way for me to connect to the people, to the community, around me, communicating with my tribe. It also has always been a way to express, refine and archive my thoughts, a bit like building a memory of insights. It also has been a way to record some of my personal history and the history of the projects I'm involved with.

My writing has changed over the last couple of years. I'm writing less publicly and more focused on specific projects. It feels like it has become less personal and more utilitarian.

Part of this is that the Internet has lost a good part of its strength as a neutral platform to reach the world. For a long time I knew where to reach the people I wanted to address and had control about my content and how it was distributed. Nowadays social media platforms act as distributors, but we are prey to their algorithms. So while publishing content is still simple, it's much harder to get it to your audience without compromising to the mechanisms which make the algorithms tick.

Another part is the disrupting advance of AI writing capabilities. While I have relied on humans to give me feedback in the past, to get into a conversation on the topics of my posts to refine the thoughts in them, now there is this all-powerful-seeming assistant in my editor who is eager to take over those roles. And it would even write for me in my own style. So what's the value of writing in 2026? Is it even worth bothering with trying to express your thoughts in writing, when a machine can produce content which looks the same, much faster and in much larger quantity? What does this do to readers, do they still care about what I would write?

My feeling is that it's still worth to put in effort to create genuine, trustworthy, truthful writing. The format, the tools, the channels might change, but the values don't. The challenge will be to figure out how to create a signal which transports these values.

I have always liked the format and style of a blog, as a stream of thoughts, coming from a personal perspective, but focused on topics of relevance to others. I enjoy reading this from others and I enjoy writing in this style. And I don't have to rely on a platform I don't control, but can use my own.

So it looks like this blog won't go away, but will channel my thoughts in 2026 as well.

a silhouette of a person's head and shoulders, used as a default avatar

Tumbleweed – Review of the week 2026/1

Dear Tumbleweed users and hackers,

Happy New Year to you all! While people all around the world are celebrating the new year, Tumbleweed has been tirelessly rolling ahead and has published six snapshots (20251227 – 20251231, 20260101). Naturally, there are no groundbreaking changes, as many developers and maintainers are out celebrating, and any greater coordinated effort is taking a bit more time.

Nevertheless, the six snapshots brought you these changes:

  • Python 3.13.11 (some CVE fixes)
  • libgit2 1.9.2S
  • Neon 0.36.0
  • Harfbuzz 12.3.0
  • NetworkManager 1.54.3
  • GStreamer 1.26.10
  • VLC 3.0.22 & 3.0.23: finally linking ffmpeg-8
  • GPG 2.5.16
  • upower 1.91.0

The next snapshot is already in progress of syncing out, and the next few changes are pulling up in the staging projects. You can expect these things shortly:

Let’s get rolling for the Year 2026! I’m looking forward to a great year!

a silhouette of a person's head and shoulders, used as a default avatar

Path Aware High Availability (PAHA)

During my works on Kubernetes on Linux on Z and the creation of https://github.com/azouhr/pgtwin, I came across the same issue that most admins have to solve in two-node clusters. How can I get quorum, and what node is to be the primary.

While using additional techniques like providing a second corosync ring for HA, and even a third ring for an IPMI device, the elegance of having a three node quorum could not easily be implemented in my desired environment.

When trying to solve the correct placement of the primary PostgreSQL database in the two-Node Cluster, it came to me, that there is an external dependency that could be used as arbitrator. It does not really help an application if a resource is available, but it cannot be reached.

The main insight here was:

**Availability without accessibility is useless**

This pattern shifts HA from “server-centric” (is it running?) to “use-case-centric” (can it be used for its intended purpose?). I did some research, however I could not find anyone describing this key principle as a method to determine placement of resources.

We did define a new term to make this handy:

Definition of “Critical Path”:

A critical path is any dependency required for the service to fulfill its designed use case.

Definition of “Path-Aware High Availability (PAHA)”

Path-Aware High Availability is a general clustering pattern where resource promotion decisions explicitly validate critical paths required for service delivery before allowing promotion. Unlike traditional HA which only checks if a service *process* is running, PAHA ensures the service is running on a node where clients can actually use it.

This turned out to be a really interesting thought. Besides network paths, this can also be applied to other paths, totally unrelated to the original use case:

Use Case Service Critical Path Validation Method
Database clustering PostgreSQL Gateway reachability Ping gateway from node
Storage HA iSCSI target Multipath to storage multipath -ll shows paths
FibreChannel SAN SAN LUN FC fabric connectivity fcinfo shows active paths
RoCE storage NVMe-oF target DCB lossless Ethernet dcbtool shows PFC enabled
API gateway Kong/Nginx Upstream service reachable Health check endpoint
Load balancer HAProxy Backend pool reachable TCP connect to backends
DNS server BIND Root server reachability Query root servers
NFS server NFS daemon Export filesystem mounted mount shows filesystem
Container orchestrator Kubernetes CNI network functional Pod-to-pod connectivity

This can even be used to mitigate sick-but-not-dead conditions. For example in a multipath environment, you might want to disable a path that sometimes shows crc errors. Even from the storage side, you would know if there are sufficient paths available, and can disable the sick path.

Now to the fun part. It tells about pacemaker, that such functionality can be implemented by simple configuration means, at least for networks. For pgtwin, the question was, what happens if ring0 (with the PostgreSQL resource) is partially broken. The other ring would keep the cluster running, but the placement of the primary with read-write capability would have to go to the node with service access.

What we had to do, was merely create a ping resource, setup a clone with it, and create a location rule that tells pacemaker where to place the primary resource. In case of pgtwin, we additionally prevent the unpromoted resource from running on a node without ping connectivity, because it likely will not be able to sync with the primary. The configuration looks like this:

primitive ping-gateway ocf:pacemaker:ping \
    params \
        host_list="192.168.1.1" \ 
        multiplier="100" \
        attempts="3" \
        timeout="2" \
    op monitor interval="10s" timeout="20s"
clone ping-clone ping-gateway \
    meta clone-max="2" clone-node-max="1"
location prefer-connected-promoted postgres-clone role=Promoted \
    rule 200: pingd gt 0
location require-connectivity-unpromoted postgres-clone role=Unpromoted \
    rule -inf: pingd eq 0

Now, in the assumed case of a Dual Datacenter setup, what happens if the gateway vanishes on one side is:

  1. The cluster makes sure that the primary is on the side with the ping availability.
  2. The secondary is located on the other side.
  3. The secondary may not run there without the ping resource and is stopped.
  4. The primary is notified about the secondary being gone, and switches to async replication mode.

This means, that we lost high availability of the PostgreSQL database, but it still serves the applications as usual. When the gateway comes back, the following happens:

  1. The cluster starts pgtwin on the secondary
  2. pgtwin initiates a rollback of the database to get the timelines in sync
  3. If the rollback is unsuccessful, pgtwin initiates a basebackup from the primary
  4. After the nodes are consistent, the database is started as secondary, and the replication is switched to sync again.
  5. The primary node is not moved back, because we set a resource stickiness by default.

All of this happens without admin intervention. This procedure greatly improves availability of the PostgreSQL database for the intended use.

the avatar of Nathan Wolf

Seamless Windows Apps on openSUSE with WinBoat

The author details their successful integration of openSUSE with Microsoft Office 365 using WinBoat, enabling Windows applications in a Linux environment without dual-booting. Despite minor setup challenges, they achieved significant functionality and security with Windows apps like Milestone XProtect and Rufus, appreciating the performance and seamless integration during their workflow.
the avatar of Greg Kroah-Hartman

Linux kernel security work

Lots of the CVE world seems to focus on “security bugs” but I’ve found that it is not all that well known exactly how the Linux kernel security process works. I gave a talk about this back in 2023 and at other conferences since then, attempting to explain how it works, but I also thought it would be good to explain this all in writing as it is required to know this when trying to understand how the Linux kernel CNA issues CVEs.