TLP: Polkit Authentication Bypass in Profiles Daemon in Version 1.9.0 (CVE-2025-67859)
Table of Contents
- 1) Introduction
- 2) Overview of the TLP Daemon
- 3) Security Issues
- 4) CVE Assignment
- 5) Coordinated Disclosure
- 6) Timeline
- 7) References
1) Introduction
TLP is a utility for saving laptop battery power when running Linux (note: the TLP acronym has no special meaning). In version 1.9.0 of TLP a profiles daemon similar to GNOME’s power profiles daemon has been added to the project, providing a D-Bus API for controlling some of TLP’s settings.
Our SUSE TLP package maintainer asked us for a review of the changes contained in the new TLP release, leading us to discover issues in the Polkit authentication logic used in TLP’s profiles daemon, which allow a complete authentication bypass. While looking into the daemon we also found some additional security problems in the area of local Denial-of-Service (DoS).
We reported the issues to upstream in December and performed coordinated disclosure. TLP release 1.9.1 contains fixes for the issues described below. This report is based on TLP 1.9.0.
The next section provides a quick overview of the TLP power daemon. Section 3 discusses the security issues we discovered in detail. Section 4 looks into the CVEs we assigned. Section 5 provides a summary of the coordinated disclosure process we followed for these findings.
2) Overview of the TLP Daemon
The new TLP power daemon is implemented in a Python script of moderate
size. The daemon runs with full root privileges and accepts
D-Bus client connections from arbitrary users. For authorization of clients a
Polkit policy defines a couple of actions which are
checked in the daemon’s _check_polkit_auth() function.
Some of these actions are allowed for local users in an active session without
providing further credentials, others require admin credentials.
3) Security Issues
3.1 Polkit Authorization Check can be Bypassed
The check_polkit_auth() function relies on Polkit’s
“unix-process” subject in an unsafe way. The function obtains the caller’s PID
and passes this information to the Polkit daemon for authorization, which is
inherently subject to a race condition: at the time the Polkit daemon looks up
the provided PID, the process can already have been replaced by a different
one with higher privileges than the D-Bus client actually has.
As a result of this, the Polkit authorization check in the TLP power daemon can be bypassed by local users, allowing them to arbitrarily control the power profile in use as well as the daemon’s log settings.
This is a well-known issue when using the “unix-process” Polkit subject which was assigned CVE-2013-4288 in the past. For this reason the subject has been marked as deprecated in Polkit. The “unix-process” subject is seeing new use these days, however, when combined with the use of Linux PID file descriptors, which are not affected by the race condition.
Upstream Bugfix
We suggested to upstream to switch to Polkit’s D-Bus “system bus name” subject instead, which is a robust way to authenticate D-Bus clients based on the UNIX domain socket the client uses to connect to the bus. This is what upstream did in commit 08aa9cd.
3.2 Predictable Cookie Values in HoldProfile Method Allow to Release Holds
The D-Bus methods “HoldProfile” and “ReleaseProfile” can be used by locally logged-in users without admin authentication and allow to establish a “profile hold”, preventing the profile from being automatically switched until it is released again.
The “HoldProfile” method returns a cookie value to the caller which needs to be presented to the “ReleaseProfile” method again to release it. This cookie value is a simple integer which starts counting at zero and is incremented for each call to “HoldProfile”. This makes the cookie value predictable and allows other, unrelated users or applications to release an active profile hold by trying to guess the cookie value in use.
Upstream Bugfix
We suggested to upstream to make the cookie value unpredictable by generating a random number. This is what upstream did in commit a88002e.
3.3 Non-Integer cookie Parameter in “ReleaseProfile” Method Leads to Unhandled Exception
As described in the previous section, the “ReleaseProfile” D-Bus
method expects an integer cookie parameter as input.
The Python D-Bus framework used to implement the method allows clients to pass
non-integer types as cookie, however, which causes an exception to be thrown
in the daemon. This does not lead to the daemon exiting, however, since the
framework catches the exception.
The issue can be reproduced via the following command line:
user$ dbus-send --system --dest=org.freedesktop.UPower.PowerProfiles \
--type=method_call --print-reply /org/freedesktop/UPower/PowerProfiles \
org.freedesktop.UPower.PowerProfiles.ReleaseProfile string:test
Error org.freedesktop.DBus.Python.ValueError: Traceback (most recent call
last):
File "/usr/lib/python3.13/site-packages/dbus/service.py", line 712, in
_message_cb
retval = candidate_method(self, *args, **keywords)
File "/usr/sbin/tlp-pd", line 223, in ReleaseProfile
cookie = int(cookie)
ValueError: invalid literal for int() with base 10: dbus.String('test')
Upstream Bugfix
While this is not strictly a security issue, we still suggested to make the
daemon more robust by actively catching type mismatch issues for the cookie
input parameter. Upstream followed this suggestion and implemented it in the
same commit as above which introduces unpredictable
cookie values.
3.4 Unlimited Number of Profile Holds Provides DoS Attack Surface
The profile hold mechanism described in section
3.2 allows local users in an active session
to create an unlimited number of profile holds without admin authentication.
This can lead to resource exhaustion in the TLP power daemon, since an integer
is entered into a Python dictionary along with arbitrary strings reason and
application_id which are also supplied by the client. This API thus
offers Denial-of-Service attack surface.
We found a similar issue in GNOME’s power profile daemon some years ago, but GNOME upstream disagreed with our analysis at the time, which is why SUSE distributions are applying a custom patch to limit the number parallel profile holds.
Upstream Bugfix
We asked upstream whether there are any valid use cases for supporting a large number of profile holds in parallel, and it turns out that the typical use case is only to support a single profile hold at any given time. Thus upstream agreed to restrict the number of profile holds to a maximum of 16, which is implemented in commit 6a637c9.
4) CVE Assignment
We assigned CVE-2025-67859 to track issue 3.1 (Polkit authentication bypass). Issues 3.2 (predictable cookie values) and 3.4 (unlimited number of profile holds) would formally also justify CVE assignments; their severity is low, however, and we agreed with upstream to focus on the main aspect of the Polkit authentication bypass.
5) Coordinated Disclosure
We reached out to the upstream author on December 16 with details about the issues and offered coordinated disclosure. Upstream confirmed the issues and accepted coordinated disclosure. We discussed patches and further details over the course of the following two weeks. Due to the approaching Christmas holiday season we decided to set the general publication date to January 7.
We want to express our thanks to the TLP upstream author for the smooth cooperation in handling these issues.
6) Timeline
| 2025-12-16 | We reached out to the upstream developer by email providing a detailed report and offered coordinated disclosure. |
| 2025-12-17 | We received a reply discussing details of the report. Coordinated disclosure was established with a preliminary publication date set to 2026-01-27. |
| 2025-12-20 | We received a set of patches from upstream for review. 2026-01-07 was suggested as new publication date. |
| 2025-12-23 | We provided positive feedback on the patches and agreed to the new publication date. We also pointed out the additional problem of the unlimited number of profile holds (issue 3.4). |
| 2025-12-25 | We received a follow-up patch from upstream limiting the number of profile holds. |
| 2025-12-29 | We reviewed the follow-up patch and provided positive feedback to upstream. |
| 2025-01-07 | Upstream published bugfix release 1.9.1 as planned. |
| 2025-01-07 | Publication of this report. |
7) References
Foomuuri: Lack of Client Authorization and Input Verification allow Control over Firewall Configuration (CVE-2025-67603, CVE-2025-67858)
Table of Contents
- 1) Introduction
- 2) Overview of the D-Bus Service
- 3) Security Issues
- 4) Upstream Bugfixes
- 5) CVE Assignment
- 6) Coordinated Disclosure
- 7) Timeline
- 8) References
1) Introduction
Foomuuri is an nftables-based firewall manager for Linux. The project includes a D-Bus daemon which offers an API similar to firewalld. In early December an openSUSE community member asked us to review Foomuuri for addition to openSUSE Tumbleweed.
During the review we quickly noticed a lack of client authorization and input validation in the implementation of Foomuuri’s D-Bus service. We reported the issues to upstream and performed coordinated disclosure. Upstream published version 0.31 of Foomuuri on 2026-01-07 which contains bugfixes for the security issues.
The next section provides an overview of the Foomuuri D-Bus service. Section 3) discusses the security issues in detail. Section 4) provides an overview of the upstream bugfixes to address the issues. Section 5) looks into the CVEs which were assigned. Section 6) gives insight into the coordinated disclosure process which was established for these findings.
This report is based on Foomuuri release v0.29.
2) Overview of the D-Bus Service
Foomuuri runs with full root privileges and registers a D-Bus interface under the name”fi.foobar.Foomuuri1”. Optionally a firewalld drop-in replacement interface is also registered under “org.fedoraproject.FirewallD1”. Both interfaces hook into the same logic, however, and there is no need to look at them separately.
There are only a few methods provided by the D-Bus interface: getting the list of available zones and managing the assignment of network interfaces to zones.
3) Security Issues
3.1) Lack of Client Authorization
There is no authentication layer like Polkit present in the Foomuuri D-Bus service, and there are also no restrictions on D-Bus configuration level as to who is allowed to connect to the D-Bus interfaces provided.
As a result any local user, including low privilege service user accounts or
even nobody, can invoke the D-Bus interface and change the firewall
configuration. The only state which can be modified this way is the assignment
of interfaces to zones, but this is enough to weaken the firewall
configuration or to perform a limited Denial-of-Service.
3.2 Missing Input Parameter Verification
Apart from the lack of access restrictions pointed out above, the input
parameters to the D-Bus methods are not carefully scrutinized. While the zone
input parameter is at least checked against currently configured
zones, no further checks are performed on the
interface parameter. This means that, e.g. via the “addInterface” D-Bus
method, arbitrary strings can be passed as interface name. There is also
intentionally no check if the specified name corresponds to an existing
network device in the system (to allow seamless coverage of network devices
even before they are added to the system).
One result from this can be log spoofing, since the interface name is
passed to logging functions unmodified. The string could contain control
characters or newlines, which can manipulate the log.
In DbusCommon.add_interface() the possibly
crafted interface name is added to the to-be-generated JSON configuration via
the out() method. While we did not verify whether this works in practice, a
local attacker could attempt to largely control the JSON configuration passed
to nftables, by skillfully embedding additional JSON configuration in the
interface parameter.
We were worried that this could even lead to arbitrary code execution by
abusing features of nftables like loading external files or plugin code, but
it turned out that there are no such features available in the nftables
configuration format.
3.3) Unsafe umask used in Daemonize Code
Foomuuri contains optional support to daemonize itself. Normally this is done
by systemd and the code in question is not invoked. It contains
logic to set the daemon’s umask to 0, however, which is a bad
default, since applications or libraries which intend to foster user control of
the file mode of newly created files can pass modes like 0666 to open(),
rendering them world-writable.
Foomuuri does not contain any code paths that create new files, but the umask
setting is also inherited by child processes, for example. While we did not
think this was a tangible security issue in this form, we suggested to choose a
more conservative value here to prevent future issues.
4) Upstream Bugfixes
We suggested the following fixes to upstream:
- restrict access to the D-Bus interfaces to
rootonly, maybe also to members of a dedicated opt-in group. Alternatively Polkit could be used for authentication of callers, which is more effort and complex, however. - the
interfaceinput parameter should be verified right from the beginning of each D-Bus method to make sure that it does not contain any whitespace or special characters and is not longer thanIFNAMSIZbytes (which is currently 16 bytes on Linux). - as an additional hardening measure we also suggested to apply systemd
directives like
ProtectSystem=fullto Foomuuri’s systemd services, to prevent possible privilege escalation should anything go wrong at the first line of defense.
Upstream decided to implement Polkit authentication for Foomuuri’s D-Bus service and otherwise followed closely our suggestions:
- commit 5944a42 adds Polkit authentication to the D-Bus service. Changing firewall settings now requires admin authorization. The use of Polkit can be disabled in Foomuuri, in which case only clients with UID 0 are allowed to perform the operations.
- commit d1961f4 adds verification of the
interfaceparameter to prevent manipulation of the JSON configuration data. - commit 806e11d sets the
umaskused in the daemonize code to a more conservative0o022setting, preventing world- or group-writable files from coming into existence. - commit 5fcf125 adds the
ProtectSystem=fulldirective to all Foomuuri systemd service units.
All of the bugfixes are contained in version 0.31 of Foomuuri.
5) CVE Assignment
In agreement with upstream we assigned the following two CVEs corresponding to this report:
-
CVE-2025-67603: lack of client authorization allows arbitrary users to influence the firewall configuration (issue 3.1).
-
CVE-2025-67858: a crafted
interfaceinput parameter to D-Bus methods can lead to integrity loss of the firewall configuration or further unspecified impact by manipulating the JSON configuration passed tonft(issue 3.2).
6) Coordinated Disclosure
We reported these issues to the upstream developer on 2025-12-11, offering coordinated disclosure. We soon got a reply and discussed the details of the non-disclosure process. Upstream quickly shared patches with us for review and we agreed on the final patches already on 2025-12-19. In light of the approaching Christmas season we agreed on a publication date of 2026-01-07 for general disclosure.
We want to thank the upstream author for the prompt reaction and cooperation in fixing the issues.
7) Timeline
| 2025-12-11 | We contacted the Foomuuri developer by email providing a detailed report about the D-Bus related findings and offered coordinated disclosure. |
| 2025-12-12 | The upstream author confirmed the issues, agreed to coordinated disclosure and asked us to assign CVEs the way we suggested them. 2026-01-07 was suggested for publication date. |
| 2025-12-15 | We discussed some additional technical details like the umask issue and the question of whether arbitrary code execution could result from the ability to control the JSON configuration passed to nft. |
| 2025-12-18 | Upstream shared with us a first version of patches for the issues we reported. The patches for minor issues and hardening were already published on GitHub at this point. |
| 2025-12-19 | We provided feedback on the patches, suggesting minor improvements. |
| 2025-12-19 | With the fixes ready we discussed whether earlier publication would make sense, but we agreed to stick to the date of 2026-01-07 to accommodate the Christmas holiday season. |
| 2026-01-07 | Upstream release v0.31 was published. |
| 2026-01-07 | Publication of this report. |
8) References
openSUSE 15.6 to 16.0 upgrade notes
Kraft 2.0 Announcement
With the start of the new year, I am very happy to announce the release of version Kraft 2.0.0.
Kraft provides effective invoicing and document management for small businesses on Linux. Check the feature list.
This new version is a big step ahead for the project. It does not only deliver the outstanding ports to Qt6 and KDE Frameworks 6 and tons of modernizations and cleanups, but for the first time, it also does some significant changes in the underlying architecture and drops outdated technology.
Kraft now stores documents not longer in a relational database, but as XML documents in the filesystem. While separate files are more natural for documents anyway, this is paving the way to let Kraft integrate with private cloud infrastructures like OpenCloud or Nextcloud via sync. That is not only for backup- and web-app-purposes, but also for synced data that enables to run Kraft as distributed system. An example is if office staff works from different home offices. Expect this and related usecases to be supported in the near future of Kraft.
But there are more features: For example, the document lifecycle was changed to be more compliant: Documents remain in a draft status now until they get finalized, when they get their final document number. From that point on, they can not longer be altered.
There is too much on the long Changes-List to mention here.
However, what is important is that after more than 20 years of developing and maintaining this app, I continue to be motivated to work on this bit. It is not a big project, but I think it is important that we have this kind of “productivity”-applications available for Linux to make it attractive for people to switch to Linux.
Around Kraft, a small but beautiful community has built up. I like to thank everybody who contributed in any way to Kraft over the years. It is big fun to work with you all!
If you are interested, please get in touch.
pgtwin as OCF Agent
When I was looking for a solution that could provide High Availability for two Datacenters, the only solution that remained viable and comprehensible for me was using Corosync/Pacemaker. The reason that I actually need this is, that Mainframe environments typically use two Datacenters, since z/OS can nicely operate with that. The application that I had to setup is Kubernetes on Linux on Z and since Kubernetes itself normally runs with 3 or more nodes, I had to find a different solution. I found, that I could use an external database to run Kubernetes with https://github.com/k3s-io/kine, and being no DBA, I selected PostgreSQL as first try.
For pacemaker, there already exists an OCF Agent called pgsql https://linux.die.net/man/7/ocf_heartbeat_pgsql that is included with the clusterlabs OCF agents. In addition, RedHat created another OCF agent, called PAF https://clusterlabs.github.io/PAF/ that sounded promising. However, I first had to build it on my own, and later I found that it was really nicely promoted, but was missing out on some needed features.
That is, a colleague asked, if I wanted to try to use his AI, and countless improvements and bugs later, the pgtwin https://github.com/azouhr/pgtwin agent really seems quite stable. Now, to some of the main design concepts.
Make use of the promotable clone resource
PostgreSQL’s primary/standby model maps perfectly to promoted/unpromoted. This is actually how you also would configure pgsql with a current pacemaker release. All documentation relies on the current schema of this configuration.
Use Physical Replication with Slots
- Prevent WAL files from being recycled while standby is offline
- Enable standby to catch up after brief disconnections
- Automatically created/managed by pgtwin
- Automatically cleaned up when excessive (prevents disk fill)
Why physical, and not logical replication?
- Byte-identical replica (all databases, all tables, all objects)
- Lower overhead than logical replication
- Supports pg_rewind for timeline divergence recovery
Automatic Standby Initialization
Traditionally, the database admin would have to setup the replication and the OCF agent would then take over the management. However, since we already had basebackup functionality ready in case the WAL had been cleaned up, it was just a small step to provide full initialization
The only steps on the secondary for the admin after configuring the primary are:
- Create the PostgreSQL Data Directory with correct ownership/permissions
- Setup the password file .pgpass
The remaining tasks of creating a sync streaming replication is done during startup of the node by pgtwin.
Timeline Divergence and pg_rewind
After a failover, the old primary may have diverged from the new primary, and thus the synchronous replication will fail. pgtwin handles this as folows:
- Detect divergence (timeline check in pgsql_demote)
- Runs pg_rewind to sync from new primary
- Replays necessary WAL ro reconcile
- Starts as standby.
This is much faster than trying to do a full basebackup, at least with big databases. Typical failover times are merely seconds.
Replication Health Monitoring
Every monitor cycle, pgtwin does not only check if PostgreSQL is running, but also the replication health. This includes the replication state (streaming, catchup, etc.) as well as the replication lag and the synchronous state.
If the replication check fails for 5 consecutive monitor cycles (configurable), pgtwin automatically triggers recovery. First trying with pg_rewind, however if that fails, it will go for pg_basebackup.
Configuration Validation
At startup, pgtwin validates PostgreSQL configuration for a number of settings that it considers critical. There are hard checks like “restart_after_crash = off” that must be set to off to prevent PostgreSQL from trying to promote itself instead of letting pacemaker handle the situation. But also a number of other parameters.
To check the startup validation, have a look at the pacemaker system logs:
journalctl -u pacemaker -f
State Machine and Lifecyle
pgtwin has a clear idea about the state of PostgreSQL lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ STOPPED STATE │
│ PostgreSQL not running │
└──────────────────────┬──────────────────────────────────────┘
│ start operation
↓
┌────────────────┐
│ PGDATA valid? │
└────┬───────┬───┘
│ │
NO ←──┘ └──→ YES
│ │
↓ ↓
┌──────────────────┐ ┌─────────────────┐
│ Auto-initialize │ │ Start PostgreSQL│
│ (pg_basebackup) │ │ as standby │
└────────┬─────────┘ └────────┬────────┘
│ │
└──────────┬──────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ UNPROMOTED STATE │
│ PostgreSQL running as standby │
│ - Replaying WAL from primary │
│ - Read-only queries allowed │
│ - Monitor checks replication health │
└──────────────────────┬──────────────────────────────────────┘
│ promote operation
↓
┌────────────────────┐
│ pg_ctl promote │
│ (remove standby │
│ signal) │
└────────┬───────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ PROMOTED STATE │
│ PostgreSQL running as primary │
│ - Accepts write operations │
│ - Streams WAL to standby │
│ - Manages replication slot │
│ - Monitor checks replication health │
└──────────────────────┬──────────────────────────────────────┘
│ demote operation
↓
┌────────────────────┐
│ Stop PostgreSQL │
│ Check timeline │
│ pg_rewind if needed│
│ Create standby │
│ signal │
└────────┬───────────┘
↓
(returns to UNPROMOTED STATE)
Failure Handling
The following Failures are handled completely automatically and are designed to provide seamless operation without dataloss:
- Primary Failure and Recovery
- Standby Failure and Recovery
- Replication Failure
- Split-Brain Prevention
For the Split-Brain Prevention, additional Pacemaker configurations like a second corosync with direct network connection as well as a third ring with IPMI will be needed.
Container Mode
pgtwin is prepared to also support containers instead of a locally installed PostgreSQL database. However, the current implementation is too sluggish and has too much overhead during management of the database.
For future releases, I plan to change the implementation by switching from “podman run” to the use of “nsexec”. We will see, if this makes the implementation usable. Still, currently implemented is
- Version check, that prevents from using a wrong Container PostgreSQL Version with the current PGDATA
- Additional PostgreSQL User that allows to use the PGDATA Userid to be used within the Container.
- All PostgreSQL commands are run by a wrapper, so that there is a seamless integration between bare-metal and container operations guaranteed.
Single-Node Startup
The original authors of pgsql were very considered about the data even in the case of a double crash of the cluster. The scenario they had in mind was like this:
- Primary crashes
- Secondary takes over and handles applications
- Secondary crashes
- Primary comes up with outdated data and continues as primary
Now, with pgtwin there is a number of considerations going to the startup
- If both nodes come up, pgtwin will check the timeline on who should become promoted
- If cluster was down, and one node comes up:
- If Node was primary and had sync mode enabled: Node likely crashed, should not be promoted.
- If Node was primary and had async mode enabled: Node likely crashed when other node was missing. This node should become primary
- If Node was secondary: Cluster probably crashed, or was restarted after the secondary crashed, node should not be promoted
The key insight here is, that in case just one node is restarted, it only should be promoted standalone if it was primary before, and in addition it had async streaming replication activated even though the cluster was configured for sync streaming replication.
The cluster will refuse to start with a single node else. If startup is really needed, the admins will have to override.
pgtwin-migrate
In a future blog entry, I will cover the features of the currently experimental pgtwin-migrate OCF agent. This agent allows to fail over between two PostgreSQL Clusters, like two Versions or between different Vendors.
What does it mean to write in 2026?
Tumbleweed – Review of the week 2026/1
Dear Tumbleweed users and hackers,
Happy New Year to you all! While people all around the world are celebrating the new year, Tumbleweed has been tirelessly rolling ahead and has published six snapshots (20251227 – 20251231, 20260101). Naturally, there are no groundbreaking changes, as many developers and maintainers are out celebrating, and any greater coordinated effort is taking a bit more time.
Nevertheless, the six snapshots brought you these changes:
- Python 3.13.11 (some CVE fixes)
- libgit2 1.9.2S
- Neon 0.36.0
- Harfbuzz 12.3.0
- NetworkManager 1.54.3
- GStreamer 1.26.10
- VLC 3.0.22 & 3.0.23: finally linking ffmpeg-8
- GPG 2.5.16
- upower 1.91.0
The next snapshot is already in progress of syncing out, and the next few changes are pulling up in the staging projects. You can expect these things shortly:
- SDL3 3.4.0
- Ruby 4.0: currently breaking build of vim, See https://github.com/vim/vim/issues/18884
- transactional-update 6.0.1
- Shadow 4.19.0
Let’s get rolling for the Year 2026! I’m looking forward to a great year!
Path Aware High Availability (PAHA)
During my works on Kubernetes on Linux on Z and the creation of https://github.com/azouhr/pgtwin, I came across the same issue that most admins have to solve in two-node clusters. How can I get quorum, and what node is to be the primary.
While using additional techniques like providing a second corosync ring for HA, and even a third ring for an IPMI device, the elegance of having a three node quorum could not easily be implemented in my desired environment.
When trying to solve the correct placement of the primary PostgreSQL database in the two-Node Cluster, it came to me, that there is an external dependency that could be used as arbitrator. It does not really help an application if a resource is available, but it cannot be reached.
The main insight here was:
**Availability without accessibility is useless**
This pattern shifts HA from “server-centric” (is it running?) to “use-case-centric” (can it be used for its intended purpose?). I did some research, however I could not find anyone describing this key principle as a method to determine placement of resources.
We did define a new term to make this handy:
Definition of “Critical Path”:
A critical path is any dependency required for the service to fulfill its designed use case.
Definition of “Path-Aware High Availability (PAHA)”
Path-Aware High Availability is a general clustering pattern where resource promotion decisions explicitly validate critical paths required for service delivery before allowing promotion. Unlike traditional HA which only checks if a service *process* is running, PAHA ensures the service is running on a node where clients can actually use it.
This turned out to be a really interesting thought. Besides network paths, this can also be applied to other paths, totally unrelated to the original use case:
| Use Case | Service | Critical Path | Validation Method |
|---|---|---|---|
| Database clustering | PostgreSQL | Gateway reachability | Ping gateway from node |
| Storage HA | iSCSI target | Multipath to storage |
multipath -ll shows paths |
| FibreChannel SAN | SAN LUN | FC fabric connectivity |
fcinfo shows active paths |
| RoCE storage | NVMe-oF target | DCB lossless Ethernet |
dcbtool shows PFC enabled |
| API gateway | Kong/Nginx | Upstream service reachable | Health check endpoint |
| Load balancer | HAProxy | Backend pool reachable | TCP connect to backends |
| DNS server | BIND | Root server reachability | Query root servers |
| NFS server | NFS daemon | Export filesystem mounted |
mount shows filesystem |
| Container orchestrator | Kubernetes | CNI network functional | Pod-to-pod connectivity |
This can even be used to mitigate sick-but-not-dead conditions. For example in a multipath environment, you might want to disable a path that sometimes shows crc errors. Even from the storage side, you would know if there are sufficient paths available, and can disable the sick path.
Now to the fun part. It tells about pacemaker, that such functionality can be implemented by simple configuration means, at least for networks. For pgtwin, the question was, what happens if ring0 (with the PostgreSQL resource) is partially broken. The other ring would keep the cluster running, but the placement of the primary with read-write capability would have to go to the node with service access.
What we had to do, was merely create a ping resource, setup a clone with it, and create a location rule that tells pacemaker where to place the primary resource. In case of pgtwin, we additionally prevent the unpromoted resource from running on a node without ping connectivity, because it likely will not be able to sync with the primary. The configuration looks like this:
primitive ping-gateway ocf:pacemaker:ping \
params \
host_list="192.168.1.1" \
multiplier="100" \
attempts="3" \
timeout="2" \
op monitor interval="10s" timeout="20s"
clone ping-clone ping-gateway \
meta clone-max="2" clone-node-max="1"
location prefer-connected-promoted postgres-clone role=Promoted \
rule 200: pingd gt 0
location require-connectivity-unpromoted postgres-clone role=Unpromoted \
rule -inf: pingd eq 0
Now, in the assumed case of a Dual Datacenter setup, what happens if the gateway vanishes on one side is:
- The cluster makes sure that the primary is on the side with the ping availability.
- The secondary is located on the other side.
- The secondary may not run there without the ping resource and is stopped.
- The primary is notified about the secondary being gone, and switches to async replication mode.
This means, that we lost high availability of the PostgreSQL database, but it still serves the applications as usual. When the gateway comes back, the following happens:
- The cluster starts pgtwin on the secondary
- pgtwin initiates a rollback of the database to get the timelines in sync
- If the rollback is unsuccessful, pgtwin initiates a basebackup from the primary
- After the nodes are consistent, the database is started as secondary, and the replication is switched to sync again.
- The primary node is not moved back, because we set a resource stickiness by default.
All of this happens without admin intervention. This procedure greatly improves availability of the PostgreSQL database for the intended use.
Seamless Windows Apps on openSUSE with WinBoat
Linux kernel security work
Lots of the CVE world seems to focus on “security bugs” but I’ve found that it is not all that well known exactly how the Linux kernel security process works. I gave a talk about this back in 2023 and at other conferences since then, attempting to explain how it works, but I also thought it would be good to explain this all in writing as it is required to know this when trying to understand how the Linux kernel CNA issues CVEs.