Kraft 2.0 Announcement
With the start of the new year, I am very happy to announce the release of version Kraft 2.0.0.
Kraft provides effective invoicing and document management for small businesses on Linux. Check the feature list.
This new version is a big step ahead for the project. It does not only deliver the outstanding ports to Qt6 and KDE Frameworks 6 and tons of modernizations and cleanups, but for the first time, it also does some significant changes in the underlying architecture and drops outdated technology.
Kraft now stores documents not longer in a relational database, but as XML documents in the filesystem. While separate files are more natural for documents anyway, this is paving the way to let Kraft integrate with private cloud infrastructures like OpenCloud or Nextcloud via sync. That is not only for backup- and web-app-purposes, but also for synced data that enables to run Kraft as distributed system. An example is if office staff works from different home offices. Expect this and related usecases to be supported in the near future of Kraft.
But there are more features: For example, the document lifecycle was changed to be more compliant: Documents remain in a draft status now until they get finalized, when they get their final document number. From that point on, they can not longer be altered.
There is too much on the long Changes-List to mention here.
However, what is important is that after more than 20 years of developing and maintaining this app, I continue to be motivated to work on this bit. It is not a big project, but I think it is important that we have this kind of “productivity”-applications available for Linux to make it attractive for people to switch to Linux.
Around Kraft, a small but beautiful community has built up. I like to thank everybody who contributed in any way to Kraft over the years. It is big fun to work with you all!
If you are interested, please get in touch.
pgtwin as OCF Agent
When I was looking for a solution that could provide High Availability for two Datacenters, the only solution that remained viable and comprehensible for me was using Corosync/Pacemaker. The reason that I actually need this is, that Mainframe environments typically use two Datacenters, since z/OS can nicely operate with that. The application that I had to setup is Kubernetes on Linux on Z and since Kubernetes itself normally runs with 3 or more nodes, I had to find a different solution. I found, that I could use an external database to run Kubernetes with https://github.com/k3s-io/kine, and being no DBA, I selected PostgreSQL as first try.
For pacemaker, there already exists an OCF Agent called pgsql https://linux.die.net/man/7/ocf_heartbeat_pgsql that is included with the clusterlabs OCF agents. In addition, RedHat created another OCF agent, called PAF https://clusterlabs.github.io/PAF/ that sounded promising. However, I first had to build it on my own, and later I found that it was really nicely promoted, but was missing out on some needed features.
That is, a colleague asked, if I wanted to try to use his AI, and countless improvements and bugs later, the pgtwin https://github.com/azouhr/pgtwin agent really seems quite stable. Now, to some of the main design concepts.
Make use of the promotable clone resource
PostgreSQL’s primary/standby model maps perfectly to promoted/unpromoted. This is actually how you also would configure pgsql with a current pacemaker release. All documentation relies on the current schema of this configuration.
Use Physical Replication with Slots
- Prevent WAL files from being recycled while standby is offline
- Enable standby to catch up after brief disconnections
- Automatically created/managed by pgtwin
- Automatically cleaned up when excessive (prevents disk fill)
Why physical, and not logical replication?
- Byte-identical replica (all databases, all tables, all objects)
- Lower overhead than logical replication
- Supports pg_rewind for timeline divergence recovery
Automatic Standby Initialization
Traditionally, the database admin would have to setup the replication and the OCF agent would then take over the management. However, since we already had basebackup functionality ready in case the WAL had been cleaned up, it was just a small step to provide full initialization
The only steps on the secondary for the admin after configuring the primary are:
- Create the PostgreSQL Data Directory with correct ownership/permissions
- Setup the password file .pgpass
The remaining tasks of creating a sync streaming replication is done during startup of the node by pgtwin.
Timeline Divergence and pg_rewind
After a failover, the old primary may have diverged from the new primary, and thus the synchronous replication will fail. pgtwin handles this as folows:
- Detect divergence (timeline check in pgsql_demote)
- Runs pg_rewind to sync from new primary
- Replays necessary WAL ro reconcile
- Starts as standby.
This is much faster than trying to do a full basebackup, at least with big databases. Typical failover times are merely seconds.
Replication Health Monitoring
Every monitor cycle, pgtwin does not only check if PostgreSQL is running, but also the replication health. This includes the replication state (streaming, catchup, etc.) as well as the replication lag and the synchronous state.
If the replication check fails for 5 consecutive monitor cycles (configurable), pgtwin automatically triggers recovery. First trying with pg_rewind, however if that fails, it will go for pg_basebackup.
Configuration Validation
At startup, pgtwin validates PostgreSQL configuration for a number of settings that it considers critical. There are hard checks like “restart_after_crash = off” that must be set to off to prevent PostgreSQL from trying to promote itself instead of letting pacemaker handle the situation. But also a number of other parameters.
To check the startup validation, have a look at the pacemaker system logs:
journalctl -u pacemaker -f
State Machine and Lifecyle
pgtwin has a clear idea about the state of PostgreSQL lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ STOPPED STATE │
│ PostgreSQL not running │
└──────────────────────┬──────────────────────────────────────┘
│ start operation
↓
┌────────────────┐
│ PGDATA valid? │
└────┬───────┬───┘
│ │
NO ←──┘ └──→ YES
│ │
↓ ↓
┌──────────────────┐ ┌─────────────────┐
│ Auto-initialize │ │ Start PostgreSQL│
│ (pg_basebackup) │ │ as standby │
└────────┬─────────┘ └────────┬────────┘
│ │
└──────────┬──────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ UNPROMOTED STATE │
│ PostgreSQL running as standby │
│ - Replaying WAL from primary │
│ - Read-only queries allowed │
│ - Monitor checks replication health │
└──────────────────────┬──────────────────────────────────────┘
│ promote operation
↓
┌────────────────────┐
│ pg_ctl promote │
│ (remove standby │
│ signal) │
└────────┬───────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ PROMOTED STATE │
│ PostgreSQL running as primary │
│ - Accepts write operations │
│ - Streams WAL to standby │
│ - Manages replication slot │
│ - Monitor checks replication health │
└──────────────────────┬──────────────────────────────────────┘
│ demote operation
↓
┌────────────────────┐
│ Stop PostgreSQL │
│ Check timeline │
│ pg_rewind if needed│
│ Create standby │
│ signal │
└────────┬───────────┘
↓
(returns to UNPROMOTED STATE)
Failure Handling
The following Failures are handled completely automatically and are designed to provide seamless operation without dataloss:
- Primary Failure and Recovery
- Standby Failure and Recovery
- Replication Failure
- Split-Brain Prevention
For the Split-Brain Prevention, additional Pacemaker configurations like a second corosync with direct network connection as well as a third ring with IPMI will be needed.
Container Mode
pgtwin is prepared to also support containers instead of a locally installed PostgreSQL database. However, the current implementation is too sluggish and has too much overhead during management of the database.
For future releases, I plan to change the implementation by switching from “podman run” to the use of “nsexec”. We will see, if this makes the implementation usable. Still, currently implemented is
- Version check, that prevents from using a wrong Container PostgreSQL Version with the current PGDATA
- Additional PostgreSQL User that allows to use the PGDATA Userid to be used within the Container.
- All PostgreSQL commands are run by a wrapper, so that there is a seamless integration between bare-metal and container operations guaranteed.
Single-Node Startup
The original authors of pgsql were very considered about the data even in the case of a double crash of the cluster. The scenario they had in mind was like this:
- Primary crashes
- Secondary takes over and handles applications
- Secondary crashes
- Primary comes up with outdated data and continues as primary
Now, with pgtwin there is a number of considerations going to the startup
- If both nodes come up, pgtwin will check the timeline on who should become promoted
- If cluster was down, and one node comes up:
- If Node was primary and had sync mode enabled: Node likely crashed, should not be promoted.
- If Node was primary and had async mode enabled: Node likely crashed when other node was missing. This node should become primary
- If Node was secondary: Cluster probably crashed, or was restarted after the secondary crashed, node should not be promoted
The key insight here is, that in case just one node is restarted, it only should be promoted standalone if it was primary before, and in addition it had async streaming replication activated even though the cluster was configured for sync streaming replication.
The cluster will refuse to start with a single node else. If startup is really needed, the admins will have to override.
pgtwin-migrate
In a future blog entry, I will cover the features of the currently experimental pgtwin-migrate OCF agent. This agent allows to fail over between two PostgreSQL Clusters, like two Versions or between different Vendors.
What does it mean to write in 2026?
Tumbleweed – Review of the week 2026/1
Dear Tumbleweed users and hackers,
Happy New Year to you all! While people all around the world are celebrating the new year, Tumbleweed has been tirelessly rolling ahead and has published six snapshots (20251227 – 20251231, 20260101). Naturally, there are no groundbreaking changes, as many developers and maintainers are out celebrating, and any greater coordinated effort is taking a bit more time.
Nevertheless, the six snapshots brought you these changes:
- Python 3.13.11 (some CVE fixes)
- libgit2 1.9.2S
- Neon 0.36.0
- Harfbuzz 12.3.0
- NetworkManager 1.54.3
- GStreamer 1.26.10
- VLC 3.0.22 & 3.0.23: finally linking ffmpeg-8
- GPG 2.5.16
- upower 1.91.0
The next snapshot is already in progress of syncing out, and the next few changes are pulling up in the staging projects. You can expect these things shortly:
- SDL3 3.4.0
- Ruby 4.0: currently breaking build of vim, See https://github.com/vim/vim/issues/18884
- transactional-update 6.0.1
- Shadow 4.19.0
Let’s get rolling for the Year 2026! I’m looking forward to a great year!
Path Aware High Availability (PAHA)
During my works on Kubernetes on Linux on Z and the creation of https://github.com/azouhr/pgtwin, I came across the same issue that most admins have to solve in two-node clusters. How can I get quorum, and what node is to be the primary.
While using additional techniques like providing a second corosync ring for HA, and even a third ring for an IPMI device, the elegance of having a three node quorum could not easily be implemented in my desired environment.
When trying to solve the correct placement of the primary PostgreSQL database in the two-Node Cluster, it came to me, that there is an external dependency that could be used as arbitrator. It does not really help an application if a resource is available, but it cannot be reached.
The main insight here was:
**Availability without accessibility is useless**
This pattern shifts HA from “server-centric” (is it running?) to “use-case-centric” (can it be used for its intended purpose?). I did some research, however I could not find anyone describing this key principle as a method to determine placement of resources.
We did define a new term to make this handy:
Definition of “Critical Path”:
A critical path is any dependency required for the service to fulfill its designed use case.
Definition of “Path-Aware High Availability (PAHA)”
Path-Aware High Availability is a general clustering pattern where resource promotion decisions explicitly validate critical paths required for service delivery before allowing promotion. Unlike traditional HA which only checks if a service *process* is running, PAHA ensures the service is running on a node where clients can actually use it.
This turned out to be a really interesting thought. Besides network paths, this can also be applied to other paths, totally unrelated to the original use case:
| Use Case | Service | Critical Path | Validation Method |
|---|---|---|---|
| Database clustering | PostgreSQL | Gateway reachability | Ping gateway from node |
| Storage HA | iSCSI target | Multipath to storage |
multipath -ll shows paths |
| FibreChannel SAN | SAN LUN | FC fabric connectivity |
fcinfo shows active paths |
| RoCE storage | NVMe-oF target | DCB lossless Ethernet |
dcbtool shows PFC enabled |
| API gateway | Kong/Nginx | Upstream service reachable | Health check endpoint |
| Load balancer | HAProxy | Backend pool reachable | TCP connect to backends |
| DNS server | BIND | Root server reachability | Query root servers |
| NFS server | NFS daemon | Export filesystem mounted |
mount shows filesystem |
| Container orchestrator | Kubernetes | CNI network functional | Pod-to-pod connectivity |
This can even be used to mitigate sick-but-not-dead conditions. For example in a multipath environment, you might want to disable a path that sometimes shows crc errors. Even from the storage side, you would know if there are sufficient paths available, and can disable the sick path.
Now to the fun part. It tells about pacemaker, that such functionality can be implemented by simple configuration means, at least for networks. For pgtwin, the question was, what happens if ring0 (with the PostgreSQL resource) is partially broken. The other ring would keep the cluster running, but the placement of the primary with read-write capability would have to go to the node with service access.
What we had to do, was merely create a ping resource, setup a clone with it, and create a location rule that tells pacemaker where to place the primary resource. In case of pgtwin, we additionally prevent the unpromoted resource from running on a node without ping connectivity, because it likely will not be able to sync with the primary. The configuration looks like this:
primitive ping-gateway ocf:pacemaker:ping \
params \
host_list="192.168.1.1" \
multiplier="100" \
attempts="3" \
timeout="2" \
op monitor interval="10s" timeout="20s"
clone ping-clone ping-gateway \
meta clone-max="2" clone-node-max="1"
location prefer-connected-promoted postgres-clone role=Promoted \
rule 200: pingd gt 0
location require-connectivity-unpromoted postgres-clone role=Unpromoted \
rule -inf: pingd eq 0
Now, in the assumed case of a Dual Datacenter setup, what happens if the gateway vanishes on one side is:
- The cluster makes sure that the primary is on the side with the ping availability.
- The secondary is located on the other side.
- The secondary may not run there without the ping resource and is stopped.
- The primary is notified about the secondary being gone, and switches to async replication mode.
This means, that we lost high availability of the PostgreSQL database, but it still serves the applications as usual. When the gateway comes back, the following happens:
- The cluster starts pgtwin on the secondary
- pgtwin initiates a rollback of the database to get the timelines in sync
- If the rollback is unsuccessful, pgtwin initiates a basebackup from the primary
- After the nodes are consistent, the database is started as secondary, and the replication is switched to sync again.
- The primary node is not moved back, because we set a resource stickiness by default.
All of this happens without admin intervention. This procedure greatly improves availability of the PostgreSQL database for the intended use.
Seamless Windows Apps on openSUSE with WinBoat
Linux kernel security work
Lots of the CVE world seems to focus on “security bugs” but I’ve found that it is not all that well known exactly how the Linux kernel security process works. I gave a talk about this back in 2023 and at other conferences since then, attempting to explain how it works, but I also thought it would be good to explain this all in writing as it is required to know this when trying to understand how the Linux kernel CNA issues CVEs.
pgtwin — HA PostgreSQL: Configuration
in the previous blog pgtwin — HA PostgreSQL: VM Preparation we setup two VMs with KVM to prepare for a HA PostgreSQL setup. Now, we will configure the Corosync cluster engine, prepare PostgreSQL for synchronous streaming replication and finally configure Pacemaker to provide high availability.
Configure Corosync
Corosync has its main configuration file located at ‘/etc/corosync/corosync.conf’. Edit this file with the following content, change the IP Adresses according to your setup:
totem {
version: 2
cluster_name: pgtwin-devel
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
token: 5000
join: 60
max_messages: 20
token_retransmits_before_loss_const: 10
# Dual ring configuration
interface {
ringnumber: 0
mcastport: 5405
}
interface {
ringnumber: 1
mcastport: 5407
}
}
nodelist {
node {
ring0_addr: 192.168.60.13
ring1_addr: 192.168.61.233
name: pgtwin1
nodeid: 1
}
node {
ring0_addr: 192.168.60.83
ring1_addr: 192.168.61.253
name: pgtwin2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
}
Next step is to create an authentication key for the cluster. Create the key on the first node, and then copy it to the other node:
corosync-keygen -l
scp /etc/corosync/authkey pgtwin2:/etc/corosync/authkey
Note, that by default you will not be allowed to access the remote node as root with ssh. This is a good standard for production sites. If you find that inconvenient, you can change the setting by adding a file to /etc/ssh/sshd_config.d. Don’t do this for production environments or externally reachable VMs though:
# cat /etc/ssh/sshd_config.d/10-permit-root.conf
PermitRootLogin=yes
On both nodes, make sure that the ownership and access rights are correct:
chmod 400 /etc/corosync/authkey
chown root:root /etc/corosync/authkey
Enable and start Corosync and Pacemaker:
systemctl enable corosync
systemctl start corosync
# Wait 10 seconds for Corosync to stabilize
sleep 10
# Check Corosync status
sudo corosync-cfgtool -s
# Enable and start Pacemaker
sudo systemctl enable pacemaker
sudo systemctl start pacemaker
Verify that the cluster is working with ‘crm status’
Configure PostgreSQL
PostgreSQL will only be configured on the first node. The second node will only need the data directory as well as the password file ‘.pgpass’ prepared, the pgtwin ocf agent itself will then perform the initial mirroring and final replication configuration of the database. Find the mentioned postgresql.custom.conf file at https://github.com/azouhr/pgtwin/blob/main/postgresql.custom.conf. This file holds the default configuration for use with pgtwin. You want to tweak the parameters according to your usage. Also make sure to use a password that is suitable for your environment.
# Initialize database
sudo -u postgres initdb -D /var/lib/pgsql/data
# Copy the provided PostgreSQL HA configuration
sudo cp /path/to/pgtwin/github/postgresql.custom.conf /var/lib/pgsql/data/postgresql.custom.conf
sudo chown postgres:postgres /var/lib/pgsql/data/postgresql.custom.conf
# Include custom config in main postgresql.conf
sudo -u postgres bash -c "echo \"include = 'postgresql.custom.conf'\" >> /var/lib/pgsql/data/postgresql.conf"
# Configure pg_hba.conf for replication
sudo -u postgres tee -a /var/lib/pgsql/data/pg_hba.conf <<EOF
# Replication connections
host replication replicator 192.168.60.0/24 scram-sha-256
host postgres replicator 192.168.60.0/24 scram-sha-256
EOF
# Start PostgreSQL manually (temporary)
sudo -u postgres pg_ctl -D /var/lib/pgsql/data start
# Create replication user
sudo -u postgres psql <<EOF
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'SecurePassword123';
GRANT pg_read_all_data TO replicator;
GRANT EXECUTE ON FUNCTION pg_ls_dir(text, boolean, boolean) TO replicator;
GRANT EXECUTE ON FUNCTION pg_stat_file(text, boolean) TO replicator;
GRANT EXECUTE ON FUNCTION pg_read_binary_file(text) TO replicator;
GRANT EXECUTE ON FUNCTION pg_read_binary_file(text, bigint, bigint, boolean) TO replicator;
EOF
# Stop PostgreSQL (cluster will manage it)
sudo -u postgres pg_ctl -D /var/lib/pgsql/data stop
Also add the connection definition for your application connections to the pg_hba.conf file.
The PostgreSQL configuration only needs to prepare the password file now. This needs to be added to both nodes:
# cat /var/lib/pgsql/.pgpass
# Replication database entries (for streaming replication)
pgtwin1:5432:replication:replicator:SecurePassword123
pgtwin2:5432:replication:replicator:SecurePassword123
192.168.60.13:5432:replication:replicator:SecurePassword123
192.168.60.83:5432:replication:replicator:SecurePassword123
# Postgres database entries (required for pg_rewind and admin operations)
pgtwin1:5432:postgres:replicator:SecurePassword123
pgtwin2:5432:postgres:replicator:SecurePassword123
192.168.60.13:5432:postgres:replicator:SecurePassword123
192.168.60.83:5432:postgres:replicator:SecurePassword123
Also set correct permissions for this file, else PostgreSQL will refrain from using it:
chmod 600 /var/lib/pgsql/.pgpass
chown postgres:postgres /var/lib/pgsql/.pgpass
After adding .pgpass to the second node, you will only need an empty data directory on that node prepared:
mkdir -p /var/lib/pgsql/data
chown postgres:postgres /var/lib/pgsql/data
chmod 700 /var/lib/pgsql/data
Configure Pacemaker
The final step before starting the HA PostgreSQL for the first time is to configure pacemaker. For first time users of pacemaker, this is a daunting configuration, and it needs a lot of considerations. For now, retrieve the already prepared file https://github.com/azouhr/pgtwin/blob/main/pgsql-resource-config.crm and adopt it to your environment.
The values that you have to edit are:
- VIP address (virtual IP, that is migrated between the cluster nodes and serves as access address for all applications)
- Ping-Gateway address, that allows the cluster to prefer a node with access to the network
- Node Names in several resources, the defaults psql1 and psql2 will be come pgtwin1 and pgtwin2 respectively
After editing the file, load it into the cluster with the ‘crm’ command. The configuration can be done on any node, and will be available immediately from any node:
crm configure < pgsql-resource-config.crm
Thats it. The cluster will now try to bring up the PostgreSQL Database on both nodes in a HA configuration. You can monitor the process with the command ‘crm_mon’. Note, that in the beginning, the secondary node will have failed resources. This is due to the fact, that pgtwin has to perform an initial basebackup on that node. After a while, the output should look similar to this:
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: pgtwin1 (version 3.0.1+20250807.16e74fc4da-1.2-3.0.1+20250807.16e74fc4da) - partition WITHOUT quorum
* Last updated: Tue Dec 30 12:55:12 2025 on pgtwin1
* Last change: Tue Dec 30 12:55:07 2025 by hacluster via hacluster on pgtwin2
* 2 nodes configured
* 5 resource instances configured
Node List:
* Online: [ pgtwin1 pgtwin2 ]
Active Resources:
* postgres-vip (ocf:heartbeat:IPaddr2): Started pgtwin1
* Clone Set: postgres-clone [postgres-db] (promotable):
* Promoted: [ pgtwin1 ]
* Unpromoted: [ pgtwin2 ]
* Clone Set: ping-clone [ping-gateway]:
* Started: [ pgtwin1 pgtwin2 ]
After the cluster stabilized, you can perform a number of tests to check the state:
On pgtwin1 (primary):
# Check replication status
sudo -u postgres psql -x -c "SELECT * FROM pg_stat_replication;"
# Expected: One row showing pgtwin2 connected
On pgtwin2 (standby):
# Check if in recovery mode
sudo -u postgres psql -c "SELECT pg_is_in_recovery();"
# Expected: t (true)
Congratulations, you got a HA PostgreSQL Database running. To access the database on the primary, just use the command:
sudo -u postgres psql
Since this has direct socket access, you will have full access to the database without password that way. For further tests and more information, have a look at https://github.com/azouhr/pgtwin/blob/main/QUICKSTART_DUAL_RING_HA.md.
pgtwin — HA PostgreSQL: VM Preparation
In my last post Kubernetes on Linux on Z, I explained why I need a highly available PostgreSQL Database to operate K3s. Of course, a HA PostgreSQL that works with just two Datacenters has lots more usecases. Let me explain how to perform an initial setup like the one that I use for development.

Preparation of two VMs
The openSUSE Project releases readily prepared Tumbleweed images almost every day. Have a look at https://download.opensuse.org/tumbleweed/appliances/, I typically get an image from there that is named like ‘openSUSE-Tumbleweed-Minimal-VM.x86_64-1.0.0-kvm-and-xen-Snapshot20251222.qcow2’. The current image will have a different name, however lets go with this for now.
My typical KVM VMs use:
- 2 CPUs
- 2 GB memory
- Raw disk image format
- Two libvirt networks (ring0 and ring1)
- Both, graphical (VNC) and serial console support
First, convert the image to a raw image. The reason why I like to use this is, that it is much easier to loop mount such an image also in the local operating system, and also to increase the image with standard commands like kpartx, losetup and dd. You can go with qcow2, if you prefer that format.
qemu-img convert openSUSE-Tumbleweed-Minimal-VM.x86_64-1.0.0-kvm-and-xen-Snapshot20251222.qcow2 pgtwin01.raw
We will need two images of that kind:
cp -a pgtwin01.raw pgtwin02.raw
Since I like to use two network rings for the HA Setup (I will go into details why this is a good thing in a concepts blog soon), lets create two libvirt networks. Attachment of real Linux bridges would also be possible. Create two files ring0.xml and ring1.xml:
# cat ring0.xml
<network>
<name>ring0</name>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr10' stp='on' delay='0'/>
<ip address='192.168.60.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.60.2' end='192.168.60.254'/>
</dhcp>
</ip>
</network>
# cat ring1.xml
<network>
<name>ring1</name>
<forward mode='nat'>
<nat>
<port start='1024' end='65535'/>
</nat>
</forward>
<bridge name='virbr11' stp='on' delay='0'/>
<ip address='192.168.61.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.61.2' end='192.168.61.254'/>
</dhcp>
</ip>
</network>
After that, define the libvirt networks, and enable autostart:
virsh net-define ring0.xml
virsh net-define ring1.xml
virsh net-autostart ring0
virsh net-autostart ring1
Now, lets setup the two VMs. The following command will bring up a VM and ask a number of initial questions. This is just the basic setup of a VM, nothing really special there:
virt-install \
--name pgtwin01 \
--memory 2048 \
--vcpus 2 \
--disk path=/home/claude/images/pgtwin01.raw,format=raw \
--import \
--network network=ring0 \
--network network=ring1 \
--os-variant opensusetumbleweed \
--graphics vnc,listen=0.0.0.0 \
--console pty,target_type=serial
and the same with pgtwin02:
virt-install \
--name pgtwin02 \
--memory 2048 \
--vcpus 2 \
--disk path=/home/claude/images/pgtwin02,format=raw \
--import \
--network network=ring0 \
--network network=ring1 \
--os-variant opensusetumbleweed \
--graphics vnc,listen=0.0.0.0 \
--console pty,target_type=serial
In case you want to connect to Linux Bridges, use “bridge=” instead of “network=”. Typically, I configure ssh to the two VMs, this normally has been done during the virt-install process. The minimal image from openSUSE by default configures both network devices with dhcp. This is an issue, because it will have two default gateways defined. Let me explain how to fix this:
# nmcli c s
NAME UUID TYPE DEVICE
Wired connection 1 29df9468-975d-3944-91ca-355ed0c82a3c ethernet enp1s0
Wired connection 2 1f45b334-b429-3823-80eb-a3aafeb33195 ethernet enp2s0
lo 611124a1-fa8e-48d6-84ba-f75733093ca6 loopback lo
There is two external interfaces configured here. If you check the routing, you will find two default gateway definitions:
ip r s
In this setup, only ring0 is used to connect to the world, and thus the default gateway of ring1 (connected over enp2s0) can be deleted:
nmcli connection modify 1f45b334-b429-3823-80eb-a3aafeb33195 \
ipv4.gateway "" \
ipv4.never-default yes
Adopt the UUID and requirements to your setup.
For the Pacemaker and PostgreSQL configuration later on, also setup your hostnames and resolving of the other nodes. The procedure to set the hostname seems to have changed recently, and it now uses hostnamectl instead of just writing the name to /etc/HOSTNAME.
On pgtwin01:
hostnamectl set-hostname pgtwin01
On pgtwin02:
hostnamectl set-hostname pgtwin02
The resolving is either over your standard DNS system or with /etc/hosts. Find the used IP Addresses with ‘ip a s’
echo "192.168.60.13 pgtwin01" >> /etc/hosts
echo "192.168.60.83 pgtwin02" >> /etc/hosts
Configure the firewall to allow communication between the two VMs:
# Corosync communication
firewall-cmd --permanent --add-port=5405/udp # Corosync multicast
firewall-cmd --permanent --add-port=5404/udp # Corosync multicast (alternative)
# Pacemaker communication
firewall-cmd --permanent --add-port=2224/tcp # pcsd
firewall-cmd --permanent --add-port=3121/tcp # Pacemaker
# PostgreSQL
firewall-cmd --permanent --add-port=5432/tcp
# Reload firewall
firewall-cmd --reload
The last step for preparing the VMs is installing the cluster software as well as the PostgreSQL database software.
zypper install -y \
pacemaker \
corosync \
crmsh \
sudo \
resource-agents \
fence-agents \
postgresql18 \
postgresql18-server \
postgresql18-contrib
After that, you have two VMs readily installed with two network connections. The next steps will be the setup of Corosync, the initial configuration of the PostgreSQL Database, and finally the cluster resource definitions.
Kubernetes on Linux on Z
This year, I had the task of setting up a Kubernetes environment on a Linux Partition on a s390x system. At first sight, this sounds easy, there are offerings out there that you can purchase. The second look however can make you wonder. There is a structural mismatch between typical Linux on Z environments and Kubernetes:
While Linux on Z typically uses two datacenters as two high availability zones, Kubernetes requires you two have at least three.
This is a base foundation issue, that is not to overcome by just telling what you all did, you really have to get into the issue and find a solution. I might not know everything, however there is a solution that the rancher people developed, and it is called kine. This is an etcd shim, that allows to replace the etcd-database, which actually requires the three sites for its quorum mechanism, with an external sql database.
I am a little adventurous, and thus I told people, we can do that. The plan looked like this:

As you can see, Kubernetes talks to PostgreSQL over kine, and the HA functionality would be provided by PostgreSQL. This first thought was kind of naive, and needed a number of fixes.
- PostgreSQL can do streaming replication, however the standard version cannot run a Multi-Master.
- The only open source cluster solution that works for two nodes and I am aware of is corosync with pacemaker. However, the OCF Agents there are able to fail over, but after that, a DBA has to restore high availability. Patroni, as the standard solution for PostgreSQL in cloud environments today, does not solve the 2 Datacenter constraint for me.
- The Kubernetes of choice was k3s, however the rancher people stopped releasing for s390x
- All the needed containers are OpenSource, but many do not release s390x architecture, some even prevent from building that in their build scripts.
Together with a colleague I started to work on this project. Fortunately, we already had worked on zCX (Container Extensions on z/OS) and had provided many container images that were missing previously. To make development easier, we utilized OBS and worked on the images in a project that I created for that purpose. This can be found at “home:azouhr:d3v” for those interested. Thus, the fourth issue was just work, but not so much a real challenge.
The main challenges have been a Highly Available PostgreSQL that works on two nodes, as well as building k3s in a reasonable way for s390x. Let me get into some more details of using Corosync and Pacemaker with two node clusters, and what is needed to make that work with PostgreSQL.
Corosync and Pacemaker are a solution that is used in the HA product of the SUSE Enterprise Server, and it actually supports two nodes if you have a SBD (Storage-Based Death) device at hand. This is typically not an issue for Mainframe environments, because those machines normally do not have local disks anyways and always operate a SAN.
Pacemaker uses OCF-Agents, that operate certain programs. In my case, this would be PostgreSQL. I have been writing on such agents long ago, however it is kind of a daunting prospect to write an agent from scratch, especially when you have to learn the tasks of a PostgreSQL DBA along the process. After pushing the task off for some time, my colleague suggested to try an AI to get started, and what can I say, I was positively surprised with the result. I chose to give it a try. Since I did not want the KI to have too many rights on my home laptop. The setup I am using looks like this:

I know that many developers don’t like what they get from a AI, and I have to admit, I did not even try to run the first three or four versions that the AI produced. However after a while, the solution stabilized, and I could concentrate on smaller aspects of the OCF agent that “we” created. A recent state of what we produced can be found at https://github.com/azouhr/pgtwin. Note, that I did by far not publish all the different design documents, this would be more than 250 different documents, talking about different aspects of how the OCF agent should operate.
Some experiences with the KI:
- KIs like to proceed, even if a thought is not ready. Working on the design is important, just don’t let a KI produce code, when you are not yet confident, that you are at the same level of understanding.
- KIs can easily skim through massive amounts of log files, and also find and fix issues on their own. I personally like to challenge solutions to issues, when I feel that the solution is not perfect. This may lead to several iterations of new proposed solutions.
- KIs sometimes solve issue A and break B, only to solve B and break A. They are happy to go on like this forever. Whenever you find a problem reoccurring, you have to get deeper into the issue. Let the KI explain what happens, create assumptions and let it explore different paths.
- KIs sometimes stumble into the same issues that have been discussed earlier. I found that starting the discussion over again is tedious. Instead, ask the KI why it cannot use the solution from the previous location.
- KIs sometimes try to figure things out without having enough data. Instead of adding debug information or tracing like any programmer would do, they just start experimenting. It often helps a great deal, just to tell them to switch on tracing, or to use things like strace to get more information.
- Finally, you always have to manually review the result. My personal procedure is to add comments into the code that can be easily found with grep, and later tell the KI to fix the comments.
- KIs have a date weakness. They like to confuse years and other numbers in dates. That’s why the release dates of pgtwin look confusing.
- A little warning about documentation and promotion. Obviously KIs have been trained a lot with marketing procedures. They typically claim something is enterprise ready as soon as it did run once. After being “enterprise ready” I typically find quite a number of issues just by looking at the code.
- Still it is impressive, how easy the code can be read, and how well it is documented. For someone who did read quite some code in history, it is really nice to look at the code. Also the amount of coding would not be possible in that short timeframe by a normal developer.
In my next post, I will go over the design of https://github.com/azouhr/pgtwin and explore the main features and concepts that I have been working on for the last weeks. I hope to create another bugfix release soon, however the agent already works quite well as it is now.