Simplify GPU Application Development with HMM on Leap
Recently, NVIDIA has introduced Heterogeneous Memory Management (HMM)
in its open source kernel drivers
which simplifies GPU Application Development
with CUDA.
It unifies system memory access across CPUs and GPUs and removes the
need to copy memory content between CPU and GPU memory.
It extends Unified Memory to cover both system allocated memory as well
as memory allocated by cudaMallocManaged().
You may ask, "how do I make this work on my Leap system?" If you are a Leap 15.5 user, the open driver is already available to you. Therefore, if you have an NVIDIA chipset with a GPU System Processor (GSP), ie. Turing or later, we have you covered. Here is how:
Installation on openSUSE Leap 15.5
The simplest way to accomplish this is to login as root and run the following commands in your shell:
zypper ar https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/cuda-opensuse15.repo
zypper --gpg-auto-import-keys refresh
zypper -n install -y --auto-agree-with-licenses --no-recommends nvidia-open-gfxG05-kmp-default cuda
This will add the NVIDIA CUDA repository and install CUDA with the kernel
modules required.
Do you require signed drivers to support secure boot or deploy in a public
cloud environment? In this case, instead of the above, execute:
zypper ar https://developer.download.nvidia.com/compute/cuda/repos/opensuse15/x86_64/cuda-opensuse15.repo
zypper ar https://download.nvidia.com/opensuse/leap/15.5/ NVIDIA-drivers
zypper --gpg-auto-import-keys refresh
zypper -n in -y --auto-agree-with-licenses --no-recommends nvidia-open-driver-G06-signed-kmp-default nvidia-drivers-minimal-G06 cuda
This makes use of the NVIDIA open driver package shipped and signed by
SUSE - like the rest of your kernel. This eliminates the need to enroll
a MOK as well an extra build stage when the kernel drivers are installed
or updated. Thus it helps to reduce the size of the cloud image by
removing the need for extra build tools.
To use these kernel drivers, it installs a set of user space driver
packages which are not yet available in the CUDA software repository.
Preparations
For chipsets with a display engine (i.e. which have display outputs), the open driver
support is still considered alpha. Therefore, you may have to add or uncomment
the following option in /etc/modprobe.d/50-nvidia-default.conf:
options nvidia NVreg_OpenRmEnableUnsupportedGpus=1
Once these steps have been performed, you may either reboot the system or run
modprobe nvidia
as root to load all required kernel modules.
Testing the Installation
To check if HMM is available and enabled, query the 'Addressing Mode' property:
nvidia-smi -q | grep Addressing
Addressing Mode : HMM
If you see above output, HMM is available on your system.
Compile HMM Sample Code
NVIDIA discusses some code examples for HMM in its blog
post.
The examples can be found here on GitHub. If you
would like to try out the examples, here are some hints on building and running them.
Some these need a newer gcc than the stock version shipped with Leap 15,
which you can install with:
zypper in gcc12-c++
In order to compile the examples, the PATH environment variable needs to be extended to
point to the CUDA binaries:
export PATH=/usr/local/cuda/bin/:${PATH}
You may now compile the examples under the path src using the following commands:
nvcc -std=c++20 -ccbin=/usr/bin/g++-12 atomic_flag.cpp -o atomic_flag
nvcc -std=c++20 -ccbin=/usr/bin/g++-12 file_after.cpp -o file_after
nvcc -std=c++20 -ccbin=/usr/bin/g++-12 file_before.cpp -o file_before
nvcc -std=c++20 -ccbin=/usr/bin/g++-12 ticket_lock.cpp -o ticket_lock
'weather_app' Example
For this example application, the system gcc compiler is sufficient. Only $PATH has to be
set to
export PATH=/usr/local/cuda/bin/:${PATH}
Now, build the binary weather_app by running
make
The blog by NVIDIA describes how to obtain the data required to run the app. If you're unable to download the ~1.3 TB of data, you may also use the random data generator from this PR on GitHub. The random data app can be compiled with
g++ create_random_data.cpp -o create_random_data -O2 -Wall
The application has no command line parameters, and the start and end year for the random data has to be set in the source code itself.
NOTE
If your graphic card doesn't have sufficient VRAM to run the original sample code, you may
scale down the data size by reducing the input_grid_height and input_grid_width parameters
in both create_random_data.cpp and weather_app.cu.
To do a sample run:
mkdir binary_1hr_all
./weather_app
./weather_app 1981 1982 binary_1hr_all/
NOTE
The Makefile doesn't compile CUDA kernels for the Turing GPUs and also has a faulty error message handling. You might want check out https://github.com/NVIDIA/HMM_sample_code/pull/2 which fixes this issues.
Summary
- The NVIDIA open driver provides HMM (Heterogeneous Memory Management) which extends the simplicity of the CUDA Unified Memory programming model even further on supported chipsets 1 by including system allocated memory.
- HMM is available for openSUSE Leap 15.5.
- The open driver allows for pre-built kernel drivers signed by SUSE.
- This greatly simplifies the installation in a secure boot environment.
- It streamlines the installation in public cloud environments by eliminating an extra build stage and reducing the size of the final image.
- We have demonstrated how to install and test HMM on Leap 15.5.
-
Turing and later ↩
One year of Tumbleweed
More than a year has passed since I switched to openSUSE Tumbleweed Linux distribution, in both, my work computer (for obvious reasons) and in my personal computer and I can say that I'm really happy with the change.
Tumbleweed is a rolling release distribution, and in this kind of distributions there are a lot of changes every week, if you want the latest software, this kind of distribution is the way to go. But with high update frequency you are exposed to some kind of instability, it's impossible to have the latest changes without some broken program here and there, because not everyone is able to follow upstream changes without some weeks or months to update.
My distro history
I've been always a Linux user, since I get my first computer at 2003. In those days I was using a debian base distribution called knoppix. Then I switched to ubuntu when it appeared around 2004. But at that time I was a computer science student and I was exploring the whole free software and Linux ecosystem, so I was changing my distribution every time that I found a new one.
Like a lot of distro-hoppers, at some point I landed at ArchLinux and there I discovered the rolling release concept. And that was my home for some time, it was nice to have the latest available software just after the release.
At some point I bough a new computer and it was too new to work correctly with the kernel distributed in ArchLinux, so I tried different distributions and at that moment Fedora was the distro that works without too much complications with that computer, so I picked that one.
In 2019 I started to work at Endless and at that time I should try the EndlessOS, so I played a bit with the dual boot, having Fedora and EndlessOS at the same time. That was the first time that I get in contact with immutable distributions, something that's getting more popular everyday, but this distributions rely a lot on containers (flatpak, podman) and, even being something that could work, as a software engineer, I don't feel comfortable enough needing a container with another distribution to do something that could be in my system.
In 2022 I started to work at SUSE and for the first time I tried the openSUSE distribution until today.
The Tumbleweed
Today I've three different computers with Tumbleweed running. One for work, Thinkpad T14s, one for personal usage, Dell inspiron 5490, and another one as a personal media server, Libre computer La-frite.
The best thing of having Tumbleweed for me is that I get the latest GNOME as soon as it's released. And another big thing for this distribution is how easy it's to fix something upstream thanks to the Open Build Service, but I work everyday with that, so I'm biased. For sure, any other community distribution has different ways to contribute, but I find this one easy enough.
Even being a rolling release distro, Tumbleweed doesn't break a lot. I can't say that it's stable, because the API of everything is broken everyday, but the distribution is tested for every release and at least some level of package compatibility check is done. That makes Tumbleweed a good distribution and I can update without fearing some weird package breakage.
I usually update my work and personal laptops once a week, and la-frite not so often, maybe every 6 months.
With the default installation, Tumbleweed uses btrfs with snapshots, and it's really easy to go back and forward using the snapper tool. So it's really easy to go back to a good state if the distribution is broken for some reason, and wait for a fix.
The problems that I found during this year
- Some problems with the NVidia graphic card in my Dell laptop, some times the kernel and the driver were not working correctly. I had to use snapper to get the NVidia working again, but fixed a few days later.
- Currently I'm having some random crashes because some bug with amdgpu and wayland and mutter, but it's not too annoying for me to go back, so I didn't use snapper this time and I'm facing this random crashes waiting for the fix.
Long live the Tumbleweed
So far so good. Tumbleweed is a nice distribution that I'm enjoying. It's not getting in the way and I can find almost anything that I need for work, programming, gaming, media, etc. I'm really happy with this distribution and it's the perfect distribution for people like me, that want to have the latest things.
I know that there are other openSUSE flavors that are interesting, like the immutable ones, Leap or the latest one Slowroll, but Tumbleweed is the one for me.
Tumbleweed's Graphic Updates Shine
This week’s openSUSE Tumbleweed snapshots brings clarity for graphics thanks to updates of multiple graphics and imaging packages.
Package updates for Mesa, GTK, ImageMagick, webkit2gtk3, GraphicsMagick and others arrived in openSUSE’s rolling release.
Mesa has in update in the latest snapshot to be released, 20231003. The 23.2.0 version of Mesa and Mesa-drivers has fixes for handling bindless images, limiting flag use to some color surfaces and enables the VK_EXT_mesh_shader extensions for the Vulkan graphics Application Programming Interface where supported. The update of gtk4 4.12.3 had widget enhancements, fixes a widget crash in the GIMP Toolkit library and memory leaks in the Broadway renderer. The 2.42.1 version of webkit2gtk3 addresses issues such as enabling the HTML5 database setting to properly control the IndexedDB API and switches a package to allow for flexibility in choosing different International Components for Unicode development packages. An update of GraphicsMagick 1.3.42 arrived in the snapshot that brought fixes for TIFF and for reading various BMP sub-formats. The Swiss Army knife of image processing also has new features include the ability to read and write BMP using JPEG compression and support for reading BMP files with PNG compression. An update of systemd 254.5 made some adjustments in the package spec file for better compatibility with Leap and SUSE Linux Enterprise. Several other packages were updated in the snapshot.
The versatile image manipulation software package ImageMagick updates to version 7.1.1.18 in snapshot 20231001. This beta release addresses multiple static analyzer issues, eliminates compiler warnings and has some cosmetic changes. An update of php8 and apache2-mod_php8 8.2.11 address some RISC-V compatibility issues and a couple of memory leaks. An update of suse-module-tools 16.0.36 addresses a critical security vulnerability identified as CVE-2023-1829. This exploit could lead to a privilege in escalation. A command line device management update for the storage and transfer protocol enhances performance after allocating a payload buffer within the create-ns command; the nvme-cli 2.6 package also blacklists specific modules to address security and compatibility concerns. The package also has various improvements for plugins and utilities. The above package along with the update of libnvme has the most changes in the snapshot. The libnvme 1.6 update enhances Python compatibility, introduces some functions to parse and retrieve various features and has various improvements related to fabric handling, subsystem matching algorithms, and context checks. Several other packages were updated including a 1.2.0 beta version of xdg-utils, which enhances support for LXQt Desktop Environment, provides better handling of spaces in .desktop file paths and fixes some shell scripting.
The 20230929 snapshot updates the Mozilla Firefox browser to new major version 118.0.1. The update addresses 10 Common Vulnerabilities and Exposures include a heap buffer overflow, memory leak, memory corruption and double-free problems. The browser also temporarily deactivates KDE integration while adding a patch. Another major update in the snapshot was argyllcms 3.0. This color management package has an extensive rewrite for icclib to ensure future-proofing, has new measuring features and fixes for instrument-related bugs. The update of gstreamer 1.22.6 and its several plugins fix latency regressions in the H.264, improves compatibility with various RTMP (Real-Time Messaging Protocol) and RTSP (Real-Time Streaming Protocol) servers and offers enhancements in signal printing for better clarity. The update of mpg123 1.32.2 addresses regressions from the 1.31 series and makes improvements to build logic and better handle large files. Several other packages updated in the snapshot including openssl-3 3.1.3, yast2-python-bindings major version 5.0.1 and more.
GNOME 45 Wallpapers
With the 45 release out the door, it would be a shame not to reveal some of the behind the scenes for the new wallpapers.
I'll start off by mentioning a lovely new addition by David Lapshin, Amber, leaning into Inkscape's mesh gradients.
The default has shown no dramatic departure from the triangles/hexagons of the previous releases. The on-brand triangle theme has been kept, but the implementation is very different. The wallpaper is a result of a generated mesh using Blender's geometry nodes and color gradients derived from a pre-existing wallpaper texture. You can check the whole thing out in the provided blender project files in wallpaper-assets.
There have been updates to the existing set, the surprisingly popular Blobs have been tweaked, my personal favorite, Fold crops a bit better at the most common 16:9 aspect and Truchet is round yet again. Pixels have been updated to feature Circle apps.
Weirdly named Morphogenesis is one of my favorite additions this release. Based around a concept of reactive diffusion, described by none other than Allan Turing, features a little easter egg if you're into chasing those down.
GNOME 45 Wallpapers
With the 45 release out the door, it would be a shame not to reveal some of the behind the scenes for the new wallpapers.
I’ll start off by mentioning a lovely new addition by David Lapshin, Amber, leaning into Inkscape’s mesh gradients.
The default has shown no dramatic departure from the triangles/hexagons of the previous releases. The on-brand triangle theme has been kept, but the implementation is very different. The wallpaper is a result of a generated mesh using Blender’s geometry nodes and color gradients derived from a pre-existing wallpaper texture. You can check the whole thing out in the provided blender project files in wallpaper-assets.
There have been updates to the existing set, the surprisingly popular Blobs have been tweaked, my personal favorite, Fold crops a bit better at the most common 16:9 aspect and Truchet is round yet again. Pixels have been updated to feature Circle apps.
Weirdly named Morphogenesis is one of my favorite additions this release. Based around a concept of reactive diffusion, described by none other than Allan Turing, features a little easter egg if you’re into chasing those down.
Compressing HTTP traffic in syslog-ng
Network traffic is expensive in the cloud, and even a single syslog-ng instance can easily saturate the full bandwidth of a network connection. Compressing HTTP traffic was introduced in syslog-ng Version 4.4.0 and depending on your use case, you can cut down on your expenses on your networking or send more logs using the same budget or bandwidth.
Development of this feature was done using a locally installed OpenResty web server, and later tested using Sumologic. However, according to the docs it should also work with Splunk, Elasticsearch, and many other services accessible using the http() destination.
https://www.syslog-ng.com/community/b/blog/posts/compressing-http-traffic-in-syslog-ng
Fed Up With Spam and Misconduct? OBS Acts Accordingly
Skull Buster
I've been wanting to make a game for as long as I remember. Granted I don't remember much, it's been on my mind since I was about 10. But somehow it never happened. Regardless how low of a bar I've set, I've never gotten to it. Until today!
Pixel Art Obsession
During the lockdown years a good friend of mine became a fulltime game dev and I've been getting that sweet sweet smell of pixels in a very high concentration. You probably noticed my fascination for pixel art in recent years.

A couple of weeks ago I was introduced to PICO8. While it's confusingly identified as a fantasy console, what it really is, is a full blown platform. While writing LUA in a 128x128 screen is not everyone's thing, I have largely enjoyed the super constrained world of PICO8. Its 16 color palette, 8x8px sprites, a super quick iteration workflow with integrated graphics, sound effects and music editors. To put it in perspective, yes, GNOME icons are exactly the same size as the whole PICO8 environment. But its highly integrated editor allows for a very quick iterative approach and I love that.
HTML5 Release
You can give the game a shot here. Let me know what you think. Initially the game was pretty easy and approachable, but given how short it is, the final boss stage is now only for the devoted arcade lovers. :)
Let me know what you think of it and post your high scores at me on Mastodon/Fediverse!
I'm not planning to ever make another game, but I will definitely use the PICO8 platform to do some small interactive demos for projects like Weeklybeats. It's dope!
Skull Buster
I’ve been wanting to make a game for as long as I remember. Granted I don’t remember much, it’s been on my mind since I was about 10. But somehow it never happened. Regardless how low of a bar I’ve set, I’ve never gotten to it. Until today!
Pixel Art Obsession
During the lockdown years a good friend of mine became a fulltime game dev and I’ve been getting that sweet sweet smell of pixels in a very high concentration. You probably noticed my fascination for pixel art in recent years.

A couple of weeks ago I was introduced to PICO8. While it’s confusingly identified as a fantasy console, what it really is, is a full blown platform. While writing LUA in a 128x128 screen is not everyone’s thing, I have largely enjoyed the super constrained world of PICO8. Its 16 color palette, 8x8px sprites, a super quick iteration workflow with integrated graphics, sound effects and music editors. To put it in perspective, yes, GNOME icons are exactly the same size as the whole PICO8 environment. But its highly integrated editor allows for a very quick iterative approach and I love that.
HTML5 Release
You can give the game a shot here. Let me know what you think. Initially the game was pretty easy and approachable, but given how short it is, the final boss stage is now only for the devoted arcade lovers. :)
Let me know what you think of it and post your high scores at me on Mastodon/Fediverse!
![]()
I’m not planning to ever make another game, but I will definitely use the PICO8 platform to do some small interactive demos for projects like Weeklybeats. It’s dope!
Open VFS Framework for the free Desktop
A few days ago Volker Krause posted this blog about the Nextcloud conference - a very interesting read.
One of the topics is the VFS (Virtual Filesystem-) API for the Linux desktop. Indeed that is a topic for us at ownCloud as well, and I like to share our perspective on it, discussing it in the scope of the free desktop.
The topic is very important, as “syncing” of data from and to cloud storages has changed over time. From having all files mirrored from client to server and vice versa, it has now shifted to keep all files in the cloud, and have them as so called placeholders on the desktop. That means that most files on the client appear with size zero to save space, but the complete filesystem structure is available.
If a user starts to interact with such a dehydrated file, the content is of the file is downloaded transparently utilizing the cloud system client, for example ownClouds desktop client. The same happens when an application accesses such a file. As a result, the placeholders look and behave like the normal filesystem we are used to.
On Windows and on MacOSX, the problem is kind of solved. Both have added APIs to their OS that can be used to implement the access of data on the cloud.
On Linux, we do not have this kind of API yet. That means that it is close to impossible to implement this user experience. Volker already said that desktop environment specific solutions probably do not scale, which I agree with.
At ownCloud we have looked into the implementation of a specific FUSE file system. That should certainly be possible, and is probably a part of the solution, but is considerable effort because of the asynchronous nature of the topic. Given that the market share of Linux desktop systems is pretty small it is not attractive for companies to invest a lot into a Linux only system. Here the power of community could make a difference again.
It would be best if we as open source community would come up with a shared solution as a free desktop standard, that might be oriented on one of the existing APIs, maybe the MacOSX File Provider API: A library and little framework that the linux desktop environments can work with abstracting the VFS.
While collaborating on that, all data clouds could implement the bindings to their storage. With that, the extra implementation efforts for the Linux solution hopefully wouldnt be dramatic any more.
Let’s call this system openVFS as a work title. How can we evolve it? I’d like to invite all interested parties to discuss in this temporary Github repo to collect ideas and opinions. There is also a little experimental code.