Fri, Mar 3rd, 2023

openSUSE Tumbleweed – Review of the week 2023/09

Dear Tumbleweed users and hackers,

The weather is unpredictable and here changes from almost spring-like back to winter in a few days. In this world I am happy to have one constant: snapshot delivering 7 snapshots in as many days (0223…0301). Snapshot 0226 has not been announced to the mailing list as it did not contain any change of packages that are part of the DVD)

The snapshots delivered these changes:

  • gimp 2.10.34
  • Node.JS 19.7.0
  • SQLite 3.41.0
  • KDE Plasma 5.27.1
  • NetworkManager 1.42.2
  • MariaDB 10.10.3
  • Linux kernel 6.2.0 & linux-glibc-devel 6.2
  • cURL 7.88.1
  • make 4.4.1
  • Mesa 23.0.0
  • AppArmor 3.1.3 (fixes for log format change in kernel 6.2)
  • zstd 1.5.4

The most relevant changes being prepared in staging projects are:

  • Linux kernel 6.2.1
  • libqt5: drop support for systems without SSE2 to fix boo#1208188 (i586 related)
  • Podman 4.4.2
  • KDE Gear 22.12.3
  • KDE Plasma 5.27.2
  • SELinux 3.5
YaST Team posted at 12:00

Systemd Container and Podman in GitHub CI

As D-Installer consists of several components like D-Bus backend, CLI or web frontend, we see a need to test in CI that each component can start and communicate properly with each other. For this we use a test framework and more importantly GitHub CI where we need a systemd container which is not documented at all. In the following paragraphs we would like to share with you how we did it so that so that each of you can be inspired by it or use it for your own project.

A Container Including Systemd

We created a testing container in our build service that includes what is needed for the backend and the frontend. After some iterations, we discovered that we depend on NetworkManager which is really coupled with systemd. Additionally, we needed to access the journal for debugging purposes, which also does not work without systemd. For those reasons, we decided to include systemd.

If you are interested, you can find the container in YaST:Head:Containers/d-installer-testing although you should know that it has some restrictions (e.g., the first process must be systemd init).

GitHub CI

Asking Google, we found no relevant information about running a systemd container on GitHub CI. However, we read a piece of advice about using Podman for such containers due to its advanced support for systemd (which is enabled by default). But for using Podman, most of the answers suggested using a self-hosting workers, something we do not want to maintain.

Just running the systemd container on GitHub CI does not work because GitHub sets the entry point to tail -f (i.e., systemd init is not the first process). However, we noticed that the Ubuntu VM host in GitHub comes with Podman pre-installed. So the solution became obvious.

GitHub CI, Podman and a Systemd Container

The idea is to run Podman as steps in GitHub CI using our systemd container. We did not define container keyword at all but just run it manually. Each step is encapsulated in a podman container exec.

A configuration example looks like this:

  integration-tests:
    runs-on: ubuntu-latest

    steps:

    - name: Git Checkout
      uses: actions/checkout@v3

    - name: start container
      run: podman run --privileged --detach --name dinstaller --ipc=host -v .:/checkout registry.opensuse.org/yast/head/containers/containers_tumbleweed/opensuse/dinstaller-testing:latest

    - name: show journal
      run:  podman exec dinstaller journalctl -b

This snippet checks out the repository, starts the container and prints the journal’s contents. The important part is to set the name of the container so you can use it in the exec calls. Moreover, it mounts the Git check out on the container so it is accessible from there.

Of course, how to use the container to do the real testing is up to you, but at least you have a working systemd container and you can inspect the logs. You can see the full configuration in action in the pull request https://github.com/yast/d-installer/pull/425.

Remaining Issues

For the integration testing of D-Installer we still face some issues:

  • Different kernel: the Ubuntu VM has a different kernel and we found out that the device mapper kernel module is missing.
  • Restricted privileges: Not all actions are possible, even on a privileged container. For instance, we cannot mount the host /var/log to container’s /var/log, which was pretty convenient to get log artifacts.
  • Cannot test the whole installation: We cannot “overwrite” the host VM, so it is not possible to perform a full end-to-end testing of the whole installation process. Not to mention that it might take quite some time (and for each pull request). But that’s fine: our openQA instance will take care of running those tests, although we are still discussing the best way to achieve that.

Mesa, Flatpak, Plasma Update in Tumbleweed

This week openSUSE Tumbleweed users learned of the performance optimizations gained with changes for x86-64-v3 and received a few snapshots.

Some of the packages to arrive this week included software for KDE users, gamers and people beginning their Linux journey.

Snapshot, 20230301 delivered a new major version of a 3D graphics library. Mesa 23.0.0 was announced by Dylan Baker, who highlighted all the community’s improvements, fixes and changes for the release. A major Link Time Optimization leak was fixed in the release and several Radeon (RADV) drivers and Zink Vulkan fixes became available with the release. AppStream 0.16.1 updated is documentation and fixed some behavior with the binding helper macros. Flatpak 1.14.3 introduced splitting an upgrade into two steps for the wrapper. It also introduces the filename in an error message if an app has an invalid syntax in its overrides or metadata. The Linux Apps Summit, which covers Flatpak, AppImage, Snap, will take place in Brno, Czech Republic, next month and is a great event to hear from developers working on cross-distro solutions in the application space. The second update of the week for sudo arrived in the snapshot. The 1.9.13p2 fixed an --enable-static-sudoers option arriving in the 20230225 snapshot. An update of apparmor 3.1.3 added support for more audit.log formats, fixed a parser bug and fixed boo#1065388, which had progressed to be resolved over a five-year period.

The 20230228 snapshot took care of a few Common Vulnerabilities and Exposures, which arrived in the curl 7.88.1 update. Daniel Stenberg knocked out a video about the bug fixes in 7.88.1, but the video about the 7.88 release covers the CVEs like CVE-2023-23916, which would cause a malloc bomb and make curl end up spending enormous amounts of allocated heap memory. CVE-2023-23914 and CVE-2023-23915 were also covered in his video. The kernel-source was updated to 6.2.0, which refreshed and updated configurations like the disabling of a misdesigned mechanism that doesn’t fit well with the v2 cgroups kernel feature. The utility package to maintain groups of programs make updated to version 4.4.1, which had a backward-incompatibility warning related to visibility of a flag inside a makefile. The make release provides new features like being able to override the built-in rules with a slightly different set of rules to use parallel builds, which previously was not possible to use with archives. The text editor vim 9.0.1357 update fixed several problems including a crash when using an unset object variable and a cursor being in the wrong position with virtual text ending in a multi-byte character. The package for diagnostic, debugging and instructional userspace, strace, updated to version 6.2 and implemented collision resolution for overlapping ioctl commands from tty and subsystems.

The file changes for 20230227 fixed some crashes with the mlterm package and a CVE-2022-24130 patch was added.

The 20230225 snapshot updated ImageMagick 7.1.0.62. The image editor had some security updates, eliminated compiler warnings and Block Compression 5. The update of NetworkManager 1.42.2 added a new setting to control whether to remove the local route rule that is automatically generated. The network package also fixed a race condition when setting the MAC address of an Open vSwitch interface. Updated translations arrived in the glib2 2.74.6 version along with some bug fixes, and the mariadb 10.10.3 release fixed a crash recover with InnoDB. The package also removed some InnoDB Buffer Pool load throttling and a shutdown hang when the change buffer is corrupted. A major version for the device memory enabling project arrived in the snapshot; ndctl 76 has a new command to monitor CXL events. Other packages up update in the snapshot were sudo 1.9.13p1, yast2-security 4.5.6, zstd 1.5.4 and more.

There is no doubt that snapshot 20230224 was a Plasma snapshot. All the packages to update in the snapshot were KDE related. The Plasma 5.27.1 update had its fill of bug fixes, and a few were related to packages that would come later in the week. The Discover software center had some fixes related to Flatpak and AppStream. There were a large amount of KWin changes and a couple related to Wayland. A potential crash for screen management package libkscreen was resolved with new setting configurations. Power consumption package powerdevil fixed a bug about the charging limit.

For the next two weeks, there won’t be a Tumbleweed blog providing updates on the week’s snapshots. Tumbleweed users are encouraged to subscribe to the Factory mailing list where the release manager posts an update about the rolling release and highlights a few packages that are forthcoming for the distribution.

Thu, Mar 2nd, 2023

openSUSE Tumbleweed gains optional x86-64-v3 optimization

Tumbleweed users who performed a distribution upgrade or zypper dup the last weeks on the rolling release with “recommended packages” enabled (the default) and matching hardware received a new package named patterns-glibc-hwcaps-x86_64_v3 automatically installed. This is a new Tumbleweed feature which will also automatically install the “recommended” package named with the -x86-64-v3 name suffix that provides the optimized version of the library.

“The performance optimizations people will gain from this change is the result of much effort and discussion,” said Douglas DeMaio, a member of the openSUSE release team. “The x86-64 architecture thread on the mailing list really drove the discussion and the results will immediately provide performance improvements for those with x86-64-v3 hardware. It would be great if people write about these improvements so the results can be shared among users of our rolling release.”

This is the result of many days of effort that have recently been completed to leverage the glibc HWCAPS feature that was released in glibc 2.33. This functionality allows the Tumbleweed dynamic linker to load hardware-optimized versions of shared libraries seamlessly and transparently to the user, which provides in certain cases a measurable performance benefit. Tumbleweed users with hardware that is not compatible will fall back to the still available baseline version of the shared library and hence experience no drawback. This provides a good interoperability experience while allowing for some performance improvements to those users on recent enough x86-64 hardware. This is most useful for packages that do not have custom dispatching to optimized routines. For containerized applications, this approach provides compatibility with a wide range of hardware while optimizing, where possible, on recent CPUs capabilities.

Only very few packages are enabled at this time, but more can come over time as individual benchmarking proves a benefit to creating an extra version. For an openSUSE contributor, the creation of these optimized versions is hidden behind a single spec macro that requires little other maintenance or packaging efforts.

If for some reason a Tumbleweed user is not interested in the functionality, they can deinstall the patterns-glibc-hwcaps-x86_64_v3 package and “lock it” so that it will not be selected again. Then no optimized versions will be installed in the future on your system.

Wed, Mar 1st, 2023

Open Source Policy Update Spotlights AI Considerations

A recent update of SUSE’s Open Source Policy is giving developers, communities and projects food for thought as Artificial Intelligence chatbots and protocols are gaining popularity and are being integrated into the fabric of global society.

The policy is specific to all SUSE employees; the ambition, however, is that open-source communities and developers give the policy careful consideration and that the policy will inspire other companies to adopt or introduce an open-source policy.

“Our ‘Contributing to Open Source Projects’ policy means that we identify collaboration and contribution opportunities with existing upstream projects for new open source projects as well,” according to text from the updated policy. “The legal constructs around AI pair programming with respect to licensing and potential violations are not resolved.”

Considering the recommendation licensing is a good default to avoid future conflicts. The policy is for the code, but there are some other points to consider.

SUSE uses Open Source Initiative approved licenses. Other cases are handled on an exceptional basis.

When the project is part of a larger open-source ecosystem, use an exisitng compatible license from within the ecosystem. This applies to both code and non-code licenses.

SUSE’s licensing recommendation for brand new software projects is context specific; the default is Apache-2.0. For copyleft oriented projects, GPL-2.0-or-later. is recommended. SUSE recommends CC BY-SA 4.0 for documentation and artwork.

AI pair programming is not currently used by SUSE employees and will not until an annual review considers this to be changed. New employees of SUSE will be given training on the policy and the policy is expected to be revised and refreshed on an annual basis.

To see how this topic is viewed by an AI chatbot, ChatGPT was asked what considerations developers and companies need to know about artificial intelligence chatbots and other protocols with regard to open source policies. The answers provided seemed to confirm SUSE taking a good approach with its Open Source Policy. The chatbot gave six points to consider; those were licensing compatibility, intellectual property rights, source code availability, attribution, liability and data privacy. Future changes were also listed in another related question about keeping policies fresh to remain compliant with new requirements.

Following the recommendations of the policy could help avoid conversation like like that pictured above with ChatGPT, which relates to the GitHub and OpenAI project copilot.

Tue, Feb 28th, 2023

Syslog-ng 101, part 9: Filters

This is the ninth part of my syslog-ng tutorial. Last time, we learned about macros and templates. Today, we learn about syslog-ng filters. At the end of the session, we will see a more complex filter and a template function.

You can watch the video on YouTube:

and the complete playlist at https://www.youtube.com/playlist?list=PLoBNbOHNb0i5Pags2JY6-6wH2noLaSiTb

Or you can read the rest the tutorial as a blog at: https://www.syslog-ng.com/community/b/blog/posts/syslog-ng-101-part-9-filters

syslog-ng logo

Nheko | Matrix Client written in Qt on openSUSE

Matrix is a secure, decentralised, real-time, communication protocol that allows you to send messages and pictures free from the encumberments of a centralized authority. You can look at Matrix as an alternative to using Telegram, WhatsApp, Discord, etc. Confusingly, Matrix is a protocol not a client. There are many clients you can choose, the most … Continue reading Nheko | Matrix Client written in Qt on openSUSE

Sun, Feb 26th, 2023

Linux Saloon | 25 Feb 2023 | CentOS Stream 9

This Linux Saloon was very education in the realm of all things relating to CentOS, Fedora and Red Hat Enterprise Linux in the way they all interact. Without question I learned a lot. At 33 minutes into the show, Neal, a developer and enthusiast of Fedora and CentOS, explains the relationship CentOS has with Red … Continue reading Linux Saloon | 25 Feb 2023 | CentOS Stream 9

Sat, Feb 25th, 2023

Reducing code size in librsvg by removing an unnecessary generic struct

Someone mentioned cargo-bloat the other day and it reminded me that I have been wanting to measure the code size for generic functions in librsvg, and see if there are improvements to be made.

Cargo-bloat can give you a rough estimate of the code size for each Rust crate in a compiled binary, and also a more detailed view of the amount of code generated for individual functions. It needs a [bin] target to work on; if you have just a [lib], it will not do anything. So, for librsvg's purposes, I ran cargo-bloat on the rsvg-convert binary.

$ cargo bloat --release --crates
    Finished release [optimized] target(s) in 0.23s
    Analyzing target/release/rsvg-bench

 File  .text     Size Crate
10.0%  38.7%   1.0MiB librsvg
 4.8%  18.8% 505.5KiB std
 2.5%   9.8% 262.8KiB clap
 1.8%   7.1% 191.3KiB regex
 ... lines omitted ...
25.8% 100.0%   2.6MiB .text section size, the file size is 10.2MiB

Note: numbers above are a result of guesswork. They are not 100% correct and never will be.

The output above is for cargo bloat --release --crates. The --release option is to generate an optimized binary, and --crates tells cargo-bloat to just print a summary of crate sizes. The numbers are not completely accurate since, for example, inlined functions may affect callers of a particular crate. Still, this is good enough to start getting an idea of the sizes of things.

In this case, the librsvg crate's code is about 1.0 MB.

Now, let's find what generic functions we may be able to condense. When cargo-bloat is run without --crates, it prints the size of individual functions. After some experimentation, I ended up with cargo bloat --release -n 0 --filter librsvg. The -n 0 option tells cargo-bloat to print all functions, not just the top N biggest ones, and --filter librsvg is to make it print functions only in that crate, not for example in std or regex.

$ cargo bloat --release -n 0 --filter librsvg

File .text    Size   Crate Name
0.0%  0.0%  1.2KiB librsvg librsvg::element::ElementInner<T>::new
0.0%  0.0%  1.2KiB librsvg librsvg::element::ElementInner<T>::new
0.0%  0.0%  1.2KiB librsvg librsvg::element::ElementInner<T>::new
... output omitted ...
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
0.0%  0.0%    825B librsvg librsvg::element::ElementInner<T>::set_style_attribute
... output omitted ...
0.0%  0.0%    358B librsvg librsvg::element::ElementInner<T>::get_cond
0.0%  0.0%    358B librsvg librsvg::element::ElementInner<T>::get_cond
0.0%  0.0%    358B librsvg librsvg::element::ElementInner<T>::get_cond
... etc ...

After looking a bit at the output, I found the "duplicated" functions I wanted to find. What is happening here is that ElementInner<T> is a type with generics, and rustc is generating one copy of each of its methods for every type instance. So, there is one copy of each method for ElementInner<Circle>, one for ElementInner<Rect>, and so on for all the SVG element types.

The code around that is a bit convoluted; it's in a part of the library that hasn't gotten much cleanup after the C to Rust port and initial refactoring. Let's see what it is like.

The initial code

Librsvg parses the XML in an SVG document and builds something that resembles a DOM tree. The tree itself uses the rctree crate; it has reference-counted nodes and functions like first_child or next_sibling. Nodes can represent XML elements, or character content inside XML tags. Here we are interested in elements only.

Consider an element like this:

<path d="M0,0 L10,10 L0,10 Z" fill="black"/>

Let's look at how librsvg represents that. Inside each reference-counted node in an rctree, librsvg keeps a NodeData enum that can differentiate between elements and character content:

enum NodeData {
    Element(Element),
    Text(Chars),
}

Then, Element is an enum that can distinguish between all the elements in the svg namespace that librsvg supports:

enum Element {
    Circle(Box<ElementInner<Circle>>),
    Ellipse(Box<ElementInner<Ellipse>>),
    Path(Box<ElementInner<Path>>),
    // ... about 50 others omitted ...
}

Inside each of those enum's variants there is an ElementInner<T>, a struct with a generic type parameter. ElementInner holds the data for the DOM-like element:

struct ElementInner<T: ElementTrait> {
    element_name: QualName,
    attributes: Attributes,
    // ... other fields omitted
    element_impl: T,
}

For the <path> element above, this struct would contain the following:

  • element_name: a qualified name path with an svg namespace.
  • attributes: an array of (name, value) pairs, in this case (d, "M0,0 L10,10 L0,10 Z"), (fill, "black").
  • element_impl: A concrete type, Path in this case.

The specifics of the Path type are not terribly interesting here; it's just the internal representation for Bézier paths.

struct Path {
    path: Rc<SvgPath>,
}

Let's look at the details of the memory layout for all of this.

Initial memory layout

Here is how the enums and structs above are laid out in memory, in terms of allocations, without taking into account the rctree:Node that wraps a NodeData.

NodeData enum, and ElementInner<T> (description in text)

There is one allocated block for the NodeData enum, and that block holds the enum's discriminant and the embedded Element enum. In turn, the Element enum has its own discriminant and space for a Box (i.e. a pointer), since each of its variants just holds a single box.

That box points to an allocation for an ElementInner<T>, which itself contains a Path struct.

It is awkward that the fields to hold XML-isms like an element's name and its attributes are in ElementInner<T>, not in Element. But more importantly, ElementInner<T> has a little bunch of methods:

impl<T: ElementTrait> ElementInner<T> {
    fn new(...) -> ElementInner<T> {
        // lots of construction
    }

    fn element_name(&self) -> &QualName {
        ...
    }

    fn get_attributes(&self) -> &Attributes {
        ...
    }

    // A bunch of other methods
}

However, none but one of these methods actually use the element_impl: T field! That is, all of them do things that are common to all element types. The only method that really deals with the element_impl field is the ::draw() method, and the only thing it does is to delegate down to the concrete type's implementation of ::draw().

Removing that generic type

So, let's shuffle things around. I did this:

  • Turn enum Element into a struct Element, with the fields common to all element types.

  • Have an Element.element_data field...

  • ... that is of type ElementData, an enum that actually knows about all supported element types.

There are no types with generics in here:

struct Element {
    element_name: QualName,
    attributes: Attributes,
    // ... other fields omitted
    element_data: ElementData,
}

enum ElementData {
    Circle(Box<Circle>),
    Ellipse(Box<Ellipse>),
    Path(Box<Path>),
    // ...
}

Now the memory layout looks like this:

NodeData enum with boxes, Element, and ElementData (description in text)

One extra allocation, but let's see if this changes the code size.

Code size

We want to know the size of the .text section in the ELF file.

# old
$ objdump --section-headers ./target/release/rsvg-bench
Idx Name          Size      VMA               LMA               File off  Algn
 15 .text         0029fa17  000000000008a060  000000000008a060  0008a060  2**4
(2750999 bytes)

# new
Idx Name          Size      VMA               LMA               File off  Algn
 15 .text         00271ff7  000000000008b060  000000000008b060  0008b060  2**4
(2564087 bytes)

The new code is is 186912 bytes smaller. Not earth-shattering, but cargo-bloat no longer shows duplicated functions which have no reason to be monomorphized, since they don't touch the varying data.

old:

$ cargo bloat --release --crates
 File  .text     Size Crate
10.0%  38.7%   1.0MiB librsvg
# lines omitted
25.8% 100.0%   2.6MiB .text section size, the file size is 10.2MiB

new:

$ cargo bloat --release --crates
 File  .text     Size Crate
 9.2%  37.5% 939.5KiB librsvg
24.6% 100.0%   2.4MiB .text section size, the file size is 10.0MiB

Less code should help a bit with cache locality, but the functions involved are not in hot loops. Practically all of librsvg's time is spent in Cairo for rasterization, and Pixman for compositing.

Dynamic dispatch

All the concrete types (Circle, ClipPath, etc.) implement ElementTrait, which has things like a draw() method, although that is not visible in the types above. This is what is most convenient for librsvg; using Box<ElementTrait> for type erasure would be a little awkward there — we used it a long time ago, but not anymore.

Eventually the code needs to find the ElementTrait vtable that corresponds to each of ElementData's variants:

let data: &dyn ElementTrait = match self {
    ElementData::Circle(d) =>   &**d,
    ElementData::ClipPath(d) => &**d,
    ElementData::Ellipse(d) =>  &**d,
    // ...
};

data.some_method_in_the_trait(...);

The ugly &**d is to arrive at the &dyn ElementTrait that each variant implements. It will get less ugly when pattern matching for boxes gets stabilized in the Rust compiler.

This is not the only way of doing things. For librsvg it is convenient to actually know the type of an element, that is, to keep an enum of the known element types. Other kinds of code may be perfectly happy with the type erasure that happens when you have a Box<SomeTrait>. If that code needs to go back to the concrete type, an alternative is to use something like the downcast-rs crate, which lets you recover the concrete type inside the box.

Heap usage actually changed

You may notice in the diagrams below that the original NodeData didn't box its variants, but now it does.

Old:

enum NodeData {
    Element(Element),
    Text(Chars),
}

New:

enum NodeData {
    Element(Box<Element>),
    Text(Box<Chars>),
}

One thing I didn't notice during the first round of memory reduction is that the NodeData::Text(Chars) variant is not boxed. That is, the size of NodeData enum is the size of the biggest of Element and Chars, plus space for the enum's discriminant. I wanted to make both variants the same size, and by boxing them they occupy only a pointer each.

I measured heap usage for a reasonably large SVG:

India Roadway Map, from Wikimedia Commons

I used Valgrind's Massif to measure peak memory consumption during loading:

valgrind --tool=massif --massif-out-file=massif.out ./target/release/rsvg-bench --num-load 1 --num-render 0 India_roadway_map.svg
ms_print massif.out

The first thing that ms_print shows is an overview of the program's memory usage over time, and the list of snapshots it created. The following is an extract of its output for the new version of the code, where snapshot 36 is the one with peak memory usage:

MB
14.22^                                                                      : 
     |                                                @#::::::::::::::::::::: 
     |                                              @@@#:      :::: :: ::: :: 
     |                                            @@@@@#:      :::: :: ::: :: 
     |                                          @@@ @@@#:      :::: :: ::: :: 
     |                                        @@@ @ @@@#:      :::: :: ::: :: 
     |                                       @@@@ @ @@@#:      :::: :: ::: :: 
     |                                    @@@@@@@ @ @@@#:      :::: :: ::: :: 
     |                                  @@@@ @@@@ @ @@@#:      :::: :: ::: :: 
     |                                 @@ @@ @@@@ @ @@@#:      :::: :: ::: :::
     |                               @@@@ @@ @@@@ @ @@@#:      :::: :: ::: :::
     |                              @@ @@ @@ @@@@ @ @@@#:      :::: :: ::: :::
     |                             @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: :::
     |                          @@@@@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: :::
     |                        :@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
     |                     @@@:@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
     |                 @@@@@ @:@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
     |              :::@ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
     |            :@:: @ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
     |     @@@@::::@:: @ @ @ @:@@@ @@@ @@ @@ @@@@ @ @@@#:      :::: :: ::: ::@
   0 +----------------------------------------------------------------------->Mi
     0                                                                   380.9

Number of snapshots: 51
 Detailed snapshots: [3, 4, 5, 9, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 (peak), 50]

Since we are just measuring memory consumption during loading, the chart shows that memory usage climbs steadily until it peaks when the complete SVG is loaded, and then it stays more or less constant while librsvg does the initial CSS cascade.

The version of librsvg without changes shows this (note how the massif snapshot with peak usage is number 39 in this one):

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 39    277,635,004       15,090,640       14,174,848       915,792            0

That is, 15,090,640 bytes.

And after making the changes in memory layout, we get this:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 36    276,041,810       14,845,456       13,935,702       909,754            0

I.e. after the changes, the peak usage of heap memory when the whole file is loaded is 14,845,456 bytes. So the changes above not only reduced the code size, but also slightly lowered memory consumption at runtime. Nice!

Wall-clock performance

This file is not huge — say, 15 MB when loaded — so whatever we gained in memory consumption is a negligible win. It's nice to know that code size can be reduced, but it is not a problem for librsvg either way.

I did several measurements of the time used by the old and new versions to render the same file, and there was no significant difference. This is because although we may get better cache locality and everything, the time spent executing the element-related code is much smaller than the rendering code. That is, Cairo takes up most of the runtime of rsvg-convert, and librsvg itself takes relatively little of it.

Conclusion

At least for this case, it was feasible to reduce the amount of code emitted for generics, since this is a case where we definitely didn't need generics! The code size in the ELF file's .text section shrank by 186912 bytes, out of 2.6 MB.

For code that does need generics, one can take different approaches. For example, a function that take arguments of type AsRef<Path> can first actually obtain the &Path, and then pass that to a function that does the real work. For example, from the standard library:

impl PathBuf {
    pub fn push<P: AsRef<Path>>(&mut self, path: P) {
        self._push(path.as_ref())
    }

    fn _push(&mut self, path: &Path) {
        // lots of code here
    }
}

The push function will be monomorphized into very tiny functions that call _push after converting what you passed to a &Path reference, but the big _push function is only emitted once.

There is also the momo crate, which helps doing similar things automatically. I have not used it yet, so I can't comment further on it.

You can see the patches for librsvg in the merge request.

Fri, Feb 24th, 2023

Project Killswitch Travel Case Review | SteamDeck

Seems like I’ve been doing a lot of SteamDeck writing lately. It’s well overdue. I have a backlog of things that I need to clear out that are all partially done. My experience of Project Killswitch by dbrand is one of those items. I received it, after I purchased my JSAUX case and am really … Continue reading Project Killswitch Travel Case Review | SteamDeck