Skip to main content

the avatar of Federico Mena-Quintero

Reducing memory consumption in librsvg, part 2: SpecifiedValues

To continue with last time's topic, let's see how to make librsvg's DOM nodes smaller in memory. Since that time, there have been some changes to the code; that is why in this post some of the type names are different from last time's.

Every SVG element is represented with this struct:

pub struct Element {
    element_type: ElementType,
    element_name: QualName,
    id: Option<String>,
    class: Option<String>,
    specified_values: SpecifiedValues,
    important_styles: HashSet<QualName>,
    result: ElementResult,
    transform: Transform,
    values: ComputedValues,
    cond: bool,
    style_attr: String,
    element_impl: Box<dyn ElementTrait>,
}

The two biggest fields are the ones with types SpecifiedValues and ComputedValues. These are the sizes of the whole Element struct and those two types:

sizeof Element: 1808
sizeof SpecifiedValues: 824
sizeof ComputedValues: 704

In this post, we'll reduce the size of SpecifiedValues.

What is SpecifiedValues?

If we have an element like this:

<circle cx="10" cy="10" r="10" stroke-width="4" stroke="blue"/>

The values of the style properties stroke-width and stroke get stored in a SpecifiedValues struct. This struct has a bunch of fields, one for each possible style property:

pub struct SpecifiedValues {
    baseline_shift:              SpecifiedValue<BaselineShift>,
    clip_path:                   SpecifiedValue<ClipPath>,
    clip_rule:                   SpecifiedValue<ClipRule>,
    /// ...
    stroke:                      SpecifiedValue<Stroke>,
    stroke_width:                SpecifiedValue<StrokeWidth>,
    /// ...
}

Each field is a SpecifiedValue<T> for the following reason. In CSS/SVG, a style property can be unspecified, or it can have an inherit value to force the property to be copied from the element's parent, or it can actually have a specified value. Librsvg represents these as follows:

pub enum SpecifiedValue<T>
where
    T: // some trait bounds here
{
    Unspecified,
    Inherit,
    Specified(T),
}

Now, SpecifiedValues has a bunch of fields, 47 of them to be exact — one for each of the style properties that librsvg supports. That is why SpecifiedValues has a size of 824 bytes; it is the largest sub-structure within Element, and it would be good to reduce its size.

Not all properties are specified

Let's go back to the chunk of SVG from above:

<circle cx="10" cy="10" r="10" stroke-width="4" stroke="blue"/>

Here we only have two specified properties, so the stroke_width and stroke fields of SpecifiedValues will be set as SpecifiedValue::Specified(something) and all the other fields will be left as SpecifiedValue::Unspecified.

It would be good to store only complete values for the properties that are specified, and just a small flag for unset properties.

Another way to represent the set of properties

Since there is a maximum of 47 properties per element (or more if librsvg adds support for extra ones), we can have a small array of 47 bytes. Each byte contains the index within another array that contains only the values of specified properties, or a sentinel value for properties that are unset.

First, I made an enum that fits in a u8 for all the properties, plus the sentinel value, which also gives us the total number of properties. The #[repr(u8)] guarantees that this enum fits in a byte.

#[repr(u8)]
enum PropertyId {
    BaselineShift,
    ClipPath,
    ClipRule,
    Color,
    // ...
    WritingMode,
    XmlLang,
    XmlSpace,
    UnsetProperty, // the number of properties and also the sentinel value
}

Also, since before these changes there was the following monster to represent "which property is this" plus the property's value:

pub enum ParsedProperty {
    BaselineShift(SpecifiedValue<BaselineShift>),
    ClipPath(SpecifiedValue<ClipPath>),
    ClipRule(SpecifiedValue<ClipRule>),
    Color(SpecifiedValue<Color>),
    // ...
}

I changed the definition of SpecifiedValues to have two arrays, one to store which properties are specified, and another only with the values for the properties that are actually specified:

pub struct SpecifiedValues {
    indices: [u8; PropertyId::UnsetProperty as usize],
    props: Vec<ParsedProperty>,
}

There is a thing that is awkward in Rust, or which I haven't found how to solve in a nicer way: given a ParsedProperty, find the corresponding PropertyId for its discriminant. I did the obvious thing:

impl ParsedProperty {
    fn get_property_id(&self) -> PropertyId {
        use ParsedProperty::*;

        match *self {
            BaselineShift(_) => PropertyId::BaselineShift,
            ClipPath(_)      => PropertyId::ClipPath,
            ClipRule(_)      => PropertyId::ClipRule,
            Color(_)         => PropertyId::Color,
            // ...
        }
    }
}

Initialization

First, we want to initialize an empty SpecifiedValues, where every element of the the indices array is set to the sentinel value that means that the corresponding property is not set:

impl Default for SpecifiedValues {
    fn default() -> Self {
        SpecifiedValues {
            indices: [PropertyId::UnsetProperty.as_u8(); PropertyId::UnsetProperty as usize],
            props: Vec::new(),
        }
    }
}

That sets the indices field to an array full of the same PropertyId::UnsetProperty sentinel value. Also, the props array is empty; it hasn't even had a block of memory allocated for it yet. That way, SVG elements without style properties don't use any extra memory.

Which properties are specified and what are their indices?

Second, we want a function that will give us the index in props for some property, or that will tell us if the property has not been set yet:

impl SpecifiedValues {
    fn property_index(&self, id: PropertyId) -> Option<usize> {
        let v = self.indices[id.as_usize()];

        if v == PropertyId::UnsetProperty.as_u8() {
            None
        } else {
            Some(v as usize)
        }
    }
}

(If someone passes id = PropertyId::UnsetProperty, the array access to indices will panic, which is what we want, since that is not a valid property id.)

Change a property's value

Third, we want to set the value of a property that has not been set, or change the value of one that was already specified:

impl SpecifiedValues {
    fn replace_property(&mut self, prop: &ParsedProperty) {
        let id = prop.get_property_id();

        if let Some(index) = self.property_index(id) {
            self.props[index] = prop.clone();
        } else {
            self.props.push(prop.clone());
            let pos = self.props.len() - 1;
            self.indices[id.as_usize()] = pos as u8;
        }
    }
}

In the first case in the if, the property was already set and we just replace its value. In the second case, the property was not set; we add it to the props array and store its resulting index in indices.

Results

Before:

sizeof Element: 1808
sizeof SpecifiedValues: 824

After:

sizeof Element: 1056
sizeof SpecifiedValues: 72

The pathological file from the last time used 463,412,720 bytes in memory before these changes. After the changes, it uses 314,526,136 bytes.

I also measured memory consumption for a normal file, in this case one with a bunch of GNOME's symbolic icons. The old version uses 17 MB; the new version only 13 MB.

How to keep fine-tuning this

For now, I am satisfied with SpecifiedValues, although it could still be made smaller:

  • The crate tagged-box converts an enum like ParsedProperty into an enum-of-boxes, and codifies the enum's discriminant into the box's pointer. This way each variant occupies the minimum possible memory, although in a separately-allocated block, and the container itself uses only a pointer. I am not sure if this is worth it; each ParsedProperty is 64 bytes, but the flat array props: Vec<ParsedProperty> is very appealing in a single block of memory. I have not checked the sizes of each individual property to see if they vary a lot among them.

  • Look for a crate that lets us have the properties in a single memory block, a kind of arena with variable types. This can be implemented with a bit of unsafe, but one has to be careful with the alignment of different types.

  • The crate enum_set2 represents an array of field-less enums as a compact bit array. If we changed the representation of SpecifiedValue, this would reduce the indices array to a minimum.

If someone wants to dedicate some time to implement and measure this, I would be very grateful.

Next steps

According to Massif, the next thing is to keep making Element smaller. The next thing to shrink is ComputedValues. The obvious route is to do exactly the same as I did for SpecifiedValues. I am not sure if it would be better to try to share the style structs between elements.

a silhouette of a person's head and shoulders, used as a default avatar

openSUSE Tumbleweed – Review of the week 2020/11 & 12

Dear Tumbleweed users and hackers,

Last week I missed, for personal reasons, to write up the report. So, slacking in one week means I have to catch up the other week. Of course, you are all eager to hear/read what is happening in Tumbleweed. In the period since covered, we have released 7 Snapshots (0305, 0306, 0307, 0309, 0311, 0312 and 0314). The major changes were:

  • Linux kernel 5.5.7
  • Python 3.8.2, with a lot of python modules being updated
  • Mesa 20.0.1
  • KDE Applications 19.12.3
  • KDE Plasma 5.18.3

Thins currently being staged or close to be shipped:

  • RPM: change of database format to ndb
  • Linux kernel 5.5.10
  • Qt 5.15.0 (currently betas being tested)
  • Ruby 2.7 – possibly paired with the removal of Ruby 2.6
  • GCC 10 as the default compiler
  • Removal of Python 2
  • GNU Make 4.3

the avatar of YaST Team

Highlights of YaST Development Sprint 95

Contents

Due to recent events, many companies all over the world are switching to a remote working model, and SUSE is not an exception. The YaST team is distributed so, for many members, it is not a big deal because they are already used to work in this way. For other folks it might be harder. Fortunately, SUSE is fully supporting us in this endeavor, so the YaST team has been able to deliver quite some stuff during this sprint, and we will keep doing our best in the weeks to come.

Before jumping into what the team has recently done, we would also like to bring your attention to the migration of our blog from the good old openSUSE Lizards blog platform to the YaST website. So, please, if you use some feeds reader, update the YaST blog URL to the new one.

Now, as promised, let’s talk only about software development. These days we are mainly focused on fixing bugs to make the upcoming (open)SUSE releases shine. However, we still have time to introduce some important improvements. Among all the changes, we will have a look at the following ones:

Expanding the Possibilities of Pervasive Encryption

Some months ago, in this dedicated blog post, we introduced the joys and benefits of the so-called pervasive encryption available for s390 mainframes equipped with a Crypto Express cryptographic coprocessor. As you may remember (and you can always revisit the post if you don’t), those dedicated pieces of hardware ensure the information at-rest in any storage device can only be read in the very same system where that information was encrypted.

But, what is better than a cryptographic coprocessor? Several cryptographic coprocessors! An s390 logical partition (LPAR) can have access to multiple crypto express adapters, and several systems can share every adapter. To configure all that, the concept of cryptographic domains is used. Each domain is protected by a master key, thus preventing access across domains and effectively separating the contained keys.

Now YaST detects when it’s encrypting a device in a system with several cryptographic domains. If that’s the case, the dialog for pervasive encryption allows specifying which adapters and domains must be used to generate the new secure key.

To succeed, all the used adapters/domains must be set with the same master key. If that’s not the case, YaST detects the circumstance and displays the corresponding information.

Install Missing Packages during Storage System Analysis

As our reader surely knows, YaST always ensures the presence of all the needed utilities when performing any operation in the storage devices, like formatting and/or encrypting them. If some necessary tool is missing in the system, YaST has always shown the following dialog to alert the user and to allow to install the missing packages with a single click.

But the presence of those tools was only checked at the end of the process, when YaST needed them to modify the devices. For example, in the screenshot above, YaST asked for btrfsprogs & friends because it wanted to format a new partition with that file system.

If the needed utility was already missing during the initial phase in which the storage devices are analyzed, the user had no chance to install the corresponding package. For example, if a USB stick formatted with Btrfs would have been inserted, the user would get an error like this when executing the YaST Partitioner or when opening the corresponding YaST module to configure the bootloader.

Btrfs old error message

Now that intimidating error is replaced by this new pop-up that allows to install the missing packages and restart the hardware probing. As usual, with YaST, expert users can ignore the warning and continue the process if they understand the consequences outlined in the new pop-up window.

Probe callback

We took the opportunity to fix other small details in the area, like better reporting when the YaST Partitioner fails to install some package, a more up-to-date list of possibly relevant packages per technology, and improvements in the source code organization and the automated tests.

Reporting Conflicting Storage Attributes in AutoYaST Profiles

If you are an AutoYaST user, you undoubtely know that it is often too quiet and offers little information about inconsistencies or potential problems in the profile. For simple sections, it is not a problem at all, but for complicated stuff, like partitioning, it is far from ideal.

In (open)SUSE 15 and later versions, and given that we had to reimplement the partitioning support using the new storage layer, we decided to add a mechanism to report some of those issues like missing attributes or invalid values. There is a high chance that, using an old profile in newer AutoYaST versions, you have seen some of those warnings.

Recently, a user reported a problem that caused AutoYaST to crash. While debugging the problem, we found both raid_name and lvm_group attributes defined in one of the partition sections. Obviously, they are mutually exclusive, but it is quite easy to overlook this situation. Not to mention that AutoYaST should not crash.

From now on, if AutoYaST detects such an inconsistency, it will automatically select one of the specified attributes, informing the user about the decision. You can see an example in the screenshot below.

AutoYaST conflicting attributes warning

For the time being, this check only applies to those attributes which determine how a device is going to be used (mount, raid_name, lvm_name, btrfs_name, bcache_backing_for, and bcache_caching_for), but we would like to extend this check in the future.

Usability Improvements in iSCSI-LIO-server Module

Recently, one of our developers detected several usability problems in the iSCSI LIO Server module, and he summarized them in a bug report. Apart from minor things, like some truncated and misaligned texts, he reported the UI to be quite confusing: it is not clear when authentication credentials are needed, and some labels are misleading. To add insult to injury, we found a potential crash when clicking the Edit button while we were addressing those issues.

As usual, a image is worth a thousand words. Below you can see how the old and confusing UI looked like.

Old iSCSI LIO Server Module UI

Now, let’s compare it with the new one, which is better organized and more approachable. Isn’t it?

New iSCSI LIO Server Module UI

Conclusion

It is possible that, during the upcoming weeks, we need to make some further adjustments to our workflow, especially when it comes to video meetings. But, at this point, everything is working quite well, and we are pretty sure that we will keep delivering at a good pace.

So, take care, and stay tuned!

the avatar of Federico Mena-Quintero

Librsvg accepting interns for Summer of Code 2020

Are you a student qualified to run for Summer of Code 2020? I'm willing to mentor the following project for librsvg.

Project: Revamp the text engine in librsvg

Librsvg supports only a few features of the SVG Text specification. It requires extra features to be really useful:

  • Proper bidirectional support. Librsvg supports the direction and unicode-bidi properties for text elements, among others, but in a very rudimentary fashion. It just translates those properties to Pango terminology and asks PangoLayout to lay out the text. SVG really wants finer control of that, for which...

  • ... ideally you would make librsvg use Harfbuzz directly, or a wrapper that is close to its level of operation. Pango is a bit too high level for the needs of SVG.

  • Manual layout of text glyphs. After a text engine like Harfbuzz does the shaping, librsvg would need to lay out the produced glyphs in the way of the SVG attributes dx, dy, x, y, etc. The SVG Text specification has the algorithms for this.

  • The cherry on top: text-on-a-path. Again, the spec has the details. You would make Wikimedia content creators very happy with this!

Requirements: Rust for programming language; some familiarity with Unicode concepts and text layout. Familiarity with Cairo and Harfbuzz would help a lot. Preference will be given to people who can write a right-to-left human language, or a language that requires complex shaping.

Details for students

a silhouette of a person's head and shoulders, used as a default avatar

Maintain release info easily in MetaInfo/Appdata files

This article isn’t about anything “new”, like the previous ones on AppStream – it rather exists to shine the spotlight on a feature I feel is underutilized. From conversations it appears that the reason simply is that people don’t know that it exists, and of course that’s a pretty bad reason not to make your life easier 😉

Mini-Disclaimer: I’ll be talking about appstreamcli, part of AppStream, in this blogpost exclusively. The appstream-util tool from the appstream-glib project has a similar functionality – check out its help text and look for appdata-to-news if you are interested in using it instead.

What is this about?

AppStream permits software to add release information to their MetaInfo files to describe current and upcoming releases. This feature has the following advantages:

  • Distribution-agnostic format for release descriptions
  • Provides versioning information for bundling systems (Flatpak, AppImage, …)
  • Release texts are short and end-user-centric, not technical as the ones provided by distributors usually are
  • Release texts are fully translatable using the normal localization workflow for MetaInfo files
  • Releases can link artifacts (built binaries, source code, …) and have additional machine-readable metadata e.g. one can tag a release as a development release

The disadvantage of all this, is that humans have to maintain the release information. Also, people need to write XML for this. Of course, once humans are involved with any technology, things get a lot more complicated. That doesn’t mean we can’t make things easier for people to use though.

Did you know that you don’t actually have to edit the XML in order to update your release information? To make creating and maintaining release information as easy as possible, the appstreamcli utility has a few helpers built in. And the best thing is that appstreamcli, being part of AppStream, is available pretty ubiquitously on Linux distributions.

Update release information from NEWS data

The NEWS file is a not very well defined textfile that lists “user-visible changes worth mentioning” per each version. This maps pretty well to what AppStream release information should contain, so let’s generate that from a NEWS file!

Since the news format is not defined, but we need to parse this somehow, the amount of things appstreamcli can parse is very limited. We support a format in this style:

Version 0.2.0
~~~~~~~~~~~~~~
Released: 2020-03-14

Notes:
 * Important thing 1
 * Important thing 2

Features:
 * New/changed feature 1
 * New/changed feature 2 (Author Name)
 * ...

Bugfixes:
 * Bugfix 1
 * Bugfix 2
 * ...

Version 0.1.0
~~~~~~~~~~~~~~
Released: 2020-01-10

Features:
 * ...

When parsing a file like this, appstreamcli will allow a lot of errors/”imperfections” and account for quite a few style and string variations. You will need to check whether this format works for you. You can see it in use in appstream itself and libxmlb for a slightly different style.

So, how do you convert this? We first create our NEWS file, e.g. with this content:

Version 0.2.0
~~~~~~~~~~~~~~
Released: 2020-03-14

Bugfixes:
 * The CPU no longer overheats when you hold down spacebar

Version 0.1.0
~~~~~~~~~~~~~~
Released: 2020-01-10

Features:
 * Now plays a "zap" sound on every character input

For the MetaInfo file, we of course generate one using the MetaInfo Creator. Then we can run the following command to get a preview of the generated file: appstreamcli news-to-metainfo ./NEWS ./org.example.myapp.metainfo.xml - Note the single dash at the end – this is the explicit way of telling appstreamcli to print something to stdout. This is how the result looks like:

<?xml version="1.0" encoding="utf-8"?>
<component type="desktop-application">
  [...]
  <releases>
    <release type="stable" version="0.2.0" date="2020-03-14T00:00:00Z">
      <description>
        <p>This release fixes the following bug:</p>
        <ul>
          <li>The CPU no longer overheats when you hold down spacebar</li>
        </ul>
      </description>
    </release>
    <release type="stable" version="0.1.0" date="2020-01-10T00:00:00Z">
      <description>
        <p>This release adds the following features:</p>
        <ul>
          <li>Now plays a "zap" sound on every character input</li>
        </ul>
      </description>
    </release>
  </releases>
</component>

Neat! If we want to save this to a file instead, we just exchange the dash with a filename. And maybe we don’t want to add all releases of the past decade to the final XML? No problem too, just pass the --limit flag as well: appstreamcli news-to-metainfo --limit=6 ./NEWS ./org.example.myapp.metainfo.tmpl.xml ./result/org.example.myapp.metainfo.xml

That’s nice on its own, but we really don’t want to do this by hand… The best way to ensure the MetaInfo file is updated, is to simply run this command at build time to generate the final MetaInfo file. For the Meson build system you can achieve this with a code snippet like below (but for CMake this shouldn’t be an issue either – you could even make a nice macro for it there):

ascli_exe = find_program('appstreamcli')
metainfo_with_relinfo = custom_target('gen-metainfo-rel',
    input : ['./NEWS', 'org.example.myapp.metainfo.xml'],
    output : ['org.example.myapp.metainfo.xml'],
    command : [ascli_exe, 'news-to-metainfo', '--limit=6', '@INPUT0@', '@INPUT1@', '@OUTPUT@']
)

In order to also translate releases, you will need to add this to your .pot file generation workflow, so (x)gettext can run on the MetaInfo file with translations merged in.

Release information from YAML files

Since parsing a “no structure, somewhat human-readable file” is hard without baking an AI into appstreamcli, there is also a second option available: Generate the XML from a YAML file. YAML is easy to write for humans, but can also be parsed by machines.The YAML structure used here is specific to AppStream, but somewhat maps to the NEWS file contents as well as MetaInfo file data. That makes it more versatile, but in order to use it, you will need to opt into using YAML for writing news entries. If that’s okay for you to consider, read on!

A YAML release file has this structure:

---
Version: 0.2.0
Date: 2020-03-14
Type: development
Description:
- The CPU no longer overheats when you hold down spacebar
- Fixed bugs ABC and DEF
---
Version: 0.1.0
Date: 2020-01-10
Description: |-
  This is our first release!

  Now plays a "zap" sound on every character input

As you can see, the release date has to be an ISO 8601 string, just like it is assumed for NEWS files. Unlike in NEWS files, releases can be defined as either stable or development depending on whether they are a stable or development release, by specifying a Type field. If no Type field is present, stable is implicitly assumed. Each release has a description, which can either be a free-form multi-paragraph text, or a list of entries.

Converting the YAML example from above is as easy as using the exact same command that was used before for plain NEWS files: appstreamcli news-to-metainfo --limit=6 ./NEWS.yml ./org.example.myapp.metainfo.tmpl.xml ./result/org.example.myapp.metainfo.xml If appstreamcli fails to autodetect the format, you can help it by specifying it explicitly via the --format=yaml flag. This command would produce the following result:

<?xml version="1.0" encoding="utf-8"?>
<component type="console-application">
  [...]
  <releases>
    <release type="development" version="0.2.0" date="2020-03-14T00:00:00Z">
      <description>
        <ul>
          <li>The CPU no longer overheats when you hold down spacebar</li>
          <li>Fixed bugs ABC and DEF</li>
        </ul>
      </description>
    </release>
    <release type="stable" version="0.1.0" date="2020-01-10T00:00:00Z">
      <description>
        <p>This is our first release!</p>
        <p>Now plays a "zap" sound on every character input</p>
      </description>
    </release>
  </releases>
</component>

Note that the 0.2.0 release is now marked as development release, a thing which was not possible in the plain text NEWS file before.

Going the other way

Maybe you like writing XML, or have some other tool that generates the MetaInfo XML, or you have received your release information from some other source and want to convert it into text. AppStream also has a tool for that! Using appstreamcli metainfo-to-news <metainfo-file> <news-file> you can convert a MetaInfo file that has release entries into a text representation. If you don’t want appstreamcli to autodetect the right format, you can specify it via the --format=<text|yaml> switch.

Future considerations

The release handling is still not something I am entirely happy with. For example, the release information has to be written and translated at release time of the application. For some projects, this workflow isn’t practical. That’s why issue #240 exists in AppStream which basically requests an option to have release notes split out to a separate, remote location (and also translations, but that’s unlikely to happen). Having remote release information is something that will highly likely happen in some way, but implementing this will be a quite disruptive, if not breaking change. That is why I am holding this change back for the AppStream 1.0 release.

In the meanwhile, besides improving the XML form of release information, I also hope to support a few more NEWS text styles if they can be autodetected. The format of the systemd project may be a good candidate. The YAML release-notes format variant will also receive a few enhancements, e.g. for specifying a release URL. For all of these things, I very much welcome pull requests or issue reports. I can implement and maintain the things I use myself best, so if I don’t use something or don’t know about a feature many people want I won’t suddenly implement it or start to add features at random because “they may be useful”. That would be a recipe for disaster. This is why for these features in particular contributions from people who are using them in their own projects or want their new usecase represented are very welcome.

the avatar of Federico Mena-Quintero

Reducing memory consumption in librsvg, part 1: text nodes

Librsvg's memory consumption has not been a problem so far for GNOME's use cases, which is basically rendering icons. But for SVG files with thousands of elements, it could do a lot better.

Memory consumption in the DOM

Librsvg shares some common problems with web browsers: it must construct a DOM tree in memory with SVG elements, and keep a bunch of information for each of the tree's nodes. For example, each SVG element may have an id attribute, or a class; each one has a transformation matrix; etc.

Apart from the tree node metadata (pointers to sibling and parent nodes), each node has this:

/// Contents of a tree node
pub struct NodeData {
    node_type: NodeType,
    element_name: QualName,
    id: Option<String>,    // id attribute from XML element
    class: Option<String>, // class attribute from XML element
    specified_values: SpecifiedValues,
    important_styles: HashSet<QualName>,
    result: NodeResult,
    transform: Transform,
    values: ComputedValues,
    cond: bool,
    style_attr: String,

    node_impl: Box<dyn NodeTrait>, // concrete struct for node types
}

On a 64-bit box, that NodeData struct is 1808 bytes. And the biggest fields are the SpecifiedValues (824 bytes) and ComputedValues (704 bytes).

Librsvg represents all tree nodes with that struct. Consider an SVG like this:

<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <rect x="10" y="20"/>
  <path d="..."/>
  <text x="10" y="20">Hello</text>
  <!-- etc -->
</svg>

There are 4 elements in that file. However, there are also tree nodes for the XML text nodes, that is, the whitespace between tags and the "Hello" inside the <text> element.

The contents of each of those text nodes is tiny (a newline and maybe a couple of spaces), but each node still takes up at least 1808 bytes from the NodeData struct, plus the size of the text string.

Let's refactor this to make it easier to remove that overhead.

First step: separate text nodes from element nodes

Internally, librsvg represents XML text nodes with a NodeChars struct which is basically a string with some extra stuff. All the concrete structs for tree node types must implement a trait called NodeTrait, and NodeChars is no exception:

pub struct NodeChars {
   // a string with the text node's contents
}

impl NodeTrait for NodeChars {
   // a mostly empty impl with methods that do nothing
}

You don't see it in the definition of NodeData in the previous section, but for a text node, the NodeData.node_impl field would point to a heap-allocated NodeChars (it can do that, since NodeChars implements NodeTrait, so it can go into node_impl: Box<dyn NodeTrait>).

First, I turned the NodeData struct into an enum with two variants, and moved all of its previous fields to an Element struct:

// This one is new
pub enum NodeData {
    Element(Element),
    Text(NodeChars),
}

// This is the old struct with a different name
pub enum Element {
    node_type: NodeType,
    element_name: QualName,
    id: Option<String>,
    class: Option<String>,
    specified_values: SpecifiedValues,
    important_styles: HashSet<QualName>,
    result: NodeResult,
    transform: Transform,
    values: ComputedValues,
    cond: bool,
    style_attr: String,
    node_impl: Box<dyn NodeTrait>,
}

The size of a Rust enum is the maximum of the sizes of its variants, plus a little extra for the discriminant (you can think of a C struct with an int for the discriminant, and a union of variants).

The code needed a few changes to split NodeData in this way, by adding accessor functions to each of the Element or Text cases conveniently. This is one of those refactors where you can just change the declaration, and walk down the compiler's errors to make each case use the accesors instead of whatever was done before.

Second step: move the Element variant to a separate allocation

Now, we turn NodeData into this:

pub enum NodeData {
    Element(Box<Element>), // This goes inside a Box
    Text(NodeChars),
}

That way, the Element variant is the size of a pointer (i.e. a pointer to the heap-allocated Box), and the Text variant is as big as NodeChars as usual.

This means that Element nodes are just as big as before, plus an extra pointer, plus an extra heap allocation.

However, the Text nodes get a lot smaller!

  • Before: sizeof::<NodeData>() = 1808
  • After: sizeof::<NodeData>() = 72

By making the Element variant a lot smaller (the size of a Box, which is just a pointer), it has no extra overhead on the Text variant.

This means that in the SVG file, all the whitespace between XML elements now takes a lot less memory.

Some numbers from a pathological file

Issue 42 is about an SVG file that is just a <use> element repeated many times, once per line:

<svg xmlns="http://www.w3.org/2000/svg">
  <defs>
    <symbol id="glyph0-0">
      <!-- a few elements here -->
    </symbol>
  </defs>

  <use xlink:href="#glyph0-0" x="1" y="10"/>
  <use xlink:href="#glyph0-0" x="1" y="10"/>
  <use xlink:href="#glyph0-0" x="1" y="10"/>
  <!-- about 196,000 similar lines -->
</svg>

So we have around 196,000 elements. According to Valgrind's Massif tool, this makes rsvg-convert allocate 800,501,568 bytes in the old version, versus 463,412,720 bytes in the new version, or about 60% of the space.

Next steps

There is a lot of repetition in the text nodes of a typical SVG file. For example, in that pathological file above, most of the whitespace is identical: between each element there is a newline and two spaces. Instead of having thousands of little allocations, all with the same string, there could be a pool of shared strings. Files with "real" indentation could get benefits from sharing the whitespace-only text nodes.

Real browser engines are very careful to share the style structs across elements if possible. Look for "style struct sharing" in "Inside a super fast CSS engine: Quantum CSS". This is going to take some good work in librsvg, but we can get there gradually.

References

the avatar of openSUSE News

openSUSE.Asia Summit 2020 Announcement

Faridabad, India, Selected for openSUSE.Asia Summit 2020

alt text

India was accepted to host the openSUSE.Asia Summit 2020. openSUSE.Asia summit is going to be held for the first time in India, Faridabad.Faridabad is a district of Haryana situated in the National Capital Region bordering the Indian capital New Delhi.

The supporters of openSUSE in India, and of Free/Libre Open Source Software (FLOSS) at large are excited to organize the most awaited openSUSE.Asia Summit event. In this activity, experts, contributors, end users, and technology enthusiasts will gather to share experiences about the development of openSUSE and other things related to FLOSS and have a lot of fun. The venue for the openSUSE.Asia Summit was chosen after being proposed by the Indian community during openSUSE.Asia Summit 2019 in Bali, Indonesia. Finally, the Asian committee decided Faridabad as the host of openSUSE.Asia Summit 2020 from September 25 to September 27, 2020, at Manav Rachna International Institute of Research & Studies, Faridabad.
Goals to be achieved in the openSUSE.Asia Summit 2020 in Faridabad include:

  • To promote openSUSE in India.
  • To attract new contributors for openSUSE from India and other Asian countries.
  • To provide an alternative to the wider community that FLOSS can be a powerful tool for doing their daily job.
  • To provide a platform for sharing user and developer experiences usually such discussions only occur online.

In the end, we are proud to present India as one of the best places for the openSUSE.Asia Summit.

Pre-announcement

openSUSE.Asia Summit 2020 will immediately open a call for paper for prospective speakers. In addition, a logo competition for the openSUSE.Asia Summit 2020 will also be opened. This would surely be an opportunity for designers in Asia to compete with each other to show their abilities and contribute to this activity. More details about the above information will be informed in the near future through news.opensuse.org.
See you in India !!

the avatar of Ish Sookun

DevCon 2020 | Kubernetes - Introducing through openSUSE MicroOS & Kubic

Guest post by Chittesh Sham 😉

DevCon 2020 is just about three weeks away and the hype is real.

People know me as a friendly neighborhood SysAdmin and this year I am excited to announce that I will be co-hosting a presentation alongside Ish at the Developers Conference. It has been a long time coming, and as you might have noticed Ish has been building up to this one, starting with his Kubic Presentation at DevCon 2019 , openSUSE MicroOS in Production talk at openSUSE Conference 2019 and a workshop on Managing Pods & Containers at the openSUSE Asia Summit.

Half Left - Kubernetes Logo ; Half right MicroOS logo 😊

We have been experimenting with a bunch of tools such as Podman, Buildah, Skopeo and openSUSE Kubic for a while now and we would like to share our experience.

Our presentation is scheduled on Saturday 4 April 2020, 2:30pm - 3:15pm in the Kryptone room at the Caudan Arts Centre. We will also be around for all three days at the conference, so if you want to grab a coffee and geek out on openSUSE and Kubernetes, don't hesitate to hit us up!

I would like to give express my gratitude to the Linux Foundation and Cloud Native Computing Foundation (CNCF) for their support. I was impressed by Chris Aniszczyk, especially how quickly he reacted to show his support upon noticing our talk.

It was incredible how fast the goodies from store.cncf.io (US) came to Mauritius. Kudos to FedEx for making that possible in just 5 days time! Here are some goodies that you can expect to be handed out during #DevConMU20

Yes, those are socks ;)

The Kube Gang

See you at DevCon Geeks! 🤓

a silhouette of a person's head and shoulders, used as a default avatar

Introducing the MetaInfo Creator

This year’s FOSDEM conference was a lot of fun – one of the things I always enjoy most about this particular conference (besides having some of the outstanding food you can get in Brussels and meeting with friends from the free software world) is the ability to meet a large range of new people who I wouldn’t usually have interacted with, or getting people from different communities together who otherwise would not meet in person as each bigger project has their own conference (for example, the amount of VideoLAN people is much lower at GUADEC and Akademy compared to FOSDEM). It’s also really neat to have GNOME and KDE developers within reach at the same place, as I care about both desktops a lot.

An unexpected issue

This blog post however is not about that. It’s about what I learned when talking to people there about AppStream, and the outcome of that. Especially when talking to application authors but also to people who deal with larger software repositories, it became apparent that many app authors don’t really want to deal with the extra effort of writing metadata at all. This was a bit of a surprise to me, as I thought that there would be a strong interest for application authors to make their apps look as good as possible in software catalogs.

A bit less surprising was the fact that people apparently don’t enjoy reading a large specification, reading a long-ish intro guide with lots of dos and don’ts or basically reading any longer text at all before being able to create an AppStream MetaInfo/AppData file describing their software.

Another common problem seems to be that people don’t immediately know what a “reverse-DNS ID” is, the format AppStream uses for uniquely identifying each software component. So naturally, people either have to read about it again (bah, reading! 😜) or make something up, which occasionally is wrong and not the actual component-ID their software component should have.

The MetaInfo Creator

It was actually suggested to me twice that what people really would like to have is a simple tool to put together a MetaInfo file for their software. Basically a simple form with a few questions which produces the final file. I always considered this a “nice to have, but not essential” feature, but now I was convinced that this actually has a priority attached to it.

So, instead of jumping into my favourite editor and writing a bunch of C code to create this “make MetaInfo file” form as part of appstreamcli, this time I decided to try what the cool kids are doing and make a web application that runs in your browser and creates all metadata there.

So, behold the MetaInfo Creator! If you click this link, you will end up at an Angular-based web application that will let you generate MetaInfo/AppData files for a few component-types simply by answering a set of questions.

The intent was to make this tool as easy to use as possible for someone who basically doesn’t know anything about AppStream at all. Therefore, the tool will:

  • Generate a rDNS component-ID suggestion automatically based on the software’s homepage and name
  • Fill out default values for anything it thinks it has enough data for
  • Show short hints for what values we expect for certain fields
  • Interactively validate the entered value, so people know immediately when they have entered something invalid
  • Produce a .desktop file as well for GUI applications, if people select the option for it
  • Show additional hints about how to do more with the metadata
  • Create some Meson snippets as pointers how people can integrate the MetaInfo files into projects using the Meson build system

For the Meson feature, the tool simply can not generate a “use this and be done” script, as each Meson snippet needs to be adjusted for the individual project. So this option is disabled by default, but when enabled, a few simple Meson snippets will be produced which can be easily adjusted to the project they should be part of.

The tool currently does not generate any release information for a MetaInfo file at all, This may be added in future. The initial goal was to have people create any MetaInfo file in the first place, having projects also ship release details would be the icing on the cake.

I hope people find this project useful and use it to create better MetaInfo files, so distribution repositories and Flatpak repos look better in software centers. Also, since MetaInfo files can be used to create an “inventory” of software and to install missing stuff as-needed, having more of them will help to build smarter software managers, create smaller OS base installations and introspect what software bundles are made of easily.

I welcome contributions to the MetaInfo Creator! You can find its source code on GitHub. This is my first web application ever, the first time I wrote TypeScript and the first time I used Angular, so I’d bet a veteran developer more familiar with these tools will cringe at what I produced. So, scratch that itch and submit a PR! 😉 Also, if you want to create a form for a new component type, please submit a patch as well.

C developer’s experience notes for Angular, TypeScript, NodeJS

This section is just to ramble a bit about random things I found interesting as a developer who mostly works with C/C++ and Python and stepped into the web-application developer’s world for the first time.

For a project like this, I would usually have gone with my default way of developing something for the web: Creating a Flask-based application in Python. I really love Python and Flask, but of course using them would have meant that all processing would have had to be done on the server. One the one hand I could have used libappstream that way to create the XML, format it and validate it, but on the other hand I would have had to host the Python app on my own server, find a place at Purism/Debian/GNOME/KDE or get it housed at Freedesktop somehow (which would have taken a while to arrange) – and I really wanted to have a permanent location for this application immediately. Additionally, I didn’t want people to send the details of new unpublished software to my server.

TypeScript

I must say that I really like TypeScript as a language compared to JavaScript. It is not really revolutionary (I looked into Dart and other ways to compile $stuff to JavaScript first), but it removes just enough JavaScript weirdness to be pleasant to use. At the same time, since TS is a superset of JS, JavaScript code is valid TypeScript code, so you can integrate with existing JS code easily. Picking TS up took me much less than an hour, and most of its features you learn organically when working on a project. The optional type-safety is a blessing and actually helped me a few times to find an issue. It being so close to JS is both a strength and weakness: On the one hand you have all the JS oddities in the language (implicit type conversion is really weird sometimes) and have to basically refrain from using them or count on the linter to spot them, but on the other hand you can immediately use the massive amount of JavaScript code available on the web.

Angular

The Angular web framework took a few hours to pick up – there are a lot of concepts to understand. But ultimately, it’s manageable and pretty nice to use. When working at the system level, a lot of complexity is in understanding how the CPU is processing data, managing memory and using the low-level APIs the operating system provides. With the web application stuff, a lot of the complexity for me was in learning about all the moving parts the system is comprised of, what their names are, what they are, and what works with which. And that is not a flat learning curve at all. As C developer, you need to know how the computer works to be efficient, as web developer you need to know a bunch of different tools really well to be productive.

One thing I am still a bit puzzled about is the amount of duplicated HTML templates my project has. I haven’t found a way to reuse template blocks in multiple components with Angular, like I would with Jinja2. The documentation suggests this feature does not exist, but maybe I simply can’t find it or there is a completely different way to achieve the same result.

NPM Ecosystem

The MetaInfo Creator application ultimately doesn’t do much. But according to GitHub, it has 985 (!!!) dependencies in NPM/NodeJS. And that is the bare minimum! I only added one dependency myself to it. I feel really uneasy about this, as I prefer the Python approach of having a rich standard library instead of billions of small modules scattered across the web. If there is a bug in one of the standard library functions, I can submit a patch to Python where some core developer is there to review it. In NodeJS, I imagine fixing some module is much harder.

That being said though, using npm is actually pretty nice – there is a module available for most things, and adding a new dependency is easy. NPM will also manage all the details of your dependency chain, GitHub will warn about security issues in modules you depend on, etc. So, from a usability perspective, there isn’t much to complain about (unlike with Python, where creating or using a module ends up as a “fight the system” event way too often and the question “which random file do I need to create now to achieve what I want?” always exists. Fortunately, Poetry made this a bit more pleasant for me recently).

So, tl;dr for this section: The web application development excursion was actually a lot of fun, and I may make more of those in future, now that I learned more about how to write web applications. Ultimately though, I enjoy the lower-level software development and backend development a bit more.

Summary

Check out the MetaInfo Creator and its source code, if you want to create MetaInfo files for a GUI application, console application, addon or service component quickly.

the avatar of openSUSE News

openSUSE Summit Dublin Canceled

The openSUSE Summit Dublin has been canceled due to the cancellation of some talks and the cancellation of the in-person SUSECON 2020 in Dublin.

Concerns over the developing situation of COVID-19 coronavirus lead to the decision to cancel the openSUSE Summit Dublin as the venue would have been shared with SUSECON and is no longer available for the summit.

Contact ddemaio (@) opensuse.org if you have any questions concerning the summit.