Skip to main content

the avatar of openSUSE News

People of openSUSE: An Interview with Ish Sookun

Can you tell us a bit about yourself?

I live on an island in the middle of the Indian Ocean (20°2’ S, 57°6’ E), called Mauritius. I work for a company that supports me in contributing to the openSUSE Project. That being said, we also heavily use openSUSE at the workplace.

Tell us about your early interaction with computers ? How your journey with Linux got started?

My early interaction with computers only started in the late years of college and I picked up Linux after a few students who were attending the computer classes back then whispered the term “Linux” as a super complicated thing. It caught my attention and I got hooked ever since. I had a few years of distro hopping until in 2009 I settled down with openSUSE.

Can you tell us more about your participation in openSUSE and why it started?

I joined the “Ambassador” program in 2009, which later was renamed to openSUSE Advocate, and finally the program was dropped. In 2013, I joined the openSUSE Local Coordinators to help coordinating activities in the region. It was my way of contributing back. During those years, I would also test openSUSE RCs and report bugs, organize local meetups about Linux in general (some times openSUSE in particular) and blog about those activities. Then, in 2018 after an inspiring conversation with Richard Brown, while he was the openSUSE Chairman, I stepped up and joined the openSUSE Elections Committee, to volunteer in election tasks. It was a nice and enriching learning experience along with my fellow election officials back then, Gerry Makaro and Edwin Zakaria. I attended my first openSUSE Conference in May 2019 in Nuremberg. I did a presentation on how we’re using Podman in production in my workplace. I was extremely nervous to give this first talk in front of the openSUSE community but I met folks who cheered me up. I can’t forget the encouragement from Richard, Gertjan, Harris, Doug, Marina and the countless friends I made at the conference. Later during the conference, I was back on the stage, during the Lightning Talks, and I spoke while holding the openSUSE beer in one hand and the microphone in the other. Nervousness was all gone thanks to the magic of the community.

Edwin and Ary told me about their activities in Indonesia, particularly about the openSUSE Asia Summit. When the CfP for oSAS 2019 was opened, I did not hesitate to submit a talk, which was accepted, and months later I stood among some awesome openSUSE contributors in Bali, Indonesia. It was a great Summit where I discovered more of the openSUSE community. I met Gerald Pfeifer, the new chairman of openSUSE, and we talked about yoga, surrounded by all of the geeko fun, talks and workshops happening.

Back to your question, to answer the second part about “why openSUSE”, I can safely, gladly and proudly say that openSUSE was (and still is) the most welcoming community and easiest project to start contributing to.

Tea or coffee?

Black coffee w/o sugar please.

Can you describe us the work of the Election Committee ? What challenges is it facing when elections time comes?

An election official should be familiar with the election rules. These help us plan an election and set the duration for every phase. The planning phase is crucial and it requires the officials to consult each other often. Some times being in time zones that are hours apart it is not obvious to hold long hours chats. We then rely on threaded emails that then takes more time to reach consensus on a matter. The election process becomes challenging if members do not step up for board candidacy as the deadline approaches. When the election begins, the next challenge is to not miss out any member. We make sure that we obtain an up-to-date list of openSUSE members and that they receive their voter link/credentials. We attend to requests from members having issues finding the email containing their voter link. Very often it ends up being something trivial as members using two different email addresses; one on the mailing list and a different one in their openSUSE Connect account.

I call these challenges to address the question but in reality it’s fun to be part of all this and ensure everything runs smoothly. Gerry has set a good example in the 2018-2019 Board election, which we still follow. Edwin has been extremely supportive in the three elections where we worked together. Recently joined, Ariez Vachha has proven to be a great addition to the team.

What do you like the most about being involved in the community?

The people.

What is one feature, project, that you think needs more attention in openSUSE?

Documentation.

What side projects/hobbies you work on outside of openSUSE?

I experiment with containers using Podman. It’s a fairly recent love but it keeps me busy. Community-wise, I like to attend local meetups, events and blog about those activities. I often help with the planning or any other task within my capacity for the Developers Conference of Mauritius. It’s a yearly event that brings the local geeks together for three days of fun. Luckily I have a supportive wife who bears with the geek tantrums and she volunteers in some of the community activities too. Oh, I might get kicked if I do not mention and give her credit for the openSUSE Goodies packs she prepares for my local talks.

What is your desktop environment of choice / preferred desktop setup?

GNOME until recently. I switched to KDE after my developer colleagues would not stop bragging about how good their KDE environment is and my GNOME/Wayland environment started acting weird.

What is your favorite food?

Paneer Makhani (Indian cottage cheese in spicy curry gravy).

What do you think the future holds for the openSUSE project?

With the example set by the openSUSE Asia community, I think the future of the project is having a strong openSUSE presence on every habitable continent.

Any final thoughts or message to our readers?

Let’s paint the world green!

a silhouette of a person's head and shoulders, used as a default avatar

Do you CI?

When I ask ask people about their approach to continuous integration, I often hear a response like

“yes of course, we have CI, we use…”.

When I ask people about doing continuous integration I often hear “that wouldn’t work for us…”

It seems the practice of continuous integration is still quite extreme. It’s hard, takes time, requires skill, discipline and humility.

What is CI?

Continuous integration is often confused with build tooling & automation. CI is not something you have, it’s something you do.

Continuous integration is about continually integrating. Regularly (several times a day) integrating your changes (in small & safe chunks) with the changes being made by everyone else working on the same system.

Teams often think they are doing continuous integration, but are using feature branches that live for hours or even days to weeks.

Code branches that live for much more than an hour are an indication you’re not continually integrating. You’re using branches to maintain some degree of isolation from the work done by the rest of the team.

I like the current Wikipedia definition: “continuous integration (CI) is the practice of merging all developer working copies to a shared mainline several times a day.”

I like this description. It’s worth calling out a few bits.

CI is a practice. Something you do, not something you have. You might have “CI Tooling”. Automated build/test running tooling that helps check all changes.

Such tooling is good and helpful, but having it doesn’t mean you’re continually integrating.

Often the same tooling is even used to make it easier to develop code in isolation from others. The opposite of continuous integration.

I don’t mean to imply that developing in isolation and using the tooling this way is bad. It may be the best option in context. Long lived branches and asynchronous tooling has enabled collaboration amongst large groups of people across distributed geographies and timezones.

CI is a different way of working. Automated build and test tooling may be a near universal good. (even a hygiene factor). The practice of Continuous Integration is very helpful in some contexts, even if less universally beneficial.

…all developer working copies…

All developers on the team integrating their code. Not just small changes. If bigger features are worked on in isolation for days or until they’re complete you’re not integrating continuously.

…to a shared mainline…

Code is integrated into the same branch. Often “master” in git parlance. It’s not just about everyone pushing their code to be checked by a central service. It’s about knowing it works when combined with everyone else’s work in progress, and visible to the rest of the team.

…several times a day

This is perhaps the most extreme part. The part that highlights just how unusual a practice continuous integration really is. Despite everyone talking about it.

Imagine you’re in a team of five developers, working independently, practising CI. Aiming to integrate your changes roughly once an hour. You might see 40 commits to master in a single day. Each commit representing a functional, working, potentially releasable state of the system.

(Teams I’ve worked on haven’t seen quite such a high commit rate. It’s reduced by pairing and non-coding work; nonetheless CI means high rate of commits to the mainline branch)

Working in this way is hard, requires a lot of discipline and skill. It might seem impossible to make large scale changes this way at first glance. It’s not surprising it’s uncommon.

To visualise the difference


Why CI?

Get Feedback

Why would work work in such a way? Integrating our changes will incur some overhead. It likely means taking time out every single hour to review changes so far, tidy, merge, and deal with any conflicts arising.

Continuously integrating helps us get feedback as fast as possible. Like most Extreme Programming practices. It’s worth practising CI if that feedback is more valuable to you than the overhead.

Team mates

We may get feedback from other team members—who will see our code early when they pull it. Maybe they have ideas for doing things better. Maybe they’ll spot a conflict or an opportunity from their knowledge and perspective. Maybe you’ve both thought to refactor something in subtly different ways and the difference helps you gain a deeper insight into your domain.

Code

CI amplifies feedback from the code itself. Listening to this feedback can help us write more modular, supple code that’s easier to change.

If our very-small change conflicts with another working on a different feature it’s worth considering whether the code being changed has too many responsibilities. Why did it need to change to support both features? Modularity is promoted by CI creating micro-pain from multiple people changing the same thing at the same time.

Making a large-scale change to our system via small sub-hour changes forces us to take a tidy-first approach. Often the next change we want to make is hard, not possible in less than an hour. Instead of taking our preconceived path towards our preconceived design, we are pressured to first make the change we want to make easier. Improve the design of the existing code so that the change we want to make becomes simple.

Even with this approach we’re unlikely to be able to make large scale changes in a single step. CI encourages mechanisms for integrating the code for incomplete changes. Such as branch by abstraction which further encourages modularity.

CI also exerts pressure to do more and better automated testing. If we don’t have automated checks for the behaviour of our code it may break when changed rapidly.

If our tests are brittle—coupled to the current structure of the code rather than the important behaviour then they will fail frequently when the code is changed. If our tests are slow then we’d waste lots of time running regularly, hopefully incentivising us to invest in speeding them up.

Continuous integration of small changes exposes us to this feedback regularly.

If we’re integrating hourly then this feedback is also timely. We can get feedback on our code structure and designs before it becomes expensive to change direction.

Production

CI is a useful foundation for continuous delivery, and continuous deployment. Having the code always in an integrated state that’s safe to release.

Continuously deploying (not the same as releasing) our changes to production enables feedback from customers, users, its impact on production health.

Combat Risk

Arguably the most significant benefit of CI is that it forces us to make our changes in small, safe, low-risk steps. Constant practice ensures it’s possible when it really matters.

It’s easy to approach a radical change to our system from the comforting isolation of a feature branch. We can start pulling things apart across the codebase and shaping them into our desired structure. Freed from the constraints of keeping tests passing or even our code compiling. Coming back to getting it working, the code compiling, and the tests compiling afterwards.

The problem with this approach is that it’s high risk. There’s a high risk that our change takes a lot longer than expected and we’ll have nothing to integrate for quite some time. There’s a high risk that we get to the end and discover unforeseen problems only at integration time. There’s a high risk that we introduce bugs that we don’t detect until after our entire change is complete. There’s a high risk that our product increment and commercial goals are missed because they are blocked by our big radical change. There’s a risk we feel pressured into rushing and sacrificing code quality when problems are only discovered late during an integration phase.

CI liberates us from these risks. Rather than embarking on a grand plan all at once, we break it down into small steps that we can complete and integrate swiftly. Steps that only take a few mins to complete.

Eventually the accumulation of these small changes unlock product capabilities and enable releasing value. Working in small steps becomes predictable. No longer is there a big delay from “we’ve got this working” to “this is ready for release”

This does not require us to be certain of our eventual goal and design. Quite the opposite. We start with a small step towards our expected goal. When we find something hard, hard to change then we stop and change tack. First making a small refactoring to try and make our originally intended change easy to make. Once we’ve made it easy we can go back and make the actual change.

What if we realise we’re going in the wrong direction? Well we’ve refactored our code to make it easier to change. What if we’ve made our codebase better for no reason? We’ve still won.

Collaborate Effectively

Meetings are not always popular. Especially ceremonies such as standups. Nevertheless it’s important for a team of people working towards a common goal to understand where each other have got to. To be able to react to new information, change direction if necessary, help each other out.

The more we work separately in isolation, the more costly and painful synchronisation points like standups can become. Catching each other up on big changes in order to know whether to adjust the plan.

Contrast this with everyone working in small, easy to digest steps. Making their progress visible to everyone else on the team frequently. It’s more likely that everyone already has a good idea of where the rest of the team is at and less time must be spent catching up. When everyone on the team is aware of where everyone else has got to the team can actually work as team. Helping each other out to speed a goal.

No-one likes endless discussions that get in the way of making progress. No-one likes costly re-work when they discover their approach conflicts with other work in the team. No-one likes wasting time duplicating work. CI enables constant progress of the whole team, at a rate the whole team can keep up with.

Arguably the most extreme continuous integration is mob programming. The whole team working on the same thing, at the same time, all the time.

Obstacles

“but we’re making a large scale change”

We touched on this above. It’s usually possible to make a large scale change via small, safe, steps. First making the change easier, then making the change. Developing new functionality side by side in the same codebase until we’re satisfied it can replace older functionality.

Indeed the discipline required to make changes this way can be a positive influence on code quality.

“but code review”

Many teams have a process of blocking code review prior to integrating changes into a mainline branch. If this code review requires interrupting someone else every few minutes this may be impractical.

Continuous integration like requires being comfortable with changes being integrated without such a blocking pull-request review style gate.

It’s worth asking yourself why you do such review and whether a blocking approach is the only way. There are alternatives that may even achieve better results.

Pair programming means all code is reviewed at the point in time it was written. It also gives the most timely feedback from someone else who fully understands the context. Pairing tends to generate feedback that improves the code quality. Asynchronous reviews all too often focus on whether the code meets some arbitrary bar—focusing on minutiae such as coding style and the contents of the diff, rather than the implications of the change on our understanding of the whole system.

Pair programming doesn’t necessarily give all the benefits of a code review. It may be beneficial for more people to be aware of each change, and to gain the perspective of people who are fresher or more detached. This can be achieved to a large extent by rotating people through pairs, but review may still be useful.

Another mechanism is non-blocking code review. Treating code review more like a retrospective. Rather than “is this code good enough to be merged” ask “what can we learn from this change, and what can we do better?”.

Consider starting each day reviewing as a team the changes made the previous day and what you can learn from them. Or stopping and reviewing recent changes when rotating who you are pair-programming with. Or having a team retrospective session where you read code together and share ideas for different approaches.

“but master will be imperfect”

Continuous integration implies master is always in an imperfect state. There will be incomplete features. There may be code that would have been blocked by a code review. This may seem uncomfortable if you strive to maintain a clean mainline that the whole team is happy with and is “complete”.

Imperfection in master is scary if you’re used to master representing the final state of code. Once it’s there being unlikely to change any time soon. In such a context being protective of it is a sensible response. We want to avoid mistakes we might need to live with for a long time.

However, an imperfect master is less of a problem in a CI context. What is the cost of a coding style violation that only lives for a few hours? What is the cost of temporary scaffolding (such as a branch by abstraction) living in the codebase for a few days?

CI suggests instead a habitable master branch. A workspace that’s being actively worked in. It’s not clinically clean, it’s a safe and useful environment to get work done in. An environment you’re comfortable spending lots of time in. How clean a workspace needs to be depends on the context. Compare a gardeners or plumbers’ work environment to a medical work environment.

“but how will we test it?”

Some teams separate the activities of software development from software testing. One pattern is testing features when each feature is complete, during an integration and stabilisation phase.

This allows teams to maintain a master branch that they think works, with uncertain work in progress in isolation.

However thorough our automated, manual, and exploratory testing we’re never going to have perfect software quality. Integration-testing might be a pattern to ensure integrated code meets some arbitrary quality bar but it won’t be perfect.

CI implies a different approach. Continuous exploratory testing of the master version. Continually improving our understanding of the current state of the system. Continuously improving it as our understanding improves. Combine this with TDD and high levels of automated checks and we can have some confidence that each micro change we integrate works as intended.

Again, this sort of approach requires being comfortable with master being imperfect. Or perhaps a recognition that it is always going to be imperfect, whatever we do.

“but we need to be able to do bugfixes”

Many teams work in batches. Deploying and releasing one set of features, working on more features in feature branches, then integrating, deploying, and releasing the next batch.

Under this model they can keep a branch that represents the current deployed version of the software. When an urgent bug is discovered in production they can fix it on this branch and deploy just that change.

From such a position the prospect of making a bugfix on top of a bunch of other already integrated changes might seem alarming. What if one of our other changes causes a regression.

CI is a fundamentally different way of working. Where our current state of master always captures the team’s current understanding of the most progressed, safest, least buggy system. Always deployable. Zero-bugs (bugs fixed when they’re discovered). Constantly evolving through small, safe steps.

A good way to make it safe to deploy bugfixes in a CI context is to also practise continuous deployment. Every micro-change deployed to production (not necessarily released). Doing this we’ll always have confidence we can deploy fixes rapidly. We’re forced to ensure that master is always safe for bugfixes.

“but…”

There’s also plenty of circumstances in which CI is not feasible or not the right approach for you. Maybe you’re the only developer! Occasional integration works well for sporadic collaboration between people with spare time open source contributions. For teams distributed across wide timezones there’s less benefits to CI. You’re not going to get fast feedback while your colleague is asleep! You can still work in and benefit from small steps regardless of whether anyone is watching.

Sometimes feedback is less important than hammering out code. If you’re working on something that you could do in your sleep and all that holds you back is how fast you can hammer out lines of code. The value of CI is much less.

Perhaps your team is very used to working with long lived branches. Used to having the code/tests broken for extended periods while working on a problem. It’s not feasible to “just” switch to a continuous integration style. You need to get used to working in small, safe, steps.

How…

Try it

Make “could we integrate what we’ve done” a question you ask yourself habitually. It fits naturally into the TDD cycle. When the tests are green consider integration. It should be safe.

Listen to the feedback

Listen to the feedback. Ok, you tried integrating more frequently and something broke, or things were slower. Why was that really? How could you avoid similar problems occurring while still being able to integrate regularly?

Tips when it’s hard

Combine with other Extreme Programming practices.

CI is easier with other Extreme Programming practices, not just TDD—which makes it safer and lends a useful cadence to development .

It’s easier when pair programming. Someone else helping remember the wider context. Someone to suggest stepping back and integrating a smaller set before going down a rabbit hole. Pairing also helps our chances of each change being safe to make. It’s more likely that others on the team will be happy with our change if our pair is on board.


CI is a lot easier with collective ownership. Where you are free to change any part of the codebase to make your desired change easy.

When your change is hard to do in small steps, first tackle one thing that makes it hard. “First make the change easy”

Separate expanding and contracting. Start building your new functionality in several steps alongside the old, then migrate existing usages, then finally remove the old. This can be done in several steps.

Separate integrating and releasing. Integrating your code should not mean that the code necessarily affects your users. Make releasing a product/business decision with feature toggles.

Invest in fast tooling. If your build and test suite takes more than 5 minutes you’re going to struggle to do continuous integration. A 5 min build and test run is feasible even with tens of thousands of tests. However, it does require constant investment in keeping the tooling fast. This is a cost of CI, but it’s also a benefit. CI requires you to keep the tooling you need to safely integrate and release a change fast and reliable. Something you’ll be thankful for when you need to make a change fast.

That’s a lot of work…

Unlike having CI [tooling], doing CI is not for all teams. It seems uncommonly practised. In some contexts it’s impractical. In others it’s not worth the overhead. Maybe worth considering whether the feedback and risk reduction would help your team.

If you’re not doing CI and you try it out, things will likely be hard. You may break things. Try to reflect deeper than “we tried it and it didn’t work”. What made it hard to work in and integrate small changes? Should you address those things regardless?

The post Do you CI? appeared first on Benji's Blog.

the avatar of Richard Brown

Regular Release Distributions Are Wrong

Why I No Longer Use openSUSE Leap -

For those who don’t already know, openSUSE is a Linux Distribution Project with two Linux distributions, Tumbleweed and Leap.

  • Tumbleweed is what is known as a Rolling Release, in that the distribution is constantly updating. Unlike Operating Systems with specific versions (e.g.. Windows 7, Windows 10, iOS 13, etc..) there is just ‘openSUSE Tumbleweed’ and anyone downloading it fresh or updating it today gets all the latest software.
  • Leap is what is known as a a Regular Release, in that it does have specific versions (e.g.. 15.0, 15.1, 15.2) released in a regular cadence. In Leap’s case, this is annually. Leap is a variation of the Regular Release known as an LTS Release because each of those ‘minor versions’ (X.1, .2, .3) are intended to include only minor changes, with a major new version (e.g.. 16.0) expected only every few years.

It’s a long documented fact that I am a big proponent of Rolling Releases and use them as my main operating system for Work & Play on my Desktops/Laptops.
However in the 4 years since writing that last blog post I always had a number of Leap machines in my life, mostly running as servers.

As of today, my last Leap machine is no more, and I do not foresee ever going back to Leap or any Linux distribution like it.

This post seeks to answer why I have fallen out of love with the Regular Release approach to developing & using Operating Systems and provide an introduction to how you too could rely on Rolling Releases (specifically Tumbleweed & MicroOS) for everything.

Disclaimer

First, a few disclaimers apply to this post. I fully realise that I am expressing a somewhat controversial point and am utterly expecting some people to disagree and be dismissive of my point of view. That’s fine, we’re all entitled to our opinions, this post is mine.

I am also distinctly aware that, my views expressed run counter to the business decisions of the customers of my employer, who do a very good job of selling a very commercially successful Enterprise Regular Release distribution.
The views expressed here are my own and not those of my employer.

And I have no problem that my employer is doing a very good job making a lot of money from customers who are currently making decisions I feel are ‘wrong’.
If my opinion is correct then I hope I can help my employer make even more money when customers start making decisions I feel are more ‘right’.

Regular & LTS Releases Mean Well

Regular & LTS Releases (hereafter referred to as just Regular Releases) have all of the best intentions. The Open Source world is made up of thousands if not millions of discreet Free Software & Open Source Projects, and Linux distributions exist to take all of that often chaotic, ever-evolving software and condensing it into a consumable format that is then put to very real work by its users. This might be something as ‘simple’ as a single Desktop or Server Computer, or something far larger and complex such as a SystemZ Mainframe or 100+ node Kubernetes Cluster.

The traditional mindset for distribution builders is that the regular release gives a nice, predictable, plan-able schedule in which the team can carefully select appropriate software from the various upstream projects.
This software can then be carefully curated, integrated, and maintained for several, sometimes many, years.
This maintenance often comes in the form of making minimal changes, seeking only to address specific security issues or customer requests, taking great care not to break systems currently in use.

This mindset is often appreciated by users, who also want from their computing a nice, predictable, reliable experience.. but they also want new stuff to keep up with their peers, either in communities or commercially.. and here begins the first problem.

All Change Is Dangerous, but Small Changes Can Be Worse

Firstly, whether the change is a security update or a new feature, that change is going to be made by humans. Humans are flawed, and no matter how great we all get with fancy release processes and automated testing, we will never avoid the fact that humans make mistakes.

Therefore, the nature of the change has to be looked at. “Is this change too risky?” is a common question, and quite often highly desired features take years to deliver in regular releases because the answer is “yes”.

When changes are made, they are made with the intention of minimising the risks introduced by changing the existing software. That often means avoiding updating software to an entirely new version but instead opting to backport the smallest necessary amounts of code and merging them with (often much) older versions already in the Regular Release. We call these patches, or updates, or maintenance updates, but we avoid referring them to what they really are … franken-software

franken-software

No matter how skilled the engineers are doing it, no matter how great the processes & testing are around their backporting, fundamentally the result is a hybrid mixed together combination of old and new software which was never originally intended to work together.
In the process of trying to avoid risk, backports instead introduce entirely new vectors for bugs to appear.

Regular Releases Neglect The Strengths Of Open Source

Linus’s Law states “given enough eyeballs, all bugs are shallow.”
I believe this to be a fundamental truth and one of the strongest benefits of the Open Source development model. The more people involved in working on something, the more eyeballs looking at the code, the better that code is, not just in a ‘lack of bugs’ sense but also often in a ‘number of working features’ sense.

And yet the process of building Regular Releases actively avoids this benefit. The large swath of contributors in various upstream projects are familiar with codebases often months, if not years, ahead of the versions used in Regular Releases.
Even inside Community Distribution Projects, it is my experience that the vast majority of volunteer distribution developers are more enthused in targetting ‘the next release’ rather than backporting complex features into an old codebase months or years old.
This leaves a small handful of committed volunteers, and those employees of companies selling commercial regular releases. These limited resources are often siloed, with only time and resources to work on their specific distribution, with their backports and patches often hard to reuse by other communities.

With Regular Releases, there are not many eyes. Does that mean all bugs are deep?

I am not suggesting the people building these Releases do not do a good job, they most certainly do. But when you consider the best possible job all Regular Release maintainers possibly could do and compare it to the much broader masses of the entire open source ecosystem, how did we ever think this problem was light enough for such narrow shoulders?

A Different Perspective

I’ve increasingly come to the realisation that not only is change unavoidable in software, it’s desired. It not only happens in Rolling Releases, but it still happens in Regular Releases.
Instead of trying to avoid it why don’t we embrace it and deal with any problems that brings?

Increasingly I hear more and more users demanding “we want everything stable forever, and secure, but we want all the new features too”.

Security and Stability cannot be achieved by standing still.
New Features cannot be easily added if good engineering practice is discouraged in the name of appearing to be stable.
In software as complicated as a Linux distribution, quite often, the right way to make a change requires a significant amount of changes across the entire codebase.
Or in other words “in order to be able to change any ONE thing, you must be able to change EVERYTHING”.

Rolling Releases can already do that, and you can read my earlier blog post if you haven’t already for some reasons why Tumbleweed is the best at that.
But it’s not enough, otherwise I would have been running Tumbleweed on my servers 4 years ago.

I Am A Lazy Sysadmin

Before my life as a distribution developer I was a sysadmin, and like all sysadmins I am lazy. I want my servers to be as ‘zero-effort’ as possible.
Once they’re working I don’t want to have to patch them, reboot them, touch them, look at them, or ideally even think about them ever again. And this is a hard proposition if I am running my server on a regular Rolling Release like Tumbleweed.

Tumbleweed is made up of over 15,000 packages, all moving at the rate of contribution. Worse, Tumbleweed is designed as a multi-purpose operating system.
You can quite happily set it up to be a mail server, web server, proxy server, virtualisation host, and heck, why not even a desktop all at the same time.
This is one of Tumbleweed’s greatest strengths, but in this case it is also a weakness.
openSUSE has to make sure all these things could possibly work together. That often means installing more ‘recommended packages’ on a system than absolutely necessary, ‘just-in-case’ the user wants to use all the possible features their combination of packages could possibly allow.
And with this complexity comes an increase of risk, and an increase of updates, which themselves bring yet more risk. Perfectly fine for my desktop (for now), but that’s far too much work for a Server, especially when I typically need a Server to just do one job.

Minimal Risk, Maximum Benefits with openSUSE MicroOS

openSUSE MicroOS is the newest member of the openSUSE Family. From a code perspective, it is a derivative of openSUSE Tumbleweed, using exactly the same packages and integrated into its release process, but from a philosophical perspective, MicroOS is a totally different beast.

Whereas Tumbleweed & other traditional distributions are multi-purpose, MicroOS is designed from the ground up to be single-purpose. It’s a Linux distribution you deploy on a bit of hardware, a VM, a Cloud instance, and once it is there it is intended to do just one job.

  • The package selection is lean, with all you need to run on bare metal if you install from the ISO, or even smaller if you choose a VM Image for a platform where hardware support isn’t necessary. Fewer packages mean fewer reasons to update, helping lower the risks of change traditionally introduced by a rolling release. Recommended packages are disabled by default.
  • Being a transactional system, it has a read-only root filesystem, further cutting down the risk of changes to the system, ensuring that any unwanted change that does happen can be rolled back. Not only that, but such roll-backs can be automated with health-checker.
  • With rebootmgr, I can even schedule maintenance windows to ensure my updates only take effect during times I’m happy with.

Auto updating, auto rebooting, auto rolling back? I can be a lazy sysadmin even with a rolling release!

Just One Job Per Machine?

MicroOS is designed to do just one job, and this is fine for machines or VMs where all you would want to do is something like a self-maintaining webserver. That is as a simple as runningtransactional-update pkg in nginx.
But that can be rather limiting. Wouldn’t it be nice if there was some way of running services that minimised the introduction of risk to the base operating system, and could be updated independently of that base operating system?

Oh, right, that already exists, and they’re called containers.

MicroOS makes a perfect container host, so much so we deliver VM Images and a System Role on the ISO which already comes configured with podman.

Especially now openSUSE has a growing collection of official containers, a good number of services are a simple podman run podman pull registry.opensuse.org/opensuse/foo away.

In my case, I have moved all of my old Leap servers to now use MicroOS with Containers, this includes

All of the custom containers are built on openSUSE’s tiny busybox container which is just small and tiny and magical and I have no idea why anyone would use alpine.

All of my servers are now running MicroOS. The software is newer with all the latest features, but through a combination of containerisation and transactional-updates I find myself spending significantly less time maintaining my servers compared to Leap.

I can update the containers when I want, using container tags to pin the services to use specific versions until I decide when I want to update them.
I can easily add new containers to the to the MicroOS hosts just by adding another systemd service file running podman to /etc/systemd/system.
And I never need to worry about my base operating system which just takes care of itself, rebooting in the maintenance window I defined in rebootmgr.

I’m going to be writing further blog posts about my life with MicroOS and podman , but meanwhile I couldn’t be happier, and sincerely hope more people take this approach to running infrastructure.
Why? Well, we’re still only human, but when things do go wrong it’ll be even easier with more people looking at any problems :)

Now all I need to do is see if any of these benefits make sense for a desktop….

the avatar of openSUSE Heroes

Introducing debuginfod service for Tumbleweed

We are happy to pre-announce a new service entering the openSUSE world:

https://debuginfod.opensuse.org

debuginfod is an HTTP file server that serves debugging resources to debugger-like tools.

Instead of using the old way to install the needed debugging packages one by one as root like:

zypper install $package-debuginfo

the new debuginfod service lets you debug anywhere, anytime.

Right now the service serves only openSUSE Tumbleweed packages for the x86_64 architecture and runs in an experimental mode.

The simple solution to use the debuginfod for openSUSE Tumbleweed is:

export DEBUGINFOD_URLS="https://debuginfod.opensuse.org/"
gdb ...

For every lookup, the client will send a query to the debuginfod server and get's back the requested information, allowing to just download the debugging binaries you really need.

More information is available at the start page https://debuginfod.opensuse.org - feel free to contact the initiator marxin directly for more information or error reports.

the avatar of openSUSE Heroes

Database monitoring

While we monitor basic functionality of our MariaDB (running as Galera-Cluster) and PostgreSQL databases since years, we missed a way to get an easy overview of what's really happening within our databases in production. Especially peaks, that slow down the response times, are not so easy to detect.

That's why we set up our own Grafana instance. The dashboard is public and allows everyone to have a look at:

  • The PostgreSQL cluster behind download.opensuse.org. Around 230 average and up to 500 queries per second are not that bad...
  • The Galera cluster behind the opensuse.org wikis and other MariaDB driven applications like Matomo or Etherpad. One interesting detail here is - for example - the archiving job of Matomo, triggering some peaks every hour.
  • The Elasticsearch cluster behind the wiki search. Here we have a relatively high JVM memory foodprint. Something to look at...

Both: the Grafana dashboard and the databases are driving big parts of the openSUSE infrastructure. And while everything is still up and running, we would love to hear from experts how we could improve. If you are an expert or know someone, feel free to contact us via Email or in our [IRC channel](irc://irc.opensuse.org/#opensuse-admin).

a silhouette of a person's head and shoulders, used as a default avatar

About me

I’m Marcos, a Kernel Livepatch developer at SUSE. I have contributed to interesting open source projects, like Linux Kernel, Linux Testing Project, btrfs-progs, virtme-ng, libvirt, LXC, bubblewrap, supportutils and others. Feel free to reach me out on Mastodon, LinkedIn or by email (me at mpdesouza.com).

the avatar of Andrés G. Aragoneses

Xamarin forks and whatnots

Busy days in geewallet world! I just released version 0.4.2.198 which brings some interesting fixes, but I wanted to talk about the internal work that had to be done for each of them, in case you're interested.
  • In Linux(GTK), cold storage mode when pairing was broken, because the absence of internet connection was not being detected properly. The bug was in a 3rd-party nuget library we were using: Xam.Plugin.Connectivity. But we couldn't migrate to Xamarin.Essentials for this feature because Xamarin.Essentials lacks support for some platforms that we already supported (not officially, but we know geewallet worked on them even if we haven't released binaries/packages for all of them yet). The solution? We forked Xamarin.Essentials to include support for these platforms (macOS and Linux), fixed the bug in our fork, and published our fork in nuget under the name `DotNetEssentials`. Whenever Xamarin.Essentials starts supporting these platforms, we will stop using our fork.
  • The clipboard functionality in geewallet depended on another 3rd-party nuget library: Xamarin.Plugins.Clipboard. The GTK bits of this were actually contributed by me to their github repository as a Pull Request some time ago, so we just packaged the same code to include it in our new DotNetEssentials fork. One dependency less to care about!
  • Xamarin.Forms had a strange bug that caused some buttons sometimes to not be re-enabled. This bug has been fixed by one of our developers and its fix was included in the new pre-release of Xamarin.Forms 4.5, so we have upgraded geewallet to use this new version instead of v4.3.
PS: Apologies if the previous blogpost or this one shows up in planets again, as it might be a side-effect of updating its links to point to the new git repo!

the avatar of YaST Team

Highlights of YaST Development Sprint 93

The Contents

Lately, the YaST team has been quite busy fixing bugs and finishing some features for the upcoming (open)SUSE releases. Although we did quite some things, in this report we will have a closer look at just a few topics:

  • A feature to search for packages across all SLE modules has arrived to YaST.
  • Improved support for S390 systems in the network module.
  • YaST command-line interface now returns a proper exit-code.
  • Added progress feedback to the Expert Partitioner.
  • Partial support for Bitlocker and, as a lesson learned from that, a new warning about resizing empty partitions.

The Online Search Feature Comes to YaST

As you already know, starting in version 15, SUSE Linux follows a modular approach. Apart from the base products, the packages are spread through a set of different modules that the user can enable if needed (Basesystem module, Desktop Applications Module, Server Applications Module, Development Tools Module, you name it).

In this situation, you may want to install a package, but you do not know which module contains such a package. As YaST only knows the data of those packages included in your registered modules, you will have to do a manual search.

Fortunately, zypper introduced a new search-packages command some time ago that allows to find out where a given package is. And now it is time to bring this feature to YaST.

For technical reasons, this online search feature cannot be implemented within the package manager, so it is available via the Extra menu.

Search Online Menu Option

YaST offers a simple way to search for the package you want across all available modules and extensions, no matter whether they are registered or not. And, if you find the package you want, it will ask you about activating the needed module/extension right away so you can finally install the package.

Online Search: Enable Containers Module

If you want to see this feature in action, check out the demonstration video.

Like any other new YaST feature, we are looking forward to your feedback.

Fixing and Improving Network Support for S390 Systems

We have mentioned a lot of times that we recently refactored the Network module, fixing some long-standing bugs and preparing the code for the future. However, as a result, we introduced a few new bugs too. One of those bugs was dropping, by accident, the network devices activation dialog for S390 systems. Thus, during this sprint, we re-introduced the dialog and, what is more, we did a few improvements as the old one was pretty tricky. Let’s have a look at them.

The first obvious change is that the overview shows only one line per each s390 group device, instead of using one row per each channel as the old did.

New YaST Network Overview for S390 Systems

Moreover, the overview will be updated after the activation, displaying the Linux device that corresponds to the just activated device.

YaST2 Network Overview After Activation

Last but not least, we have improved the error reporting too. Now, when the activation fails, YaST will give more details in order to help the user to solve the problem.

YaST2 Network Error Reporting in S390 Systems

Fixing the CLI

YaST command-line interface is a rather unknown feature, although it has been there since ever. Recently, we got some bug reports about its exit codes. We discovered that, due to a technical limitation of our internal API, it always returned a non-zero exit code on any command that was just reading values but not writing anything. Fortunately, we were able to fix the problem and, by the way, we improved the behavior in several situations where, although the exit code was non-zero, YaST did not give any feedback. Now that the CLI works again, it is maybe time to give it a try, especially if it is the first time you hear about it.

Adding Progress Feedback to the Partitioner

The Expert Partitioner is a very powerful tool. It allows you to perform very complex configurations in your storage devices. At every time you can check the changes you have been doing in your devices by using the Installation Summary option on the left bar. All those changes will not be applied on the system until you confirm them by clicking the Next button. But once you confirm the changes, the Expert Partitioner simply closes without giving feedback about the progress of the changes being performed.

Actually, this is a kind of regression after migrating YaST to its new Storage Stack (a.k.a. storage-ng). The old Partitioner had a final step which did inform the user about the progress of the changes. That dialog has been brought back, allowing you to be aware of what is happening once you decide to apply the configuration. This progress dialog will be available in SLE 15 SP2, openSUSE 15.2 and, of course, openSUSE Tumbleweed.

YaST Partitioner Progress Feedback

Recognizing Bitlocker Partitions

Bitlocker is a filesystem encrypting technology that comes included with Windows. Until the previous sprint, YaST was not able to recognize that a given partition was encrypted with such technology.

As a consequence, the automatic partitioning proposal of the (open)SUSE installer would happily delete any partition encrypted with Bitlocker to reclaim its space, even for users that had specified they wanted to keep Windows untouched. Moreover, YaST would allow users to resize such partitions using the Expert Partitioner without any warning (more about that below).

All that is fixed. Now Bitlocker partitions are correctly detected and displayed as such in the Partitioner, which will not allow users to resize them, explaining that such operation is not supported. And the installer’s Guided Setup will consider those partitions to be part of a Windows installation for all matters.

Beware of Empty Partitions

As explained before, whenever YaST is unable to recognize the content of a partition or a disk, it considers such device to be empty. Although that’s not longer the case for Bitlocker devices, there are many more technologies out there (and more to come). So users should not blindly trust that a partition displayed as empty in the YaST Partitioner can actually be resized safely.

In order to prevent data loss, in the future YaST will inform the user about a potential problem when trying to resize a partition that looks empty.

YaST Expert Partitioning Warning when Resizing Empty Partitions

Hack Week is coming…

That special time of the year is already around the corner. Christmas? No, Hack Week! From February 10 to February 14 we will be celebrating the 19th Hack Week at SUSE. The theme of this edition is Simplify, Modernize & Accelerate. If you are curious about the projects that we are considering, have a look at SUSE Hack Week’s Page. Bear in mind that the event is not limited to SUSE employees, so if you are interested in any project, do not hesitate to join us.

a silhouette of a person's head and shoulders, used as a default avatar

openSUSE Tumbleweed – Review of the week 2020/06

Dear Tumbleweed users and hackers,

This week I canceled more snapshots than I released – only 2 snapshots have been sent out (0201 and 0205). Feels quite bad, but on the other hand, I’m glad we have openQA protecting you, the openSUSE Tumbleweed users, from those issues. As the -factory mailing list shows this week, despite all the testing, we can’t ever predict all the special cases found on our users’ machines.

So, what was happening this week:

  • Qt 5.14.1
  • SQLite 3.31.1
  • Virtualbox 6.1.2
  • Mesa 19.3.3
  • chkconfig moved from aaa_base to insserv-compat (if you have some legacy init scripts around from a package that does not specify this dependency, please file a bug)
  • netcfg – the topic on the mailing list: /etc/services, /etc/protocols and /etc/ethers moved to /usr/etc. Two major sources for errors on user machines have been identified:
    • users ignoring *.rpmnew files and nor merging the config changes from packages into their own config (in this specific case for /etc/nsswitch.conf)
    • Some people seem to have removed even patterns-base-minimal_base, which resulted in those users not having libnss_usrfiles2 being pulled in. As a result, even if the config file was maintained/corrected, the services file could not be found.
    • Both issues are being attempted to be resolved: libnss_usrfiles2 is not only required by the pattern, but also by netcfg (netcfg sets the default config, so we thought this is the best point) and aaa_base tries to correct /etc/nsswitch.conf (but that results in ursfiles being added again, even if the user would have explicitly removed it)

I hope we could help everybody recover their system in a proper way by now, and that the future goal – having as few as possible files in /etc config from the distribution – is something you can follow as a rationale. Besides that, the stagings are still filled with these things:

  • KDE Applications 19.12.2
  • KDE Plasma 5.18
  • Linux Kernel 5.5.1
  • Python 3.8 (salt, hopefully going to be unblocking soon)
  • Removal of python 2
  • glibc 2.31
  • GNU make 4.3
  • libcap 2.30: breaks fakeroot and drpm
  • RPM: change of the database format to ndb
  • elfutils: adding support for debuginfod

the avatar of Kubic Project

Reaching the login prompt in 2.5 seconds - a journey

Not only in development environments it’s very handy to have a quick turnaround time, which can include reboots. Especially for transactional systems where changes to the system only take effect after booting into the new state, this can have a significant impact.

So let’s see what can be done. Remember: “The difference between screwing around and science is writing it down”!

Starting point

Starting point for this experiment is a VM (KVM), 4GiB RAM, 2 CPU cores, no EFI. Tumbleweed was installed as Server (textmode) with just defaults.

# systemd-analyze
Startup finished in 1.913s (kernel) + 2.041s (initrd) + 22.104s (userspace) = 25.958s

Almost 26 seconds just to get to the login prompt of a pretty minimal system, that’s not great. What can we do?

Low-hanging fruit

systemd-analyze blame tells us what the worst offenders are:

# systemd-analyze blame --no-pager
18.769s btrfsmaintenance-refresh.service    
17.027s wicked.service                      
 3.170s plymouth-quit.service               
 3.170s plymouth-quit-wait.service          
 1.078s postfix.service                     
 1.023s apparmor.service                    
  839ms systemd-udev-settle.service         
  601ms systemd-logind.service              
  532ms firewalld.service

btrfsmaintenance-refresh.service is a bit special: It calls systemctl during execution to enable/disable and start/stop the btrfs-*.timer units. Those depend on time-sync.target, which itself needs network.service through chronyd.service. wicked.service is the next item on the list. Before the unit is considered active, it tries to fully configure and setup all configured interfaces, which includes DHCPv4 and v6 by default. This is directly used as state for network.service and thus network.target. There is no distinction between network.service and network-online.target by wicked. To make the bootup quicker, switching to NetworkManager is an option, which interprets network.service in a more async way and thus is much quicker to reach the active state. Note that with DHCP, switching between wicked and NM might result in a different IP address!

# zypper install NetworkManager
# systemctl disable wicked
Removed /etc/systemd/system/multi-user.target.wants/wicked.service.
Removed /etc/systemd/system/network.service.
Removed /etc/systemd/system/network-online.target.wants/wicked.service.
Removed /etc/systemd/system/dbus-org.opensuse.Network.Nanny.service.
Removed /etc/systemd/system/dbus-org.opensuse.Network.AUTO4.service.
Removed /etc/systemd/system/dbus-org.opensuse.Network.DHCP4.service.
Removed /etc/systemd/system/dbus-org.opensuse.Network.DHCP6.service.
# systemctl enable NetworkManager
Created symlink /etc/systemd/system/network.service → /usr/lib/systemd/system/NetworkManager.service.

Let’s also remove plymouth - except for eyecandy it does not provide any useful features.

# zypper rm -u plymouth
Reading installed packages...
Resolving package dependencies...

The following 23 packages are going to be REMOVED:
  gnu-unifont-bitmap-fonts libdatrie1 libdrm2 libfribidi0 libgraphite2-3 libharfbuzz0 libpango-1_0-0 libply5 libply-boot-client5 libply-splash-core5 libply-splash-graphics5 libthai0 libthai-data libXft2 plymouth plymouth-branding-openSUSE
  plymouth-dracut plymouth-plugin-label plymouth-plugin-label-ft plymouth-plugin-two-step plymouth-scripts plymouth-theme-bgrt plymouth-theme-spinner

23 packages to remove.
After the operation, 4.8 MiB will be freed.
Continue? [y/n/v/...? shows all options] (y):
...

Plymouth is still started in the initrd, but as it’s not part of the root filesystem anymore it’s not stopped by plymouth-quit.service. This combination would result in a broken boot! Normally, the initrd should be regenerated automatically if anything relevant changes, but for removals that’s not implemented (boo#966057).

# mkinitrd
...
# reboot
...

Let’s see how much time this saved:

# systemd-analyze
Startup finished in 1.675s (kernel) + 2.066s (initrd) + 2.696s (userspace) = 6.438s

Over 19s saved, that’s quite a lot already!

# systemd-analyze blame --no-pager
1.411s btrfsmaintenance-refresh.service    
893ms systemd-logind.service              
849ms apparmor.service

Now the biggest contributor to the bootup time is btrfsmaintenance-refresh.service. As it is not quite clear how much value it provides with a recent kernel (boo#1063638#c106), let’s just remove it.

# zypper rm -u btrfsmaintenance
# reboot
...
# systemd-analyze
Startup finished in 1.700s (kernel) + 2.010s (initrd) + 2.367s (userspace) = 6.079s
multi-user.target reached after 2.347s in userspace

That’s a bit better again.

# systemd-analyze blame --no-pager
873ms apparmor.service                    
550ms systemd-logind.service              
504ms postfix.service                     
405ms firewalld.service

Both apparmor and systemd-logind.service are needed, so no low hanging fruit remain.

Accelerating early boot

There’s still one part remaining in the boot time equation we can completely eliminate!

Startup finished in 1.700s (kernel) + 2.010s (initrd) + 2.367s (userspace) = 6.079s
                                      ^^^^^^^^^^^^^^^

So, what exactly is the initrd for anyway? In the vast majority of installations, it’s very clearly defined: Mount the real root filesystem and switch to it. Depending on the configuration, this can range from simple (local ext4) to very complex (encrypted block device over the network accepting the password over ssh). Additionally, the kernel binary as loaded by the bootloader is very small, so does not include drivers for every system. Those are part of modules included in the initrd.

Turns out, in simple cases (the majority of VM guest systems) we can boot without initrd just fine. The current setup is not tuned for this setup though, so a few adjustments are required.

Drivers for mounting / without loading modules

The kernel needs drivers for both the virtual devices connecting to the storage device and for the filesystem on it. The former part is dealt with by using the kernel-kvmsmall flavor, but unfortunately it does not have btrfs built-in.

Fortunately, this is easy to fix by rebuilding the kernel with a custom config. By putting CONFIG_FS_BTRFS=y (and some other required options) into config.addon.tar.bz2 next to the standard openSUSE kernel in OBS, it spits out an .rpm with a working binary.

# zypper ar obs://devel:kubic:quickboot/ devel:kubic:quickboot
# zypper in --from devel:kubic:quickboot kernel-kvmsmall

kernel-kvmsmall does not have all kernel features enabled (not even as modules), which means that in some cases it might be necessary to apply the changes on kernel-default instead, which has a complete set of modules.

Mounting root by UUID

However, if you now reboot and comment out the initrd command in the grub config, you will notice that the boot fails as the kernel is unable to find the root device. This is because by default, the GRUB configuration uses root=UUID=deadbeef-1234... as parameter. This is interpreted by the initrd in userspace. To be exact, when a block device is recognized by the kernel, Udev reacts by reading the filesystem UUID and creating a link in /dev/disk/by-uuid/... which is then used as root device. Without an initrd, that does not happen and the kernel is unable to continue.

Workaround here is to set GRUB_DISABLE_LINUX_UUID=true in /etc/default/grub. This means that device paths like root=/dev/vda2 are used, which can lead to issues when changing the disk layout or order. By additionally setting GRUB_DISABLE_LINUX_PARTUUID=false, it uses root=PARTUUID=cafebabe-4554... which is supported by the kernel as well, but is more reliable.

# echo GRUB_DISABLE_LINUX_UUID=true >> /etc/default/grub
# echo GRUB_DISABLE_LINUX_PARTUUID=false >> /etc/default/grub
# grub2-mkconfig -o /boot/grub2/grub.cfg
# reboot
... (comment out the initrd call in grub, after pressing "e" in the menu and prepending "#" in the last line)
# systemd-analyze
Startup finished in 1.778s (kernel) + 2.725s (userspace) = 4.504s

Almost a third shaved off again, awesome! However, there are now error messages shown in the console, which is not that awesome. Something about systemd-gpt-auto-generator and systemd-remount-fs being unable to find the root device - just like the kernel earlier. The cause is actually the same - /etc/fstab still contains the mounts in UUID= format and those errors happen before systemd-udevd.service is started and udev has settled.

No matter how systemd is configured, it’s not possible to get rid of the first error - generators run before any units. So we have to start udev even before that - before systemd!

But first, a quick detour.

Getting transactional

Let’s take a look at how MicroOS is doing. As the name already says, it’s supposed to be lighter than plain Tumbleweed out of the box.

# systemd-analyze 
Startup finished in 1.788s (kernel) + 2.036s (initrd) + 21.243s (userspace) = 25.068s

# systemd-analyze blame --no-pager
17.669s btrfsmaintenance-refresh.service
16.177s wicked.service
 3.377s apparmor.service
 1.356s health-checker.service                
 1.179s systemd-udev-settle.service
  968ms systemd-logind.service
  811ms kdump.service

A second quicker, ok. Plymouth is gone, but we gained health-checker and kdump. The time is dominated by wicked slowing down the startup though, so let’s replace it. Additionally, disable btrfsmaintenance-refresh.service. It’s not possible to remove it as the microos_base pattern requires it.

# transactional-update shell
transactional update # zypper install NetworkManager
transactional update # systemctl disable wicked
transactional update # systemctl enable NetworkManager
transactional update # systemctl disable btrfsmaintenance-refresh.service
transactional update # exit
# reboot
...
# systemd-analyze 
Startup finished in 1.744s (kernel) + 1.989s (initrd) + 2.342s (userspace) = 6.075s

# systemd-analyze blame
1.251s apparmor.service                    
1.066s kdump.service                       
 824ms NetworkManager-wait-online.service  
 742ms systemd-logind.service              
 730ms kdump-early.service                 
 638ms systemd-udevd.service               
 563ms create-dirs-from-rpmdb.service

Much better again.

Booting a read only system without initrd

In a system with a read-only root filesystem like MicroOS (or transactional server), the initrd has another task: Make sure that /var and /etc are mounted already, so that early boot can store logs and read configuration.

So we actually have to mount /var and /etc before starting systemd. How? By having our own init script! It is started directly by the kernel by setting init=/sbin/init.noinitrd and as last step just does exec /sbin/init to replace itself as PID 1 with systemd.

Unfortunately it’s not quite as easy as just doing mount /var and calling it a day, as the mount for /var uses a UUID= as source, so it needs udev running… Luckily, udev actually works in that environment, after mounting /sys, /proc and /run manually.

Here the circle closes - we have udev running before systemd now. So by just using the script we need for read-only systems everywhere, that issue is solved too.

Making initrd-less booting simpler

As the setup for initrd-less booting is quite complex, there’s now a package which does the needed setup automatically (except for installing a suitable kernel).

This contains the needed “pre-init” wrapper script /sbin/init.noinitrd as well as a grub configuration module which automatically adds entries to boot the system without initrd. Those are only generated for kernels which have support for the root filesystem built-in. It takes care of setting root/rootflags and init parameters properly as well. The boot options with initrd are still there, as failsafe.

# zypper ar obs://devel:kubic:quickboot/openSUSE_Tumbleweed devel:kubic:quickboot
# transactional-update initrd shell pkg in --from devel:kubic:quickboot kernel-kvmsmall noinitrd
...
transactional update # grub2-set-default 0
transactional update # exit
# reboot
...

# systemd-analyze 
Startup finished in 1.889s (kernel) + 2.246s (userspace) = 4.135s

# systemd-analyze blame
1.022s apparmor.service                    
 847ms kdump.service                       
 820ms NetworkManager-wait-online.service  
 782ms systemd-logind.service              
 608ms kdump-early.service                 
 542ms dev-vda3.device

Just over 4 seconds! That the /var partition is now part of the top six on systemd-analyze means that we’re getting close to the limits.

An issue with issue-generator

Booting without an initrd seems to have introduced a bug: Instead of showing the active network interface enp1s0 and its addresses, there’s just a lonely eth0: on the login screen. Checking the journal, this is because the interface got renamed from eth0 to enp1s0 during boot. Usually this happens when udev runs in the initrd already, which means that after switch-root there’s an add event with the new name already, which issue-generator picks up. Without initrd, the rename happens in the booted system and issue-generator has to handle that somehow.

How can this be implemented? To find out which udev events are triggered by a rename, the udev monitor is very helpful. With the --property option it shows which properties are attached to the triggered events:

# udevadm monitor --udev --property &
# ip link set lo down
# ip link set lo name lonew
# ip link set lonew name lo
...
UDEV  [630.458943] move     /devices/virtual/net/lo (net)
ACTION=move
DEVPATH=/devices/virtual/net/lo
SUBSYSTEM=net
DEVPATH_OLD=/devices/virtual/net/lonew
INTERFACE=lo
IFINDEX=1
SEQNUM=3616
USEC_INITIALIZED=1054433
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
SYSTEMD_ALIAS=/sys/subsystem/net/devices/lonew /sys/subsystem/net/devices/lonew
TAGS=:systemd:
# fg
^C
# ip link set lo up

So using the DEVPATH_OLD and INTERFACE properties, the rename can be implemented in issue-generator’s udev rules:

After applying those changes and a reboot, enp1s0 is shown properly now!

Optimizing apparmor.service

Those who paid attention to the output of systemd-analyze blame will have noticed that compared to plain Tumbleweed, apparmor.service takes longer to start on MicroOS. Why is that so?

# systemd-analyze plot > plot.svg

systemd-analyze plot result

In this plot it’s possible to guess which explicit and also implicit dependencies there are between services. If a service starts after a different service ends, it’s probably an explicit dependency. If a service is only up after a different service started, it’s probably waiting for it in some way. That’s what we can see in the plot: apparmor.service only finished startup after create-dirs-from-rpmdb.service started up. So getting that to start earlier or quicker would accelerate apparmor.service as well. To confirm this theory, just disable the service:

# systemctl disable create-dirs-from-rpmdb.service
# reboot
# systemd-analyze blame
927ms systemd-logind.service              
824ms NetworkManager-wait-online.service  
721ms kdump.service                       
689ms kdump-early.service                 
554ms apparmor.service
# systemctl enable create-dirs-from-rpmdb.service

Confirmed. So how can this be optimized properly?

The purpose of this service is to create directories owned by packages, which are not part of the system snapshots. It orders itself between local-fs.target and systemd-tmpfiles-setup.service. Some tmpfiles.d files might rely on packaged directories being present, so it has to run before systemd-tmpfiles-setup.service. Except for changing it to RequiresMountsFor=/var /opt /srv there isn’t much potential for optimization.

However, instead of running on every boot, the service only has to be active if the set of packages changed. Luckily, with rpm 4.15, a new method to ease such checks (rpmdbCookie) got implemented and it was easy to make use of it in the service. With this deployed, it only runs when necessary and otherwise just takes some time to get the cookie from the rpm database:

# systemd-analyze blame
872ms NetworkManager-wait-online.service  
832ms systemd-logind.service              
811ms kdump.service                       
645ms dev-vda3.device                     
597ms kdump-early.service                 
526ms apparmor.service
# systemd-analyze blame | grep create-dirs-from-rpmdb
 52ms create-dirs-from-rpmdb.service

For some reason this doesn’t always help though, sometimes apparmor.service is back at >1s, so this needs some more investigation.

rebootmgr

On top of the blame list we have NetworkManager-wait-online.service. This service can take a variable amount of time depending on the network configration and the environment and is in most cases not needed for getting services up and running. So what is currently pulling that into multi-user.target?

# systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

multi-user.target @2.472s
└─rebootmgr.service @2.284s +24ms
  └─network-online.target @2.282s
    └─NetworkManager-wait-online.service @1.311s +970ms
      └─NetworkManager.service @1.246s +61ms

It’s rebootmgr.service! The reason it orders itself After=network-online.target is that it can directly communicate with etcd. However, support for that is currently disabled in rebootmgr anyway and it appears to handle the case with no network connection on start just fine. So until that change ends up in the package, let’s just adjust that manually:

# systemctl edit --full rebootmgr.service
(Remove lines with network-online.target)
# reboot

Note that this doesn’t really improve the perceived speed of booting as only multi-user.target itself depended on it and sshd/getty are started before that already. The critical chain to multi-user.target is now:

# systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

multi-user.target @2.188s
└─kdump.service @1.287s +899ms
  └─NetworkManager.service @1.202s +82ms
    └─dbus.service @1.197s
      └─basic.target @1.195s
        └─sockets.target @1.195s
          └─dbus.socket @1.195s

Getting close to the edge

It’s time to get creative now - what’s left for optimizing? Everything after this is arguably a hack, much more than the previous changes.

Disabling non-essential services

Let’s just disable everything which isn’t actually needed to get the system up.

apparmor.service: Used for system hardening. If the system is not security relevant (e.g. an isolated VM), this can be disabled. Not recommended though.

rebootmgr.service: If a reboot is scheduled (e.g. by the automatic transactional-update.timer), it triggers an automatic reboot in the configured timeframe (default 3:30am). If the system is rebooted manually, this can be disabled.

kdump.service: Loads a kernel and initrd for kernel coredumping into RAM. Unless the system is a highly critical production machine and every crash has to be analyzed, this can be disabled.

With those changes applied:

# systemd-analyze 
Startup finished in 1.899s (kernel) + 1.505s (userspace) = 3.405s
# systemd-analyze blame
629ms systemd-journald.service            
532ms systemd-logind.service              
480ms dev-vda3.device                     
479ms dev-vda2.device                     
303ms systemd-hostnamed.service

Over a second saved again. systemd-analyze tells us that there’s not much left in userspace to optimize.

Kernel configuration

Currently the kernel takes quite a long time during boot for benchmarking some algorithms. This is done so that it knows which of the available implementations is the quickest on the system it’s running on. Ironically, this means that on CPUs with new features it actually takes a bit longer. If performance for RAID6 is not important, this can simply be disabled by setting CONFIG_RAID6_PQ_BENCHMARK=n

After building such a kernel and installing it:

# systemd-analyze 
Startup finished in 1.083s (kernel) + 1.356s (userspace) = 2.439s

This saves almost a second during kernel startup.

Direct kernel boot

The bootloader takes some time during boot as well (boot menu, kernel loading), which can be optimized as well. Apart from the obvious option to decrease the time the boot menu is shown (or hiding it by default), it’s also possible to skip it altogether! By supplying the kernel and cmdline from the VM host, booting can be even quicker. This is only a good idea in setups where there is a custom kernel build with everything built-in though, as otherwise it may happen that the modules in the VM get out of sync with the kernel image supplied by the VM host. This also breaks automatic rollbacks (health-checker needs GRUB for that) and selecting old snapshots to boot from.

I’m not aware of a way to measure the time it takes to load the kernel, so no measurement here. This mostly removes the time which systemd-analyze doesn’t show (on non-EFI systems).

Conclusion

Getting from 25.958s to 2.439s (with hacks) means that over 90% of boot time can be optimized away.

The next task is to push those optimizations into the distro and make them the default or at least easy to apply.

Have a lot of fun!