Skip to main content

the avatar of Klaas Freitag

Open Search Foundation

recently I learned about the Open Search Foundation in the public broadcast radio (Bayern 2 Radio Article). That surprised me: I had not heard about OSF before, even though I am active in the field of free software and culture. But this new foundation made it into the mainstream broadcast already. Reason enough to take a closer look.

It is a very good sign to have the topic of internet search in the news. It is a fact that one company has a gigantic market share in searching which is indeed a threat to the freedom of internet users. The key to be found in the web is the key to success with whatever message or service a web site might come up with, and all that is controlled by one enterprise driven by commercial interests. That should be realized by a broad audience.

The Open Search Foundation has the clear vision to build up an publicly owned search index as an alternative for Europe.

Geographical and Political Focus

The whitepaper talks about working on the search machine specifically for Europe. It mentions that there are search indexes in the US, China and Russia, but none rooted in Europe. While this is a geographical statement in the first place, it is of course also a political, because some of the existing services are probably politically controlled.

It is good to start with a focus on Europe, but the idea of a free and publicly controlled project should not be limited to Europes borders. In fact, it will not stop there if it is attractive because it might offer a way to escape from potentially controlled systems.

On the other hand, Europe (in opposite to any single European country alone) seems like a good base to start with this huge effort as it is able to come up with the needed resources.

Organization

The founding members of the Open Search Foundation are not very well known members of the wider open source community. That is good, as it shows that the topics around the free internet do not only concern nerds in the typical communities, but also people who work for an open and future proof society in other areas like academia, research and medicine.

On the other hand, an organization like for example the Wikimedia e.V. might have been a more obvious candidate to address this topic. Neither on the web site nor in the whitepaper I found mentions of any of the “usual suspects” or other organizations and companies who have already tried to set up alternative indices. I wonder if there have been discussions, cooperations or plans to work together?

I am very curious to see how the collaboration between the more “traditional” open data/open source community and the Open Search Foundation will be, as I think it is a crucial part to combine all players in this area without falling into the “endless discussion trap” while not achieving countable results. It is the question of building an efficient community.

Pillars of Success

Does the idea of the OSF have a realistic chance to succeed? The following four pillars might play an important role for the success of the idea to build the free search index of the internet:

1. Licenses and Governance

The legal framework has to be well defined and thought through, so that it will be resilient longer term. As we talk about a huge commercial potential to control this index, parties might wanna try to get into control of it.

Only a strong governance and legal framework can ensure that the idea lasts.

The OSF mentions in the whitepaper that it is one of the first steps to set this up.

2. Ressources

A search index requires big amounts of computing power in the wider sense, including storage, networking, redundancy and so on. Additionally there need to be people who take care on that. For that, there needs to be financial support for staffing, marketing, legal support and all that.

The whitepaper mentions ideas to collect the computing power from academia or from company donations.

For the financial backing the OSF will have to find sources like EC money, from governments and academia, and maybe private fund raising. Organizations like Wikimedia would already have experience with that.

If that will not be enough, the idea of selling better search results for money or offering SEO help for development will quickly come up. This will be interesting discussions that require the strong governance.

3. Technical Excellence

Who will use a search index that does not come up with reasonable search results?
To be able to compete with the existing solutions that even made it into our daily communication habits already, the service needs to be just great in terms of search results and user experience.

Many already existing approaches that use the Google index as a backend have already show that even with that it is not easy to provide a comparable result.

It is a fact that users of the commercial competition trade their personal data against optimal search results, even if they dont do that consciously. It is more difficult for a privacy oriented service, so this is another handicap.

The whitepaper mentions ideas on how to work on this huge task and also accepts that it will be challenging. But that is no reason to not try it. We all know plenty of examples where these kind of tasks were successful even though nobody believed that in the beginning.

4. Community

To achieve all the points a strong community is key factor.

There need to be people who do technical work like administering the data centers, developers who code, technical writers for documentation, translators and much more. But that is only the technical part.

For the financial-, marketing- and legal support there are other people needed, not speaking about political lobby and such.

All these parts have to be built up, managed and kept intact long term.

The Linux kernel, which was mentioned as a model in the whitepaper, is different. Not even the technical work is comparable between the free search index and the Linux kernel.

The long term stable development of the Linux kernel is based on people who work full time on the kernel while being employed by certain companies who are actually competitors. But on the kernel, they collaborate.

This way, the companies share cost for inevitable base development work. There differentiators in the market are not depending on there work on the kernel, but in the levels above the kernel.

How is that for OSF? I am failing to see how enough sustainable business can be based on an open, privacy respecting search index so that companies will be happy to fund engineers working on it.

Apart from that, the kernel has the benefit that it had strong companies like RedHat, SUSE and IBM who pushed Linux in the early times, so no special marketing budgets etc. were needed for the kernel specifically. Also that is different for OSF, as quite some marketing- and community management money will be required to start.

Conclusion

Building a lasting, productive and well established community will be the vital question for the whole project in my opinion. Offering a great idea, which this initiative is without question, will not be enough to motivate people to participate long term.

There has to be an interesting offer for potential contributors at all levels, starting from individuals and companies for contributions, to universities for donating hardware or the governments and the European Community for money. There needs to be some kind of benefit they will gain for their engagement on the project. It is interesting if the OSF can come up with a model that will get that kickstarted.

I very much hope that this gets traction as it would be an important step towards a more free internet again. And I also hope that there will be collaboration on this topic with the traditional free culture communities and the foundations there.

the avatar of Nathan Wolf

Noodlings 14 | LeoCAD, DeWalt and a UPS

Dusting off for the 14th installment. 14th Noodling of nonsense and clamoring LeoCAD from Design to Publication Designing, organizing the timeline and publishing a MOC (My Own Creation) on Rebrickable.com using LeoCAD on openSUSE DeWalt cordless Power tool platform A little trip outside the cubicle for my appreciation for a great cordless tool platform that … Continue reading Noodlings 14 | LeoCAD, DeWalt and a UPS
a silhouette of a person's head and shoulders, used as a default avatar

openSUSE Tumbleweed – Review of the week 2020/24

Dear Tumbleweed users and hackers,

Another week has passed. There have been a few technical issues around the publishing of our snapshots. Two were flagged for release, but actually never made it to the mirrors. Turned out, kiwi renamed some of the live-images from *-i686-* to *-ix86-*. But nothing else knew about it. As we even have links on the web pointing to those image names, we opted to revert to the original name. So, due to this, we only release 3 snapshots (0604, 0609, and 0610; 0609 contained the changes of 0605 and 0607 – the ones that got not synced out).

The changes in these snapshots were:

  • GNOME 3.36.3 (sadly, upstream did not release a gnome-desktop update; this package would be responsible for the version number shown in the control center)
  • Mozilla Firefox 77.0.1
  • Libvirt 6.4.0
  • Transmission 3.00
  • KDE Plasma 5.19.0

Thins being worked on include:

  • KDE Applications 20.04.2
  • SQLite 3.32.2
  • Linux kernel 5.7.1
  • Mesa 20.1.0
  • RPM change: %{_libexecdir} is being changed to /usr/libexec. This exposes quite a lot of packages that abuse %{_libexecdir} and fail to build
  • openSSL 3.0

the avatar of Joe Shaw

Abusing go:linkname to customize TLS 1.3 cipher suites

This post has been translated into Chinese by a Gopher in Beijing. 巧用go:linkname 定制 TLS 1.3 加密算法套件

When Go 1.12 was released, I was very excited to test out the new opt-in support for TLS 1.3. TLS 1.3 is a major improvement to the main security protocol of the web.

I was eager to try it out in a tool I had written for work which allowed me to scan what TLS parameters were supported by a server. In TLS, the client presents a set of cipher suites to the server that it supports, and the server chooses the best one to use, where “best” is typically a reasonable trade-off of security and performance.

In order to enumerate what cipher suites a server supports, a client must make individual connections, each offering a single cipher suite at a time. If the server rejects the handshake, you know the cipher suite is not supported.

For TLS 1.2 and below, this is pretty straightforward:

func supportedTLS12Ciphers(hostname string) []uint16 {
	// Taken from https://golang.org/pkg/crypto/tls/#pkg-constants
	var allCiphers = []uint16{
		tls.TLS_RSA_WITH_RC4_128_SHA,
		tls.TLS_RSA_WITH_3DES_EDE_CBC_SHA,
		tls.TLS_RSA_WITH_AES_128_CBC_SHA,
		tls.TLS_RSA_WITH_AES_256_CBC_SHA,
		tls.TLS_RSA_WITH_AES_128_CBC_SHA256,
		tls.TLS_RSA_WITH_AES_128_GCM_SHA256,
		tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
		tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
		tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
		tls.TLS_ECDHE_RSA_WITH_RC4_128_SHA,
		tls.TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
		tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
		tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
		tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
		tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
		tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
		tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
		tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
		tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
	}

    var supportedCiphers []uint16
    
	for _, c := range allCiphers {
		cfg := &tls.Config{
			ServerName:   hostname,
			CipherSuites: []uint16{c},
			MinVersion:   tls.VersionTLS12,
			MaxVersion:   tls.VersionTLS12,
		}

		conn, err := net.Dial("tcp", hostname+":443")
		if err != nil {
			panic(err)
		}

		client := tls.Client(conn, cfg)
		client.Handshake()
		client.Close()

		if client.ConnectionState().CipherSuite == c {
			supportedCiphers = append(supportedCiphers, c)
		}
	}

	return supportedCiphers
}

After writing the barebones code to support TLS 1.3 in the tool, I discovered something unfortunate: Go does not allow you to select what TLS 1.3 cipher suites are sent to the server. The rationale makes sense: TLS 1.3 greatly simplified both what is contained within a cipher suite and how many are supported. Unless and until there is a weakness in a TLS 1.3 cipher suite, there’s nothing to be gained in allowing them to be customized.

Still, this tool was one of the rare situations where it makes sense, and I wanted to see if I could hack it in. Enter go:linkname. Buried deep in Go’s compiler documentation:

//go:linkname localname importpath.name

The //go:linkname directive instructs the compiler to use “importpath.name” as the object file symbol name for the variable or function declared as “localname” in the source code. Because this directive can subvert the type system and package modularity, it is only enabled in files that have imported “unsafe”.

Well hello! This looks promising. If there is a function or variable in Go’s standard library that specifies what the list of TLS 1.3 ciphers are, we can override that in our tool by instructing the Go complier to use our local implementation instead of the one in the standard library.

Let’s dig into the standard library’s TLS 1.3 implementation. In crypto/tls/handshake_client.go [link], we have:

if hello.supportedVersions[0] == VersionTLS13 {
		hello.cipherSuites = append(hello.cipherSuites, defaultCipherSuitesTLS13()...)
        // ...
}

Great! Let’s just override this defaultCipherSuitesTLS13() function. In crypto/tls/common.go [link]:

func defaultCipherSuitesTLS13() []uint16 {
	once.Do(initDefaultCipherSuites)
	return varDefaultCipherSuitesTLS13
}

This complicates things a bit. This calls an initialization function lazily on first use, and that function manipulates a bunch of internal default lists beyond just the TLS 1.3 cipher suites list. We don’t want to mess with any of that. But in that initDefaultCipherSuites function, we have this [link]:

varDefaultCipherSuitesTLS13 = []uint16{
    TLS_AES_128_GCM_SHA256,
    TLS_CHACHA20_POLY1305_SHA256,
    TLS_AES_256_GCM_SHA384,
}

Ah ha! A package global variable is assigned the cipher suite values. And because this initialization function is only ever called once, we can initialize the list and then take control of it in our code.

// Using go:linkname requires us to import unsafe
import (
    "crypto/tls"
    _ "unsafe" 
)

// We bring the real defaultCipherSuitesTLS13 function from the
// crypto/tls package into our own package.  This lets us perform
// that lazy initialization of the cipher list when we want.

//go:linkname defaultCipherSuitesTLS13 crypto/tls.defaultCipherSuitesTLS13
func defaultCipherSuitesTLS13() []uint16

// Next we bring the `varDefaultCipherSuitesTLS13` slice into our
// package.  This is what we manipulate to get the cipher suites.

//go:linkname varDefaultCipherSuitesTLS13 crypto/tls.varDefaultCipherSuitesTLS13
var varDefaultCipherSuitesTLS13 []uint16

// Also keep a variable around for the real default set, so we
// can reset it once we're finished.
var realDefaultCipherSuitesTLS13 []uint16

func init() {
    // Initialize the TLS 1.3 ciphersuite set; this populates
    // varDefaultCipherSuitesTLS13 under the covers
    realDefaultCipherSuitesTLS13 = defaultCipherSuitesTLS13()
}

func supportedTLS13Ciphers(hostname string) []uint16 {
	var supportedCiphers []uint16

	for _, c := range realDefaultCipherSuitesTLS13 {
		cfg := &tls.Config{
			ServerName: hostname,
			MinVersion: tls.VersionTLS13,
		}

		// Override the internal slice!
		varDefaultCipherSuitesTLS13 = []uint16{c}

		conn, err := net.Dial("tcp", hostname+":443")
		if err != nil {
			panic(err)
		}

        client := tls.Client(conn, cfg)
		client.Handshake()
		client.Close()

		if client.ConnectionState().CipherSuite == c {
			supportedCiphers = append(supportedCiphers, c)
		}
	}

	// Reset the internal slice back to the full set
	varDefaultCipherSuitesTLS13 = realDefaultCipherSuitesTLS13

	return supportedCiphers
}

As you can see, we used go:linkname to subvert package modularity for both a function and a variable. We use a package init function to populate the default cipher suites list, and then we override it as we iterate and attempt connections with only a single supported cipher suite. Finally, we make sure to clean things up and set the default list back to the full set for any future uses.

Lastly, let’s glue things together:

func main() {
    hostname := os.Args[1]
	fmt.Println("Supported TLS 1.2 ciphers")
	for _, c := range supportedTLS12Ciphers(hostname) {
		fmt.Printf("  %s\n", tls.CipherSuiteName(c))
	}
	fmt.Println()
	fmt.Println("Supported TLS 1.3 ciphers")
	for _, c := range supportedTLS13Ciphers(hostname) {
		fmt.Printf("  %s\n", tls.CipherSuiteName(c))
	}
}
$ go run cipherlist.go joeshaw.org
Supported TLS 1.2 ciphers
  TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

Supported TLS 1.3 ciphers
  TLS_AES_128_GCM_SHA256
  TLS_CHACHA20_POLY1305_SHA256
  TLS_AES_256_GCM_SHA384

There you have it.

go:linkname should be used very sparingly. Consider carefully whether you must use it, or whether you can solve your problem another way. For me, the alternative was to import all of crypto/tls to make some minor edits. It would also freeze me into a point in time of the Go TLS stack and put the burden of upgrading onto me. While I know that there are no compatibility guarantees with Go’s crypto/tls internals, using go:linkname allows me to use the TLS stack provided by current and future versions of Go as long as the particular pieces I am using don’t change. I can live with that.

The full code for this test program lives in this Github repository.

the avatar of Nathan Wolf

the avatar of YaST Team

Digest of YaST Development Sprint 101

As explained in our previous blog post, this YaST development report is presented as a collection of links to rather descriptive Github’s pull requests. With that, our readers can deep into the particular topics they find interesting.

Of course, that’s not a full list of all the pull requests merged into the YaST repositories during the sprint, just a selection of the most interesting ones.

Additionally, the YaST Team also invested quite some time researching all the new concepts introduced by systemd regarding filesystem management and investigating several aspects of the current implementation of AutoYaST. We have no concrete links for the results of such researchs, but you will see the outcome soon in form of upcoming changes in (Auto)YaST.

Enjoy the links and see you again in two weeks!

the avatar of Nathan Wolf
the avatar of Jos Poortvliet

Collabora vs ONLYOFFICE

Since the Nextcloud Hub release switched from ONLYOFFICE to Collabora Online as default, lots of people have asked why. Is one better than the other? Let's talk about this.

History

Let me first say - the decision wasn't pure technical. As always, relations and other reasons play a role. I'll try to cover both aspects, but there is always more. With that out of the way, let's first look at how ONLYOFFICE got into Nextcloud.

Frank, myself and others in the Nextcloud community have wanted to integrate office in our collaboration platform for most of the past decade. Previously, we* had invested quite a bit in getting a collaborative document editor into our private cloud. The Documents app was a from-the-ground-up developed ODF editor with a unique and very clever design, built by KO GmbH (now sadly defunct). We together put resources in integration and further development and we hoped other (open source) businesses would invest and contribute too, so the solution would grow in time. Also, we had hoped some customers would be willing to pay for it. Both of these did not really come true, and KO sadly didn't survive.

* Note that I use 'we' here loosely as I wasn't really involved back then, so think 'the core team', as a slowly-changing team of people, including Frank, Jan, Arthur and others.

Fast forward to our launch on June 2 2016 (happy birthday!), and a few months later we announced Collabora Online integration. We had worked with Collabora to make this available not just to enterprise customers, as before, but to all users thanks to the 'CODE' docker image. As you know, we care deeply about community/private home users and this was of course a great step forward.

But running docker, setting up a reverse proxy on a second domain with proper certificates - it isn't easy and does not work for everyone. So we had to keep maintaining the Documents app a little, as some users still could only use that.

ONLYOFFICE vs Collabora

Meanwhile, a new open source online office solution came around, ONLYOFFICE. Let's talk for a sec how they compare Collabora, as the two could not be more technically and non-technically different!

Technical: how they work

The way Collabora Online works is:
An embedded version of Libreoffice runs on the server. It reads the document, then 'streams' the rendered document as image tiles to the browser client, which shows it to the user. The browser client does some of the menu's and lots of smart things like showing the cursor, other users, text selection etc, but many other components like pop-up menu's and sidebars are also streamed from the back-end, giving relatively good feature parity with LibreOffice. This strategy is responsible for giving LibreOffice, for example, desktop-level table style editing, better than any other online office solution.

The way ONLYOFFICE works is:
The document is converted on the server to a JSON file which is streamed to the browser client. The browser client is the full office suite, editing the document. Once done, it sends back the JSON and the server merges and exports it back to a file. A fully html5 canvas based front-end means a relatively pretty user interface and any javascript dev can go hacking.

So what does this mean?

  • LibreOffice is much heavier on the server and network connection, but uses a bit less client resources which tends to help mobile devices with battery life during editing
  • You get the full Libreoffice file type support. Decades worth of obscure file formats, it is all there.
  • ONLYOFFICE has a more modern UI, writing it all in Javascript so it is far easier to be mobile-friendly. You can imagine how useless those old LibreOffice paragraph settings dialogs are on a mobile phone screen!
  • In theory ONLYOFFICE would be much easier to integrate in web apps in general. Most app frameworks can consume a javascript or json component, a simply streamed, tiled image is far less flexible...

Compatibility

On document support, three things.

First, with regard to the Microsoft file compatibility - this is ALWAYS hit and miss. I can't objectively claim either is better or worse, you will always find a file that works well in one but not the other. But you will also find lots of MS Office files that won't work in Office 365, or break the desktop version between Mac and Windows or even just from older versions, because Microsoft screwed up their own compatibility.

Second, one thing I can say: if you migrate from Collabora Online to ONLYOFFICE and most of your files are ODF files because that's what Collabora uses by default, you're in for a bad experience. The ODF support in ONLYOFFICE is quite basic. But with MS Office files they feel on-par to me and that's what probably matters for most people. (sadly, yes)

Third, if you need any other file types - Collabora can handle a LOT, due to its long legacy. Word Perfect anyone?

For other technical capabilities - I probably be best off simply pointing to the comparisons both made themselves:

Social/historical differences

Let's talk about the second big difference between Collabora and ONLYOFFICE: their roots. Collabora builds on and is part of the LibreOffice community, a decades-old project, and consists of long time open source believers. Development is open and accessible and there are lots of individuals and companies that work on and can provide services for its code base. ONLYOFFICE on the other hand, is quite new to open source and only a bit over a dozen people have contributed to the code base. Their open core model if of course less than favorite in the open source world, though it is still miles better than proprietary - some people seem to lose sight of that sometimes, if you ask me. For an end user, the development model makes little difference, in either case.

let me emphasize two things.
First, it is awesome that we have TWO open source office suits. Building one is an amazing accomplishment - we have had others in the past but most are no longer really viable due to the massive amount of resources required to keep up.
Second, I think it is great that ONLYOFFICE decided to open source their product. I believe most people really under estimate what it takes to turn around your business model so radically. And if you're unhappy with decisions made, in either case - contribute, get involved. That is how you change things in open source.

Getting Office in Nextcloud

So, as I said in the History section, by 2017 we had three office solutions integrated in Nextcloud. One was easy to install but unmaintained and quickly deteriorating. The other two were harder to install but much more complete.

You know we're ambitious people, so indeed we have thought about and discussed this situation forever. And at some point, Robin started to really investigate what would be possible. After looking deeply at both, he finally managed to create a proof of concept with ONLYOFFICE. What he did was:

1. Separate the 'converter' part from ONLYOFFICE, the javascript front-end and the 'rest'
2. Made a separate binary of the converter, package the javascript and rewrite all the glue that lets them interact in PHP
3. Make this thing install-able as one big blob, acting as alternative 'server' with a proxy component that ties it all together

This was a LOT of work, but after polishing it, we had something we could show to the ONLYOFFICE people. They were initially not huge fans of what we did - no surprise, as it was an ugly solution. We discussed this for a fair bit and in the end, we agreed on an approach.

The result was what we made available last January with the first release of Nextcloud hub. We saw it as a first step towards deeper integration. Watch the video below to get an idea of what it looked like!

📺 view video on YouTube

And then...

After release, two things happened.
First, ONLYOFFICE has sadly been unable to focus much on the integration with Nextcloud. There was a long wish list we had - there is a lot you can do to make the experience nicer, from removing/disabling/hiding duplicated features like the build in chat and file handling to making file collaboration work in other apps like Talk, or adding certain features that connect even deeper like @mentioning users for example. Unfortunately, this didn't happen. No blame, there is a lot happening in the world right now!
Second, Collabora was inspired by the work and while we didn't think we could make it install-able with such ease, they obviously know their own technology better. And indeed, they did make it happen! Besides that, we worked with them to improve the already pretty good integration further, allowing you to edit documents while in a video call or chat in Talk.

As our focus continues to be on providing the best experience possible, we simply looked at that: what gives, right now, the best experience. And thus our latest video shows Collabora instead...

📺 view video on YouTube

Note that this doesn't mean we don't like ONLYOFFICE. 😍 This just changed the default you get on installation. Both solutions are very good and continue to be available for users! And perhaps things will change for the next release. Given the large differences at every level between the two, I consider it a benefit to have both approaches available for Nextcloud users!

So is Collabora better?

I will let Captain Marvel answer that.

a silhouette of a person's head and shoulders, used as a default avatar

openSUSE Tumbleweed – Review of the weeks 2020/21 – 23

Dear Tumbleweed users and hackers,

It has been a while since I wrote a ‘weekly’ review. My own fault for taking some days off, right? At least I had good weather and enjoyed the time – a lot even. But now I am in dept to you: I owe you a review over what happened over the last three weeks. Since the last review, openSUSE Tumbleweed has seen 11 new snapshots (0514, 0515, 0516, 0517, 0519, 0520, 0523, 0526, 0528, 0602 and 0603). Thanks to Max for taking care of it during my absence).

The most notable changes in those snapshots were:

  • Mozilla Firefox 76.0.1
  • Linux kernel 5.6.12 & 5.6.14
  • The YaST changes, as announced by the YaST Team in their sprint reports
  • Mesa 20.0.7
  • VirtualBox 6.1.8
  • KDE Applications 20.04.1
  • Sudo 1.9.0 (final release)
  • GCC 10 as default distro compiler (all packages have been attempted to rebuild. Currently, ~ 5% of packages fail to build)
  • Qt 5.15.0
  • Inkscape 1.0
  • TexLive 2020
  • Guile 3.0.2

This means almost all of the things from the last ‘weekly review’s’ ‘things in progress’ have been delivered by now. But that does not mean there is nothing left. Currently, the staging projects are filled with these major changes:

  • RPM change: %{_libexecdir} is being changed to /usr/libexec. This exposes quite a lot of packages that abuse %{_libexecdir} and fail to build
  • Mozilla Firefox 77.0
  • KDE Plasma 5.19
  • SQlite 3.32.1
  • Linux kernel 5.7.0
  • Mesa 20.1.0
  • openSSL 3.0

the avatar of openSUSE Heroes

Post-Mortem: download.opensuse.org outage

Summary

As the current storage used on download.opensuse.org is running out of service, we started to move to a new storage via pvmove command. The first 12TB were transferred without any problem and no noticeable impact to production. After that, the old storage produced some (maybe longstanding, but unnoticed) problems on some drives, resulting in "unreadable sectors" failure messages in the upper filesystem levels. We managed to recover some data by restarting the pvmove with some offset (like pvmove /dev/vdd:4325375+131072 /dev/vde) over and over again - and finally triggered a bug in dm_mirror at kernel level, which is used by pvmove, and a bad block on a hard drive...

Details

As result, we needed to reboot download.opensuse.org to get the system back to work. As we wanted to get all data transferred to the new storage device, this became a loop:

  1. starting pvmove with offset
  2. waiting for the old storage to run in hard drive timeouts and resetting a drive
  3. looking at the pvmove/dm_mirror running into trouble
  4. seeing the meanwhile known kernel oops
  5. rebooting the machine; start at 1

And as everyone knows: the last steps are always the hardest. While reaching the end of the transfer, the loop started to happen more often. Finally too often for our feeling - so we decided to switch over to our 2nd mirror in Provo, which normally holds all the data (21T) as well, but often a bit outdated because of latency and bandwidth. But this mirror was running stable, so better old content than no content.

So we finally switched the DNS entries for download.opensuse.org and downloadcontent.opensuse.org at 23:00 CEST, pointing to the mirror server in Provo.

Next morning, around 08:00 CEST, people notified us that the SSL certificate for download.opensuse.org is not correct. Right: we forgot to renew the "Let's Encrypt" certificate on the Provo mirror to also contain the new DNS entries. This was a one minute job, but an important one we forgot after the long day before.

Our openSUSE Kernel Guru Jeff Mahoney and our Bugfinder Rüdiger Oertel helped us with the main problem and provided debug information and new test-kernels over the whole time, that helped us to track down and finally eliminate the original problem. A big THANK YOU for this, Jeff and Rudi!

So finally, in the morning of Wednesday 3rd June 2020, around 10:00, we were able to finish the pvmove to the new storage. But: with all the problems, we decided to run an xfs_check/xfs_repair on the filesystem - and this takes some time on a 21TB storage. So we decided to leave the DNS in Provo, but instead provide the redirector database there, to free up some bandwidth that is needed to run the office in Provo. Luckily, we still had the DB server, configs and other stuff ready to use there. So all we needed to do was to transfer a current database dump from Nuremberg to Provo, restore the dump and check the old backup setup. This was done in ~30min and Provo was "the new download.opensuse.org" redirector.

After checking the xfs on the new storage, we finally declared the machine in Nuremberg production ready again around 12:00 CEST and switched the DNS back to the old system in Nuremberg with the new storage.

Lessons Learned

What Went Well

  • As always our power users and admins are very fast and vocal about problems they see.
  • The close cooperation with our kernel guru and the live chat helped to identify and solve at least the kernel problem
  • Having a full secondary mirror server at hand which is running in another DC and even in another continent is very helpful, if you need to switch over
  • Having the needed backups and setups ready before a problem occurs also helps to keep the downtime low

What Went Wrong

  • the full secondary mirror server did not contain up-to date data for all the 21TB of packages and files. This lead to some (luckily small) confusion, as some repositories suddenly contained old data
  • our OBS was not directly affected by the outage, but could not push new packages to the secondary mirror directly. The available bandwidth did not allow to keep everything in sync.

Where We Got Lucky

  • having the experts together and having the ability for them to talk directly with each other solves problems way quicker than anything else
  • the setup we used during a power outage of the Nuremberg office 3 years ago was still up and running (and maintained) over all the years. This helped us to setup the backup system in a very quick time frame.

Action Items

Limited to the available bandwidth in Provo:

  • try to establish a sync between the databases in Provo and Nuremberg, which would allow us a hot-standby
  • evaluate possibilities to sync the Provo mirror more often

General:

  • As the filesystem on the standard download.opensuse.org machine is now some years old, was hot-resized multiple times and now had seen some problems (which could be somehow repaired by xfs_repair, but nevertheless), we will try to copy the data over to a completely new xfs version 5 filesystem during the next days
  • Try to get an additional full mirror closer to the one in Nuremberg, which does not have the bandwidth and latency problems - and establish this one as "hot-standby" or even a load-balanced system.