Skip to main content

a silhouette of a person's head and shoulders, used as a default avatar

GSoC 2013 Experiences

924x156xbanner-gsoc2013_png_pagespeed_ic_Z9V_lgyiqp

Finally, GSoC 2013 comes to an end. Being an admin for openSUSE was an enriching experience, and I learnt a lot in the process. I realized that it takes as much effort to manage a program in the community as it takes to participate in one. With a few hiccups along the way, I can safely say that GSoC 2013 was a success for openSUSE. In this post, I am highlighting the work done along the way for openSUSE this summer, and my own experiences.

Work Done over the Summer:

Overall, We had 12 selections from the openSUSE umbrella, which comprised of openSUSE, Balabit (syslog-ng), Hedgewars and ownCloud. 6 projects were for openSUSE and 2 each for the other organizations. Towards the end, we had 10 successful projects.

The projects completed over the summer were:

  • OBS Discussion System by Shayon Mukherjee
  • Automatic Resizing of LVM Volumes by Akif Khan
  • OSEM by Stella Rouzi
  • AppArmor Profiling Tools by Kshitij Gupta
  • git-review by Xystushi
  • User Management Application for ownCloud by Raghu Nayyar
  • Music App for ownCloud by Morris Jobke
  • Syslog-ng MySQL Destination by Gyula Petrovics
  • Syslog-ng Redis Destination by Tihamér Petrovics
  • Hedgewars Mission Campaign by Periklis Natansis

My Experiences:

  1. It took us a long time to get started for GSoC this year. In the early months, me and Manu were facing difficulties in trying to get mentors to add projects, perhaps due to previous experiences. An issue with GSoC over the years has been that students don’t stick around in their respective communities. Mentors invest a lot of time in the students, and the effort is *wasted*, when the student doesn’t stick around. This got us into a debate whether we should give preference to existing contributors, or give potentially new contributors a chance. For this year, we had a good mixture of new and existing contributors, which bodes well for openSUSE as a whole.
  2. We got very few slots this time. With the number of proposals we got, and the sheer number of projects on our ideas pages, we expected around 16-17 slots, which could accommodate most of the openSUSE projects, and also our coparticipating orgs. Getting 12 slots was a bit of a shock for us, because we had a good number of projects. Some of the mentors were understandably upset, that their project was overlooked, as we had 6 projects for openSUSE, and 2 each for the other orgs. This was something beyond our control, but it would helped if we would have got a few more slots.
  3. We were hit this time by the problem of Disappearing students. One of the students had got accepted, only to disappear for long periods of time. Ultimately, the project failed. The worst part was that the student had prior contributions, and the mentors had verified them. This calls for stricter rules to select students, and verify their credibility.
  4. Health of students proved to be the stumbling block for projects. One of our students had severe issues with health till the midterm. His mentor chose to pass the student in the midterm, but valuable time was lost. Even with a lot of effort, there was not enough output for the mentor to pass the student. Hats off to the student for still staying motivated to contribute to openSUSE. In another case, the student fell sick after the midterm, and due to that, all the project goals could not be completed, though the project is in good shape to be completed.
  5. At the openSUSE Conference, me and Manu had discussions with the community, mentors about the program. We realised that the potential of GSoC could be realised by having worthwhile contributions, and having the contributors stick (which is what the program is about). Our efforts in the coming years would be based on that, and hopefully, we wont face many problems

the avatar of Andrew Wafaa

EuroBSDCon Day 4

The final day of EuroBSDCon kicked off (summary of Day 1, 2 & 3) and I gave my talk on “Introducing the 64-bit ARMv8 Architecture” in the morning, actually the morning in Track 3 was overtaken by ARM related talks :-). My talk went down well (no rotten fruit/veg was thrown! \o/), and it was nearly a full room people from various flavours of BSD which was encouraging. There were some good questions and some suggestions for things that could make things easier for the developers to support ARM moving forward which is always useful to hear.

a silhouette of a person's head and shoulders, used as a default avatar

Use the Scan to PC function on a Samsung Multifunction

So I went out and bought myself a spanking new multifunction creature from Samsung called a CLX3305FN. Generally, this fits into the CLX3300 series, but it has LAN only – no wifi (that would be the CLX3305FW). One of the reasons I decided I wanted this was because of the advertised “Scan to PC” function. I figured it would be simple on Windows and that I’d be able to get it working on Linux through YaST or sane/scanimage etc – i.e. a PITA but it would work.

As it turned out, it didn’t work at all. The function is supposed to be for Windows only. However, the clever lads over at bchemnet.com reverse engineered the protocol that was used between the scanner and a windows PC and managed to hack a script together which runs as a server daemon. It just sits there twiddling its thumbs until a user presses the “Scan to PC” button on the printer/scanner. Then it kicks into action and uses sane to send scan commands to the scanner. The result of it is that the scan lands in $HOME/Scans/ – thus, the Scan To PC function is neatly implemented for Linux. There are, of course, rough edges (such as the scanner sending in RAW rather than JPEG) but nothing that couldn’t be fixed in a hackweek.

So where can you get it? The package is available at http://software.opensuse.org/package/python-samsungScannerServer – but you’ll need to install the Samsung Unified Driver for Linux first. I found it at http://www.samsung.com/us/support/owners/product/CLX-3305FN but apparently bchemnet.com has a repo for debian/ubuntu where you can download it too. Once you install that and my package, you’ll probably have to do a systemctl start samsungScannerServer (“probably” because I don’t really know how systemd worked and schustered together a .service file based on google search results).

Another nice hackweek project would be to use something like inotify to discover incoming scanned files, gpg to encrypt them and email them to the user (and then delete the unencrypted version). I also need to look into getting the unified samsung driver working on ARM so I can use my raspberry as a scan server which sends encrypted scans to my email address…

a silhouette of a person's head and shoulders, used as a default avatar

Problems with pylab and ipython on openSUSE – ImportError

These days I’m trying to use python+numpy+scipy to solve some academic problems, and when I tried to import pylab module I got those errors:

In [63]: import pylab
—————————————————————————
ImportError Traceback (most recent call last)
in ()
—-> 1 import pylab

/usr/lib64/python2.7/site-packages/pylab.py in ()
—-> 1 from matplotlib.pylab import *
2 import matplotlib.pylab
3 __doc__ = matplotlib.pylab.__doc__

/usr/lib64/python2.7/site-packages/matplotlib/pylab.py in ()
263 from numpy.linalg import *
264
–> 265 from matplotlib.pyplot import *
266
267 # provide the recommended module abbrevs in the pylab namespace

/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py in ()
95
96 from matplotlib.backends import pylab_setup
—> 97 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
98
99 @docstring.copy_dedent(Artist.findobj)

/usr/lib64/python2.7/site-packages/matplotlib/backends/__init__.pyc in pylab_setup()
23 backend_name = ‘matplotlib.backends.%s’%backend_name.lower()
24 backend_mod = __import__(backend_name,
—> 25 globals(),locals(),[backend_name])
26
27 # Things we pull in from all backends

ImportError: No module named backend_tkagg

After a few web search it was solved installing the package python-matplotlib-tk.

In openSUSE it means, as root:

zypper install python-matplotlib-tk

Reference: http://forums.opensuse.org/english/other-forums/development/programming-scripting/416182-python-matplolib.html

the avatar of Jeffrey Stedfast

Optimization Tips & Tricks used by MimeKit: Part 1

One of the goals of MimeKit, other than being the most robust MIME parser, is to be the fastest C# MIME parser this side of the Mississippi. Scratch that, fastest C# MIME parser in the World.

Seriously, though, I want to get MimeKit to be as fast and efficient as my C parser, GMime, which is one of the fastest (if not the fastest) MIME parsers out there right now, and I don't expect that any parser is likely to smoke GMime anytime soon, so using it as a baseline to compare against means that I have a realistic goal to set for MimeKit.

Now that you know the why, let's examine the how.

First, I'm using one of those rarely used features of C#: unsafe pointers. While that alone is not all that interesting, it's a corner stone for one of the main techniques I've used. In C#, the fixed statement (which is how you get a pointer to a managed object) pins the object to a fixed location in memory to prevent the GC from moving that memory around while you operate on that buffer. Keep in mind, though, that telling the GC to pin a block of memory is not free, so you should not use this feature without careful consideration. If you're not careful, using pointers could actually make your code slower. Now that we've got that out of the way...

MIME is line-based, so a large part of every MIME parser is going to be searching for the next line of input. One of the reasons most MIME parsers (especially C# MIME parsers) are so slow is because they use a ReadLine() approach and most TextReaders likely use a naive algorithm for finding the end of the current line (as well as all of the extra allocating and copying into a string buffer):

    // scan for the end of the line
    while (inptr < inend && *inptr != (byte) '\n')
        inptr++;

The trick I used in GMime was to make sure that my read buffer was 1 byte larger than the max number of bytes I'd ever read from the underlying stream at a given time. This allowed me to set the first byte in the buffer beyond the bytes I just read from the stream to '\n', thus allowing for the ability to remove the inptr < inend check, opting to do the bounds check after the loop has completed instead. This nearly halves the number of instructions used per loop, making it much, much faster. So, now we have:

    // scan for the end of the line
    while (*inptr != (byte) '\n')
        inptr++;

But is that the best we can do?

Even after using this trick, it was still the hottest loop in my parser:

We've got no choice but to use a linear scan, but that doesn't mean that we can't do it faster. If we could somehow reduce the number of loops and likewise reduce the number of pointer increments, we could eliminate a bunch of the overhead of the loop. This technique is referred to as loop unrolling. Here's what brianonymous (from the ##csharp irc channel on freenode) and I came up with (with a little help from Sean Eron Anderson's bit twiddling hacks):

    uint* dword = (uint*) inptr;
    uint mask;

    do {
        mask = *dword++ ^ 0x0A0A0A0A;
        mask = ((mask - 0x01010101) & (~mask & 0x80808080));
    } while (mask == 0);

And here are the results of that optimization:

Now, keep in mind that on many architectures other than x86, in order to employ the trick above, inptr must first be 4-byte aligned (uint is 32bit) or it could cause a SIGBUS or worse, a crash. This is fairly easy to solve, though. All you need to do is increment inptr until you know that it is 4 byte aligned and then you can switch over to reading 4 bytes at a time as in the above loop. We'll also need to figure out which of those 4 bytes contained the '\n'. An easy way to solve that problem is to just linearly scan those 4 bytes using our previous single-byte-per-loop implementation starting at dword - 1. Here it is, your moment of Zen:

    // Note: we can always depend on byte[] arrays being
    // 4-byte aligned on 32bit and 64bit architectures
    int alignment = (inputIndex + 3) & ~3;
    byte* aligned = inptr + alignment;
    byte* start = inptr;
    uint mask;

    while (inptr < aligned && *inptr != (byte) '\n')
        inptr++;

    if (inptr == aligned) {
        // -funroll-loops
        uint* dword = (uint*) inptr;

        do {
            mask = *dword++ ^ 0x0A0A0A0A;
            mask = ((mask - 0x01010101) & (~mask & 0x80808080));
        } while (mask == 0);

        inptr = (byte*) (dword - 1);
        while (*inptr != (byte) '\n')
            inptr++;
    }

Note: In this above code snippet, 'inputIndex' is the byte offset of 'inptr' into the byte array. Since we can safely assume that index 0 is 4-byte aligned, we can do a simple calculation to get the next multiple of 4 and add that to our 'inptr' to get the next 4-byte aligned pointer.

That's great, but what does all that hex mumbo jumbo do? And why does it work?

Let's go over this 1 step at a time...

    mask = *dword++ ^ 0x0A0A0A0A;

This xor's the value of dword with 0x0A0A0A0A (0x0A0A0A0A is just 4 bytes of '\n'). The xor sets every byte that is equal to 0x0A to 0 in mask. Every other byte will be non-zero.

    mask - 0x01010101

When we subtract 0x01010101 from mask, the result will be that only bytes greater than 0x80 will contain any high-order bits (and any byte that was originally 0x0A in our input will now be 0xFF).

    ~mask & 0x80808080

This inverts the value of mask resulting in no bytes having the highest bit set except for those that had a 0 in that slot before (including the byte we're looking for). By then bitewise-and'ing it with 0x80808080, we get 0x80 for each byte that was originally 0x0A in our input or otherwise had the highest bit set after the bit inversion.

Because there's no way for any byte to have the highest bit set in both sides of the encompassing bitwise-and except for the character we're looking for (0x0A), the mask will always be 0 unless any of the bytes within were originally 0x0A, which would then break us out of the loop.

Well, that concludes part 1 as it is time for me to go to bed so I can wake up at a reasonable time tomorrow morning.

Good night!

the avatar of Andrew Wafaa

EuroBSDCon DevSummit Day 3

OK so I lied, the EuroBSDCon actually has 3 days of FreeBSD DevSummit. The third day built upon some of the things covered in Day 2. The first part of the morning was taken over by the FreeBSD working groups, these working groups are the ones that do the work and integrate things that make up the full FreeBSD Operating system. There were presentations by groups like Toolchain, Security, Packaging, Desktop, etc.

the avatar of Jeffrey Stedfast

MimeKit: Coming to a NuGet near you.

If, like me, you've been trapped in the invisible box of despair, bemoaning the woeful inadequacies of every .NET MIME library you've ever found on the internets, cry no more: MimeKit is here.

I've just released MimeKit v0.5 as a NuGet Package. There's still plenty of work left to do, mostly involving writing more API documentation, but I don't expect to change the API much between now and v1.0. For all the mobile MIME lovers out there, you'll be pleased to note that in addition to the .NET Framework 4.0 assembly, the NuGet package also includes assemblies built for Xamarin.Android and Xamarin.iOS. It's completely open source and licensed under the MIT/X11 license, so you can use it in any project you want - no restrictions. Once MimeKit goes v1.0, I plan on adding it to Xamarin's Component Store as well for even easier mobile development. If that doesn't turn that frown upside down, I don't know what will.

For those that don't already know, MimeKit is a really fast MIME parser that uses a real tokenizer instead of regular expressions and string.Split() to parse and decode headers. Among numerous other things, it can properly handle rfc2047 encoded-word tokens that contain quoted-printable and base64 payloads which have been improperly broken apart (i.e. a quoted-printable triplet or a base64 quartet is split between 2 or more encoded-word tokens) as well as handling cases where multibyte character sequences are split between words thanks to the state machine nature of MimeKit's rfc2047 text and phrase decoders (yes, there are 2 types of encoded-word tokens - something most other MIME parsers have failed to take notice of). With the use of MimeKit.ParserOptions, the user can specify his or her own fallback charset (in addition to UTF-8 and ISO-8859-1 that MimeKit has built in), allowing MimeKit to gracefully handle undeclared 8bit text in headers.

When constructing MIME messages, MimeKit provides the user with the ability to specify any character encoding available on the system for encoding each individual header (or, in the case of address headers: each individual email address). If none is specified, UTF-8 is used unless the characters will fit nicely into ISO-8859-1. MimeKit's rfc2047 and rfc2231 encoders do proper breaking of text (i.e it avoids breaking between surrogate pairs) before the actual encoding step, thus ensuring that each encoded-word token (or parameter value) is correctly self-contained.

S/MIME support is also available in the .NET Framework 4.0 assembly (not yet supported in the Android or iOS assemblies due to the System.Security assembly being unavailable on those platforms). MimeKit supports signing, encrypting, decrypting, and verifying S/MIME message parts. For signing, you can either use the preferred multipart/signed approach or the application/[x-]pkcs7-signature mime-type, whichever you prefer.

I'd love to support PGP/MIME as well, but this is a bit more complicated as I would likely need to depend on external native libraries and programs (such as GpgME and GnuPG) which means that MimeKit would likely have to become 32bit-only (currently, libgpgme is only available for 32bit Windows).

I hope you enjoy using MimeKit as much as I have enjoyed implementing it!

Note: For those using my GMime library, fear not! I have not forgotten about you! I plan to bring many of the API and parser improvements that I've made to MimeKit back to GMime in the near future.

For those using the C# bindings, I'd highly recommend that you consider switching to MimeKit instead. I've based MimeKit's API on my GMime API, so porting to MimeKit should be fairly straightforward.

the avatar of Andrew Wafaa

EuroBSDCon DevSummit Day 2

Day 2 of EuroBSDCon (overview of Day 1) kicked off for me with the morning being dedicated to virtualization. In Linux we’re all used to the usual suspects - KVM/Xen/LXC/VMWare, so I was interested to hear what’s available on FreeBSD especially bhyve which is probably as close to KVM as one will get. Xen support is available in FreeBSD, but only as DomU (guest VM). Dom0 (host server) support is actively being worked on though to rectify this shortfall (this does not apply to NetBSD).

the avatar of Andrew Wafaa

EuroBSDCon DevSummit Day 1

As part of my role at work I get to interact with various Open Source projects, and not all of them are Linux related. This week I’m on the beautiful, historic, sunny and warm island of Malta attending EuroBSDCon. As you can work out from the name, it is the European gathering of BSD developers and users covering all the BSD variants like FreeBSD/NetBSD/OpenBSD/PCBSD/Dragonfly. I’m giving a talk on ARMv8 and AArch64 on Sunday, so hopefully that will go well, I’m somewhat nervous as I finished my slides before I left which is somewhat unusual.
the avatar of Klaas Freitag

DAV Torture

Currently we speak a lot about performance of the ownCloud WebDAV server. Speaking with a computer programmer about performance is like speaking with a doctor about pain. It needs to be qualified, the pain, and also the performance concerns.

To do a step into that direction, here is a little script collection for you to play with if you like: the DAV torture collection. We started it quite some time ago but never really introduced it. It is still very rough.

What it does

The first idea is that we need a reproducable set of files to test the server with. We don’t want to send around huge tarballs with files, so Danimo invented two perl scripts called torture_gen_layout.pl and torture_create_files.pl. With torture_gen_layout.pl one can create a file that contains the layout of the test file tree, a so called layout( or .lay)-file. The .lay-file describes the test file tree completely, with names, structure and size.

torture_gen_layout.pl takes the .lay-file and really creates the file tree on a machine. The cool thing about is that we can commit on a .lay-file as our standard test tree and just pass a file around with a couple of kbytes size that describes the tree.

Now that there is a standard file tree to test with, I wrote a little script called dav_torture.pl. It copies the whole tree described by a .lay file and created on the local file system to an ownCloud WebDAV server using PUT requests. Along with that, it produces performance relevant output.

Try it

Download the tarball and unpack it, or clone it from github.

After having installed a couple of perl deps (probably only modules Data::Random::WordList, HTTP::DAV, HTTP::Request::Common are not in perl’s core) you should be able to run the scripts from within the directory.

First, you need to create a config file. For that, copy t1.cfg.in to t1.cfg (don’t ask about the name) and edit it. For this example, we only need user, passwd and url to access ownCloud. Be careful with the syntax, it gets sourced into a perl script.

Now, create the local reference tree with a .lay-file which I put into the tarball: ./torture_create_files.pl small.lay tree This command will build the file tree described by small.lay into the directory called tree.

Now, you can already treat your server: Call ./dav_torture.pl small.lay tree This will perform PUT commands to the WebDAV server and output some useful information. It also appends to two files results.dat and puts.tsv. results.dat just logs the results of subseqent call. The tsv file is the data file for the html file index.html in the same directory. That opened in a browser gives a curve over the average transmission rate of all subsequent runs of dav_torture.pl (You have run dav_torture.pl a couple of times to make that visible). The dav_torture.pl script can now be hooked into our Jenkins CI and performed after every server checkin. The resulting curve must never raise :-)

To create your own .lay-file, open torture_gen_layout.pl and play with the variables on top of the script. Simply call the script and redirect into a file to create a .lay-file.

All this is pretty experimental, but I thought it will help us to get to a more objective discussion about performance. I wanted to open this up in a pretty early stage because I am hoping that this might be interesting for somebody of you: Treat your own server, create interesting .lay files or improve the script set (testing plain PUTs is rather boring) or the result html presentation.

What do you think?