Linux Display Server Primer
Freedom is the essential principle around which Linux (a.k.a. GNU/Linux) as an operating system revolves. You’re free to choose your distribution. You’re free to change your office suite. You’re free to change your kernel! It is only natural this extends to the visual appearance of your system. Virtually anything can be swapped with something else and configured to the nth degree to achieve any imaginable style. Customising the appearance of your system is known as ricing.
Before we attempt any of these customisations, it is important to understand what are the major elements in this and what role they play in yielding the final rendering of your desktop. I’m going to give a bottom-up overview of these elements and their most popular implementations. Please do bear in mind this is not intended by any means to be an exhaustive or authoritative explanation of the Linux display stack. If you wish to obtain more details about a specific topic covered here, refer to the references section at the end of this article, or consult the appropriate documentation.
Even though most of the ideas covered here apply equally to all distros, I’m assuming an installation specific to an Arch Linux system. You will find that the equivalent steps in your preferred distro are nearly the same (or even easier), but that much I’ll leave you to figure out.
The Display Server
Text-based consoles can be very useful and fun. But, fortunately, staring at a flickering monochromatic screen stopped being the only way to use a computer long ago. The invention of the Graphical User Interface -or GUI- in the Palo Alto-based company Xerox was a big turning point, transforming how computers were regarded by the mainstream population and helped make computers useful, not only to academics that make this possible, but the very first one (just above raw graphics rendering) is the display server, the first protagonist of our whirlwind tour of the stack.
A display server, or windowing system, is a (Userland applications run in user mode as opposed to privileged or kernel mode, so they have a much more restricted control of the hardware resources) userland software layer responsible for capturing interactions from the user’s mouse and keyboard as well as communicating with the kernel to render graphical artifacts on the screen, namely WIMP elements and fonts. As its name implies, it follows a client-server approach whereby all applications must send requests to it to draw to the screen, as only the display server is allowed to talk directly to the kernel to utilise the system’s graphics resources. In the next section we’ll explain in more detail how this is actually done but, let us simply say at this point that, what this means is that when an application wants to render its container window on the screen, it must do so via the display server. The server will provide it with the screen real estate it needs, a fixed position on the screen, alongside graphic primitives necessary to draw the window borders and other usual elements like the close, minimise or expand buttons at the top. The display server doesn’t have any real knowledge about what it’s drawing, it doesn’t understand what a pointer or a window is, conceptually speaking. It doesn’t need to, as that is someone else’s job: all it cares about is shapes, boundaries and the intersections between these (a technique known as compositing, which we’ll also explain later).
The X Window System which, at its 11th revision -commonly referred to as X11-, is by far the most ubiquitous type of display server protocol, and X.Org Server is its de facto implementation.
X.Org Server can be installed with the
xorg-server-apps packages in Arch Linux, which include functionality for debugging and further configuration. Even though
xorg-server is the main one, the other two will provide you with some indispensable tools like the debugging tool
xinit, which is what you use to start a new instance of X in your system, that is, transform that flickering one-dimensional terminal interface into a colourful gateway to your computer. I recommend you install all of these packages to save yourself some trouble.
With X, installing it is just half the battle. You’ll have to go through a configuration process which involves generating and/or editing X.Org’s settings file,
xorg.conf. This file contains all the information X needs to know about your hardware: screens, keyboard, mouse… This task may be more or less painstaking, depending on how many tools are available to you in your particular environment. There are two very straightforward ways to configure X:
- Run any of these tools if there are any available:
xorgconfig, in that order of preference. If one fails to generate a sane config file, just try the next one. If none work, go to alternative 2.
- Create your own config file from a template or just edit the existing default one, if provided. X.Org places its default config file in
/etc/X11/xorg.conf. You could try editing that file to make it work with your system or attempt to write one from scratch using
/etc/X11/xorg.conf.installas a reference, if you have that available.
Writing X config files is no easy task and I’m not going to explain how to write one in this guide either as I’m far from being an expert on that myself but just to give you an idea of what a working X config file looks like, here’s mine
Section "ServerLayout" Identifier "layout" Screen 0 "nvidia" Inactive "intel" EndSection Section "Device" Identifier "nvidia" Driver "nvidia" BusID "PCI:1:0:0" EndSection Section "Screen" Identifier "nvidia" Device "nvidia" Option "AllowEmptyInitialConfiguration" "Yes" EndSection Section "Device" Identifier "intel" Driver "modesetting" Option "AccelMethod" "none" EndSection Section "Screen" Identifier "intel" Device "intel" EndSection
There are several other additional config files for X.Org you’ll see mentioned at some point or another, like
.Xdefaults, the newer
xinitrc, etc. These files control how X is initialised when launched, system fonts, console appearance and much more. I’m not getting into any more detail with these, just wanted to create some awareness on them because I’m sure you’ll come across them sooner or later if you’re setting up X from scratch.
Architecture of the X Window System
The X Window System exposes a protocol to communicate asynchronously with its clients (there can be any number of them) either via a network or locally. You can go and have a read at its specification here if you want (not that you have to). In order to illustrate X’s functioning, let’s assume a common scenario where a user is interacting with an application and has just clicked a radio button. The actions that will ensue are as follows:
- The Linux kernel device drivers register a left mouse click was performed and adapt its device-specific message structure to match evdev’s, Linux input driver.
- evdev passes this information on to the display server, X, along with the position of the event in absolute terms.
- X has no knowledge of where the mouse click was performed in the context of the target window so it notifies the client application that owns the window of this event.
- The client application accepts the event information and decides what should be done next. In this case, we need to animate the clicked radio button so it appears selected. This information is sent back to the display server.
- X then forwards the client’s response to the compositor, which has full knowledge of the current layout of the screen, how many windows there are, if there are any intersections between them, etc. The compositor decides what the next frame should look like.
- After calculating the next frame, the compositor sends a request to the X server. X then communicates with the kernel again to get the frame put on the graphics adapter framebuffer, so it can be shown on the screen.
As stated before, X is not aware of what it’s drawing to the screen nor does it remember what it’s sent to the framebuffer, so every time something needs to be redrawn on the screen, X sends a request to the appropriate client application to take care of that. This is known as running in legacy mode. Nowadays, it is preferred to use an external composite manager or compositor. A composite manager’s main job is to receive the next window frame to render on the screen for every individual X client (i.e. every windowed application) and then combine all of these individual windows, calculating their intersections, z-index (visibility or depth), etc. to obtain the final scene to display on the screen. Composite managers can add many more features like menu and window animations, transparency and shadows. Notable examples of compositors are compiz, compton, xcompmgr or Cairo Compmgr.
X is capable of drawing 2D and 3D elements onto the screen via the Cairo and GLX (OpenGL) libraries, respectively. If you’re installing a system from scratch it might be worth checking if you already have the necessary graphics driver installed. The mesa package is fairly standard and will work just fine on most architectures and with most graphics cards. It will not be very fast though. For that, you’ll need drivers specific to your graphics adapter. For example, in the case of Nvidia GPUs, you’d need the nouveau package or Nvidia’s proprietary drivers (nvidia) to squeeze the maximum performance out of your card.
Unsurprisingly, X.Org is written in C. Even though it is possible to directly code against it with Xlib and the newer XCB interface, you’ll rarely need to do such a thing to develop GUI applications in Unix systems. Instead, you code using a GUI toolkit. The three most popular GUI toolkits are GTK+, Qt and Tk. GTK+ is the most widespread one whereas Qt is the toolkit for KDE applications. Tk is the oldest of the three as well as the easiest one to program with, although it looks very arcane compared to its modern counterparts.
You can read more about X’s architecture on its wiki. Apart from X11, there are other well-known display server protocols in very active development, such as Ubuntu’s Mir and Wayland. Wayland and Weston, its reference implementation, look quite promising as they’re being rolled out to the latest versions of Red Hat’s cutting-edge distribution starting from Fedora 21. Both Wayland and Mir have many reasons for existing but they both spawned off the same desire to modernise GNU/Linux by removing the clutter and quirky artifacts inherited from its Unix roots. One of these jewels from the past is the topic in discussion, the X Window System, which has been around since the mid 80’s. Now, how does Wayland work and in what areas does it bring improvement over X? These are manifold but can be boiled down to one: Wayland removes the middle-man.
If you paid attention to the example event process outlined a moment ago, you’d have noticed X plays an unavoidable central role in the chain. Nowadays, most components of the architecture (udev, evdev, the compositor) have become self-sufficient in accomplishing their individual tasks and therefore do not need X for nothing else than as a dumb data relay. The proponents of Wayland realised this and decided it would make more sense to replace X altogether with the compositor, thus making the display server and the composite manager become the same thing.
In Wayland, the compositor talks directly with the clients and the kernel to manipulate the appearance of the screen. The implications of this approach are substantially positive. Performance is increased as the overhead of having every component communicating via a piece of middleware is eliminated. Also, the overall structure and workflows of the system are greatly simplified as well as the codebase which gets an extreme makeover, going from X’s ancient source to Wayland’s, or properly speaking, Weston’s. The net result of this is that GUI applications should render quicker and cleaner, especially when animations and interactions between different windows are involved.
Unfortunately, the fact that Wayland exists doesn’t mean it has been adopted all across the board. Migrating from X, which is deeply intertwined with all the existing GUI participants, to Wayland is no easy task, as the change cannot occur only on a single layer of the chain: it’s not as simple as uninstalling X.Org and then installing Weston. X is a dependency in one way or another for most tiers of the desktop rendering hierarchy: toolkits, WMs and naturally DEs, all have been built on top of X. Thus, the switch was not going to happen quick and painlessly. GNOME and KDE maintainers have done a great job at incorporating Wayland support in the latest revisions of their software (as I said before, Fedora can work on Wayland out of the box now) but there’s much work to be done still, especially for standalone WMs like awesome, which doesn’t even have plans to support it. But wait… what was a WM again?
Window managers are key backend components of a GUI environment. They control the location, appearance and animations (moving, resizing…) of the windows shown on the screen and collaborate closely with the display server to achieve this purpose. WMs are devoted to the task of manipulating the size and position of the windows on your desktop as well as rendering their borders, title bars and icons. They also control other elements such as desktop menus, dialogs and popups (e.g. notify-send popups).
Managing and rendering the visual appearance of a window frame, that is, everything surrounding the application window, is known as decoration. Decoration can be performed by two different actors in the display model: the client (client-side decoration or CSD) or the server (server-side decoration or SSD).
- Client-side decoration delegates the rendering of the window frame and the bar to the application itself.
- Server-side decoration places the responsibility of managing a window frame on the window manager or the display server.
Both CSD and SSD have their pros and cons and have been embraced by one DE or another. CSD’s strengths are that it offers much more freedom to the application developer, as he can tailor the look of his application all around, including an element which is usually off-bounds, as is the window frame itself. The problem with CSD is that, should the application crash at any point, the whole window will become stuck and will be a pain to remove from the screen. This is a common scenario on Windows and its infamous Application Not Responding problem, which is often provoked due to a bad design in which the UI is not given its own thread separate from the backend logic. On the other hand, SSD is safer and performs better, as the display server or WM are in full control of the window frame of any application. This means that, even if the application crashes or becomes unresponsive for a while, at least you will be able to move, close or minimise the window, and shutting down the application becomes less of a fuss.
There was once mention of a novel approach to rendering window frames, namely Dynamic Window Decorations, which looked promising, attempting to bring the best of both worlds, but I guess it died out over time as not much has been said recently on the matter. The basic idea behind it was that client applications published a specification of the widgets and custom elements they wanted drawn and then between the display server and another participant called the DWD console, would render native controls on the screen according to this. Since WMs play such a vital role in how your desktop appearance is composed, they are capable of transforming your experience in many different ways. There are three major types of WM:
- Stacking or floating: these are by far the most commonplace window managers. The default choice in Windows and OS X as well as all mainstream Linux distributions, floating WMs do not arrange windows in a predefined manner. Windows can be moved around freely by the user and can potentially be hidden by other windows (which are floating on top). The most representative example (in Linux) would be OpenBox, the default WM for Xfce.
- Tiled: tiled window managers arrange windows in a predefined manner, in an attempt to maximise the amount of screen real estate by applying a fixed size and coordinates to every screen on display. Tiling WMs’ default behaviour is to give every screen an equal amount of space by splitting the amount of screen available. The layout of the root window (using X terms) can be changed by the user. Therefore, you could have a layout where windows are tiled horizontally (screen is split side by side) and vertically (one above another). Think of it like doing a :v or :hsplit on Vim or C-x 2, C-x 3 on Emacs. Tiling WMs also work quite nicely with multi-monitor setups. Examples are fluxbox, bspwn, XMonad or Ratpoison, although the most popular one by far is i3, which is remarkably well documented, I must say.
- Dynamic or hybrid: these try to offer the best of both worlds and give you the option to select the paradigm you want to use. The best example is awesome, the one I use, which lets you assign a desktop metaphor for every individual workspace.
Installing a window manager should be as easy as grabbing the package through your package manager and then appending this line to your ~/.xinitrc file once it’s installed:
$ exec awesome
However, this will give you a very crude rendering of your desktop. You would rarely go through the trouble of installing a window manager yourself without the intention of doing some customisations to it. Each window manager has its own configuration approach and all of them are pretty unique to them. OpenBox basic settings can be set through from a menu but in reality these settings are stored in an XML file; awesome is configured using Lua; XMonad uses Haskell; i3 has another rather simple, custom config file format. From these configuration files you’ll be able to control the appearance of your desktop to ridiculous extents: global keyboard shortcuts, wallpaper, workspace layout, desktop menus, taskbar, window appearance (duh) and much much more. Explaining how you can configure your own WM from zero to beautiful is beyond the scope of this article. Maybe some day I’ll write a guide for configuring awesome, my favourite WM, but for the time being it’s up to you to embark on that journey of discovery alone. You’ll find the Arch Linux wiki most useful in figuring out the darker areas of WM configuration, no matter which one you choose.
Desktop Environments lie at the highest level of abstraction of the graphical desktop chain. They encompass all the services explained above and deliver them, and much more, with a consistent theme and structure so that the end-user can use his/her system through a unified interface. A desktop environment is what you’ll see as soon as you login to any of the mainstream Linux distros of today. Debian and Fedora have GNOME; openSUSE has KDE; Ubuntu has Unity. There’s also Xfce and LXDE, which are really popular lightweight alternatives, but they don’t come pre-installed in any of the major upstream distributions.
Desktop environments provide a completely out-of-the-box visual experience: display servers, window managers, file managers, session managers, login managers, icon, font and wallpaper packages… DEs take care of all of that and much more. There’s no setup required other than installing them and customising them is very easy in case you’re not too fond of their default look. In particular, KDE is very well-known for being one of the most highly configurable DEs. I especially recommend the version of KDE Plasma released with the Fedora KDE spin (shown above).
To DE or not to DE?
One of the questions that comes up the most on ricing and Linux enthusiast forums is whether it’s worth going through the trouble of installing and configuring a custom WM instead of taking the easy way and using a preconfigured DE developed to cater for everyone’s needs. The short answer is no. It’s not worth it if you’re not extremely obsessed with performance or maximum customisation. However, in case you do fancy having a completely personalised environment built entirely by yourself, you’ll definitely find it a fun experience. Doubtless WM ricing is much more common in DIY distros like Slackware, Gentoo or Arch Linux, but it’s certainly possible to do in some of the mainstream ones. I managed to get a pretty decent “awesome WM” set up on Ubuntu, for example.
It is viable to have a full-blown DE installed side-by-side with a barebones WM implementation and then be able to sign in to either of them, insofar as it’s possible to have multiple DEs installed on the same machine. I’ve personally found this sort of approaches tend to get dirty and cumbersome eventually though.
For instance, I installed Linux Mint with its default Cinammon interface on a laptop and then decided to go for the glossy look of KDE Plasma, so I ended up installing both (to be honest, I don’t quite recall if I actually uninstalled Cinammon or just gave up and installed KDE on top of it) and even though the desktop turned out alright, there I still find a few wrinkles on it that bother me occasionally, like the fact that now all of the System Settings have been split up into two control panels: cinnamon-settings, which I have to invoke from the terminal, and KDE’s control panel, which is accessed normally. I’ve found that certain settings only get applied when done from one control panel, but not both. Morale of the story? Try out as many different DEs/WMs as you want, but do so in throw-away environments like virtual machines. Then, once you’ve settled for a particular one, install a system which comes with it preinstalled, like the Ubuntu variants: Lubuntu (LXDE), Xubuntu (Xfce), Kubuntu (KDE)… or any of the Fedora spins.
The intention of this article was just to provide a quick overview of the principles behind the Linux desktop. There are many hands-on tutorials on how to customise your Linux environment using X, Y or Z piece of software, and massive amounts of documentation to configure any WM or DE under the sun. Generic ricing tutorials are abundant and compendious too. However, there are not that many introductions to how it all really works behind the scenes, how all the different components fit together in the greater picture of the graphical desktop set of technologies. I hope this article contributes a bit towards helping those fellows like me who feel vertigo at the mere idea of running something on their system which they don’t understand.
If you’re still not convinced if ricing is for you or want to get some inspiration, there are many outstanding examples you can see on the web. For example, check out /r/unixporn if you want to see a few.
I want to clarify that, even though this article assumes Linux as the system to customise, it’s perfectly possible to perform ricing on most of the BSD variants (including OS X), as most of the components described are so fundamental that the concepts covered here will probably carry over to other platforms different to Linux (even Windows). In fact, PC-BSD, for example, lets you choose from a great number of WMs during its first install wizard. As a matter of fact, installing all the ones on offer and then switching between them is a great way to test them out without the hassle of setting them all up on your own.
The best piece of advice I can give you is to completely disregard mine or anyone else’s opinion on what the best-looking WM/DE/system should be and just find your own style! Install every DE, every WM you can get hold of, configure them, get used to them, like them or hate them. Rinse and repeat. You’ll eventually find the best one for you. Good luck!
References and Further Reading
Kalle Dalheimer M, Welsh M, Running Linux (ch.16), 5th Edition, USA: O’Reilly
Presenting DWD: A Candidate for KDE Window Decorations
Wayland article in Fedora wiki
Wayland Protocol Specification Rev. 1.3