CoreOS’s rkt started at the beginning of 2014 as a security-focused alternative to Docker. The project aimed to create a signature verification of cloud-native apps by default; the intention was to guarantee the integrity of the apps. It also stepped away from the central-daemon design of Docker, which requires root privileges for all operations. By contrast, the rkt process is short-lived, limiting the chances of being exploited, and some of rkt commands can be executed as unprivileged user.
The project has come a long way since it was conceived. It is stable, fully featured, and it supports a variety of ways to fetch and start cloud-native apps with security being top of mind. For example, it can download apps from the Docker registry and use virtualization to run them.
Rkt champions open standards and supports the Open Container Initiative image and runtime specifications. A couple of months ago, it was accepted into the Cloud Native Computing Foundation, becoming a member of the same family as Kubernetes, the popular container orchestration service. Rkt already has excellent support for Kubernetes, and it will strengthen further now that they live under the same roof.
Why Rkt Works So Well: It’s The Architecture
The primary benefit of rkt is that it is versatile. Its versatility stems from its architecture, which is based on multiple stages of execution: stage0, stage1, and stage2:
Stage0 is responsible for readying the images and is implemented by the rkt executable.
Stage1 is in charge of creating isolated environments to run cloud-native apps. Stage1s are distributed in Application Container Image format (also known as ACI), which is a tarball containing a rootfs and a JSON manifest. It is the same format used for cloud-native apps.
Stage2 is the environment in which the applications actually run.
Taking a lesson from Kubernetes, the basic execution unit of rkt is a pod: a small set of cloud-native apps to be run in a shared context. Typically, a pod is just a couple of apps, for example, a server app and a log parsing app. The log parsing application needs access to the logs of the other app; hence, the two apps share filesystem access.
When the user executes rkt run to start a pod, rkt unpacks the stage1 tarball, sets up the stage2s’ rootfs at a known location under the stage1 filesystem hierarchy, and runs a stage1 application with the right arguments. The stage1 application to run is specified in the stage1 manifest. The stage1 binary takes charge of setting up a fresh new environment, then runs the stage2 applications.
The beauty of this architecture is that stage1s are entirely independent and self-contained. Developers can implement new stage1s easily. They can be maintained, built, and shipped separately from rkt. Today, rkt supports five in-tree stage1s, plus two out-of-tree, including a stage1 based on Linux namespaces named coreos, and a stage1 based on KVM. End-users are given the choice of multiple stage1s with different trade-offs; they can pick the best for their use-cases at runtime, with a simple command line option.
The industry has come a long way since the early days of Docker, when many people confused cloud-native apps with Linux namespaces, because they were both called containers. Linux namespaces are only one of the many technologies to run applications. Similarly, cloud-native apps are packaged according to the ACI format, which is only one of many ways to package applications binaries. The two technologies are orthogonal. The distinction between them is extremely stark in rkt.
Xen Joins the Party
A couple of weeks ago, the rkt community gained stage1-xen, a new stage1 based on the Xen Project hypervisor. It is still in its very early days, but it is a good proof of concept. Xen Project offers a few unique properties, not just in terms of technology, but also in terms of community and processes.
Xen Project is known as the enabler of many strong isolation and privilege separation architectures. Projects like Qubes OS and OpenXT, aimed at highly secure environments, take the security by compartmentalization approach, using the Xen Project hypervisor to create multiple isolated compartments. Each workload runs on a separate virtual machine. Infrastructure components, such as the network stack and the network drivers, can also be moved into their own separate VMs, named driver domains. Even if an attacker manages to penetrate and assume control of a driver domain, the intruder still does not gain full system access.
The Xen stage1 enables users to take advantage of rkt’s easy to use and powerful app management features, together with the Xen Project’s security and isolation properties. It creates a separate, secure by default, Xen virtual machine for each pod.
Configuring Linux namespaces for isolation is hard; it is a daunting task at any scale. SELinux is the top technology to do it, but it has a steep learning curve, and often end-users disable it. It is hard to believe that the first completion suggestion for “how to disable” on Google Search is actually “selinux.” As companies are redesigning their software stacks around microservices, they’ll benefit from a Xen Project solution which is secure and doesn’t need additional settings to increase isolation.
Xen is most often associated with the largest public clouds in production, but the target of this project is not limited to servers. In fact, cloud-native apps are becoming the new way of packaging and distributing applications across all market segments. Stage1-xen will be of great help to developers in embedded environments, such as the automotive industry, where higher security standards are to be upheld. It will allow them to download and deploy new apps to vehicles, keeping them strongly isolated from the critical functions of the car.
Xen and Its Proclivity for Cloud Computing
There are many reasons why Xen is a great hypervisor for cloud-native applications; one of them is that Xen can run anywhere, from the latest and greatest physical servers to the smallest Amazon AWS instances. Let’s start by looking at virtualization technologies to understand how this is possible.
Xen offers two virtual machine types on Intel and AMD processors: PV and HVM guests. The Xen stage1 uses PV guests because they are lightweight and they don’t require any hardware emulation or additional processes on the host. Also, they have short boot times as they don’t run any guest firmware (i.e., there is no UEFI or Seabios to be run inside the virtual machine). They are a good match for cloud-native apps.
A fundamental characteristic of PV guests is that they don’t require hardware virtualization extensions. Intel calls them VT-x, while AMD named them AMD-V. They were introduced around 2006. All modern x86 machines support them, but cloud instances do not.
Although both Xen and KVM can create virtual machines with a virtual version of VT-x and AMD-V, cloud providers do not enable this feature. As a consequence, Amazon and Google Cloud instances look like pre-2006 hardware: they have neither VT-x nor AMD-V. Thus, it is not possible to create a nested KVM virtual machine on top of an Amazon AWS instance, but it is possible to start a nested Xen PV guest in the same environment because it doesn’t require virtualization extensions. With stage1-xen, Rkt users gain the ability to execute cloud-native apps as virtual machines on top of AWS and Google Cloud, the same way they do today with the default coreos stage1.
Beyond the Technicalities: The Security Process
Besides the technical features, Xen Project has a strong security track record and a fully transparent security policy that supports responsible disclosure.
Security fixes are easy to track, apply and deploy. Stable trees are maintained for two years. It is possible to patch productions systems before the public disclosure date when a fix doesn’t expose technicalities that could introduce the risk of re-discovery of the vulnerability. Security management is one of the top reasons for choosing the Xen Project hypervisor, which makes it a great fit for a security-focused project like rkt.
Stage1-xen is still in its infancy, and we need your help in making it fully supported and ready for primetime.
If you are interested in cloud-native apps and security, join the community and take the opportunity to shape its future. If you are located in or near Budapest, Hungary, I’ll be talking more about this topic during the Xen Project Developer and Design Summit happening July 11 – 13, 2017.