Jake Sanders | 02 October 2015
It's been a while since we wrote anything about Kubernetes, and seeing as the project gets upwards of 30 pull requests per day, it moves really fast! In this post we're going to explore a number of exciting upcoming features, which get added to the experimental (now called v1alpha1) section of the API. If you're unfamiliar with Kubernetes, You can read our introduction, or take a look at their official documentation.
Firstly, a couple of features that are mostly complete and are useful to anyone running a cluster in production:
In an idealised situation, every container deployed on Kubernetes would be stateless and ephemeral, and it would not matter where they were scheduled. However, in practice you often want to run certain cluster services on every node. Some examples include running GlusterFS or Ceph daemons to provide cluster-wide persistent storage, or fluentd to collect logs. With a DaemonSet, Kubernetes will run the provided Pod template on every Node that is added to the cluster, and clean up the pod when it is removed. If the Pod fails, Kubernetes will attempt to restart it. If, for whatever reason, a matching Pod already exists, Kubernetes will assume the Pod was created already and will not create another!
A common thought after reading about this feature would be "Why would I use this feature over the normal init system?" One of the advantages of using a DaemonSet is you can control which nodes the Pods are created on with labels. For example, you may only want to run your GlusterFS storage daemons on specific servers that have a large HDD. If you label these nodes while adding them to the cluster, Kubernetes will deal with this for you, meaning you don't have to customise your OS image per server type.
Another advantage of using a DaemonSet is you can still use Kubernetes' service discovery to communicate with the DaemonSet Pods, rather than having to keep a record of the IP addresses of your storage servers separately if you had used an init system.
Another common use case for a cluster scheduler, a Job creates a specified number of Pods and makes sure that a specified number of them terminate successfully. This can be anything from a single-Pod backup Job to a large MapReduce operation using 100s of Pods running in Parallel!
We've emulated this functionality before by deploying bare Pods through the Kubernetes API. However, using a Job object means you gain more control over Job failures. Rather than manually watching the API for Pod error conditions, why not have the Job Controller do it for you? Timed (cron) jobs are unfortunately not currently implemented, but are coming soon.
The other features below are still being discussed, but are interesting enough to have caught our eye:
Currently, the way to do a rolling update to your current ReplicationControllers is to interact with the master node via kubectl rolling-update, which is similar to manually deleting old pods and scheduling new ones. This works OK, but seeing as the update logic happens on your computer rather than inside kubernetes it is vulnerable to your internet going down or laptop crashing. A Deployment lets you run a declarative configuration update for running pods that is overseen by the DeploymentController on the cluster itself, with the aim of supporting many deployment strategies, and allowing rollback on failure.
The Horizontal autoscaler automatically schedules more Pods (up to a defined limit) when a certain Pod exceeds a defined amount of CPU or memory consumption. This would be extremely useful if your app gets posted to /r/InternetIsBeautiful and suddenly needs a tonne of resources it doesn't usually require. If you're running your cluster as a service or a "private cloud," being able to keep apps up under extreme load will ensure your customers don't leave!
If you're playing around with Kubernetes, the documentation for all of these features is constantly changing, so the best place to look would be the kubernetes master branch on Github. If you want to try any experimental features, make sure to start your apiserver with the --runtime-config=experimental/v1alpha1 command line flag, or you won't be able to create any of them.