LiveWyer | 11 May 2018
This week the LiveWyer team took some time out from discussing “This Is America” to watch some similarly engaging, provocative and artful videos from this year’s KubeCon + CloudNativeCon Europe 2018.
We’ve picked out a few of our highlights so far, and if you want to check out the full playlist you can do so here.
Laura Frank provides an introduction to Raft, the algorithm which powers etcd, Kubernetes’ source of truth. Having x number of machines be able to agree upon a single state is the fundamental question of distributed cloud computing, so for cloud engineers, an understanding how distributed nodes are able to decide what data should be added to the source of truth through leader election and quorum is quite useful. In addition to visual demonstrations of how the Raft algorithm allows consensus to take place between nodes in the event of leader failure, Laura also provides us with an introduction to how Raft can negotiate inconsistencies between nodes in the event of an etcd or Docker Swarm failure recovery.
Pod Anomaly Detection and Eviction using Prometheus Metrics - David Benque & Cedric Lamoriniere, Amadeus
David Benque and Cedric Lamoriniere show us how, despite having load balancers, node failures and dependencies on nodes which fail can result in a domino knock on effect that impacts nodes right across your distributed system. They provide us with several angles from which this can be prevented from happening by increasing reliability and ensuring dependencies’ requests are able to be serviced (proximity-based load balancing, assigning pod limits, intelligent liveness and readiness probe definitions, and using service meshes). This provides a segway into kubervisor, a Kubernetes operator which uses prometheus metrics for services to determine whether pods in a service’s replica set are healthy and behaving as expected, and removes pods and terminated pods from an endpoint which fail specified healthchecks.
Good Enough for the Finance Industry: Achieving High Security at Scale with Microservices - Zachary Arnold & Austin Adams, Ygrene
Pitched a talk for running microservices on Kubernetes platforms securely, Zachary Arnold and Austin Adams run us through a high-level description of how their company based in the finance industry has implemented security into their build pipeline and their Kubernetes cluster. They demonstrate how they are able to enforce policies on code and containers in their build pipeline by introducing an additional security pipeline, with tests in place to mitigate security holes that are introduced when adopting containers. They also provide us with an overview of security steps and native features in Kubernetes that can ensure cluster access is compliant with policy, as well as how additional tooling that can be introduced on the cluster level to make security compliance easier.
Improving your Kubernetes Workload Security with Hardware Virtualisation - Fabian Deutsch & Samuel Ortiz, Intel
Demonstrating two complementary approaches of how to integrate hardware virtualisation into a Kubernetes cluster, Fabian Deutsch and Samuel Ortiz provide an introduction to two projects which seek to implement virtual machines to support different use-cases. Samuel shows us how Kata containers can be used to deploy containers using hardware virtualised pods with their own kernel (rather than kernel namespaced pods that interact with a single kernel), while Fabian demonstrates how legacy applications in virtual machines can be run on a cluster using KubeVirt.
Most people using Kubernetes have grasped the concept of deploying to the cluster, so I’ve chosen four talks about understanding and reacting to unexpected events once you hit real world workloads.
An all too common response to the questions, “What’s your expected load? How much resource allocation do these services need?” is a blank stare. Without metrics your operations team is effectively blind. Matt Layher from DigitalOcean gives an introduction to prometheus and explains how to write your own prometheus metrics exporter in Go by parsing the /proc/stat (text file) metrics. He then continues with a real world example - exporting and parsing metrics from his networked cable TV tuner. Finally, he shows how to collect metrics from system calls. Armed with the knowledge from this talk, there’s no excuse for not exporting your application metrics!
Horizontal Pod Autoscaler Reloaded - Scale on Custom Metrics - Maciej Pytel, Google & Solly Ross, Red Hat
One of the most touted benefits of Kubernetes is the ability to scale on demand with no sysadmin intervention. The dynamic duo go through the old and new manifest format for describing to Kubernetes how it should autoscale applications. Previously, the autoscaler functionality was limited to CPU usage as a percentage of the Pods’ CPU requests, but this talk shows how to use a custom metrics adapter to expose any metric you like, allowing you to scale on arbitrary conditions, perhaps a metric you exported after watching the previous video!
Allison talks about the importance of structured logging - adding a grammar, verbs and quantities to your logging structures enables you to gain important insights into the internal behaviour of your microservices. She then dives in to OpenTracing and Jaeger, 2 components of the CNCF landscape that let you see the entire lifecycle of an API call to your service. Combining the structured logging with opentracing tags gives you better analytics, and together these can be used to diagnose problems and improve performance - “Why does this query take more than 30 seconds under load?”
Who Shot the Cluster? - Audit Logging in Kubernetes - Marian Lobur & Mik Vyatskov, Google
Recently in Kubernetes news, a popular electric car manufacturer made the headlines when one of their clusters was compromised and used to mine cryptocurrency. Luckily, the Kubernetes API server can log every event that passes through it, along with the identity of the user who issued the commands - so you can find compromised credentials or malicious users. While the audit log can be sent anywhere, Marian shows how to search for events in Google Cloud’s Stackdriver UI. We’re then shown how to control the granularity of Audit logs with a policy.