Cost optimisation
Forecast saving of $2.4 million annually.
Reduced operational effort
Karpenter implementation will save 600+ engineering hours over the next year.
Zero downtime
Seamless integration with GitOps workflows and zero customer impact.
The client struggled with high costs and operational inefficiencies due to reliance on AWS EKS Node Groups. These rigid configurations prevented the use of cost-saving Spot Instances, requiring predefined instance types and frequent manual adjustments for changing workload needs. Scaling was slow, as EKS Node Groups depended on EC2 Auto Scaling Groups, and managing AMI updates across clusters became a burden. Engineers faced increased complexity, navigating both AWS and Kubernetes contexts, leading to higher costs, delays, and strained resources.
The client replaced AWS EKS Node Groups with Karpenter, a dynamic Kubernetes autoscaler, to optimise resource usage and reduce costs. Karpenter prioritises Spot Instances for cost efficiency, dynamically provisions instances to match workloads, and automates instance lifecycle management, significantly simplifying server patching across over 100 clusters. The solution, deployed via the client’s GitOps processes using Rancher and Fleet, eliminated manual intervention, enhanced agility, and reduced operational complexity, freeing up engineering resources for higher-value tasks.
The implementation of Karpenter brought significant benefits to the client and their Platform Team. Engineers now have more time to focus on delivering new features, supporting a “Platform as a Product” approach to enhance value for software engineers. The effort required for AMI upgrades forecast to reduce by 6000 engineering hours per year, while infrastructure cost savings amounted to approximately $2.4 million annually across 130 clusters. Platform Team morale improved with reduced operational overhead, and the solution was implemented with no downtime, ensuring uninterrupted service.
2
Month FinOps engagement
$2.4M
Infrastructure cost reduction
6000
Engineer hours saved
LiveWyer has partnered with the client for many years, architecting and building their Kubernetes Platform. Faced with rising infrastructure costs, LiveWyer were tasked with finding an elegant solution to reduce cloud infrastructure costs while optimising AWS Spot Instance usage. LiveWyer delivered through its technical FinOps services, designing and implementing an effective, cost-saving technical solution and ongoing strategy.
The client encountered several challenges related to both costs and engineering effort, primarily stemming from their use of AWS EKS Node Groups and their inability to effectively leverage Spot Instances:
These issues not only limited the client’s ability to effectively reduce cloud costs but also strained their engineering resources, diminishing agility and delaying critical updates and workload optimisations.
The solution replaced AWS EKS Node Groups with Karpenter, a flexible and intelligent Kubernetes cluster autoscaler. This significantly improved resource efficiency and reduced operational overhead.
The implementation was seamlessly rolled out using the client’s GitOps processes, managed via Rancher and Fleet. This ensured a smooth, consistent deployment across clusters, aligning with existing workflows.
By adopting Karpenter, the client achieved dynamic scaling, reduced cloud costs through efficient resource utilisation, and alleviated the operational complexity of managing and upgrading nodes across multiple clusters. This enhanced agility and freed up engineering resources for higher-value tasks.
The implementation of Karpenter significantly benefited both the client and their Platform Team. Engineers now have more time to focus on developing new features, aligning with a “Platform as a Product” mindset that adds greater value for software engineers.
Operationally, AMI upgrade efforts were reduced by an estimated 6000 hours per year, and cloud infrastructure cost savings reached approximately $2.4 million annually across 130 clusters. Team morale improved due to reduced operational overhead, and the platform maintained zero downtime, ensuring uninterrupted service for its users.