KumoMind
KumoMind's Blog

KumoMind's Blog

DevOps Security Checklist For Kubernetes

Photo by Cristian Castillo on Unsplash

DevOps Security Checklist For Kubernetes

KumoMind's photo
KumoMind
·Jul 24, 2022·

8 min read

Kubernetes is a container orchestration platform today adopted by many companies. Its implementation requires a certain understanding of its ecosystem in order to deploy a cluster ready for production. Implementing working principles or tools is therefore essential and requires the work of all teams (operations, development, security, etc.) to promote the detection of anomalies as soon as possible and thus raise the level of security of the orchestrator and its resources.

Pre-commit Hooks

The primary objective being to minimize the impacts in production by adding automated processes as early as possible in a continuous integration pipeline is today a recognized principle of the DevSecOps world.

This practice called “shift security left” was introduced to facilitate the collaboration between development, security and operations teams. The idea is to ensure application security early in the development cycle by moving the security and test processes to the left on the traditional linear SDLC representation, starting with the addition of pre-commit.

Several tools have emerged in recent years to facilitate this integration in order to:

  • Format YAML file code
  • Detect anomalies in the Kubernetes resource configuration
  • Force the application of configuration and security policy to respect good development practice
  • Detect sensitive data before committing any source code

    Here are three good examples of tools to control YAML definition files:

  • YAML Lint

  • Checkov
  • K8svalidate

Continuous Integration Checks

Pre-commit tests can rarely be forced by the DevSecOps team, they are usually recommended to facilitate teamwork. In some cases, the implementation of pre-commit tasks can be cumbersome and burdensome, especially for large teams. These tests are still necessary and must therefore be moved further in the continuous integration process.

The pre-commit tasks can be divided to, first, format the code, and then, scan it to validate the conformance of the configuration files.

The tools mentioned in the previous section are still good candidates. To this list can be added Datree, an easy-to-use web platform that allows DevSecOps teams to easily develop, versioning, viewing and enforcing security rules to ensure compliance of YAML files before deployment.

Image Scanning

Scanning images even before their deployment is an important and sometimes overlooked step as many think official images are secured. It is nevertheless important to scan it because new vulnerabilities are discovered every day, and it is important to update any system with a security vulnerability to limit the domain of malicious person attack.

These scans must be performed at different stages of a container’s life cycle Before publishing an image to a remote registry to ensure that the image in question complies with the security rule even before it is deployed During the runtime of a container, to identify as soon as possible the images that need to be rebuilt in order to correct a newly identified vulnerability

Several open source tools can be used to collect this data and notify teams of problems identified, like:

Cluster Scanning

Securing a Kubernetes cluster depends on the company’s security governance. The policies applied must take into account accessibility, maintenance, data management, etc.

Nevertheless, it is important to respect a number of rules identified by the community to ensure a good basic level of security as soon as a cluster is installed. It is also recommended to regularly scan a Kubernetes cluster in order to identify during its runtime any known anomalies, mainly related to its configuration.

There are several tools to automate and generate anomaly detection reports, such as:

Security Context

Kubernetes Security Context adheres to the principle of the least privileges: a subject should only receive the privileges he needs to perform his task.

The security context is a tool that allows administrators to define security-related parameters based on each resource. This allows each resource to be given the specific permissions it needs to access resources on the host server, while denying access to those it does not specifically require. In a Kubernetes context, the security context defines privileges of individual containers in a pod.

Managing security contexts requires advanced management and understanding of the cluster, its installation and the ecosystem. Despite the complexity of their implementation, these measures remain a very effective and native way to limit the operations of any pod or container.

Role Based Access Control

Taking advantage of Kubernetes Role Based Access Management (RBAC) is a basic first step toward securing clusters and applications running on the platform. The RBAC principle is really simple: define who can access what based on user identities.

Kubernetes has an efficient granularity for managing access to different resources:

  • User accesses that result in a User Account
  • Application accesses that result in a Service Account
  • Roles to define permissions restricted to the resources of a single namespace
  • Cluster Roles to apply cluster-level restrictions

This granularity coupled with an external identity provider (such as Okta, Gmail, LDAP, etc.) allows a very fine management of access and thus ensures the control of resources and often its auditability.

Network Policy

The management of network security rules can be part of the shift security left concept. Kubernetes network policy lets administrators and developers enforce which network traffic is allowed using rules. The shift left principle allows developers to secure access to and from their applications without understanding low-level network concepts.

Default policies can be enforced by the DevOps team and specific accesses can be managed by developers, giving them some autonomy in managing their applications.

Network policies are controlled by the Container Network Interface (CNI) deployed on the cluster. Several CNI provide this feature, but here are two of them that deserve specific attention:

  • Calico, probably the best known and used CNI in the Kubernetes ecosystem. In this free version, Calico can manage these network rules up to level 3 of the OSI model. To manage higher levels, it is necessary to use the paid version.
  • Cilium, a good alternative or add-on, offers many free features on all 7 layers of the OSI model.

Policy Enforcement

Kubernetes is an application mainly based on APIs. This approach has made it possible to develop access control tools to these various resources in order to audit and secure them.

The Admission Controller is the entry point for all requests made by a Kubernetes client, such as Kubectl. Adding checkpoints at this stage allows you to validate all queries before they are even executed, and thus prevent any behaviour that deviates from the company’s security governance.

These control points may be used to:

  • Checking if CPU and memory limits are set
  • Ensuring that users don’t change default network policies
  • Ensuring that specific resources always contain a specific label
  • Denying permissions on particular resources
  • Preventing the use of the latest tags
  • Generate a default network policy for every new namespace

Several tools can be used today to manage these security rules:

  • Open Policy Agent (OPA) is probably the most famous application to enforce policies on Kubernetes. OPA policies are expressed in a high-level declarative language called Rego.
  • Kyverno, it can validate, mutate, and generate configurations using admission controls and background scans. Kyverno policies are expressed in YAML files like any Kubernetes resources and do not require learning a new language.

It is strongly recommended to include one of these tools in the basic profile of any Kubernetes cluster.

Runtime Threat Detection

Kubernetes is not a security platform. It lacks native tooling to handle most security-related tasks, such as detecting vulnerabilities within applications and monitoring for breaches.

Real-time detection of anomaly or threat is an important point of any security governance. The Kubernetes platform is not exempt. On the contrary, its widespread use in different domains such as data management makes its security a key point for its business operations.

Several open source tools have emerged on the market to identify any deviating behavior of containers and applications, such as:

Both allow for threat detection during the runtime and the sending of alerts supplemented by incident reports to take the necessary measures.

Outdated Resources

Kubernetes is a major player today in the DevOps world for its flexibility and impact on the delivery of new functionality in production. This tool allows development teams to increase their deployment speed and therefore requires special attention to the resources used to not compromise the security of the cluster with outdated resources.

There are different types of outdated resources:

  • Deprecated Kubernetes APIs
  • Old / Deprecated Kubernetes resources (like Helm releases)

Deprecated Kubernetes APIs are not necessarily security vulnerabilities, but they can impact the life cycle of a cluster and therefore compromise its maintenance. Pluto is a utility developed by Fairwinds to help users find deprecated Kubernetes apiVersions in their code repositories and their helm releases. This type of analysis must be performed before updating a cluster to ensure its success and prevent a potential security breach.

Ensuring the protection of a Kubernetes cluster also involves scanning the active resources. As a package manager, Helm requires special attention to check the life cycle of the Charts and follow a weekly or monthly update plan of the managed resources. Nova, another tool developed by Fairwinds, makes it possible to automate this type of scan in order to quickly detect charts needing to be updated or even removed from the cluster if they have been deprecated.

Next?

For more information, please refer to this documentation:

About the author

Nicolas Giron — Site Reliability Engineer (SRE) — DevOps

 
Share this