For GKE, at a high level, we are responsible for protecting:
Conversely, you are responsible for protecting:
Google is responsible for making the control plane more secure – which is the component of Kubernetes that manages how Kubernetes communicates with the cluster, and applies the user’s desired state. The control plane includes the master VM, API server, scheduler, controller manager, cluster CA, root-of-trust key material, IAM authenticator and authorizer, audit logging configuration, etcd, and various other controllers. All of your control plane components run on Compute Engine instances that we own and operate. These instances are single tenant, meaning each instance runs the control plane and its components for only one customer. (You can learn more about GKE control plane security here.)
We make changes to the control plane to further harden these components on an ongoing basis–as attacks occur in the wild, when vulnerabilities are announced, or when new patches are available. For example, we updated clusters to use RBAC rather than ABAC by default, and locked down and eventually disabled the Kubernetes dashboard.
How we respond to vulnerabilities depends on which component the vulnerability is found in:
In all of these cases, we make these patches available as part of general GKE releases (patch releases and bug fixes) as soon as possible given the level of risk, embargo time, and any other contextual factors.
Your worker nodes in Kubernetes Engine consist of a few different surfaces that need to be protected, including the node OS, the container runtime, Kubernetes components like the kubelet and kube-proxy, and Google system containers for monitoring and logging. We’re responsible for developing and releasing patches for these components, but you are responsible for upgrading your system to apply these patches.
Kubernetes components like kube-proxy and kube-dns, and Google-specific add-ons to provide logging, monitoring, and other services run in separate containers. We’re responsible for these containers’ control plane compatibility, scalability, upgrade testing, as well as security configurations. If these need to be patched, it’s your responsibility to upgrade to apply these patches.
To ease patch deployment, you can use node auto-upgrade. Node auto-upgrade applies updates to nodes on a regular basis, including updates to the operating system and Kubernetes components from the latest stable version. This includes security patches. Notably, if a patch contains a critical fix and can be rolled out before the public vulnerability announcement without breaking embargo, your GKE environment will be upgraded before the vulnerability is even announced.
What we’ve been talking about so far is the underlying infrastructure that runs your workload–but you of course still have the workload itself. Application security and other protections to your workload are your responsibility.
You’re also responsible for the Kubernetes configurations that pertain to your workloads. This includes setting up a NetworkPolicy to restrict pod to pod traffic and using a PodSecurityPolicy to restrict pod capabilities. For an up-to-date list of the best practices we recommend to protect your clusters, including node configurations, see Hardening your cluster’s security.
If there is a vulnerability in your container image, or application, however, it is also fully your responsibility to patch it. However, there are tools you can use to help:
So what if you’ve done your part, we’ve done ours, and your cluster is still attacked? Damn! Don’t panic.
Google Cloud takes the security of our infrastructure–including where user workloads run–very seriously, and we have documented processes for incident response. Our security team’s job is to protect Google Cloud from potential attacks and protect the components outlined above. For the pieces you’re responsible for, if you’re looking to further protect yourself from potential container-specific attacks, Google Cloud already has a range of container security partners integrated with the Cloud Security Command Center.
If you are responding to an incident, you can leverage Stackdriver Incident Response & Management (alpha) to help you reduce your time to incident mitigation, refer to sample queries for Kubernetes audit logs, and check out the Cloud Forensics 101 talk from Next ’18 to learn more about conducting forensics.
What’s the tl;dr of GKE security? For GKE, we’re responsible for protecting the control plane, which includes your master VM, etcd, and controllers; and you’re responsible for protecting your worker nodes, including deploying patches to the OS, runtime and Kubernetes components, and of course securing your own workload. An easy way to do your part is to
If you follow those three steps, together we can build GKE environments that are resilient to attacks and vulnerabilities, to deliver great uptime and performance.