How to detect and prevent network outages–and stay compliant too

With Kubeflow 1.0, run ML workflows on Anthos across environments
March 2, 2020
Mind Games | The Evolving Psychology of Ransom Notes
March 2, 2020
With Kubeflow 1.0, run ML workflows on Anthos across environments
March 2, 2020
Mind Games | The Evolving Psychology of Ransom Notes
March 2, 2020

By some estimates, 75% of network outages and performance issues are the result of a misconfiguration, and more often than not, these misconfigurations aren’t discovered until they’re in production. That’s stressful for network administrators and architects–not knowing the impact of a configuration change in firewall rules or routing rules makes network monitoring reactive rather than proactive, introduces risk and leads to long troubleshooting times.

We recently introduced Network Intelligence Center, Google Cloud’s comprehensive network monitoring, verification and optimization platform that works across the cloud and on-premises data centers, including an initial set of modules that can predict and heal network failures. In this post, we’ll take a deep dive into the Connectivity Test module, which helps diagnose connectivity issues and predicts the impact of configuration changes, so you can better prevent outages.

Connectivity Test enables you to self-diagnose connectivity issues within Google Cloud, or Google Cloud to an external end-point that is on-prem or even in another cloud. You can also create, save and run tests. With these capabilities, Connectivity Test can help you perform a variety of important network administration tasks such as:

  1. Understand and verify network design and architecture

  2. Troubleshoot and fix connectivity issues

  3. Verify the impact of configuration changes

  4. Ensure network security

  5. Make your security and compliance audits easier and more manageable

We’ll discuss each of these use cases in greater depth below, but first, let’s look at the Connectivity Test architecture.

Connectivity Test technical overview

The Connectivity Test module is powered by a network reachability analysis platform, which determines whether there’s connectivity between source and destination. If there’s no connectivity, Connectivity Test pin-points where it’s broken and identifies the root-cause, for example, a firewall rule blocking the connectivity. Rather than the traditional approach of looking at live traffic flows or sending traffic through the data plane, this reachability analysis platform uses a network verification approach based on formal verification techniques. It creates an accurate and comprehensive model of the network based on the current network design, configurations and network state. The model can reason about all possible behaviors and help troubleshoot configuration issues or prove compliance with an intended policy. Thus, network verification can exhaustively prove or disprove reachability in ways that traditional approaches cannot.

Connectivity Test uses two key components in particular to perform this analysis.

Data plane model

To perform static reachability analysis, Connectivity Test relies on an idealized data plane model. In other words, Connectivity Test derives instances, networks, firewall rules, routes, VPN tunnels, etc. from GCP project configurations, which it then analyzes to verify whether two points can be reached. The most important configurations that it uses are VPC network properties, network services (load balancers), hybrid cloud configurations (VPN, Interconnect, Cloud Routers), and VM and Google Kubernetes Engine endpoint configurations.

Network Abstract State Machine

Connectivity Test also relies on a Network Abstract State Machine, an idealized model of how a Google Cloud VPC network processes packets. Specifically, Google Cloud processes a packet in several logical steps that are modeled as a finite state machine, which takes a bounded number of steps between discrete states until the packet has been delivered or dropped.

The diagram below shows a model for how Connectivity Test simulates trace traffic between two VMs. Depending on your GCP network and resource configurations, this traffic could go through, for example, a Cloud VPN tunnel, a GCP load balancer, or a peered VPC network before reaching the destination VM.

Leave a Reply

Your email address will not be published. Required fields are marked *