The ultimate goal of the Network Design Arena is to help network designers and architects on their journey to become better network designers by focusing on not only the required technical skills, but the mindset required to obtain the most recognized expert level network design certification in the industry (CCDE).
When you provision workload in the cloud to serve an application, having a load balancer at the front end of the applications’ tier is almost always a must, to ensure that users’ requests are redirected to the workload instances that have the capacity to serve the request with better performance.
Load balancing in Google Cloud Platform GCP is a fully scalable and redundant managed service offered in different flavors (global external, regional external, regional internal) This article focuses on the global external load balancer. Figure 1 below illustrates the high level architecture of Google global load balancer that will be discussed in more details in this article.
Generally, Load balancing in the cloud offers two main functions:
Traffic load balancing or distribution with optional auto scaling capability
Auto Healing in case a failed or an unhealthy instance presented in the backend, it can be replaced with a functional one.
Its, obvious from the name this type of LB, it is external (client/user facing) and it’s a global, also, it’s an application layer (HTTP(S)) type of LB. As highlighted in my previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations” with Google external global load balancer all what you need is a single IP to front end your application stack that cloud be distributed globally without the need to deploy a load balancer per region!
Since the Google global LB is not a single box, it’s not a VM instance and it’s not a cluster, so how does it work and where does it reside?
GCP global LB is constructed of the following components
Google global network: Google has over 100 points of presence, and a well-provisioned global network with 100,000s of miles of fiber optic cable, and the global LB is located in many of these POPs, which offer the ability to ingest user traffic into Google’s backbone as close as possible to the source of the traffic request, which offer an optimized experience, as illustrated below:
GCP LB is capable to scale quickly and effortlessly, according to GCP “Cloud Load Balancing is built on the same front-end serving infrastructure that powers Google. It supports 1 Million+ queries per second with consistent high performance and low latency. Traffic enters Cloud Load Balancing through 80+ distinct global load balancing locations, maximizing the distance traveled on Google’s fast private network backbone.”
Software defined load balancing: it is a SDN based LB that can be used by all projects on GCP, its also used by other services such as Google Search and Google Mail. The SDN construct of it, includes the global forwarding rules at the Google global front end to the targeted proxy service. “The global forwarding rule provides a single global IPv4 or IPv6 address (Anycast IP) that you can use in DNS records for your site, and it can only be used with HTTP(S), SSL Proxy, and TCP Proxy load balancing. You can use more than one forwarding rule with a given proxy, so you can have one for IPv4 traffic and another for IPv6 traffic”
If LB for encrypted traffic (HTTPS), the target proxy requires signed certificate in order to terminate the SSL/TLS session and the proxy will re initiate a new session with the back for the session/request. If the new connection to the back is also encrypted, then the target VM instance need to have a certificate installed as well.
Once, the requested URL is received by the proxy, the URL policy map will be evaluated. Unless its configured for SSL or TCP proxy, it will be sent directly to the backend without URL map check.
URL Policy Map: traffic distribution with Google Cloud global external load balancer can be deployed in two main scenarios (or it can be combination of both):
Cross-region load balancer: also known as Proximity based routing, according to GCP: “You can use a global IP address that can intelligently route users based on proximity. For example, if you set up instances in North America, Europe, and Asia, users around the world will be automatically sent to the backends closest to them, assuming those instances have enough capacity. If the closest instances do not have enough capacity, cross-region load balancing automatically forwards users to the next closest region” . With this deployment option, the URL map will only contain the default mapping.
Content-based load balancer, with the visibility of the application layer using HTTP/HTTPs LB functions GCP global LB can provide traffic routing based on URL content, for instance request for certain part of the application like multimedia can be redirect to instances group with higher capacity while traffic distended to static content can be sent to different set of instances.
The URL map illustrated in the figure below, shows an example of a website or service that offer uploading photos (static content), in which traffic with the /photo/* is redirected to a multi-regional cloud storage bucket. While traffic destined to multimedia video content, is distributed across different instances’ groups if its HD or non-HD video content.
Targeted backend service: it’s a local grouping of the actual application instances, the backend along with health check can determine which instance(s) is healthy, or over utilized (CPU utilization, request per second per instance), and when to trigger auto scaling. You configure your load balancing service to route requests to your backend service.
Instance Group: used to add and manage virtual machines. GCP Compute Engine offers two flavors of instance groups: managed and unmanaged instance groups. The managed instance group is always the preferred type in which you create instances based on a template and then you will be able to enable auto-scaling to scale out and down the instances based on pre-defined metrics. On the other hand, the unmanaged instance group does not support this capability because simplify it is based on different VM instances, the common use scenario to such method is when you are migrating your workloads with different types of machines or trying to use existing configuration setup to load balance the traffic.
Although, the processing of SSL can be CPU intensive, especially when the used ciphers are not CPU efficient, it is always recommended to use secure sessions from the proxy to the backend instances and avoid sending the traffic over unencrypted TCP as it it typically will reduce the level of security between the GCP global load balancer and the backend instances. In addition, SSL proxy may handle HTTPS but it is recommended to create HTTPS target proxy with at least one signed SSL certificate installed on the target HTTPS proxy.
Moreover, GCP Global HTTP/HTTPS LB allows you to create custom request headers, in case the default ones are not sufficient or do not meet your requirements.
“User-defined request headers allow you to specify additional headers that the load balancer adds to requests. These headers can include information that the load balancer knows about the client connection, including the latency to the client, the geographic location of the client’s IP address, and parameters of the TLS connection.User-defined request headers are supported for backend services associated with HTTP(S) Load Balancers”
At the time of this blog writing, this capability is in Beta release, which means it is not covered by any SLA or deprecation policy and might be subject to backward-incompatible changes.
When it comes to instances health check, typically the GCP LB health checks are used to decide if an instance(s) is “healthy” and functioning. Functioning here might be checked using an application layer probe such as HTTP(s) probe. With GCP LB, If you check the logs on the instance, you may notice that the health check polling is happening more frequently than what you may have configured. This is because GCP LB offer the ability to create redundant copies of each health checker, which are used to probe your instances. If any health checker fails, a redundant one can take over without delay.
As discussed in the previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations”, each VPC in GCP has a single virtual software distributed FW, in order to make sure the LB and health check can communicate with the intended VMs in the respective instance group, traffic needs to be explicitly allowed by the FW rules. “You must create a firewall rule that allows traffic from 22.214.171.124/22 and 126.96.36.199/16 to reach your instances. These are IP address ranges that the load balancer uses to connect to backend instances. This rule allows traffic from both the load balancer and the health checker, also, keep in mind that GCP firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself”
One of the key aspects of connectivity design with GCP LB is, that by default, HTTP(S) load balancing distributes requests evenly among available instances. However, some applications behind a NAT device they will appear as sourced from the same IP, also stateful servers used by ads, gaming applications etc. may go through multiple applications’ tiers requests before the user end up on the targeted instance. When the session is disconnected due to poor quality or a moving mobile user, it can lead to bad user experience. That’s why considering “session affinity” to identify requests from a user by the client IP or preferably by the value of a cookie to re-direct client request to the same instance in a consistent manner, assuming the intended instance is healthy and has capacity to handle the request.
Beware that, when auto scaling functionality adds or removes instances within an instance group, technically the backend service may reallocate load, and the target instance may move, therefore to reduce the impact of such situation, you need to ensure that the initial minimum number of instances provisioned by the auto scaling is sufficient to handle the anticipated load, and auto scaling is kicked-off only when there is an unexpected load increase. However, this may not always be the case, because it requires good understanding of the expect load and required workload to handle it, also, the load may not be consistent during the day or day of the week in which the minimum number of instances can not be pre-provisioned to cover the expected load for the entire week (this is where auto-scaling is required).
Furthermore, Kubernetes Engine offers integrated support of Google HTTP LB, where an ingress controller can be created in a cluster, and then Kubernetes Engine creates an HTTP(S) load balancer and configures it to route traffic to application, also, path matcher can be leveraged for more specific requests routing into multiple containers with different images or functions. According to GCP “If you are exposing an HTTP(S) service hosted on Kubernetes Engine, HTTP(S) load balancing is the recommended method for load balancing.”