What is Service Mesh #

Introduction #

Istio is a popular solution in service mesh for microservices, which naturally leads to the question: what is a service mesh? To answer this, let’s take a step back and examine the architectural challenges of operating either a microservice architecture or what we used to call a service-oriented architecture.

What we tend to find is that we’re really dealing with a collection of different services, such as a front-end login portal and the ability to check the balance of a bank account, wire transfers, and bill payments. All of these discrete services are part of the larger bank, for example. These different services communicate with each other over a network, and the challenge becomes determining how these services, which exist at different IP addresses on different machines within our infrastructure, can communicate, discover, and route to ensure that our login portal knows how to communicate with services(e.g. balance checking service).

The Challenges of Microservices Architectures #

Routing and Discovery #

The first challenge involves routing and discovery. Most organizations address this by placing a load balancer in front of every service. Each service then hardcoded the IP address of the load balancer, which directs traffic back to the back-end service. However, this approach results in a proliferation of load balancers, requiring many of them in an environment. Additionally, they are often manually managed, leading to a lengthy process of filing a ticket and waiting several weeks for someone to add a new instance to the load balancer before traffic can be directed to it. This introduces both a cost penalty and an agility penalty, affecting the speed at which organizations can deploy new services, scale up and down, and react to machine failures. As a result, it’s crucial to examine these challenges surrounding routing and discovery.

Security #

The second set of major challenges involves security. Historically, networks were flat and open, with security focused on the network perimeter, like the castle walls. Firewalls, intrusion detection systems, and other devices were used to filter all traffic entering the data center. However, once inside, large flat namespaces were common. While there might have been some firewalls in the east-west path, these firewalls often became a management burden due to their hundreds or thousands of rules, which were manually managed.

This created a mismatch in agility, as updating firewall rules could take days, while launching a new application only took seconds. This friction affected development teams as they tried to scale up, scale down, and deploy new versions but were constrained by the speed at which firewalls and load balancers could be updated. These issues are present in both microservice and service-oriented architectures.

Service mesh #

The goal of a service mesh is to address these problems/challenges above holistically.

Routing and Discovery #

When addressing this issue, a central registry is created to maintain a record of all running services. Whenever a new application comes online, it is added to the central registry, making information such as the service instance’s IP address readily available. This allows anyone seeking to discover a service to query the central registry for its location, IP address, and communication method. As a result, a more dynamic infrastructure can be built, where servers can be added or removed, and scaling can occur without having to file a ticket and wait days for load balancers and firewalls to be updated. Instead, the service is added to the registry and can be discovered immediately.

Security #

The other aspect of the challenges above is securing east-west traffic within the network “castle walls.” Relying solely on a secure network perimeter is no longer enough; we must be realistic and assume that attackers may gain access to the network. To mitigate this risk, every service should require explicit authorization, allowing communication with the database only when necessary.

Service meshes aim to address this challenge by allowing explicit rules to be defined for which services can communicate with one another. Rules are established at a logical service level, not at the IP level, making it scale-independent. Regardless of whether there are one, ten, or a thousand web servers, the rule for web server to database communication remains the same.

To implement identity-based security, service meshes distribute TLS certificates to the various applications, such as web servers, databases, and APIs. These certificates are used to identify services during communication. When a web server communicates with a database, it presents a certificate proving its identity, while the database does the same. This process establishes mutual TLS, allowing both sides to verify each other’s identities and create an encrypted communication channel.

This core concept behind zero trust ensures that the network is not trusted, relying on TLS to provide both identity verification and encryption for confidentiality.

The final aspect of this issue is ensuring that communications between services, such as a web server and a database, are compliant with centrally managed rules. Service meshes distribute certificates and rules to the edge, enabling services to communicate directly without a centralized bus. This decentralization avoids introducing single points of failure and potential bottlenecks.

Key Considerations for Evaluating Service Mesh Technologies #

When adopting a service mesh, organizations should consider several requirements.

Scalability: the first challenge is scalability, i.e., how decisions regarding authentication and authorization can be pushed to the edge without creating a centralized bottleneck. This is crucial when dealing with thousands or tens of thousands of nodes.
Compatibility: Another critical factor is compatibility. The network serves as a compatibility layer, enabling communication between various technologies, operating systems, and languages. A service mesh must also maintain this compatibility, allowing modern containerized applications to communicate with older systems like mainframes, without creating network silos.
Application Awareness: It’s also important to determine how much applications need to be retooled to make use of service mesh technology. With potentially thousands of applications using custom protocols or off-the-shelf solutions, it’s vital to ensure they can integrate into the environment. A service mesh should provide level three and level four compatibility, working across all types of protocols without requiring protocol awareness. Additionally, it should offer level seven capabilities, enabling traffic shaping and intelligent traffic management for protocols it is aware of, without sacrificing compatibility for those it is not.
Opeartions: The last aspect to consider when evaluating service mesh technologies is the operational aspect of the system. Since these systems are at the core of our networks, any outage could impact all the systems running on top of it, potentially affecting the entire data center. Therefore, it’s essential to ensure that these systems are:
- Easy to scale: The service mesh should have the ability to grow and adapt to the needs of the organization without significant challenges or reconfigurations.
- Highly reliable: The service mesh must be designed with redundancy and fault tolerance in mind to minimize the risk of downtime and service disruptions.
- Available 24/7: The service mesh should be built for constant operation, ensuring that services can communicate and function without interruption.

In summary, when evaluating different service mesh technologies, organizations should focus on scalability, compatibility, adaptability to existing applications and protocols, and operational aspects like reliability and availability. These factors will help ensure that the chosen service mesh solution can effectively support and enhance the organization’s microservices architecture and overall network infrastructure.