A service mesh handles the communication between microservices, providing traffic management, security, and observability without changing application code.
What is a Service Mesh?#
Without service mesh:
┌─────────────┐ ┌─────────────┐
│ Service A │─────────│ Service B │
│ │ │ │
│ (handles │ │ (handles │
│ retries, │ │ auth, │
│ timeouts, │ │ logging, │
│ tracing) │ │ etc.) │
└─────────────┘ └─────────────┘
Every service implements networking concerns.
With service mesh:
┌─────────────┐ ┌─────────────┐
│ Service A │ │ Service B │
├─────────────┤ ├─────────────┤
│ Sidecar │────────│ Sidecar │
│ Proxy │ │ Proxy │
└─────────────┘ └─────────────┘
Networking handled by sidecar proxies.
Application code stays simple.
Core Components#
Data Plane (Sidecar Proxies)#
1# Envoy proxy injected alongside each service
2# Handles all network traffic
3
4# In Kubernetes, this is automatic:
5apiVersion: v1
6kind: Pod
7metadata:
8 annotations:
9 sidecar.istio.io/inject: "true"
10spec:
11 containers:
12 - name: my-app
13 image: my-app:latest
14 # Sidecar automatically injected hereControl Plane#
Control plane components:
- Pilot: Traffic management
- Citadel: Security (mTLS, certificates)
- Galley: Configuration validation
- Mixer: Policy and telemetry (deprecated in Istio 1.5+)
┌─────────────────────┐
│ Control Plane │
│ (Istiod in Istio) │
└──────────┬──────────┘
│ Configuration
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Sidecar │ │ Sidecar │ │ Sidecar │
└─────────┘ └─────────┘ └─────────┘
Traffic Management#
Request Routing#
1# Route based on headers
2apiVersion: networking.istio.io/v1alpha3
3kind: VirtualService
4metadata:
5 name: reviews
6spec:
7 hosts:
8 - reviews
9 http:
10 # Route beta users to v2
11 - match:
12 - headers:
13 x-user-type:
14 exact: beta
15 route:
16 - destination:
17 host: reviews
18 subset: v2
19
20 # Everyone else to v1
21 - route:
22 - destination:
23 host: reviews
24 subset: v1
25
26---
27apiVersion: networking.istio.io/v1alpha3
28kind: DestinationRule
29metadata:
30 name: reviews
31spec:
32 host: reviews
33 subsets:
34 - name: v1
35 labels:
36 version: v1
37 - name: v2
38 labels:
39 version: v2Traffic Splitting#
1# Canary deployment: 90% v1, 10% v2
2apiVersion: networking.istio.io/v1alpha3
3kind: VirtualService
4metadata:
5 name: api
6spec:
7 hosts:
8 - api
9 http:
10 - route:
11 - destination:
12 host: api
13 subset: v1
14 weight: 90
15 - destination:
16 host: api
17 subset: v2
18 weight: 10Retries and Timeouts#
1apiVersion: networking.istio.io/v1alpha3
2kind: VirtualService
3metadata:
4 name: ratings
5spec:
6 hosts:
7 - ratings
8 http:
9 - route:
10 - destination:
11 host: ratings
12 timeout: 10s
13 retries:
14 attempts: 3
15 perTryTimeout: 3s
16 retryOn: connect-failure,refused-stream,503Circuit Breaking#
1apiVersion: networking.istio.io/v1alpha3
2kind: DestinationRule
3metadata:
4 name: reviews
5spec:
6 host: reviews
7 trafficPolicy:
8 connectionPool:
9 tcp:
10 maxConnections: 100
11 http:
12 h2UpgradePolicy: UPGRADE
13 http1MaxPendingRequests: 100
14 http2MaxRequests: 1000
15 outlierDetection:
16 consecutive5xxErrors: 5
17 interval: 30s
18 baseEjectionTime: 30s
19 maxEjectionPercent: 100Security#
Mutual TLS (mTLS)#
1# Strict mTLS for entire mesh
2apiVersion: security.istio.io/v1beta1
3kind: PeerAuthentication
4metadata:
5 name: default
6 namespace: istio-system
7spec:
8 mtls:
9 mode: STRICT
10
11# All traffic between services is encrypted
12# Certificates managed automaticallyAuthorization Policies#
1# Only allow specific services to call reviews
2apiVersion: security.istio.io/v1beta1
3kind: AuthorizationPolicy
4metadata:
5 name: reviews-policy
6 namespace: default
7spec:
8 selector:
9 matchLabels:
10 app: reviews
11 rules:
12 - from:
13 - source:
14 principals: ["cluster.local/ns/default/sa/productpage"]
15 to:
16 - operation:
17 methods: ["GET"]
18
19# Deny all other traffic
20---
21apiVersion: security.istio.io/v1beta1
22kind: AuthorizationPolicy
23metadata:
24 name: deny-all
25 namespace: default
26spec:
27 {} # Empty spec denies allJWT Authentication#
1apiVersion: security.istio.io/v1beta1
2kind: RequestAuthentication
3metadata:
4 name: jwt-auth
5 namespace: default
6spec:
7 selector:
8 matchLabels:
9 app: api
10 jwtRules:
11 - issuer: "https://auth.example.com"
12 jwksUri: "https://auth.example.com/.well-known/jwks.json"
13
14---
15apiVersion: security.istio.io/v1beta1
16kind: AuthorizationPolicy
17metadata:
18 name: require-jwt
19 namespace: default
20spec:
21 selector:
22 matchLabels:
23 app: api
24 rules:
25 - from:
26 - source:
27 requestPrincipals: ["*"]Observability#
Distributed Tracing#
1# Traces collected automatically
2# View in Jaeger/Zipkin
3
4# Application only needs to propagate headers:
5# x-request-id
6# x-b3-traceid
7# x-b3-spanid
8# x-b3-parentspanid
9# x-b3-sampled
10# x-b3-flags1// Example: Propagate tracing headers
2async function callService(url: string, incomingHeaders: Headers) {
3 const tracingHeaders = [
4 'x-request-id',
5 'x-b3-traceid',
6 'x-b3-spanid',
7 'x-b3-parentspanid',
8 'x-b3-sampled',
9 ];
10
11 const headers: Record<string, string> = {};
12 for (const header of tracingHeaders) {
13 const value = incomingHeaders.get(header);
14 if (value) {
15 headers[header] = value;
16 }
17 }
18
19 return fetch(url, { headers });
20}Metrics (Prometheus)#
1# Automatic metrics collection
2# - Request count
3# - Request duration
4# - Request size
5# - Response size
6# - TCP connections
7
8# Prometheus queries
9# Request rate
10rate(istio_requests_total{destination_service="reviews"}[5m])
11
12# Error rate
13sum(rate(istio_requests_total{destination_service="reviews", response_code=~"5.."}[5m])) /
14sum(rate(istio_requests_total{destination_service="reviews"}[5m]))
15
16# Latency (p99)
17histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{destination_service="reviews"}[5m])) by (le))Access Logs#
1# Enable access logging
2apiVersion: telemetry.istio.io/v1alpha1
3kind: Telemetry
4metadata:
5 name: mesh-default
6 namespace: istio-system
7spec:
8 accessLogging:
9 - providers:
10 - name: envoy
11
12# Logs include:
13# - Source/destination
14# - Request path
15# - Response code
16# - Duration
17# - Bytes sent/receivedWhen to Use a Service Mesh#
Good fit:
✓ Many microservices
✓ Need mTLS between services
✓ Complex traffic management
✓ Require observability across services
✓ Zero-trust security model
Consider alternatives:
✗ Simple architectures (< 10 services)
✗ Performance-critical, low-latency systems
✗ Teams without Kubernetes experience
✗ Resource-constrained environments
Trade-offs#
Pros:
+ Consistent security and observability
+ Traffic management without code changes
+ Centralized policy enforcement
+ Automatic mTLS
Cons:
- Added latency (proxy hop)
- Increased resource usage
- Operational complexity
- Learning curve
Conclusion#
Service meshes solve real problems in microservices architectures: security, observability, and traffic management. But they add complexity and resource overhead.
Start without a service mesh. Add one when you have enough services that manual management becomes unsustainable, typically around 10-20 services. Istio is feature-rich, while Linkerd is simpler to operate.