Infrastructure Expert
The Infrastructure Expert agent specializes in cloud infrastructure, Infrastructure-as-Code, Kubernetes, and DevOps practices for scalable systems.
Expertise Areas#
- Cloud Platforms - AWS, GCP, Azure architecture
- Terraform - Infrastructure-as-Code, modules, state management
- Kubernetes - Deployment, services, scaling, monitoring
- Docker - Containerization, multi-stage builds, optimization
- CI/CD - GitHub Actions, GitLab CI, Jenkins pipelines
- Networking - VPCs, load balancers, CDN, DNS
- Security - IAM, secrets management, compliance
- Monitoring - Prometheus, Grafana, CloudWatch
Usage Examples#
AWS Infrastructure#
Use the infrastructure-expert agent to design AWS infrastructure for a Next.js app
with a PostgreSQL database and Redis cache.
Response includes:
- VPC and networking setup
- ECS/EKS deployment options
- RDS and ElastiCache configuration
- CloudFront CDN setup
Terraform Module#
Use the infrastructure-expert agent to create a Terraform module for deploying
a Kubernetes cluster on GCP.
Response includes:
- GKE cluster configuration
- Node pool setup
- Networking resources
- IAM roles and service accounts
Kubernetes Deployment#
Use the infrastructure-expert agent to create Kubernetes manifests for a
microservices application with 3 services.
Response includes:
- Deployment manifests
- Service definitions
- Ingress configuration
- ConfigMaps and Secrets
Infrastructure Patterns#
Terraform AWS Module#
1# modules/web-app/main.tf
2
3terraform {
4 required_providers {
5 aws = {
6 source = "hashicorp/aws"
7 version = "~> 5.0"
8 }
9 }
10}
11
12variable "app_name" {
13 type = string
14}
15
16variable "environment" {
17 type = string
18}
19
20# VPC
21resource "aws_vpc" "main" {
22 cidr_block = "10.0.0.0/16"
23 enable_dns_hostnames = true
24 enable_dns_support = true
25
26 tags = {
27 Name = "${var.app_name}-vpc"
28 Environment = var.environment
29 }
30}
31
32# Subnets
33resource "aws_subnet" "public" {
34 count = 2
35 vpc_id = aws_vpc.main.id
36 cidr_block = "10.0.${count.index + 1}.0/24"
37 availability_zone = data.aws_availability_zones.available.names[count.index]
38
39 map_public_ip_on_launch = true
40
41 tags = {
42 Name = "${var.app_name}-public-${count.index + 1}"
43 }
44}
45
46# ECS Cluster
47resource "aws_ecs_cluster" "main" {
48 name = "${var.app_name}-cluster"
49
50 setting {
51 name = "containerInsights"
52 value = "enabled"
53 }
54}
55
56# RDS Instance
57resource "aws_db_instance" "main" {
58 identifier = "${var.app_name}-db"
59 engine = "postgres"
60 engine_version = "15"
61 instance_class = "db.t3.micro"
62 allocated_storage = 20
63 storage_encrypted = true
64 skip_final_snapshot = var.environment != "production"
65
66 db_name = var.app_name
67 username = "admin"
68 password = var.db_password
69
70 vpc_security_group_ids = [aws_security_group.db.id]
71 db_subnet_group_name = aws_db_subnet_group.main.name
72}Kubernetes Deployment#
1# k8s/deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: web-app
6 labels:
7 app: web-app
8spec:
9 replicas: 3
10 selector:
11 matchLabels:
12 app: web-app
13 template:
14 metadata:
15 labels:
16 app: web-app
17 spec:
18 containers:
19 - name: web-app
20 image: ghcr.io/org/web-app:latest
21 ports:
22 - containerPort: 3000
23 env:
24 - name: DATABASE_URL
25 valueFrom:
26 secretKeyRef:
27 name: app-secrets
28 key: database-url
29 resources:
30 requests:
31 memory: "256Mi"
32 cpu: "250m"
33 limits:
34 memory: "512Mi"
35 cpu: "500m"
36 livenessProbe:
37 httpGet:
38 path: /health
39 port: 3000
40 initialDelaySeconds: 30
41 periodSeconds: 10
42 readinessProbe:
43 httpGet:
44 path: /ready
45 port: 3000
46 initialDelaySeconds: 5
47 periodSeconds: 5
48---
49apiVersion: v1
50kind: Service
51metadata:
52 name: web-app
53spec:
54 selector:
55 app: web-app
56 ports:
57 - port: 80
58 targetPort: 3000
59 type: ClusterIP
60---
61apiVersion: networking.k8s.io/v1
62kind: Ingress
63metadata:
64 name: web-app
65 annotations:
66 kubernetes.io/ingress.class: nginx
67 cert-manager.io/cluster-issuer: letsencrypt-prod
68spec:
69 tls:
70 - hosts:
71 - app.example.com
72 secretName: web-app-tls
73 rules:
74 - host: app.example.com
75 http:
76 paths:
77 - path: /
78 pathType: Prefix
79 backend:
80 service:
81 name: web-app
82 port:
83 number: 80GitHub Actions CI/CD#
1# .github/workflows/deploy.yml
2name: Deploy
3
4on:
5 push:
6 branches: [main]
7
8env:
9 REGISTRY: ghcr.io
10 IMAGE_NAME: ${{ github.repository }}
11
12jobs:
13 build:
14 runs-on: ubuntu-latest
15 permissions:
16 contents: read
17 packages: write
18
19 steps:
20 - uses: actions/checkout@v4
21
22 - name: Set up Docker Buildx
23 uses: docker/setup-buildx-action@v3
24
25 - name: Login to Container Registry
26 uses: docker/login-action@v3
27 with:
28 registry: ${{ env.REGISTRY }}
29 username: ${{ github.actor }}
30 password: ${{ secrets.GITHUB_TOKEN }}
31
32 - name: Build and push
33 uses: docker/build-push-action@v5
34 with:
35 context: .
36 push: true
37 tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
38 cache-from: type=gha
39 cache-to: type=gha,mode=max
40
41 deploy:
42 needs: build
43 runs-on: ubuntu-latest
44 steps:
45 - name: Deploy to Kubernetes
46 uses: azure/k8s-deploy@v4
47 with:
48 manifests: k8s/
49 images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}Best Practices#
Infrastructure-as-Code#
- Use modules - Reusable, versioned infrastructure components
- State management - Remote state with locking (S3 + DynamoDB)
- Environment separation - Workspaces or separate state files
- Documentation - Inline comments and README files
- CI/CD for infra - Plan on PR, apply on merge
Kubernetes#
- Resource limits - Always set requests and limits
- Health checks - Liveness and readiness probes
- Secrets management - External secrets operator or sealed secrets
- Horizontal scaling - HPA for auto-scaling
- Network policies - Restrict pod-to-pod communication
Security#
- Least privilege - Minimal IAM permissions
- Encryption - At rest and in transit
- Audit logging - CloudTrail, GCP audit logs
- Secrets rotation - Automated secret rotation
- Vulnerability scanning - Container and dependency scanning
When to Use#
Use the Infrastructure Expert when you need to:
- Design cloud architecture
- Write Terraform configurations
- Create Kubernetes manifests
- Set up CI/CD pipelines
- Configure networking and security
- Implement monitoring and alerting
- Plan for high availability and scaling
Related Agents#
- DevOps Expert - CI/CD and deployment workflows
- Security Expert - Security best practices
- Backend Expert - Application architecture
- Monitoring Expert - Observability and alerting