Infrastructure Expert

The Infrastructure Expert agent specializes in cloud infrastructure, Infrastructure-as-Code, Kubernetes, and DevOps practices for scalable systems.

Expertise Areas#

  • Cloud Platforms - AWS, GCP, Azure architecture
  • Terraform - Infrastructure-as-Code, modules, state management
  • Kubernetes - Deployment, services, scaling, monitoring
  • Docker - Containerization, multi-stage builds, optimization
  • CI/CD - GitHub Actions, GitLab CI, Jenkins pipelines
  • Networking - VPCs, load balancers, CDN, DNS
  • Security - IAM, secrets management, compliance
  • Monitoring - Prometheus, Grafana, CloudWatch

Usage Examples#

AWS Infrastructure#

Use the infrastructure-expert agent to design AWS infrastructure for a Next.js app with a PostgreSQL database and Redis cache.

Response includes:

  • VPC and networking setup
  • ECS/EKS deployment options
  • RDS and ElastiCache configuration
  • CloudFront CDN setup

Terraform Module#

Use the infrastructure-expert agent to create a Terraform module for deploying a Kubernetes cluster on GCP.

Response includes:

  • GKE cluster configuration
  • Node pool setup
  • Networking resources
  • IAM roles and service accounts

Kubernetes Deployment#

Use the infrastructure-expert agent to create Kubernetes manifests for a microservices application with 3 services.

Response includes:

  • Deployment manifests
  • Service definitions
  • Ingress configuration
  • ConfigMaps and Secrets

Infrastructure Patterns#

Terraform AWS Module#

1# modules/web-app/main.tf 2 3terraform { 4 required_providers { 5 aws = { 6 source = "hashicorp/aws" 7 version = "~> 5.0" 8 } 9 } 10} 11 12variable "app_name" { 13 type = string 14} 15 16variable "environment" { 17 type = string 18} 19 20# VPC 21resource "aws_vpc" "main" { 22 cidr_block = "10.0.0.0/16" 23 enable_dns_hostnames = true 24 enable_dns_support = true 25 26 tags = { 27 Name = "${var.app_name}-vpc" 28 Environment = var.environment 29 } 30} 31 32# Subnets 33resource "aws_subnet" "public" { 34 count = 2 35 vpc_id = aws_vpc.main.id 36 cidr_block = "10.0.${count.index + 1}.0/24" 37 availability_zone = data.aws_availability_zones.available.names[count.index] 38 39 map_public_ip_on_launch = true 40 41 tags = { 42 Name = "${var.app_name}-public-${count.index + 1}" 43 } 44} 45 46# ECS Cluster 47resource "aws_ecs_cluster" "main" { 48 name = "${var.app_name}-cluster" 49 50 setting { 51 name = "containerInsights" 52 value = "enabled" 53 } 54} 55 56# RDS Instance 57resource "aws_db_instance" "main" { 58 identifier = "${var.app_name}-db" 59 engine = "postgres" 60 engine_version = "15" 61 instance_class = "db.t3.micro" 62 allocated_storage = 20 63 storage_encrypted = true 64 skip_final_snapshot = var.environment != "production" 65 66 db_name = var.app_name 67 username = "admin" 68 password = var.db_password 69 70 vpc_security_group_ids = [aws_security_group.db.id] 71 db_subnet_group_name = aws_db_subnet_group.main.name 72}

Kubernetes Deployment#

1# k8s/deployment.yaml 2apiVersion: apps/v1 3kind: Deployment 4metadata: 5 name: web-app 6 labels: 7 app: web-app 8spec: 9 replicas: 3 10 selector: 11 matchLabels: 12 app: web-app 13 template: 14 metadata: 15 labels: 16 app: web-app 17 spec: 18 containers: 19 - name: web-app 20 image: ghcr.io/org/web-app:latest 21 ports: 22 - containerPort: 3000 23 env: 24 - name: DATABASE_URL 25 valueFrom: 26 secretKeyRef: 27 name: app-secrets 28 key: database-url 29 resources: 30 requests: 31 memory: "256Mi" 32 cpu: "250m" 33 limits: 34 memory: "512Mi" 35 cpu: "500m" 36 livenessProbe: 37 httpGet: 38 path: /health 39 port: 3000 40 initialDelaySeconds: 30 41 periodSeconds: 10 42 readinessProbe: 43 httpGet: 44 path: /ready 45 port: 3000 46 initialDelaySeconds: 5 47 periodSeconds: 5 48--- 49apiVersion: v1 50kind: Service 51metadata: 52 name: web-app 53spec: 54 selector: 55 app: web-app 56 ports: 57 - port: 80 58 targetPort: 3000 59 type: ClusterIP 60--- 61apiVersion: networking.k8s.io/v1 62kind: Ingress 63metadata: 64 name: web-app 65 annotations: 66 kubernetes.io/ingress.class: nginx 67 cert-manager.io/cluster-issuer: letsencrypt-prod 68spec: 69 tls: 70 - hosts: 71 - app.example.com 72 secretName: web-app-tls 73 rules: 74 - host: app.example.com 75 http: 76 paths: 77 - path: / 78 pathType: Prefix 79 backend: 80 service: 81 name: web-app 82 port: 83 number: 80

GitHub Actions CI/CD#

1# .github/workflows/deploy.yml 2name: Deploy 3 4on: 5 push: 6 branches: [main] 7 8env: 9 REGISTRY: ghcr.io 10 IMAGE_NAME: ${{ github.repository }} 11 12jobs: 13 build: 14 runs-on: ubuntu-latest 15 permissions: 16 contents: read 17 packages: write 18 19 steps: 20 - uses: actions/checkout@v4 21 22 - name: Set up Docker Buildx 23 uses: docker/setup-buildx-action@v3 24 25 - name: Login to Container Registry 26 uses: docker/login-action@v3 27 with: 28 registry: ${{ env.REGISTRY }} 29 username: ${{ github.actor }} 30 password: ${{ secrets.GITHUB_TOKEN }} 31 32 - name: Build and push 33 uses: docker/build-push-action@v5 34 with: 35 context: . 36 push: true 37 tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} 38 cache-from: type=gha 39 cache-to: type=gha,mode=max 40 41 deploy: 42 needs: build 43 runs-on: ubuntu-latest 44 steps: 45 - name: Deploy to Kubernetes 46 uses: azure/k8s-deploy@v4 47 with: 48 manifests: k8s/ 49 images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}

Best Practices#

Infrastructure-as-Code#

  1. Use modules - Reusable, versioned infrastructure components
  2. State management - Remote state with locking (S3 + DynamoDB)
  3. Environment separation - Workspaces or separate state files
  4. Documentation - Inline comments and README files
  5. CI/CD for infra - Plan on PR, apply on merge

Kubernetes#

  1. Resource limits - Always set requests and limits
  2. Health checks - Liveness and readiness probes
  3. Secrets management - External secrets operator or sealed secrets
  4. Horizontal scaling - HPA for auto-scaling
  5. Network policies - Restrict pod-to-pod communication

Security#

  1. Least privilege - Minimal IAM permissions
  2. Encryption - At rest and in transit
  3. Audit logging - CloudTrail, GCP audit logs
  4. Secrets rotation - Automated secret rotation
  5. Vulnerability scanning - Container and dependency scanning

When to Use#

Use the Infrastructure Expert when you need to:

  • Design cloud architecture
  • Write Terraform configurations
  • Create Kubernetes manifests
  • Set up CI/CD pipelines
  • Configure networking and security
  • Implement monitoring and alerting
  • Plan for high availability and scaling
  • DevOps Expert - CI/CD and deployment workflows
  • Security Expert - Security best practices
  • Backend Expert - Application architecture
  • Monitoring Expert - Observability and alerting