Logging and Observability Best Practices

When something goes wrong in production, observability is the difference between quick resolution and hours of guessing. Good logging, tracing, and metrics let you understand system behavior, debug issues, and prevent problems before users notice.

The Three Pillars of Observability

Logs

Discrete events with context. "User 123 placed order 456 at 14:32:05"

Metrics

Aggregated measurements over time. "Orders per minute: 150"

Traces

Request flow across services. "Request X took 500ms: API (50ms) → DB (300ms) → Cache (150ms)"

Structured Logging

Why Structure Matters

Loading code block...

Structured logs are:

Searchable: userId:123 AND level:error
Aggregatable: "Count errors by productId"
Parseable: Machines can process them

Logger Configuration

Loading code block...

Log Levels

Loading code block...

Contextual Logging

Loading code block...

Distributed Tracing

OpenTelemetry Setup

Loading code block...

Manual Spans

Loading code block...

Metrics

Key Metrics Types

Loading code block...

Request Metrics Middleware

Loading code block...

Business Metrics

Loading code block...

Alerting Strategy

Alert Definition

Loading code block...

Runbook Links

Loading code block...

Log Aggregation

Shipping Logs

Loading code block...

Query Patterns

# Find errors for a specific user
service:order-api AND level:error AND userId:123

# Find slow requests
service:order-api AND duration:>1000

# Trace a request across services
traceId:abc123

# Error patterns in last hour
service:* AND level:error | stats count by message | sort -count | head 10

Dashboard Design

Key Dashboards

Request to AI:

Design observability dashboards for an e-commerce API:

Dashboards needed:
1. Overview (health at a glance)
2. Request performance
3. Business metrics
4. Infrastructure
5. Errors and debugging

For each dashboard:
- Key metrics to display
- Visualization types
- Time ranges
- Alert thresholds

Example Overview Dashboard

┌─────────────────────────────────────────────────────────────┐
│  Request Rate          Error Rate           P95 Latency    │
│  [  1,234 req/s  ]     [  0.3%  ]           [  145ms  ]    │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│  Requests per Second (last 6 hours)                         │
│  ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁▂▃▄▅▆▇█▇▆▅▄▃▂▁▂▃▄▅▆▇█▇▆▅▄▃▂▁                │
└─────────────────────────────────────────────────────────────┘
┌──────────────────────────┬──────────────────────────────────┐
│  Top Endpoints           │  Recent Errors                   │
│  GET /products   45%     │  • PaymentError: timeout         │
│  GET /users      25%     │  • ValidationError: email        │
│  POST /orders    15%     │  • NotFoundError: product        │
│  Other           15%     │                                  │
└──────────────────────────┴──────────────────────────────────┘

Conclusion

Observability isn't a feature—it's a capability that enables everything else. Without visibility into system behavior, you're flying blind.

Start with structured logging. Add metrics for key operations. Implement tracing for distributed systems. Build dashboards that answer questions before they're asked. Set up alerts that catch problems before users do.

AI helps implement these patterns correctly, from logger configuration to alert thresholds. The investment in observability pays dividends every time you need to debug production issues—which is always sooner than you expect.

Logging and Observability Best Practices

The Three Pillars of Observability

Logs

Metrics

Traces

Structured Logging

Why Structure Matters

Logger Configuration

Log Levels

Contextual Logging

Distributed Tracing

OpenTelemetry Setup

Manual Spans

Metrics

Key Metrics Types

Request Metrics Middleware

Business Metrics

Alerting Strategy

Alert Definition

Runbook Links

Log Aggregation

Shipping Logs

Query Patterns

Dashboard Design

Key Dashboards

Example Overview Dashboard

Conclusion

Share this article

Related articles

Logging Best Practices: Effective Application Logging

Observability: Monitoring Distributed Systems

Monitoring and Alerting Strategies for Production Systems

The Three Pillars of Observability#

Logs#

Metrics#

Traces#

Structured Logging#

Why Structure Matters#

Logger Configuration#

Log Levels#

Contextual Logging#

Distributed Tracing#

OpenTelemetry Setup#

Manual Spans#

Metrics#

Key Metrics Types#

Request Metrics Middleware#

Business Metrics#

Alerting Strategy#

Alert Definition#

Runbook Links#

Log Aggregation#

Shipping Logs#

Query Patterns#

Dashboard Design#

Key Dashboards#

Example Overview Dashboard#

Conclusion#

Share this article

Related articles

Logging Best Practices: Effective Application Logging

Observability: Monitoring Distributed Systems

Monitoring and Alerting Strategies for Production Systems

The Three Pillars of Observability

Logs

Metrics

Traces

Structured Logging

Why Structure Matters

Logger Configuration

Log Levels

Contextual Logging

Distributed Tracing

OpenTelemetry Setup

Manual Spans

Metrics

Key Metrics Types

Request Metrics Middleware

Business Metrics

Alerting Strategy

Alert Definition

Runbook Links

Log Aggregation

Shipping Logs

Query Patterns

Dashboard Design

Key Dashboards

Example Overview Dashboard

Conclusion