Skip to content

Events & Troubleshooting

Access Kubernetes events and container logs with descriptions and troubleshooting guidance.

What Are Events?

Kubernetes generates events as things happen in your cluster. Each event includes context and guidance:

  • What happened: Description of the event
  • Why it matters: Impact on your applications
  • What to do: Recommended actions if needed
  • Context: Related resources and timeline

Tip: Event Context Provided

Events include descriptions and troubleshooting steps to help you quickly identify and resolve issues.

Event Categories

✅ Normal Events (Good News)

These events mean things are working as expected:

What you'll see:

  • "Pod successfully started on node worker-1"
  • "Container image pulled and ready to run"
  • "Application deployment completed successfully"
  • "Storage volume attached and ready"

Why we show these: To confirm your operations completed successfully and provide an audit trail.

⚠️ Warning Events (Pay Attention)

These events indicate potential issues that haven't caused failures yet:

What you'll see (with explanations):

  • "Pod waiting to be scheduled"

    • What it means: No node has enough resources for this pod
    • What to do: Check if you need to add nodes or reduce resource requests
  • "Image pull is slow"

    • What it means: Container image is taking a while to download
    • What to do: Usually resolves itself. If persistent, check network connectivity
  • "Health check failing"

    • What it means: Your application isn't responding to readiness probes
    • What to do: Check application logs for startup issues

🔴 Error Events (Action Required)

These events indicate active problems:

What you'll see (with troubleshooting):

  • "Container crashed with exit code 1"

    • What it means: Your application exited with an error
    • What to do: Check container logs below for the error message
  • "Out of memory (OOMKilled)"

    • What it means: Container used more memory than its limit
    • What to do: Increase memory limits or optimize your application
  • "Cannot pull image"

    • What it means: Can't download the container image
    • What to do: Verify image name, registry access, and credentials

Event Sources

Events are generated by various Kubernetes components:

Scheduler

  • Pod scheduling decisions
  • Node selection
  • Resource constraints

Kubelet

  • Container lifecycle
  • Volume operations
  • Node status changes

Controller Manager

  • Deployment rollouts
  • ReplicaSet scaling
  • Job execution

API Server

  • Resource creation/deletion
  • Authentication/Authorization events

Event Information

Each event contains:

  • Type: Normal or Warning
  • Reason: Event classification (e.g., Failed, Started, Killing)
  • Message: Detailed description
  • Source: Component that generated the event
  • Object: Related Kubernetes resource
  • Timestamp: When the event occurred
  • Count: Number of occurrences

Event Monitoring

Resource Events

Monitor events for specific resources to track their lifecycle and identify issues.

Configuration:

yaml
clusterPirate:
  monitoring:
    resourceEventsEnabled: true

System Events

Track cluster-wide and node-level events to monitor infrastructure health.

Configuration:

yaml
clusterPirate:
  monitoring:
    systemEventsEnabled: true

Common Event Scenarios

Pod Failures

Failed Scheduling

  • Reason: FailedScheduling
  • Common Causes: Insufficient resources, node selectors, taints/tolerations
  • Resolution: Check node capacity, adjust resource requests, verify node labels

Image Pull Errors

  • Reason: Failed, ErrImagePull, ImagePullBackOff
  • Common Causes: Invalid image name, missing credentials, network issues
  • Resolution: Verify image name, check image pull secrets, test registry connectivity

Container Crashes

  • Reason: CrashLoopBackOff, Error
  • Common Causes: Application errors, missing dependencies, configuration issues
  • Resolution: Check container logs, verify environment variables, review application code

OOM Kills

  • Reason: OOMKilled
  • Common Causes: Insufficient memory limits, memory leaks
  • Resolution: Increase memory limits, profile application memory usage, fix memory leaks

Volume Issues

Mount Failures

  • Reason: FailedMount
  • Common Causes: Volume not available, incorrect PVC configuration, storage class issues
  • Resolution: Verify PVC exists, check storage class, ensure volume is bound

Volume Full

  • Reason: VolumeResizeFailed
  • Common Causes: Disk space exhausted, volume resize not supported
  • Resolution: Clean up disk space, resize volume, migrate to larger volume

Readiness/Liveness Failures

Probe Failures

  • Reason: Unhealthy
  • Common Causes: Application not ready, incorrect probe configuration, network issues
  • Resolution: Check application startup time, adjust probe settings, verify endpoint availability

Pod Logs

Accessing Logs

Logs are available for all containers in running and recently terminated pods.

Via Web Console:

  1. Navigate to cluster in portal
  2. Select namespace and pod
  3. Choose container (if multiple)
  4. View real-time logs

Log Features

Real-time Streaming

  • Live tail of container stdout/stderr
  • Automatic updates as new logs are written

Historical Logs

  • Access logs from previous container runs
  • View logs from terminated containers

Filtering

  • Search log content
  • Filter by timestamp
  • Filter by log level (if structured)

Log Retention

  • Active Containers: Logs available while container is running
  • Terminated Containers: Logs retained based on Kubernetes configuration
  • Pod Deletion: Logs are lost when pod is deleted

Use Cases

Troubleshooting Application Issues

  1. Check Pod Events: Identify scheduling or startup issues
  2. Review Container Logs: Look for application errors or exceptions
  3. Monitor Resource Events: Track deployment updates and rollouts
  4. Examine System Events: Identify infrastructure problems

Debugging Crashes

  1. Find OOM Events: Check for memory-related kills
  2. Review Exit Codes: Understand how containers terminated
  3. Analyze Error Patterns: Identify recurring issues
  4. Check Previous Logs: Review logs from failed containers

Monitoring Deployments

  1. Track Rollout Events: Monitor deployment progress
  2. Identify Pod Failures: Catch issues during updates
  3. Verify Configuration: Ensure correct settings applied
  4. Watch Resource Updates: Track replica changes

Audit Trail

  1. Resource Creation: Track when resources were created
  2. Configuration Changes: Monitor updates to workloads
  3. Access Events: Review authentication/authorization events
  4. Deletion Events: Track resource cleanup

Kubernetes Events

Access events through Kubernetes resource endpoints.

Get Resource with Events

When retrieving a specific resource, events are included in the response:

http
GET /v1/workspaces/{workspaceId}/observability/{observabilityInstanceId}/clusters/{clusterId}/namespaces/{namespace}/pods/{podName}
Authorization: Bearer <access-token>

Response includes:

json
{
  "resource": {
    /* pod details */
  },
  "events": [
    {
      "type": "Normal",
      "reason": "Started",
      "message": "Started container nginx",
      "timestamp": "2024-01-15T10:30:00Z",
      "count": 1
    }
  ]
}

Best Practices

Event Monitoring

  • Enable both resource and system events for complete visibility
  • Set up alerts for critical event types (OOMKilled, CrashLoopBackOff)
  • Regularly review warning events to catch issues early

Log Management

  • Implement structured logging in applications
  • Include correlation IDs for request tracing
  • Use appropriate log levels (DEBUG, INFO, WARN, ERROR)
  • Avoid logging sensitive information

Troubleshooting Workflow

  1. Start with events to identify the problem type
  2. Review pod logs for application-specific details
  3. Check resource configuration for misconfigurations
  4. Examine metrics for resource constraints
  5. Review cluster-wide events for infrastructure issues