Events & Troubleshooting

Access Kubernetes events and container logs with descriptions and troubleshooting guidance.

What Are Events?

Kubernetes generates events as things happen in your cluster. Each event includes context and guidance:

What happened: Description of the event
Why it matters: Impact on your applications
What to do: Recommended actions if needed
Context: Related resources and timeline

Tip: Event Context Provided

Events include descriptions and troubleshooting steps to help you quickly identify and resolve issues.

Event Categories

✅ Normal Events (Good News)

These events mean things are working as expected:

What you'll see:

"Pod successfully started on node worker-1"
"Container image pulled and ready to run"
"Application deployment completed successfully"
"Storage volume attached and ready"

Why we show these: To confirm your operations completed successfully and provide an audit trail.

⚠️ Warning Events (Pay Attention)

These events indicate potential issues that haven't caused failures yet:

What you'll see (with explanations):

"Pod waiting to be scheduled"
- What it means: No node has enough resources for this pod
- What to do: Check if you need to add nodes or reduce resource requests
"Image pull is slow"
- What it means: Container image is taking a while to download
- What to do: Usually resolves itself. If persistent, check network connectivity
"Health check failing"
- What it means: Your application isn't responding to readiness probes
- What to do: Check application logs for startup issues

🔴 Error Events (Action Required)

These events indicate active problems:

What you'll see (with troubleshooting):

"Container crashed with exit code 1"
- What it means: Your application exited with an error
- What to do: Check container logs below for the error message
"Out of memory (OOMKilled)"
- What it means: Container used more memory than its limit
- What to do: Increase memory limits or optimize your application
"Cannot pull image"
- What it means: Can't download the container image
- What to do: Verify image name, registry access, and credentials

Event Sources

Events are generated by various Kubernetes components:

Scheduler

Pod scheduling decisions
Node selection
Resource constraints

Kubelet

Container lifecycle
Volume operations
Node status changes

Controller Manager

Deployment rollouts
ReplicaSet scaling
Job execution

API Server

Resource creation/deletion
Authentication/Authorization events

Event Information

Each event contains:

Type: Normal or Warning
Reason: Event classification (e.g., Failed, Started, Killing)
Message: Detailed description
Source: Component that generated the event
Object: Related Kubernetes resource
Timestamp: When the event occurred
Count: Number of occurrences

Event Monitoring

Resource Events

Monitor events for specific resources to track their lifecycle and identify issues.

Configuration:

yaml

clusterPirate:
  monitoring:
    resourceEventsEnabled: true

System Events

Track cluster-wide and node-level events to monitor infrastructure health.

Configuration:

yaml

clusterPirate:
  monitoring:
    systemEventsEnabled: true

Common Event Scenarios

Pod Failures

Failed Scheduling

Reason: FailedScheduling
Common Causes: Insufficient resources, node selectors, taints/tolerations
Resolution: Check node capacity, adjust resource requests, verify node labels

Image Pull Errors

Reason: Failed, ErrImagePull, ImagePullBackOff
Common Causes: Invalid image name, missing credentials, network issues
Resolution: Verify image name, check image pull secrets, test registry connectivity

Container Crashes

Reason: CrashLoopBackOff, Error
Common Causes: Application errors, missing dependencies, configuration issues
Resolution: Check container logs, verify environment variables, review application code

OOM Kills

Reason: OOMKilled
Common Causes: Insufficient memory limits, memory leaks
Resolution: Increase memory limits, profile application memory usage, fix memory leaks

Volume Issues

Mount Failures

Reason: FailedMount
Common Causes: Volume not available, incorrect PVC configuration, storage class issues
Resolution: Verify PVC exists, check storage class, ensure volume is bound

Volume Full

Reason: VolumeResizeFailed
Common Causes: Disk space exhausted, volume resize not supported
Resolution: Clean up disk space, resize volume, migrate to larger volume

Readiness/Liveness Failures

Probe Failures

Reason: Unhealthy
Common Causes: Application not ready, incorrect probe configuration, network issues
Resolution: Check application startup time, adjust probe settings, verify endpoint availability

Pod Logs

Accessing Logs

Logs are available for all containers in running and recently terminated pods.

Via Web Console:

Navigate to cluster in portal
Select namespace and pod
Choose container (if multiple)
View real-time logs

Log Features

Real-time Streaming

Live tail of container stdout/stderr
Automatic updates as new logs are written

Historical Logs

Access logs from previous container runs
View logs from terminated containers

Filtering

Search log content
Filter by timestamp
Filter by log level (if structured)

Log Retention

Active Containers: Logs available while container is running
Terminated Containers: Logs retained based on Kubernetes configuration
Pod Deletion: Logs are lost when pod is deleted

Use Cases

Troubleshooting Application Issues

Check Pod Events: Identify scheduling or startup issues
Review Container Logs: Look for application errors or exceptions
Monitor Resource Events: Track deployment updates and rollouts
Examine System Events: Identify infrastructure problems

Debugging Crashes

Find OOM Events: Check for memory-related kills
Review Exit Codes: Understand how containers terminated
Analyze Error Patterns: Identify recurring issues
Check Previous Logs: Review logs from failed containers

Monitoring Deployments

Track Rollout Events: Monitor deployment progress
Identify Pod Failures: Catch issues during updates
Verify Configuration: Ensure correct settings applied
Watch Resource Updates: Track replica changes

Audit Trail

Resource Creation: Track when resources were created
Configuration Changes: Monitor updates to workloads
Access Events: Review authentication/authorization events
Deletion Events: Track resource cleanup

Kubernetes Events

Access events through Kubernetes resource endpoints.

Get Resource with Events

When retrieving a specific resource, events are included in the response:

http

GET /v1/workspaces/{workspaceId}/observability/{observabilityInstanceId}/clusters/{clusterId}/namespaces/{namespace}/pods/{podName}
Authorization: Bearer <access-token>

Response includes:

json

{
  "resource": {
    /* pod details */
  },
  "events": [
    {
      "type": "Normal",
      "reason": "Started",
      "message": "Started container nginx",
      "timestamp": "2024-01-15T10:30:00Z",
      "count": 1
    }
  ]
}

Best Practices

Event Monitoring

Enable both resource and system events for complete visibility
Set up alerts for critical event types (OOMKilled, CrashLoopBackOff)
Regularly review warning events to catch issues early

Log Management

Implement structured logging in applications
Include correlation IDs for request tracing
Use appropriate log levels (DEBUG, INFO, WARN, ERROR)
Avoid logging sensitive information

Troubleshooting Workflow

Start with events to identify the problem type
Review pod logs for application-specific details
Check resource configuration for misconfigurations
Examine metrics for resource constraints
Review cluster-wide events for infrastructure issues

Events & Troubleshooting ​

What Are Events? ​

Event Categories ​

✅ Normal Events (Good News) ​

⚠️ Warning Events (Pay Attention) ​

🔴 Error Events (Action Required) ​

Event Sources ​

Event Information ​

Event Monitoring ​

Resource Events ​

System Events ​

Common Event Scenarios ​

Pod Failures ​

Volume Issues ​

Readiness/Liveness Failures ​

Pod Logs ​

Accessing Logs ​

Log Features ​

Log Retention ​

Use Cases ​

Troubleshooting Application Issues ​

Debugging Crashes ​

Monitoring Deployments ​

Audit Trail ​

Kubernetes Events ​

Get Resource with Events ​

Best Practices ​

Event Monitoring ​

Log Management ​

Troubleshooting Workflow ​

Related Resources ​

Events & Troubleshooting

What Are Events?

Event Categories

✅ Normal Events (Good News)

⚠️ Warning Events (Pay Attention)

🔴 Error Events (Action Required)

Event Sources

Event Information

Event Monitoring

Resource Events

System Events

Common Event Scenarios

Pod Failures

Volume Issues

Readiness/Liveness Failures

Pod Logs

Accessing Logs

Log Features

Log Retention

Use Cases

Troubleshooting Application Issues

Debugging Crashes

Monitoring Deployments

Audit Trail

Kubernetes Events

Get Resource with Events

Best Practices

Event Monitoring

Log Management

Troubleshooting Workflow

Related Resources