Cluster Components Management

Task 7: Managing and Monitoring Kubernetes Cluster Critical Components

To maintain a healthy Kubernetes cluster, you must monitor and manage critical components like: ✅ API Server (Controls all operations) ✅ etcd (Stores cluster state) ✅ Controller Manager & Scheduler (Manages workloads) ✅ Kubelet & Kube Proxy (Handles nodes & networking)


Step 1: Monitor API Server Health

The API server is the most critical component. If it goes down, kubectl won’t work.

1️⃣ Check API Server Logs

kubectl logs -n kube-system -l component=kube-apiserver

Or on a master node:

journalctl -u kube-apiserver --no-pager | tail -50

2️⃣ Check API Server Health

kubectl get --raw='/readyz'

If the output is ok, the API server is healthy.

3️⃣ Restart API Server (If Unhealthy)

systemctl restart kube-apiserver

Or on a static pod setup:

docker restart $(docker ps | grep kube-apiserver | awk '{print $1}')

Common Issues & Solutions

Issue
Cause
Solution

kubectl get nodes hangs

API Server is down.

Restart API Server.

failed to connect to API Server

Firewall or network issue.

Check iptables -L and allow port 6443.


Step 2: Monitor etcd Health

Since etcd stores cluster state, monitoring it is crucial.

1️⃣ Check etcd Pod Status

2️⃣ Check etcd Health

3️⃣ Check etcd Leader Election

4️⃣ Restart etcd (If Needed)

Common Issues & Solutions

Issue
Cause
Solution

etcdctl: connection refused

etcd is down.

Restart etcd.

etcd database is corrupted

Data corruption.

Restore from snapshot.


Step 3: Monitor Controller Manager & Scheduler

The Controller Manager manages controllers, and the Scheduler schedules workloads.

1️⃣ Check Controller Manager Logs

2️⃣ Check Scheduler Logs

3️⃣ Restart If Required

Common Issues & Solutions

Issue
Cause
Solution

pods stuck in Pending

Scheduler issue.

Restart Scheduler.

failed to create deployment

Controller Manager issue.

Restart Controller Manager.


Step 4: Monitor Node Components (Kubelet & Proxy)

1️⃣ Check Kubelet Status on Nodes

If the Kubelet is not running, restart it:

2️⃣ Check Kube Proxy

To check logs:

3️⃣ Restart Kube Proxy (If Needed)

Common Issues & Solutions

Issue
Cause
Solution

Node Not Ready

Kubelet issue.

Restart Kubelet.

Pods can’t communicate

Kube Proxy issue.

Restart Kube Proxy.


Step 5: Use Prometheus & Grafana for Monitoring

For real-time monitoring, use Prometheus + Grafana.

1️⃣ Install Prometheus

2️⃣ Install Grafana

3️⃣ Expose Grafana

Now, access Grafana UI at http://localhost:3000.

Common Issues & Solutions

Issue
Cause
Solution

Grafana dashboard empty

Data source not added.

Add Prometheus as a data source.

Prometheus alerts missing

Misconfigured rules.

Reapply monitoring config.


Step 6: Set Up Alerts for Cluster Issues

1️⃣ Alert for High CPU Usage

Add this rule in Prometheus:

2️⃣ Alert for etcd Failure


Step 7: Regular Health Checks & Maintenance

Check cluster status daily:

Check failed pods:

Check node disk space:

Automate alerts for failures.


Summary

API Server monitored & restarted if neededetcd checked & restored from snapshotController Manager & Scheduler logs monitoredKubelet & Kube Proxy issues resolvedPrometheus & Grafana set up for real-time monitoringAlerting system configured for failures


Next Task: Do you want to proceed with Cluster Security & Access Control? 😊

Last updated