Health Check

Linux System Health Checks and Monitoring Guide with Explanations

This guide provides a complete understanding of system health monitoring and includes what happens when you run each command along with common issues & solutions.


πŸš€ Step 1: Monitor CPU, Memory, and Disk Usage

1.1 Check CPU Usage

πŸ”Ή Check real-time CPU usage:

top

What Happens?

  • Displays real-time CPU, memory, and process usage.

  • Press q to exit.

πŸ”Ή Alternative (more user-friendly):

htop

What Happens?

  • Similar to top, but with better visuals and colors.

  • Press F9 to kill processes directly.

πŸ”Ή Check CPU usage history:

What Happens?

  • Captures CPU usage every 5 seconds for 10 times.

πŸ”Ή View CPU usage by each core:

What Happens?

  • Shows CPU utilization per core.

βœ… Common Issue: High CPU Usage πŸ”Ή Find the process consuming CPU:

What Happens?

  • Lists top CPU-consuming processes.

πŸ”Ή Kill high CPU-consuming process:

What Happens?

  • Immediately terminates the process with no chance of cleanup.

  • Useful when service/process consuming high ram/CPU u can kill it and restart it

  • Prevents the server from crashing due to high CPU usage.

  • we can kill service when there is port conflict

  • sudo lsof -i :80 and sudo kill -9

  • Useful when service is hang and not responsive

πŸ”Ή Safer alternative:

What Happens?

  • Sends a graceful termination signal, allowing the process to shut down properly.


1.2 Check Memory Usage

πŸ”Ή Check available and used memory:

What Happens?

  • Displays memory usage in MB.

πŸ”Ή Detailed memory usage:

What Happens?

  • Shows swap, active, and inactive memory details.

βœ… Common Issue: Memory Running Low πŸ”Ή Find high memory-consuming processes:

What Happens?

  • Lists top memory-consuming processes.

πŸ”Ή Clear cached memory:

What Happens?

  • Clears disk cache without affecting running applications.


1.3 Check Disk Usage

πŸ”Ή View disk space usage:

What Happens?

  • Shows disk usage in human-readable format.

πŸ”Ή Check individual file sizes:

What Happens?

  • Shows size of each file in /var/log.

πŸ”Ή Find large files (above 1GB):

What Happens?

  • Searches for files larger than 1GB.

βœ… Common Issue: Disk Full πŸ”Ή Find and delete old logs:

What Happens?

  • Deletes compressed log files.

πŸ”Ή Remove unnecessary packages:

What Happens?

  • Removes unused dependencies.


πŸš€ Step 2: Monitor System Load

πŸ”Ή Check system load (1, 5, 15 min avg):

What Happens?

  • Shows system load averages over time.

πŸ”Ή Detailed load average:

What Happens?

  • Displays CPU load statistics.

βœ… Common Issue: High Load Average πŸ”Ή Check which process is causing load:

What Happens?

  • Sorts processes by CPU usage.

πŸ”Ή Reduce load by stopping unnecessary services:

What Happens?

  • Stops the Apache web server.


πŸš€ Step 3: Check Running Services

πŸ”Ή List all running services:

What Happens?

  • Lists active systemd services.

πŸ”Ή List only failed services:

What Happens?

  • Displays failed services.

πŸ”Ή Check service status:

What Happens?

  • Shows the running status of the SSH service.

βœ… Common Issue: Service Not Running πŸ”Ή Restart a failed service:

What Happens?

  • Stops and then starts the service.

πŸ”Ή Enable a service on boot:

What Happens?

  • Ensures the service starts on boot.


πŸš€ Step 4: Log Analysis

πŸ”Ή View system logs:

What Happens?

  • Shows detailed logs of recent issues.

πŸ”Ή Check kernel logs:

What Happens?

  • Displays the last 20 kernel logs.

βœ… Common Issue: Too Many Log Errors πŸ”Ή Clear logs safely:

What Happens?

  • Rotates logs based on configured policies.


πŸš€ Step 5: Network Diagnostics

πŸ”Ή Check network connections:

What Happens?

  • Shows open ports and listening services.

πŸ”Ή Check bandwidth usage:

What Happens?

  • Displays real-time network usage.

5.2 Test Network Connectivity

πŸ”Ή Ping a server:

What Happens?

  • Sends 4 ICMP packets to check connectivity.

πŸ”Ή Check if a port is open:

What Happens?

  • Tests if port 443 (HTTPS) is open.

βœ… Common Issue: Internet Not Working πŸ”Ή Check if default gateway is set:

What Happens?

  • Displays network routing table.

πŸ”Ή Restart network service:

What Happens?

  • Restarts all network interfaces.


πŸš€ Step 6: Security Health Check

πŸ”Ή Check failed login attempts:

What Happens?

  • Finds unauthorized login attempts.

πŸ”Ή Check for running root processes:

What Happens?

  • Lists processes running as root.

βœ… Common Issue: SSH Brute Force Attacks πŸ”Ή Block repeated failed login attempts:

What Happens?

  • Displays ban status for failed SSH attempts.

πŸ”Ή Manually block IP:

What Happens?

  • Blocks the IP from accessing the system.


πŸš€ Summary

βœ” Monitored CPU, Memory, and Disk Usage βœ” Checked System Load and Running Services βœ” Analyzed Logs for Errors βœ” Diagnosed Network Issues βœ” Performed Security Audits


Next: Do you want to configure system alerts using Prometheus and Grafana? πŸš€

Last updated