Health Check
Linux System Health Checks and Monitoring Guide with Explanations
This guide provides a complete understanding of system health monitoring and includes what happens when you run each command along with common issues & solutions.
π Step 1: Monitor CPU, Memory, and Disk Usage
1.1 Check CPU Usage
πΉ Check real-time CPU usage:
topWhat Happens?
Displays real-time CPU, memory, and process usage.
Press
qto exit.
πΉ Alternative (more user-friendly):
htopWhat Happens?
Similar to
top, but with better visuals and colors.Press
F9to kill processes directly.
πΉ Check CPU usage history:
What Happens?
Captures CPU usage every 5 seconds for 10 times.
πΉ View CPU usage by each core:
What Happens?
Shows CPU utilization per core.
β Common Issue: High CPU Usage πΉ Find the process consuming CPU:
What Happens?
Lists top CPU-consuming processes.
πΉ Kill high CPU-consuming process:
What Happens?
Immediately terminates the process with no chance of cleanup.
Useful when service/process consuming high ram/CPU u can kill it and restart it
Prevents the server from crashing due to high CPU usage.
we can kill service when there is port conflict
sudo lsof -i :80 and sudo kill -9
Useful when service is hang and not responsive
πΉ Safer alternative:
What Happens?
Sends a graceful termination signal, allowing the process to shut down properly.
1.2 Check Memory Usage
πΉ Check available and used memory:
What Happens?
Displays memory usage in MB.
πΉ Detailed memory usage:
What Happens?
Shows swap, active, and inactive memory details.
β Common Issue: Memory Running Low πΉ Find high memory-consuming processes:
What Happens?
Lists top memory-consuming processes.
πΉ Clear cached memory:
What Happens?
Clears disk cache without affecting running applications.
1.3 Check Disk Usage
πΉ View disk space usage:
What Happens?
Shows disk usage in human-readable format.
πΉ Check individual file sizes:
What Happens?
Shows size of each file in
/var/log.
πΉ Find large files (above 1GB):
What Happens?
Searches for files larger than 1GB.
β Common Issue: Disk Full πΉ Find and delete old logs:
What Happens?
Deletes compressed log files.
πΉ Remove unnecessary packages:
What Happens?
Removes unused dependencies.
π Step 2: Monitor System Load
πΉ Check system load (1, 5, 15 min avg):
What Happens?
Shows system load averages over time.
πΉ Detailed load average:
What Happens?
Displays CPU load statistics.
β Common Issue: High Load Average πΉ Check which process is causing load:
What Happens?
Sorts processes by CPU usage.
πΉ Reduce load by stopping unnecessary services:
What Happens?
Stops the Apache web server.
π Step 3: Check Running Services
πΉ List all running services:
What Happens?
Lists active systemd services.
πΉ List only failed services:
What Happens?
Displays failed services.
πΉ Check service status:
What Happens?
Shows the running status of the SSH service.
β Common Issue: Service Not Running πΉ Restart a failed service:
What Happens?
Stops and then starts the service.
πΉ Enable a service on boot:
What Happens?
Ensures the service starts on boot.
π Step 4: Log Analysis
πΉ View system logs:
What Happens?
Shows detailed logs of recent issues.
πΉ Check kernel logs:
What Happens?
Displays the last 20 kernel logs.
β Common Issue: Too Many Log Errors πΉ Clear logs safely:
What Happens?
Rotates logs based on configured policies.
π Step 5: Network Diagnostics
πΉ Check network connections:
What Happens?
Shows open ports and listening services.
πΉ Check bandwidth usage:
What Happens?
Displays real-time network usage.
5.2 Test Network Connectivity
πΉ Ping a server:
What Happens?
Sends 4 ICMP packets to check connectivity.
πΉ Check if a port is open:
What Happens?
Tests if port 443 (HTTPS) is open.
β Common Issue: Internet Not Working πΉ Check if default gateway is set:
What Happens?
Displays network routing table.
πΉ Restart network service:
What Happens?
Restarts all network interfaces.
π Step 6: Security Health Check
πΉ Check failed login attempts:
What Happens?
Finds unauthorized login attempts.
πΉ Check for running root processes:
What Happens?
Lists processes running as root.
β Common Issue: SSH Brute Force Attacks πΉ Block repeated failed login attempts:
What Happens?
Displays ban status for failed SSH attempts.
πΉ Manually block IP:
What Happens?
Blocks the IP from accessing the system.
π Summary
β Monitored CPU, Memory, and Disk Usage β Checked System Load and Running Services β Analyzed Logs for Errors β Diagnosed Network Issues β Performed Security Audits
Next: Do you want to configure system alerts using Prometheus and Grafana? π
Last updated