Alerting

chevron-rightkubernetes failure alerthashtag

https://medium.com/@kopp0510/how-i-made-kubernetes-monitoring-smarter-with-ai-b16ff3888e41arrow-up-right

covering almost all common problem scenarios:

Cluster Core Component Alerts (10)

  • AlertmanagerClusterCrashlooping, AlertmanagerClusterDown

  • AlertmanagerClusterFailedToSendAlerts, AlertmanagerConfigInconsistent

  • AlertmanagerFailedReload, KubeAPIDown

  • KubeAPIErrorBudgetBurn, PrometheusBadConfig

  • PrometheusTargetSyncFailure, PrometheusRuleFailures

Workload Related Alerts (12)

  • KubeDeploymentReplicasMismatch, KubeStatefulSetReplicasMismatch

  • KubePodCrashLooping, KubePodNotReady

  • KubeJobFailed, KubeHpaReplicasMismatch

  • KubeHpaMaxedOut, KubeHpaHighUtilization

  • KubeHpaFrequentScaling, KubeImagePullBackOff

  • KubeServiceEndpointsUnavailable, KubeConfigReloadFailed

Container and Resource Alerts (12)

  • KubeContainerOOMKilled

  • KubeContainerCPUNearLimit

  • KubeContainerMemoryNearLimit

  • KubeTooManyPendingPods

  • KubeCPUOvercommit

  • KubeMemoryOvercommit

  • KubeQuotaExceeded

  • KubeQuotaAlmostFull

  • CPUThrottlingHigh

  • NodeCPUHighUsage

  • NodeMemoryHighUtilization

  • NodeSystemSaturation

Node and Storage Alerts (10)

  • KubeletDown, KubeNodeNotReady

  • NodeFileDescriptorLimit, KubePersistentVolumeErrors

  • KubePersistentVolumeFillingUp, KubePersistentVolumeInodesFillingUp

  • KubeVolumeMountFailed, NodeFilesystemAlmostOutOfSpace

  • NodeFilesystemSpaceFillingUp, KubeStateMetricsDown

Certificate and Monitoring Alerts (5)

  • KubeClientCertificateExpiration

  • KubeletClientCertificateExpiration

  • KubeletServerCertificateExpiration

  • KubeStateMetricsListErrors

  • Watchdog

Last updated