PromQL

View all Queries from mode-expo:9100/metrics do ctrl+f and search

All

📌 Complete PromQL Tutorial – Querying Metrics in Prometheus

PromQL (Prometheus Query Language) is used to query and analyze time-series metrics stored in Prometheus. It allows us to filter, aggregate, and visualize data from applications, servers, and infrastructure.

🔹 1. Basics of PromQL

Prometheus stores metrics in time-series format, consisting of:

Metric Name – Represents a resource, e.g., http_requests_total
Labels – Key-value pairs for filtering, e.g., method="GET", status="200"
Timestamps – Each metric has a timestamp

Example:

http_requests_total{method="GET", status="200"}

http_requests_total → Total HTTP requests
{method="GET", status="200"} → Filters by method and status
Result: Returns all GET requests with status 200.

🔹 2. Querying Time-Series Data

📌 2.1 Instant Vectors (Single Data Point)

An instant vector represents the latest value of a metric.

Example 1: Fetch all values of a metric

http_requests_total

Returns the current value for all instances.

Example 2: Filter using labels

http_requests_total{method="POST"}

Returns only POST requests.

📌 2.2 Range Vectors (Multiple Data Points)

A range vector fetches multiple data points over time.

Example: Last 5 minutes of request count

http_requests_total[5m]

Returns all request counts from the last 5 minutes.

Example: CPU Usage in the last 10 minutes

node_cpu_seconds_total[10m]

Returns CPU usage data from the last 10 minutes.

🔹 3. Aggregation Operators

Aggregation helps in calculating totals, averages, min, max, etc.

📌 3.1 Sum (Total Count)

sum(http_requests_total)

Returns the total count of HTTP requests across all instances.

📌 3.2 Average

avg(node_memory_Active_bytes)

Returns the average active memory usage.

📌 3.3 Maximum and Minimum

max(node_cpu_seconds_total)
min(node_cpu_seconds_total)

Finds the highest and lowest CPU usage.

📌 3.4 Count (Number of Instances Reporting)

count(node_cpu_seconds_total)

Counts the number of instances reporting CPU metrics.

🔹 4. Mathematical Operations

PromQL supports basic math operations on metrics.

📌 4.1 Rate of Increase

rate(http_requests_total[5m])

Calculates requests per second over the last 5 minutes.

📌 4.2 Convert Bytes to MB

node_memory_Active_bytes / (1024 * 1024)

Converts memory from bytes to megabytes.

📌 4.3 CPU Usage Percentage

100 - (avg by (instance)(rate(node_cpu_seconds_total[5m])) * 100)

Calculates CPU idle percentage.

🔹 5. Time-Based Queries

PromQL supports time functions to filter historical data.

📌 5.1 Query Data at a Specific Time

http_requests_total @ 1694000000

Fetches the metric at Unix timestamp 1694000000.

📌 5.2 Fetch Metrics from 1 Hour Ago

http_requests_total offset 1h

Returns values from 1 hour ago.

🔹 6. Histogram Queries

Prometheus stores histograms with:

_count → Total occurrences
_sum → Total sum of values
_bucket → Buckets for distribution analysis

📌 6.1 Histogram Quantiles

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Returns the 95th percentile response time.

📌 6.2 Average Response Time

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Calculates the average request duration.

🔹 7. Advanced Queries

📌 7.1 Conditional Filtering

node_cpu_seconds_total > 100

Returns CPU metrics greater than 100.

rate(http_requests_total[5m]) > 10

Returns endpoints with more than 10 requests/sec.

📌 7.2 Comparing Two Metrics

http_requests_total / http_requests_total{method="GET"}

Compares total requests vs. GET requests.

📌 7.3 Calculating Error Rate

sum(rate(http_requests_total{status=~"5.."}[5m])) 
/ sum(rate(http_requests_total[5m])) * 100

Calculates the percentage of 5xx errors.

📌 7.4 Uptime Check

up == 0

Returns all down instances.

🔹 8. Recording Rules (Performance Optimization)

Instead of querying real-time data, we can precompute metrics.

Example: Create a Custom Query

groups:
  - name: custom_rules
    interval: 30s
    rules:
      - record: job:http_requests:rate5m
        expr: rate(http_requests_total[5m])

Saves the rate(http_requests_total[5m]) as job:http_requests:rate5m.

🔹 9. Alerts in Prometheus

📌 9.1 Alert When CPU is Above 90%

groups:
  - name: cpu_alerts
    rules:
      - alert: HighCPUUsage
        expr: (100 - avg by (instance)(rate(node_cpu_seconds_total[5m])) * 100) > 90
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

Triggers an alert if CPU usage exceeds 90% for 2 minutes.

🔹 10. Exporting Data from Prometheus

📌 10.1 Query Data via API

curl "http://localhost:9090/api/v1/query?query=rate(http_requests_total[5m])"

Fetches request rate data.

📌 10.2 Export Data to CSV

curl -G "http://localhost:9090/api/v1/query_range" \
  --data-urlencode "query=node_cpu_seconds_total" \
  --data-urlencode "start=1694000000" \
  --data-urlencode "end=1694003600" \
  --data-urlencode "step=60s" | jq '.data.result[] | {metric, values}'

Extracts metrics in CSV format.

🔹 Summary

✅ Metric Types → Instant vectors (single data point), Range vectors (time-based data). ✅ Filtering & Aggregation → sum(), avg(), max(), min(), rate(). ✅ Histogram Analysis → histogram_quantile() for latency tracking. ✅ Time Manipulation → offset, @ for historical data. ✅ Alerts & Automation → Use recording rules and alerts.

🚀 With this, you can fully leverage PromQL to analyze, visualize, and optimize your system performance in Prometheus!

http

Here are the most commonly used PromQL queries for HTTP metrics along with filters:

🔹 1. Basic HTTP Metrics

📌 1.1 Total HTTP Requests

http_requests_total

Returns total HTTP requests for all endpoints.

📌 1.2 Total HTTP Requests with Filter

http_requests_total{method="GET"}

Filters only GET requests.

http_requests_total{status="500"}

Filters only 500 Internal Server Error responses.

🔹 2. HTTP Request Rate (Requests per Second)

📌 2.1 Requests per Second (RPS)

rate(http_requests_total[5m])

Calculates the number of requests per second over the last 5 minutes.

📌 2.2 Requests per Second by Method

rate(http_requests_total{method="POST"}[5m])

Calculates POST request rate.

📌 2.3 Requests per Second by Status Code

rate(http_requests_total{status="200"}[5m])

Calculates only 200 OK responses.

🔹 3. HTTP Error Rate (4xx & 5xx Errors)

📌 3.1 4xx Error Rate

sum(rate(http_requests_total{status=~"4.."}[5m]))

Calculates the rate of all 4xx errors.

📌 3.2 5xx Error Rate

sum(rate(http_requests_total{status=~"5.."}[5m]))

Calculates the rate of all 5xx errors.

📌 3.3 Error Rate Percentage

(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100

Returns the percentage of 5xx errors.

🔹 4. HTTP Response Time

📌 4.1 Average Response Time

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Calculates the average request duration.

📌 4.2 95th Percentile Response Time

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Returns the 95th percentile response time.

📌 4.3 Response Time by Status Code

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{status="500"}[5m]))

Returns the 95th percentile response time for 500 errors.

🔹 5. Active HTTP Connections

📌 5.1 Current Active Connections

nginx_connections_active

Shows the current number of active connections in Nginx.

📌 5.2 Connections Accepted Per Second

rate(nginx_connections_accepted[5m])

Shows the rate of accepted connections.

📌 5.3 Dropped Connections

rate(nginx_connections_dropped[5m])

Shows the rate of dropped connections.

🔹 6. HTTP Uptime & Availability

📌 6.1 Check Which Instances Are Down

up == 0

Lists all instances that are down.

📌 6.2 Percentage of Healthy Instances

(sum(up) / count(up)) * 100

Shows percentage of healthy instances.

🔹 7. HTTP Traffic & Data Transfer

📌 7.1 Total Data Sent

sum(rate(http_response_size_bytes_sum[5m]))

Shows total bytes sent by the server.

📌 7.2 Average Response Size

rate(http_response_size_bytes_sum[5m]) / rate(http_response_size_bytes_count[5m])

Calculates the average response size.

🔹 8. HTTP Queries with Advanced Filtering

📌 8.1 Filter Requests from a Specific Region

http_requests_total{region="us-west"}

Filters requests from the US-West region.

📌 8.2 Requests for a Specific API Endpoint

rate(http_requests_total{path="/api/v1/users"}[5m])

Filters only requests to /api/v1/users.

📌 8.3 Requests to a Specific Server

rate(http_requests_total{instance="server1:9090"}[5m])

Filters requests to server1.

🔹 9. HTTP Load Balancer Queries

📌 9.1 Requests per Instance

sum(rate(http_requests_total[5m])) by (instance)

Shows the number of requests per instance.

📌 9.2 Load Distribution Across Servers

sum(rate(http_requests_total[5m])) by (instance) / sum(rate(http_requests_total[5m])) * 100

Shows percentage load per server.

🔹 10. Slowest API Endpoints

📌 10.1 Find Slowest Endpoints (90th Percentile)

histogram_quantile(0.90, sum(rate(http_request_duration_seconds_bucket[5m])) by (path))

Lists slowest API endpoints by response time.

🔹 Summary

🚀 With these PromQL queries, you can:

✅ Monitor total requests, error rates, and latency.
✅ Filter by method, status code, region, and instance.
✅ Analyze load distribution and traffic trends.
✅ Track uptime, dropped connections, and slow endpoints.

💡 Use these in Grafana dashboards and alerts to monitor web performance effectively!

Keep in Mind

use / 1024 / 1024 at the last of promq query to show bh KB, MB, GB

popular Metrics

get free disk in GB node_filesystem_free_bytes{fstype="ext4"} / 1024 / 1024 / 1024

get free memry in MB node_memory_MemFree_bytes / 1024 / 1024

System information node_uname_info

The same PromQL queries will work in Grafana if you have Prometheus as a data source

cpu consuption in last hour

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1h])) * 100)

When monitoring infrastructure with Grafana and Prometheus, these are the most commonly used metrics for system health, performance, and resource utilization:

🔹 CPU Metrics

1. CPU Usage (%)

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__interval])) * 100)

Shows CPU utilization per instance.
Use mode="user" and mode="system" separately to see user/system CPU usage.

2. CPU Load Average

node_load1

node_load1, node_load5, node_load15 show system load over 1, 5, and 15 minutes.

🔹 Memory Metrics

3. Memory Usage (%)

100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

4. Memory Used (Bytes)

Memory used in MB

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024 / 1024

1024 KB 1024 MB 1024 GB

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)

Displays total used memory.

🔹 Disk Metrics

5. Disk Space Usage (%)

100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)

Shows percentage of disk space used for / (root partition).

6. Disk Read/Write Speed

rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])

Monitors disk read and write speed.

🔹 Network Metrics

7. Network Traffic (In & Out)

rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])

Tracks incoming (receive) and outgoing (transmit) network traffic.

8. Network Errors

rate(node_network_receive_errs_total[5m])
rate(node_network_transmit_errs_total[5m])

Helps identify network issues.

🔹 System & Process Metrics

9. Uptime (Seconds)

node_time_seconds - node_boot_time_seconds

Displays system uptime.

10. Running Processes

node_procs_running

Number of active processes.

💡 Recommended Dashboards in Grafana

System Overview Dashboard → CPU, Memory, Disk, Network
Node Exporter Full Dashboard → Prebuilt dashboards for detailed metrics
Alerting Dashboard → Set alerts for high CPU, memory, or disk usage

🔥 These metrics help in troubleshooting, capacity planning, and alerting. Want to set up alerts in Grafana? 🚀

Memory

Metrics: Memory

Visualization: Stat

Total: node_memory_MemTotal_bytes

Available: node_memory_MemAvailable_bytes

Used: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)

To find in 1024 KB 1024 MB 1024 GB

Eg. node_memory_MemTotal_bytes / 1024 / 1024 / 1024

Used in %: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

Total in %: As Total Memory in 100%

Free in %: 100 * (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

SWAP Memory

Total: node_memory_SwapTotal_bytes

Free: node_memory_SwapFree_bytes

Used: ( metric1 - metric2)

(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes )

Percentage for total Used = ((Total - Free ) / Total) * 100

Percentage of Used Swap = ((node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes) * 100

Disk

Metrics: Disk Visualization: Stat

Query Operations

Advance Operation

Great! You can perform advanced operations in Prometheus using arithmetic, rate functions, and aggregations. Let's go over some disk, memory, and percentage-based queries for better monitoring.

1. Disk Operations

a. Free Disk Space Calculation

To get free disk space, subtract used space from total disk space:

node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_used_bytes{mountpoint="/"}

node_filesystem_size_bytes: Total disk size.
node_filesystem_used_bytes: Used disk space.
Filters by mount point / (root partition).

b. Disk Usage Percentage

To get the percentage of disk used:

(node_filesystem_used_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

This gives the percentage of disk space used on the / partition.

c. Disk Read & Write Speed

Monitor disk read/write speed over time:

rate(node_disk_read_bytes_total[5m])

Calculates disk read speed over the last 5 minutes.

rate(node_disk_written_bytes_total[5m])

Calculates disk write speed over the last 5 minutes.

2. Memory Operations

a. Free Memory Calculation

To calculate free memory:

node_memory_MemTotal_bytes - node_memory_MemUsed_bytes

node_memory_MemTotal_bytes: Total system memory.
node_memory_MemUsed_bytes: Used memory.

b. Memory Usage Percentage

To get RAM usage percentage:

((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes) / node_memory_MemTotal_bytes) * 100

Subtracts free, cached, and buffered memory from total memory.
Gives the percentage of RAM used.

c. Memory Usage per Pod

For Kubernetes memory usage per pod:

(sum by (pod) (container_memory_usage_bytes{namespace="default"})) / 1024 / 1024

Converts to MB.

3. CPU Operations

a. CPU Usage Percentage

To calculate CPU usage per core:

(rate(node_cpu_seconds_total{mode="user"}[5m]) + rate(node_cpu_seconds_total{mode="system"}[5m])) * 100

Adds user and system CPU time over the last 5 minutes.
Multiplies by 100 for percentage.

For total CPU usage across all cores:

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Subtracts idle CPU time from 100%.

4. Network Operations

a. Network Bandwidth Usage

Measure incoming and outgoing network traffic:

rate(node_network_receive_bytes_total[5m])

Incoming traffic rate.

rate(node_network_transmit_bytes_total[5m])

Outgoing traffic rate.

b. Network Usage Percentage

Calculate percentage of network usage (assuming 1Gbps link = 125MBps):

(rate(node_network_transmit_bytes_total[5m]) / 125000000) * 100

Gives the percentage of bandwidth used.

5. Kubernetes-Specific Metrics

a. Pod Restart Rate

increase(kube_pod_container_status_restarts_total[1h])

Shows number of pod restarts in the last 1 hour.

b. Node CPU Usage Across Cluster

sum(rate(container_cpu_usage_seconds_total[5m])) by (node)

CPU usage per Kubernetes node.

c. Running vs. Pending Pods

count(kube_pod_status_phase{phase="Running"}) / count(kube_pod_status_phase)

Percentage of running pods in the cluster.

6. Combining Metrics

a. Percentage of CPU Used by Specific Pod

(rate(container_cpu_usage_seconds_total{pod="nginx-pod"}[5m]) / count(node_cpu_seconds_total{mode="idle"})) * 100

Filters CPU usage by nginx-pod.
Divides by total CPU cores for percentage.

Final Thoughts

Use rate() for speed/throughput calculations (network, disk, CPU).
Use arithmetic operations (+, -, *, /) for advanced calculations.
Use sum by(), avg(), count(), and increase() for aggregations.

Let me know if you need more! 🚀

Cadvisor Labels

Good question! cAdvisor (Container Advisor) is a tool that collects, aggregates, and exports container resource usage and performance metrics. When Prometheus scrapes metrics from cAdvisor, it automatically includes several default labels (filters) that you can use to filter and query container-specific data.

Default Labels (Filters) Exposed by cAdvisor

cAdvisor provides detailed container metrics with labels that help filter data based on container name, pod, namespace, and resource types.

1. Common Labels (Filters) in cAdvisor Metrics

Label Name

Description

id

The cgroup path, usually representing the container ID.

name

The short name of the container (e.g., /nginx).

container_label_io_kubernetes_pod_name

The pod name associated with the container.

container_label_io_kubernetes_pod_namespace

The Kubernetes namespace where the pod is running.

container_label_io_kubernetes_container_name

The actual name of the container inside a Kubernetes pod.

container_label_io_kubernetes_node_name

The Kubernetes node hosting the container.

container_label_io_kubernetes_pod_uid

The unique pod UID assigned by Kubernetes.

image

The container image name (e.g., nginx:latest).

pod

The Kubernetes pod name (alternative to container_label_io_kubernetes_pod_name).

namespace

The Kubernetes namespace (alternative to container_label_io_kubernetes_pod_namespace).

container

The actual container name (alternative to container_label_io_kubernetes_container_name).

cpu

The CPU core number being monitored.

device

Disk or network device name (e.g., eth0, sda).

2. Example Metrics and Queries

These labels are useful when querying container metrics in Prometheus.

a. CPU Usage per Container

container_cpu_usage_seconds_total{container="nginx"}

Filters by container name (nginx).
Metric shows total CPU usage in seconds.

b. Memory Usage per Pod

container_memory_usage_bytes{pod="nginx-deployment", namespace="default"}

Filters by pod name (nginx-deployment) in the default namespace.
Metric shows memory usage in bytes.

c. Disk I/O per Container

container_fs_reads_total{container="nginx", device="sda"}

Filters by container nginx and disk sda.
Metric shows total disk reads.

d. Network Usage per Pod

container_network_transmit_bytes_total{pod="frontend", namespace="production"}

Filters by pod name (frontend) in namespace production.
Metric shows total transmitted network bytes.

e. CPU Usage per Kubernetes Node

container_cpu_usage_seconds_total{container="", container_label_io_kubernetes_node_name="worker-node-1"}

Shows total CPU usage on worker-node-1.
Filters out non-containerized processes (container="").

3. Summary of Key Labels in cAdvisor

Label

Purpose

id

Container ID or cgroup path

name

Short container name

container

Actual container name inside the pod

pod

Pod name in Kubernetes

namespace

Namespace in Kubernetes

container_label_io_kubernetes_pod_name

Pod name (long form)

container_label_io_kubernetes_pod_namespace

Namespace (long form)

container_label_io_kubernetes_node_name

Kubernetes node name

image

Container image

cpu

CPU core ID

device

Disk or network device

These labels allow you to filter cAdvisor metrics effectively when monitoring Kubernetes or Docker containers.

4. How Prometheus Scrapes cAdvisor Metrics

To collect these metrics, Prometheus must scrape cAdvisor. Example Prometheus scrape config for cAdvisor:

scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['localhost:8080']

Job name = cadvisor
Instance = localhost:8080 (where cAdvisor is running)

Once configured, Prometheus will store all container metrics with the labels above, enabling filtering by pod, namespace, container, and more.

5. Example Query for Multi-Filter

To get CPU usage only for containers in the backend namespace running on worker-node-2:

container_cpu_usage_seconds_total{namespace="backend", container_label_io_kubernetes_node_name="worker-node-2"}

Final Thoughts

cAdvisor automatically exposes several useful labels related to containers, pods, namespaces, and nodes. These labels allow you to filter metrics efficiently in Prometheus queries, helping with Kubernetes monitoring, troubleshooting, and optimization.

Let me know if you need more details! 🚀

Node Exporter label

Ah, I see what you're asking now. You're asking about the default filters (or labels) that Node Exporter automatically exposes, as well as those you can configure when setting up Prometheus to scrape metrics from it. These filters are typically set when you define your Prometheus scrape configuration, such as in the prometheus.yml file, and they relate to how Prometheus labels and organizes the scraped metrics.

Default Filters Exposed by Node Exporter

When Prometheus scrapes metrics from the Node Exporter, it adds several default labels to the metrics automatically. These labels can be used for filtering in your Prometheus queries. The most common default filters (labels) are:

job:
- This label represents the job name defined in your Prometheus scrape configuration (typically in the prometheus.yml file).
- Example: If your Prometheus scrape config is scraping metrics from Node Exporter, it might look like:
  scrape_configs: - job_name: 'node' static_configs: - targets: ['localhost:9100']
  In this case, job="node" will be added automatically to all metrics scraped from Node Exporter.
instance:
- This label represents the instance of the target being scraped, which is typically the hostname or IP address of the machine exposing the metrics. This is automatically added by Prometheus based on the target’s address.
- Example: For a target localhost:9100, the instance label will be instance="localhost:9100".
__name__:
- This is a built-in label used to represent the metric name (e.g., node_cpu_seconds_total, node_memory_Free_bytes, etc.). While this isn't explicitly defined in prometheus.yml, it's a fundamental label for querying metrics.

Example of a Prometheus Configuration (`prometheus.yml`)

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100', '192.168.1.100:9100']

With the above configuration, the following labels will be applied:

job="node"
instance="localhost:9100"
instance="192.168.1.100:9100"

Additional Labels and Filters

In addition to the default labels like job and instance, Node Exporter exposes other labels for filtering based on the type of metric being collected.

1. CPU Metrics (e.g., node_cpu_seconds_total)

mode: The CPU mode (e.g., user, system, idle).
- Example: node_cpu_seconds_total{mode="user"}

2. Network Metrics (e.g., node_network_receive_bytes_total)

device: The network interface (e.g., eth0, lo).
- Example: node_network_receive_bytes_total{device="eth0"}

3. Filesystem Metrics (e.g., node_filesystem_free_bytes)

mountpoint: The mount path (e.g., /, /mnt/data).
- Example: node_filesystem_free_bytes{mountpoint="/"}
fstype: The filesystem type (e.g., ext4, xfs).
- Example: node_filesystem_free_bytes{fstype="ext4"}

4. Disk Metrics (e.g., node_disk_read_bytes_total)

device: The disk device (e.g., sda, sdb).
- Example: node_disk_read_bytes_total{device="sda"}

5. Memory Metrics (e.g., node_memory_MemTotal_bytes)

state: The state of memory (e.g., active, free).
- Example: node_memory_Active_bytes{state="active"}

6. Uptime Metrics (e.g., node_boot_time_seconds)

instance: The instance label, which is often used in combination with other labels to get more granular data.

Filtering by Labels

You can use these labels to filter or group metrics in your Prometheus queries. Here are a few examples of how you'd query the metrics exposed by Node Exporter using those labels:

By job and instance (e.g., for CPU usage):

node_cpu_seconds_total{job="node", instance="localhost:9100", mode="user"}

By network interface:

node_network_receive_bytes_total{device="eth0"}

By filesystem and mountpoint:

node_filesystem_free_bytes{mountpoint="/"}

By memory usage:

node_memory_MemTotal_bytes{state="free"}

By CPU and mode:
```
node_cpu_seconds_total{mode="idle"}
```

Summary of Common Labels from Node Exporter Metrics

job: Name of the scraping job (e.g., node).
instance: Instance of the target (typically the hostname or IP address of the target).
mode: Mode of CPU usage (e.g., user, system, idle).
device: Disk or network interface (e.g., sda, eth0).
mountpoint: The filesystem mount path (e.g., /).
fstype: The type of filesystem (e.g., ext4).
state: State of memory (e.g., active, free).
sensor: For hardware sensors, e.g., cpu for CPU temperature.

These labels are added automatically by Prometheus when scraping metrics from Node Exporter, and you can use them to filter and group metrics in your Prometheus queries for detailed analysis.

Exporter

Prometheus officially maintains a set of exporters that are considered reliable and actively supported by the Prometheus team. These are listed on the official Prometheus GitHub and website.

🔹 Officially Maintained Prometheus Exporters

These exporters are directly managed by the Prometheus team and are considered stable.

Exporter

Description

Repository

Node Exporter

Exposes system-level metrics (CPU, memory, disk, etc.)

GitHub

Blackbox Exporter

Probes HTTP, HTTPS, DNS, TCP, and ICMP endpoints

GitHub

SNMP Exporter

Exposes SNMP metrics for network devices

GitHub

Pushgateway

Accepts push-based metrics (for short-lived jobs)

GitHub

🔹 Prometheus Community Exporters

While not directly managed by Prometheus, these exporters are part of the Prometheus Community and are well-maintained.

Exporter

Description

Repository

MySQL Exporter

Exposes MySQL/MariaDB metrics

GitHub

PostgreSQL Exporter

Exposes PostgreSQL database metrics

GitHub

Redis Exporter

Exposes Redis database metrics

GitHub

Elasticsearch Exporter

Exposes Elasticsearch cluster metrics

GitHub

HAProxy Exporter

Exposes HAProxy load balancer metrics

GitHub

Nginx Exporter

Exposes Nginx web server metrics

GitHub

Windows Exporter

Exposes Windows system metrics

GitHub

🔹 Where to Find Officially Managed Exporters?

🔗 Prometheus Official Exporters: https://prometheus.io/docs/instrumenting/exporters/

If you need help setting up any exporter, let me know! 🚀

Prometheus supports a wide range of exporters to collect metrics from different systems. Below is a list of commonly used Prometheus exporters categorized by their use case:

1. System & Infrastructure Monitoring

Exporter

Description

Repository

Node Exporter

System metrics (CPU, memory, disk, etc.)

GitHub

cAdvisor

Container and Docker metrics

GitHub

Windows Exporter

Windows system metrics

GitHub

SNMP Exporter

SNMP-enabled devices (networking hardware)

GitHub

2. Cloud & Virtualization Monitoring

Exporter

Description

Repository

AWS CloudWatch Exporter

AWS CloudWatch metrics

GitHub

Azure Metrics Exporter

Azure monitoring data

GitHub

GCP Stackdriver Exporter

Google Cloud Stackdriver metrics

GitHub

vSphere Exporter

VMware vSphere metrics

GitHub

3. Database Monitoring

Exporter

Description

Repository

PostgreSQL Exporter

PostgreSQL performance metrics

GitHub

MySQL Exporter

MySQL/MariaDB monitoring

GitHub

MongoDB Exporter

MongoDB metrics

GitHub

Redis Exporter

Redis database monitoring

GitHub

Elasticsearch Exporter

Elasticsearch cluster monitoring

GitHub

4. Messaging & Streaming Services

Exporter

Description

Repository

Kafka Exporter

Apache Kafka monitoring

GitHub

RabbitMQ Exporter

RabbitMQ queue monitoring

GitHub

NATS Exporter

NATS messaging system metrics

GitHub

5. Web & API Monitoring

Exporter

Description

Repository

Blackbox Exporter

Probes HTTP, HTTPS, DNS, TCP, and ICMP endpoints

GitHub

HAProxy Exporter

HAProxy metrics

GitHub

NGINX Exporter

NGINX metrics

GitHub

Apache Exporter

Apache HTTP server monitoring

GitHub

6. Storage & Backup Monitoring

Exporter

Description

Repository

Ceph Exporter

Ceph storage cluster metrics

GitHub

MinIO Exporter

MinIO object storage monitoring

GitHub

ZFS Exporter

ZFS storage pool monitoring

GitHub

7. Network & Security Monitoring

Exporter

Description

Repository

cilium-agent Exporter

Cilium network security monitoring

GitHub

Istio Telemetry Exporter

Istio service mesh metrics

GitHub

Flow Exporter

NetFlow and sFlow metrics

GitHub

8. Kubernetes Monitoring

Exporter

Description

Repository

kube-state-metrics

Kubernetes object state metrics

GitHub

Kubernetes Metrics Server

Resource utilization metrics

GitHub

cAdvisor

Container metrics (also used for Kubernetes)

GitHub

9. Custom Exporters

You can write custom exporters in Go, Python, Node.js, etc. if an official one is unavailable. Some tools to help:

prometheus_client (Python): GitHub
prometheus_client (Go): GitHub

How to Find More Exporters?

You can find additional official and community exporters here: 🔗 Prometheus Exporters List: https://prometheus.io/docs/instrumenting/exporters/

Need help setting up any exporter? Let me know! 🚀

PreviousPrometheus NextAlertManager

Last updated 10 months ago

hashtag📌 Complete PromQL Tutorial – Querying Metrics in Prometheus

hashtag🔹 1. Basics of PromQL

hashtagExample:

hashtag🔹 2. Querying Time-Series Data

hashtag📌 2.1 Instant Vectors (Single Data Point)

hashtagExample 1: Fetch all values of a metric

hashtagExample 2: Filter using labels

hashtag📌 2.2 Range Vectors (Multiple Data Points)

hashtagExample: Last 5 minutes of request count

hashtagExample: CPU Usage in the last 10 minutes

hashtag🔹 3. Aggregation Operators

hashtag📌 3.1 Sum (Total Count)

hashtag📌 3.2 Average

hashtag📌 3.3 Maximum and Minimum

hashtag📌 3.4 Count (Number of Instances Reporting)

hashtag🔹 4. Mathematical Operations

hashtag📌 4.1 Rate of Increase

hashtag📌 4.2 Convert Bytes to MB

hashtag📌 4.3 CPU Usage Percentage

hashtag🔹 5. Time-Based Queries

hashtag📌 5.1 Query Data at a Specific Time

hashtag📌 5.2 Fetch Metrics from 1 Hour Ago

hashtag🔹 6. Histogram Queries

hashtag📌 6.1 Histogram Quantiles

hashtag📌 6.2 Average Response Time

hashtag🔹 7. Advanced Queries

hashtag📌 7.1 Conditional Filtering

hashtag📌 7.2 Comparing Two Metrics

hashtag📌 7.3 Calculating Error Rate

hashtag📌 7.4 Uptime Check

hashtag🔹 8. Recording Rules (Performance Optimization)

hashtagExample: Create a Custom Query

hashtag🔹 9. Alerts in Prometheus

hashtag📌 9.1 Alert When CPU is Above 90%

hashtag🔹 10. Exporting Data from Prometheus

hashtag📌 10.1 Query Data via API

hashtag📌 10.2 Export Data to CSV

hashtag🔹 Summary

hashtag🔹 1. Basic HTTP Metrics

hashtag📌 1.1 Total HTTP Requests

hashtag📌 1.2 Total HTTP Requests with Filter

hashtag🔹 2. HTTP Request Rate (Requests per Second)

hashtag📌 2.1 Requests per Second (RPS)

hashtag📌 2.2 Requests per Second by Method

hashtag📌 2.3 Requests per Second by Status Code

hashtag🔹 3. HTTP Error Rate (4xx & 5xx Errors)

hashtag📌 3.1 4xx Error Rate

hashtag📌 3.2 5xx Error Rate

hashtag📌 3.3 Error Rate Percentage

hashtag🔹 4. HTTP Response Time

hashtag📌 4.1 Average Response Time

hashtag📌 4.2 95th Percentile Response Time

hashtag📌 4.3 Response Time by Status Code

hashtag🔹 5. Active HTTP Connections

hashtag📌 5.1 Current Active Connections

hashtag📌 5.2 Connections Accepted Per Second

hashtag📌 5.3 Dropped Connections

hashtag🔹 6. HTTP Uptime & Availability

hashtag📌 6.1 Check Which Instances Are Down

hashtag📌 6.2 Percentage of Healthy Instances

hashtag🔹 7. HTTP Traffic & Data Transfer

hashtag📌 7.1 Total Data Sent

hashtag📌 7.2 Average Response Size

hashtag🔹 8. HTTP Queries with Advanced Filtering

hashtag📌 8.1 Filter Requests from a Specific Region

hashtag📌 8.2 Requests for a Specific API Endpoint

hashtag📌 8.3 Requests to a Specific Server

hashtag🔹 9. HTTP Load Balancer Queries

hashtag📌 9.1 Requests per Instance

hashtag📌 9.2 Load Distribution Across Servers

hashtag🔹 10. Slowest API Endpoints

hashtag📌 10.1 Find Slowest Endpoints (90th Percentile)

hashtag🔹 Summary

hashtagcpu consuption in last hour

hashtag🔹 CPU Metrics

hashtag1. CPU Usage (%)

hashtag2. CPU Load Average

hashtag🔹 Memory Metrics

hashtag3. Memory Usage (%)

hashtag4. Memory Used (Bytes)

📌 Complete PromQL Tutorial – Querying Metrics in Prometheus

🔹 1. Basics of PromQL

Example:

🔹 2. Querying Time-Series Data

📌 2.1 Instant Vectors (Single Data Point)

Example 1: Fetch all values of a metric

Example 2: Filter using labels

📌 2.2 Range Vectors (Multiple Data Points)

Example: Last 5 minutes of request count

Example: CPU Usage in the last 10 minutes

🔹 3. Aggregation Operators

📌 3.1 Sum (Total Count)

📌 3.2 Average

📌 3.3 Maximum and Minimum

📌 3.4 Count (Number of Instances Reporting)

🔹 4. Mathematical Operations

📌 4.1 Rate of Increase

📌 4.2 Convert Bytes to MB

📌 4.3 CPU Usage Percentage

🔹 5. Time-Based Queries

📌 5.1 Query Data at a Specific Time

📌 5.2 Fetch Metrics from 1 Hour Ago

🔹 6. Histogram Queries

📌 6.1 Histogram Quantiles

📌 6.2 Average Response Time

🔹 7. Advanced Queries

📌 7.1 Conditional Filtering

📌 7.2 Comparing Two Metrics

📌 7.3 Calculating Error Rate

📌 7.4 Uptime Check

🔹 8. Recording Rules (Performance Optimization)

Example: Create a Custom Query

🔹 9. Alerts in Prometheus

📌 9.1 Alert When CPU is Above 90%

🔹 10. Exporting Data from Prometheus

📌 10.1 Query Data via API

📌 10.2 Export Data to CSV

🔹 Summary

🔹 1. Basic HTTP Metrics

📌 1.1 Total HTTP Requests

📌 1.2 Total HTTP Requests with Filter

🔹 2. HTTP Request Rate (Requests per Second)

📌 2.1 Requests per Second (RPS)

📌 2.2 Requests per Second by Method

📌 2.3 Requests per Second by Status Code

🔹 3. HTTP Error Rate (4xx & 5xx Errors)

📌 3.1 4xx Error Rate

📌 3.2 5xx Error Rate

📌 3.3 Error Rate Percentage

🔹 4. HTTP Response Time

📌 4.1 Average Response Time

📌 4.2 95th Percentile Response Time

📌 4.3 Response Time by Status Code

🔹 5. Active HTTP Connections

📌 5.1 Current Active Connections

📌 5.2 Connections Accepted Per Second

📌 5.3 Dropped Connections

🔹 6. HTTP Uptime & Availability

📌 6.1 Check Which Instances Are Down

📌 6.2 Percentage of Healthy Instances

🔹 7. HTTP Traffic & Data Transfer

📌 7.1 Total Data Sent

📌 7.2 Average Response Size

🔹 8. HTTP Queries with Advanced Filtering

📌 8.1 Filter Requests from a Specific Region

📌 8.2 Requests for a Specific API Endpoint

📌 8.3 Requests to a Specific Server

🔹 9. HTTP Load Balancer Queries

📌 9.1 Requests per Instance

📌 9.2 Load Distribution Across Servers

🔹 10. Slowest API Endpoints

📌 10.1 Find Slowest Endpoints (90th Percentile)

🔹 Summary

cpu consuption in last hour

🔹 CPU Metrics

1. CPU Usage (%)

2. CPU Load Average

🔹 Memory Metrics

3. Memory Usage (%)

4. Memory Used (Bytes)