PromQL
View all Queries from mode-expo:9100/metrics do ctrl+f and search
All
📌 Complete PromQL Tutorial – Querying Metrics in Prometheus
PromQL (Prometheus Query Language) is used to query and analyze time-series metrics stored in Prometheus. It allows us to filter, aggregate, and visualize data from applications, servers, and infrastructure.
🔹 1. Basics of PromQL
Prometheus stores metrics in time-series format, consisting of:
Metric Name – Represents a resource, e.g.,
http_requests_totalLabels – Key-value pairs for filtering, e.g.,
method="GET",status="200"Timestamps – Each metric has a timestamp
Example:
http_requests_total{method="GET", status="200"}http_requests_total→ Total HTTP requests{method="GET", status="200"}→ Filters by method and statusResult: Returns all GET requests with status
200.
🔹 2. Querying Time-Series Data
📌 2.1 Instant Vectors (Single Data Point)
An instant vector represents the latest value of a metric.
Example 1: Fetch all values of a metric
http_requests_totalReturns the current value for all instances.
Example 2: Filter using labels
http_requests_total{method="POST"}Returns only
POSTrequests.
📌 2.2 Range Vectors (Multiple Data Points)
A range vector fetches multiple data points over time.
Example: Last 5 minutes of request count
http_requests_total[5m]Returns all request counts from the last 5 minutes.
Example: CPU Usage in the last 10 minutes
node_cpu_seconds_total[10m]Returns CPU usage data from the last 10 minutes.
🔹 3. Aggregation Operators
Aggregation helps in calculating totals, averages, min, max, etc.
📌 3.1 Sum (Total Count)
sum(http_requests_total)Returns the total count of HTTP requests across all instances.
📌 3.2 Average
avg(node_memory_Active_bytes)Returns the average active memory usage.
📌 3.3 Maximum and Minimum
max(node_cpu_seconds_total)
min(node_cpu_seconds_total)Finds the highest and lowest CPU usage.
📌 3.4 Count (Number of Instances Reporting)
count(node_cpu_seconds_total)Counts the number of instances reporting CPU metrics.
🔹 4. Mathematical Operations
PromQL supports basic math operations on metrics.
📌 4.1 Rate of Increase
rate(http_requests_total[5m])Calculates requests per second over the last 5 minutes.
📌 4.2 Convert Bytes to MB
node_memory_Active_bytes / (1024 * 1024)Converts memory from bytes to megabytes.
📌 4.3 CPU Usage Percentage
100 - (avg by (instance)(rate(node_cpu_seconds_total[5m])) * 100)Calculates CPU idle percentage.
🔹 5. Time-Based Queries
PromQL supports time functions to filter historical data.
📌 5.1 Query Data at a Specific Time
http_requests_total @ 1694000000Fetches the metric at Unix timestamp 1694000000.
📌 5.2 Fetch Metrics from 1 Hour Ago
http_requests_total offset 1hReturns values from 1 hour ago.
🔹 6. Histogram Queries
Prometheus stores histograms with:
_count → Total occurrences
_sum → Total sum of values
_bucket → Buckets for distribution analysis
📌 6.1 Histogram Quantiles
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))Returns the 95th percentile response time.
📌 6.2 Average Response Time
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])Calculates the average request duration.
🔹 7. Advanced Queries
📌 7.1 Conditional Filtering
node_cpu_seconds_total > 100Returns CPU metrics greater than 100.
rate(http_requests_total[5m]) > 10Returns endpoints with more than 10 requests/sec.
📌 7.2 Comparing Two Metrics
http_requests_total / http_requests_total{method="GET"}Compares total requests vs. GET requests.
📌 7.3 Calculating Error Rate
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100Calculates the percentage of 5xx errors.
📌 7.4 Uptime Check
up == 0Returns all down instances.
🔹 8. Recording Rules (Performance Optimization)
Instead of querying real-time data, we can precompute metrics.
Example: Create a Custom Query
groups:
- name: custom_rules
interval: 30s
rules:
- record: job:http_requests:rate5m
expr: rate(http_requests_total[5m])Saves the
rate(http_requests_total[5m])asjob:http_requests:rate5m.
🔹 9. Alerts in Prometheus
📌 9.1 Alert When CPU is Above 90%
groups:
- name: cpu_alerts
rules:
- alert: HighCPUUsage
expr: (100 - avg by (instance)(rate(node_cpu_seconds_total[5m])) * 100) > 90
for: 2m
labels:
severity: critical
annotations:
summary: "High CPU usage on {{ $labels.instance }}"Triggers an alert if CPU usage exceeds 90% for 2 minutes.
🔹 10. Exporting Data from Prometheus
📌 10.1 Query Data via API
curl "http://localhost:9090/api/v1/query?query=rate(http_requests_total[5m])"Fetches request rate data.
📌 10.2 Export Data to CSV
curl -G "http://localhost:9090/api/v1/query_range" \
--data-urlencode "query=node_cpu_seconds_total" \
--data-urlencode "start=1694000000" \
--data-urlencode "end=1694003600" \
--data-urlencode "step=60s" | jq '.data.result[] | {metric, values}'Extracts metrics in CSV format.
🔹 Summary
✅ Metric Types → Instant vectors (single data point), Range vectors (time-based data).
✅ Filtering & Aggregation → sum(), avg(), max(), min(), rate().
✅ Histogram Analysis → histogram_quantile() for latency tracking.
✅ Time Manipulation → offset, @ for historical data.
✅ Alerts & Automation → Use recording rules and alerts.
🚀 With this, you can fully leverage PromQL to analyze, visualize, and optimize your system performance in Prometheus!
http
Here are the most commonly used PromQL queries for HTTP metrics along with filters:
🔹 1. Basic HTTP Metrics
📌 1.1 Total HTTP Requests
http_requests_totalReturns total HTTP requests for all endpoints.
📌 1.2 Total HTTP Requests with Filter
http_requests_total{method="GET"}Filters only GET requests.
http_requests_total{status="500"}Filters only 500 Internal Server Error responses.
🔹 2. HTTP Request Rate (Requests per Second)
📌 2.1 Requests per Second (RPS)
rate(http_requests_total[5m])Calculates the number of requests per second over the last 5 minutes.
📌 2.2 Requests per Second by Method
rate(http_requests_total{method="POST"}[5m])Calculates POST request rate.
📌 2.3 Requests per Second by Status Code
rate(http_requests_total{status="200"}[5m])Calculates only 200 OK responses.
🔹 3. HTTP Error Rate (4xx & 5xx Errors)
📌 3.1 4xx Error Rate
sum(rate(http_requests_total{status=~"4.."}[5m]))Calculates the rate of all 4xx errors.
📌 3.2 5xx Error Rate
sum(rate(http_requests_total{status=~"5.."}[5m]))Calculates the rate of all 5xx errors.
📌 3.3 Error Rate Percentage
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) * 100Returns the percentage of 5xx errors.
🔹 4. HTTP Response Time
📌 4.1 Average Response Time
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])Calculates the average request duration.
📌 4.2 95th Percentile Response Time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))Returns the 95th percentile response time.
📌 4.3 Response Time by Status Code
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{status="500"}[5m]))Returns the 95th percentile response time for 500 errors.
🔹 5. Active HTTP Connections
📌 5.1 Current Active Connections
nginx_connections_activeShows the current number of active connections in Nginx.
📌 5.2 Connections Accepted Per Second
rate(nginx_connections_accepted[5m])Shows the rate of accepted connections.
📌 5.3 Dropped Connections
rate(nginx_connections_dropped[5m])Shows the rate of dropped connections.
🔹 6. HTTP Uptime & Availability
📌 6.1 Check Which Instances Are Down
up == 0Lists all instances that are down.
📌 6.2 Percentage of Healthy Instances
(sum(up) / count(up)) * 100Shows percentage of healthy instances.
🔹 7. HTTP Traffic & Data Transfer
📌 7.1 Total Data Sent
sum(rate(http_response_size_bytes_sum[5m]))Shows total bytes sent by the server.
📌 7.2 Average Response Size
rate(http_response_size_bytes_sum[5m]) / rate(http_response_size_bytes_count[5m])Calculates the average response size.
🔹 8. HTTP Queries with Advanced Filtering
📌 8.1 Filter Requests from a Specific Region
http_requests_total{region="us-west"}Filters requests from the US-West region.
📌 8.2 Requests for a Specific API Endpoint
rate(http_requests_total{path="/api/v1/users"}[5m])Filters only requests to /api/v1/users.
📌 8.3 Requests to a Specific Server
rate(http_requests_total{instance="server1:9090"}[5m])Filters requests to server1.
🔹 9. HTTP Load Balancer Queries
📌 9.1 Requests per Instance
sum(rate(http_requests_total[5m])) by (instance)Shows the number of requests per instance.
📌 9.2 Load Distribution Across Servers
sum(rate(http_requests_total[5m])) by (instance) / sum(rate(http_requests_total[5m])) * 100Shows percentage load per server.
🔹 10. Slowest API Endpoints
📌 10.1 Find Slowest Endpoints (90th Percentile)
histogram_quantile(0.90, sum(rate(http_request_duration_seconds_bucket[5m])) by (path))Lists slowest API endpoints by response time.
🔹 Summary
🚀 With these PromQL queries, you can:
✅ Monitor total requests, error rates, and latency.
✅ Filter by method, status code, region, and instance.
✅ Analyze load distribution and traffic trends.
✅ Track uptime, dropped connections, and slow endpoints.
💡 Use these in Grafana dashboards and alerts to monitor web performance effectively!
popular Metrics
get free disk in GB node_filesystem_free_bytes{fstype="ext4"} / 1024 / 1024 / 1024
get free memry in MB node_memory_MemFree_bytes / 1024 / 1024
System information node_uname_info
The same PromQL queries will work in Grafana if you have Prometheus as a data source
cpu consuption in last hour
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1h])) * 100)
When monitoring infrastructure with Grafana and Prometheus, these are the most commonly used metrics for system health, performance, and resource utilization:
🔹 CPU Metrics
1. CPU Usage (%)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[$__interval])) * 100)Shows CPU utilization per instance.
Use
mode="user"andmode="system"separately to see user/system CPU usage.
2. CPU Load Average
node_load1node_load1,node_load5,node_load15show system load over 1, 5, and 15 minutes.
🔹 Memory Metrics
3. Memory Usage (%)
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))4. Memory Used (Bytes)
Memory used in MB
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024 / 1024
1024 KB 1024 MB 1024 GB
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)Displays total used memory.
🔹 Disk Metrics
5. Disk Space Usage (%)
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)Shows percentage of disk space used for
/(root partition).
6. Disk Read/Write Speed
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])Monitors disk read and write speed.
🔹 Network Metrics
7. Network Traffic (In & Out)
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])Tracks incoming (
receive) and outgoing (transmit) network traffic.
8. Network Errors
rate(node_network_receive_errs_total[5m])
rate(node_network_transmit_errs_total[5m])Helps identify network issues.
🔹 System & Process Metrics
9. Uptime (Seconds)
node_time_seconds - node_boot_time_secondsDisplays system uptime.
10. Running Processes
node_procs_runningNumber of active processes.
💡 Recommended Dashboards in Grafana
System Overview Dashboard → CPU, Memory, Disk, Network
Node Exporter Full Dashboard → Prebuilt dashboards for detailed metrics
Alerting Dashboard → Set alerts for high CPU, memory, or disk usage
🔥 These metrics help in troubleshooting, capacity planning, and alerting. Want to set up alerts in Grafana? 🚀
Memory
Metrics: Memory
Visualization: Stat
Total: node_memory_MemTotal_bytes
Available: node_memory_MemAvailable_bytes
Used: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
To find in 1024 KB 1024 MB 1024 GB
Eg. node_memory_MemTotal_bytes / 1024 / 1024 / 1024
Used in %: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))
Total in %: As Total Memory in 100%
Free in %: 100 * (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
SWAP Memory
Total: node_memory_SwapTotal_bytes
Free: node_memory_SwapFree_bytes
Used: ( metric1 - metric2)
(node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes )
Percentage for total Used = ((Total - Free ) / Total) * 100
Percentage of Used Swap = ((node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes) * 100
Advance Operation
Great! You can perform advanced operations in Prometheus using arithmetic, rate functions, and aggregations. Let's go over some disk, memory, and percentage-based queries for better monitoring.
1. Disk Operations
a. Free Disk Space Calculation
To get free disk space, subtract used space from total disk space:
node_filesystem_size_bytes: Total disk size.node_filesystem_used_bytes: Used disk space.Filters by mount point
/(root partition).
b. Disk Usage Percentage
To get the percentage of disk used:
This gives the percentage of disk space used on the
/partition.
c. Disk Read & Write Speed
Monitor disk read/write speed over time:
Calculates disk read speed over the last 5 minutes.
Calculates disk write speed over the last 5 minutes.
2. Memory Operations
a. Free Memory Calculation
To calculate free memory:
node_memory_MemTotal_bytes: Total system memory.node_memory_MemUsed_bytes: Used memory.
b. Memory Usage Percentage
To get RAM usage percentage:
Subtracts free, cached, and buffered memory from total memory.
Gives the percentage of RAM used.
c. Memory Usage per Pod
For Kubernetes memory usage per pod:
Converts to MB.
3. CPU Operations
a. CPU Usage Percentage
To calculate CPU usage per core:
Adds user and system CPU time over the last 5 minutes.
Multiplies by 100 for percentage.
For total CPU usage across all cores:
Subtracts idle CPU time from 100%.
4. Network Operations
a. Network Bandwidth Usage
Measure incoming and outgoing network traffic:
Incoming traffic rate.
Outgoing traffic rate.
b. Network Usage Percentage
Calculate percentage of network usage (assuming 1Gbps link = 125MBps):
Gives the percentage of bandwidth used.
5. Kubernetes-Specific Metrics
a. Pod Restart Rate
Shows number of pod restarts in the last 1 hour.
b. Node CPU Usage Across Cluster
CPU usage per Kubernetes node.
c. Running vs. Pending Pods
Percentage of running pods in the cluster.
6. Combining Metrics
a. Percentage of CPU Used by Specific Pod
Filters CPU usage by
nginx-pod.Divides by total CPU cores for percentage.
Final Thoughts
Use
rate()for speed/throughput calculations (network, disk, CPU).Use arithmetic operations (
+,-,*,/) for advanced calculations.Use
sum by(),avg(),count(), andincrease()for aggregations.
Let me know if you need more! 🚀
Cadvisor Labels
Good question! cAdvisor (Container Advisor) is a tool that collects, aggregates, and exports container resource usage and performance metrics. When Prometheus scrapes metrics from cAdvisor, it automatically includes several default labels (filters) that you can use to filter and query container-specific data.
Default Labels (Filters) Exposed by cAdvisor
cAdvisor provides detailed container metrics with labels that help filter data based on container name, pod, namespace, and resource types.
1. Common Labels (Filters) in cAdvisor Metrics
id
The cgroup path, usually representing the container ID.
name
The short name of the container (e.g., /nginx).
container_label_io_kubernetes_pod_name
The pod name associated with the container.
container_label_io_kubernetes_pod_namespace
The Kubernetes namespace where the pod is running.
container_label_io_kubernetes_container_name
The actual name of the container inside a Kubernetes pod.
container_label_io_kubernetes_node_name
The Kubernetes node hosting the container.
container_label_io_kubernetes_pod_uid
The unique pod UID assigned by Kubernetes.
image
The container image name (e.g., nginx:latest).
pod
The Kubernetes pod name (alternative to container_label_io_kubernetes_pod_name).
namespace
The Kubernetes namespace (alternative to container_label_io_kubernetes_pod_namespace).
container
The actual container name (alternative to container_label_io_kubernetes_container_name).
cpu
The CPU core number being monitored.
device
Disk or network device name (e.g., eth0, sda).
2. Example Metrics and Queries
These labels are useful when querying container metrics in Prometheus.
a. CPU Usage per Container
Filters by container name (
nginx).Metric shows total CPU usage in seconds.
b. Memory Usage per Pod
Filters by pod name (
nginx-deployment) in the default namespace.Metric shows memory usage in bytes.
c. Disk I/O per Container
Filters by container
nginxand disksda.Metric shows total disk reads.
d. Network Usage per Pod
Filters by pod name (
frontend) in namespaceproduction.Metric shows total transmitted network bytes.
e. CPU Usage per Kubernetes Node
Shows total CPU usage on
worker-node-1.Filters out non-containerized processes (
container="").
3. Summary of Key Labels in cAdvisor
id
Container ID or cgroup path
name
Short container name
container
Actual container name inside the pod
pod
Pod name in Kubernetes
namespace
Namespace in Kubernetes
container_label_io_kubernetes_pod_name
Pod name (long form)
container_label_io_kubernetes_pod_namespace
Namespace (long form)
container_label_io_kubernetes_node_name
Kubernetes node name
image
Container image
cpu
CPU core ID
device
Disk or network device
These labels allow you to filter cAdvisor metrics effectively when monitoring Kubernetes or Docker containers.
4. How Prometheus Scrapes cAdvisor Metrics
To collect these metrics, Prometheus must scrape cAdvisor. Example Prometheus scrape config for cAdvisor:
Job name =
cadvisorInstance =
localhost:8080(where cAdvisor is running)
Once configured, Prometheus will store all container metrics with the labels above, enabling filtering by pod, namespace, container, and more.
5. Example Query for Multi-Filter
To get CPU usage only for containers in the backend namespace running on worker-node-2:
Final Thoughts
cAdvisor automatically exposes several useful labels related to containers, pods, namespaces, and nodes. These labels allow you to filter metrics efficiently in Prometheus queries, helping with Kubernetes monitoring, troubleshooting, and optimization.
Let me know if you need more details! 🚀
Node Exporter label
Ah, I see what you're asking now. You're asking about the default filters (or labels) that Node Exporter automatically exposes, as well as those you can configure when setting up Prometheus to scrape metrics from it. These filters are typically set when you define your Prometheus scrape configuration, such as in the prometheus.yml file, and they relate to how Prometheus labels and organizes the scraped metrics.
Default Filters Exposed by Node Exporter
When Prometheus scrapes metrics from the Node Exporter, it adds several default labels to the metrics automatically. These labels can be used for filtering in your Prometheus queries. The most common default filters (labels) are:
job:This label represents the job name defined in your Prometheus scrape configuration (typically in the
prometheus.ymlfile).Example: If your Prometheus scrape config is scraping metrics from Node Exporter, it might look like:
In this case,
job="node"will be added automatically to all metrics scraped from Node Exporter.
instance:This label represents the instance of the target being scraped, which is typically the hostname or IP address of the machine exposing the metrics. This is automatically added by Prometheus based on the target’s address.
Example: For a target
localhost:9100, theinstancelabel will beinstance="localhost:9100".
__name__:This is a built-in label used to represent the metric name (e.g.,
node_cpu_seconds_total,node_memory_Free_bytes, etc.). While this isn't explicitly defined inprometheus.yml, it's a fundamental label for querying metrics.
Example of a Prometheus Configuration (prometheus.yml)
prometheus.yml)With the above configuration, the following labels will be applied:
job="node"instance="localhost:9100"instance="192.168.1.100:9100"
Additional Labels and Filters
In addition to the default labels like job and instance, Node Exporter exposes other labels for filtering based on the type of metric being collected.
1. CPU Metrics (e.g., node_cpu_seconds_total)
mode: The CPU mode (e.g.,user,system,idle).Example:
node_cpu_seconds_total{mode="user"}
2. Network Metrics (e.g., node_network_receive_bytes_total)
device: The network interface (e.g.,eth0,lo).Example:
node_network_receive_bytes_total{device="eth0"}
3. Filesystem Metrics (e.g., node_filesystem_free_bytes)
mountpoint: The mount path (e.g.,/,/mnt/data).Example:
node_filesystem_free_bytes{mountpoint="/"}
fstype: The filesystem type (e.g.,ext4,xfs).Example:
node_filesystem_free_bytes{fstype="ext4"}
4. Disk Metrics (e.g., node_disk_read_bytes_total)
device: The disk device (e.g.,sda,sdb).Example:
node_disk_read_bytes_total{device="sda"}
5. Memory Metrics (e.g., node_memory_MemTotal_bytes)
state: The state of memory (e.g.,active,free).Example:
node_memory_Active_bytes{state="active"}
6. Uptime Metrics (e.g., node_boot_time_seconds)
instance: The instance label, which is often used in combination with other labels to get more granular data.
Filtering by Labels
You can use these labels to filter or group metrics in your Prometheus queries. Here are a few examples of how you'd query the metrics exposed by Node Exporter using those labels:
By job and instance (e.g., for CPU usage):
By network interface:
By filesystem and mountpoint:
By memory usage:
By CPU and mode:
Summary of Common Labels from Node Exporter Metrics
job: Name of the scraping job (e.g.,node).instance: Instance of the target (typically the hostname or IP address of the target).mode: Mode of CPU usage (e.g.,user,system,idle).device: Disk or network interface (e.g.,sda,eth0).mountpoint: The filesystem mount path (e.g.,/).fstype: The type of filesystem (e.g.,ext4).state: State of memory (e.g.,active,free).sensor: For hardware sensors, e.g.,cpufor CPU temperature.
These labels are added automatically by Prometheus when scraping metrics from Node Exporter, and you can use them to filter and group metrics in your Prometheus queries for detailed analysis.
Exporter
Prometheus officially maintains a set of exporters that are considered reliable and actively supported by the Prometheus team. These are listed on the official Prometheus GitHub and website.
🔹 Officially Maintained Prometheus Exporters
These exporters are directly managed by the Prometheus team and are considered stable.
🔹 Prometheus Community Exporters
While not directly managed by Prometheus, these exporters are part of the Prometheus Community and are well-maintained.
🔹 Where to Find Officially Managed Exporters?
🔗 Prometheus Official Exporters: https://prometheus.io/docs/instrumenting/exporters/
If you need help setting up any exporter, let me know! 🚀
Prometheus supports a wide range of exporters to collect metrics from different systems. Below is a list of commonly used Prometheus exporters categorized by their use case:
1. System & Infrastructure Monitoring
2. Cloud & Virtualization Monitoring
3. Database Monitoring
4. Messaging & Streaming Services
5. Web & API Monitoring
6. Storage & Backup Monitoring
7. Network & Security Monitoring
8. Kubernetes Monitoring
9. Custom Exporters
You can write custom exporters in Go, Python, Node.js, etc. if an official one is unavailable. Some tools to help:
How to Find More Exporters?
You can find additional official and community exporters here: 🔗 Prometheus Exporters List: https://prometheus.io/docs/instrumenting/exporters/
Need help setting up any exporter? Let me know! 🚀
Last updated