Monitoring systems are critical in modern IT environments to ensure reliability and uptime. Prometheus and Alertmanager are open-source tools that simplify monitoring, metric collection, and alerting.
This guide will cover everything you need to start using Prometheus and Alertmanager, even if you are a complete beginner.
What is Prometheus?
Prometheus is a robust monitoring tool designed for cloud-native environments. It collects metrics from configured targets at given intervals, evaluates rule expressions, and triggers alerts when thresholds are breached.
Key Features:
- Multi-dimensional data model: Uses key-value pairs to identify metrics.
- Powerful query language (PromQL).
- Efficient storage: Time-series data storage.
- Visualization: Integrates well with Grafana.
What is Alertmanager?
Alertmanager handles alerts generated by Prometheus, deduplicates them, groups them, and routes them to various receivers like email, Slack, or PagerDuty.
Key Features:
- Alert grouping.
- Silencing alerts.
- Integration with multiple receivers.
Installing Prometheus
Prerequisites:
- A Linux server with root access.
- Docker or direct installation via binaries.
Steps for Docker Installation:
- Create a
prometheus.yml
configuration file:
global:
scrape_interval: 15s # Default scrape interval
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- Start Prometheus:
docker run -d --name=prometheus \
-p 9090:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
- Access the Prometheus web UI:
- Open http://localhost:9090.
Installing Alertmanager
- Create an
alertmanager.yml
configuration file:
global:
resolve_timeout: 5m
route:
receiver: 'email-alert'
receivers:
- name: 'email-alert'
email_configs:
- to: 'your-email@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'your-username'
auth_password: 'your-password'
- Start Alertmanager:
docker run -d --name=alertmanager \
-p 9093:9093 \
-v $(pwd)/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager
- Access Alertmanager’s web UI:
- Open http://localhost:9093.
Configuring Prometheus to Use Alertmanager
Modify prometheus.yml
:
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- 'alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Create an alert_rules.yml
file:
groups:
- name: example-alert
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "No response from {{ $labels.instance }} for over 1 minute."
Reload Prometheus:
curl -X POST http://localhost:9090/-/reload
Best Practices for Prometheus
- Use Node Exporter for OS Metrics:
- Install Node Exporter:bashCopy code
docker run -d -p 9100:9100 prom/node-exporter
- Add it to
prometheus.yml
:yamlCopy codescrape_configs: - job_name: 'node' static_configs: - targets: ['localhost:9100']
- Install Node Exporter:bashCopy code
- Label Your Metrics Wisely:
- Avoid high cardinality (e.g., too many unique labels).
- Set Retention Period:
- Limit storage to save resources:bashCopy code
--storage.tsdb.retention.time=15d
- Limit storage to save resources:bashCopy code
- Scale with Prometheus Federation:
- Use hierarchical setups for large environments.
Creating Custom Metrics
Prometheus allows custom metrics via client libraries:
Example (Python):
Install the library:
pip install prometheus_client
Create a simple exporter:
from prometheus_client import start_http_server, Gauge
import random
import time
# Define a gauge metric
my_gauge = Gauge('random_number', 'A random number generator')
if __name__ == "__main__":
start_http_server(8000)
while True:
my_gauge.set(random.randint(0, 100))
time.sleep(5)
Add it to prometheus.yml
:
scrape_configs:
- job_name: 'custom-metrics'
static_configs:
- targets: ['localhost:8000']
Monitoring a Linux System
Use the Node Exporter to monitor Linux system metrics such as CPU, memory, disk usage, and network.
- Start Node Exporter:
docker run -d -p 9100:9100 prom/node-exporter
- Add it to Prometheus:
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
- Access metrics in Prometheus:
- Query examples:
node_cpu_seconds_total
node_memory_Active_bytes
- Query examples:
Sending Alerts with Alertmanager
- Create an alert rule in
alert_rules.yml
:
groups:
- name: high_cpu_usage
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "High CPU usage detected on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 2 minutes."
- Reload Prometheus:
curl -X POST http://localhost:9090/-/reload
- Configure email or Slack integration in Alertmanager.
Conclusion
By following this guide, you can set up a powerful monitoring and alerting system with Prometheus and Alertmanager. Start small, experiment with metrics and alerts, and refine as you scale.
Feel free to share your questions or experiences in the comments below!