Prometheus+Alertmanager对接Telegram告警
· 阅读需 5 分钟
Prometheus+Alertmanager对接Telegram告警
Prometheus 搭建启动
1、安装prometheus
mkdir /apps && cd /apps
wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz
tar xf prometheus-2.35.0.linux-amd64.tar.gz
# 创建软连接
ln -sv prometheus-2.35.0.linux-amd64 /apps/prometheus
2、配置systemd管理
vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/apps/prometheus/prometheus \
--config.file=/apps/prometheus/prometheus.yml \
--storage.tsdb.path=/apps/prometheus/data \
--web.console.templates=/apps/prometheus/consoles \
--web.console.libraries=/apps/prometheus/console_libraries \
--web.listen-address=:9090 \
--storage.tsdb.retention.time=15d \
--web.enable-lifecycle \
--web.enable-admin-api
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
3、创建用户和设置权限
# 创建prometheus用户
useradd --no-create-home --shell /bin/false prometheus
# 设置目录权限
chown -R prometheus:prometheus /apps/prometheus/
4、启动服务
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus
Alertmanager 搭建配置
1、安装Alertmanager
cd /apps
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
ln -sv alertmanager-0.24.0.linux-amd64 /apps/alertmanager
2、配置systemd管理
vim /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/apps/alertmanager/alertmanager \
--config.file=/apps/alertmanager/alertmanager.yml \
--storage.path=/apps/alertmanager/data \
--web.listen-address=:9093 \
--web.external-url=http://localhost:9093
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
3、配置Alertmanager
vim /apps/alertmanager/alertmanager.yml
global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'your-email@163.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'telegram-notifications'
receivers:
- name: 'telegram-notifications'
telegram_configs:
- api_url: 'https://api.telegram.org'
bot_token: 'YOUR_BOT_TOKEN'
chat_id: YOUR_CHAT_ID
message: |
{{ range .Alerts -}}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
4、设置权限和启动
chown -R prometheus:prometheus /apps/alertmanager/
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager
systemctl status alertmanager
配置Prometheus规则
1、创建告警规则文件
vim /apps/prometheus/rules/alert-rules.yml
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
2、修改Prometheus配置
vim /apps/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "rules/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
创建Telegram Bot
1、创建Bot
- 在Telegram中搜索
@BotFather
- 发送
/newbot
命令 - 按提示设置bot名称和用户名
- 获取bot token
2、获取Chat ID
- 将bot添加到群组或直接私聊
- 发送一条消息给bot
- 访问
https://api.telegram.org/bot<TOKEN>/getUpdates
- 从返回的JSON中找到chat id
3、测试Bot
# 测试发送消息
curl -X POST "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/sendMessage" \
-H "Content-Type: application/json" \
-d '{"chat_id": "<YOUR_CHAT_ID>", "text": "Test message from Prometheus"}'
配置验证
1、重启服务
systemctl restart prometheus
systemctl restart alertmanager
2、检查配置
# 检查Prometheus配置
curl http://localhost:9090/api/v1/status/config
# 检查Alertmanager配置
curl http://localhost:9093/api/v1/status/config
3、测试告警
# 停止node_exporter来触发告警
systemctl stop node_exporter
# 查看告警状态
curl http://localhost:9090/api/v1/alerts
高级配置
1、告警模板自定义
receivers:
- name: 'telegram-notifications'
telegram_configs:
- api_url: 'https://api.telegram.org'
bot_token: 'YOUR_BOT_TOKEN'
chat_id: YOUR_CHAT_ID
parse_mode: 'HTML'
message: |
<b>🚨 {{ .Status | toUpper }}</b>
<b>Alert:</b> {{ .GroupLabels.alertname }}
<b>Severity:</b> {{ .CommonLabels.severity }}
<b>Instance:</b> {{ .CommonLabels.instance }}
<b>Summary:</b> {{ .CommonAnnotations.summary }}
<b>Description:</b> {{ .CommonAnnotations.description }}
<b>Started:</b> {{ .StartsAt.Format "2006-01-02 15:04:05" }}
2、告警抑制规则
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
3、路由规则
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'default-receiver'
routes:
- match:
severity: critical
receiver: 'critical-receiver'
repeat_interval: 5m
- match:
severity: warning
receiver: 'warning-receiver'
repeat_interval: 30m
故障排除
1、常见问题
- Bot token错误:确保从BotFather获取的token正确
- Chat ID错误:通过getUpdates API获取正确的chat id
- 网络问题:检查服务器是否能访问Telegram API
- 权限问题:确保prometheus用户有正确的文件权限
2、日志查看
# 查看Prometheus日志
journalctl -u prometheus -f
# 查看Alertmanager日志
journalctl -u alertmanager -f
3、调试命令
# 测试告警规则
/apps/prometheus/promtool query instant 'up == 0'
# 验证告警配置
/apps/alertmanager/amtool config check --config.file=/apps/alertmanager/alertmanager.yml
总结
通过以上配置,你已经成功搭建了Prometheus+Alertmanager+Telegram的告警系统。这个系统能够:
- 监控系统指标
- 根据预设规则触发告警
- 将告警信息推送到Telegram
- 提供灵活的告警路由和抑制机制
记住要定期检查和更新告警规则,确保告警系统的有效性。