跳到主要内容

Prometheus+Alertmanager对接Telegram告警

· 阅读需 5 分钟

Prometheus+Alertmanager对接Telegram告警

Prometheus 搭建启动

1、安装prometheus

mkdir /apps  && cd /apps

wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz
tar xf prometheus-2.35.0.linux-amd64.tar.gz

# 创建软连接
ln -sv prometheus-2.35.0.linux-amd64 /apps/prometheus

2、配置systemd管理

vim /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/apps/prometheus/prometheus \
--config.file=/apps/prometheus/prometheus.yml \
--storage.tsdb.path=/apps/prometheus/data \
--web.console.templates=/apps/prometheus/consoles \
--web.console.libraries=/apps/prometheus/console_libraries \
--web.listen-address=:9090 \
--storage.tsdb.retention.time=15d \
--web.enable-lifecycle \
--web.enable-admin-api
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

3、创建用户和设置权限

# 创建prometheus用户
useradd --no-create-home --shell /bin/false prometheus

# 设置目录权限
chown -R prometheus:prometheus /apps/prometheus/

4、启动服务

systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus

Alertmanager 搭建配置

1、安装Alertmanager

cd /apps
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
ln -sv alertmanager-0.24.0.linux-amd64 /apps/alertmanager

2、配置systemd管理

vim /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/apps/alertmanager/alertmanager \
--config.file=/apps/alertmanager/alertmanager.yml \
--storage.path=/apps/alertmanager/data \
--web.listen-address=:9093 \
--web.external-url=http://localhost:9093
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

3、配置Alertmanager

vim /apps/alertmanager/alertmanager.yml

global:
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'your-email@163.com'

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'telegram-notifications'

receivers:
- name: 'telegram-notifications'
telegram_configs:
- api_url: 'https://api.telegram.org'
bot_token: 'YOUR_BOT_TOKEN'
chat_id: YOUR_CHAT_ID
message: |
{{ range .Alerts -}}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Labels:
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value }}
{{ end }}
{{ end }}

inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']

4、设置权限和启动

chown -R prometheus:prometheus /apps/alertmanager/
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager
systemctl status alertmanager

配置Prometheus规则

1、创建告警规则文件

vim /apps/prometheus/rules/alert-rules.yml

groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"

- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 80% for more than 5 minutes on {{ $labels.instance }}"

2、修改Prometheus配置

vim /apps/prometheus/prometheus.yml

global:
scrape_interval: 15s
evaluation_interval: 15s

alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093

rule_files:
- "rules/*.yml"

scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']

- job_name: 'node'
static_configs:
- targets: ['localhost:9100']

创建Telegram Bot

1、创建Bot

  1. 在Telegram中搜索 @BotFather
  2. 发送 /newbot 命令
  3. 按提示设置bot名称和用户名
  4. 获取bot token

2、获取Chat ID

  1. 将bot添加到群组或直接私聊
  2. 发送一条消息给bot
  3. 访问 https://api.telegram.org/bot<TOKEN>/getUpdates
  4. 从返回的JSON中找到chat id

3、测试Bot

# 测试发送消息
curl -X POST "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/sendMessage" \
-H "Content-Type: application/json" \
-d '{"chat_id": "<YOUR_CHAT_ID>", "text": "Test message from Prometheus"}'

配置验证

1、重启服务

systemctl restart prometheus
systemctl restart alertmanager

2、检查配置

# 检查Prometheus配置
curl http://localhost:9090/api/v1/status/config

# 检查Alertmanager配置
curl http://localhost:9093/api/v1/status/config

3、测试告警

# 停止node_exporter来触发告警
systemctl stop node_exporter

# 查看告警状态
curl http://localhost:9090/api/v1/alerts

高级配置

1、告警模板自定义

receivers:
- name: 'telegram-notifications'
telegram_configs:
- api_url: 'https://api.telegram.org'
bot_token: 'YOUR_BOT_TOKEN'
chat_id: YOUR_CHAT_ID
parse_mode: 'HTML'
message: |
<b>🚨 {{ .Status | toUpper }}</b>

<b>Alert:</b> {{ .GroupLabels.alertname }}
<b>Severity:</b> {{ .CommonLabels.severity }}
<b>Instance:</b> {{ .CommonLabels.instance }}

<b>Summary:</b> {{ .CommonAnnotations.summary }}
<b>Description:</b> {{ .CommonAnnotations.description }}

<b>Started:</b> {{ .StartsAt.Format "2006-01-02 15:04:05" }}

2、告警抑制规则

inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']

3、路由规则

route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'default-receiver'
routes:
- match:
severity: critical
receiver: 'critical-receiver'
repeat_interval: 5m
- match:
severity: warning
receiver: 'warning-receiver'
repeat_interval: 30m

故障排除

1、常见问题

  • Bot token错误:确保从BotFather获取的token正确
  • Chat ID错误:通过getUpdates API获取正确的chat id
  • 网络问题:检查服务器是否能访问Telegram API
  • 权限问题:确保prometheus用户有正确的文件权限

2、日志查看

# 查看Prometheus日志
journalctl -u prometheus -f

# 查看Alertmanager日志
journalctl -u alertmanager -f

3、调试命令

# 测试告警规则
/apps/prometheus/promtool query instant 'up == 0'

# 验证告警配置
/apps/alertmanager/amtool config check --config.file=/apps/alertmanager/alertmanager.yml

总结

通过以上配置,你已经成功搭建了Prometheus+Alertmanager+Telegram的告警系统。这个系统能够:

  1. 监控系统指标
  2. 根据预设规则触发告警
  3. 将告警信息推送到Telegram
  4. 提供灵活的告警路由和抑制机制

记住要定期检查和更新告警规则,确保告警系统的有效性。