标签:node memory rules labels prometheus Prometheus 规则 告警 yml
https://www.cnblogs.com/linuxk/p/12036193.html
- 修改prometheus配置文件
指定prometheus的规则文件路径或者文件名
vim prometheus.yml
rule_files:
- 'rules/*_rules.yml'
# - 'prometheus_rules.yml'
# - "./rule/*.yaml"
# - "first_rules.yml"
# - "second_rules.yml"
chown -R prometheus:prometheus /opt/prometheus/rules/
mv prometheus_rules.yml rules/
此配置所有规则都写入一个文件里面。
- 重启prometheus
systemctl restart prometheus
journalctl -u prometheus -fn 200
#停止
#ps -ef | grep prometheus | grep -v grep | awk '{print $2}' | xagrs kill -9
#或者
#curl -XPOST http://localhost:9090/-/quit
#重载
#curl -XPOST http://localhost:9090/-/reload
vim prometheus_rules.yml
groups:
- name: alive
rules:
- record: node:ping:total
expr: up
- name: cpu
rules:
- record: node:cpu_usage:ratio #别的文件使用,直接使用这个
expr: ((100 - (avg by(instance,ip,hostname) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)))
- name: mem
rules:
- record: node:memory_usage:ratio
expr: (100 -(node_memory_MemTotal_bytes -node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes ) / node_memory_MemTotal_bytes * 100 )
- 检查配置
[root@prometheus prometheus]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 1 rule files found
Checking prometheus_rules.yml
FAILED:
prometheus_rules.yml: groupname: "alive" is repeated in the same file
systemctl restart prometheus
journalctl -u prometheus -fn 200
vim rules/disk_rules.yml
groups:
- name: disk-monitor
rules:
- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes{job="node",fstype=~"ext.*|xfs",mountpoint ="/"} < 30 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
for: 1m
labels:
severity: warning
annotations:
summary: Host out of disk space (instance {{ $labels.instance }})
description: "Disk is almost full (< 30% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
vim rules/cpu_rules.yml
groups:
- name: cpu-monitor
rules:
- alert: HostHighCpuLoad
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance {{ $labels.instance }})
# description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
description: "服务器5分钟内CPU使用率超过80%!(当前值: {{ $value }}%)"
vim rules/alertmanager_rules.yml
groups:
- name: alertmanager-monitor
rules:
- alert: PrometheusNotConnectedToAlertmanager
expr: prometheus_notifications_alertmanagers_discovered < 1
for: 0m
labels:
severity: critical
annotations:
summary: Prometheus not connected to alertmanager (instance {{ $labels.instance }})
description: "Prometheus cannot connect the alertmanager\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
vim rules/memory_rules.yml
groups:
- name: memory-monitor
rules:
- alert: HostOutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
for: 2m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: "Node memory is filling up (< 20% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
标签:node,memory,rules,labels,prometheus,Prometheus,规则,告警,yml 来源: https://www.cnblogs.com/xjzyy/p/15589823.html
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。