在10 分钟教你使用Prometheus监控Spring Boot工程中介绍了如何使用Prometheus监控Spring Boot提供的默认指标,这篇介绍如何自定义业务指标,并使用Prometheus进行监控并报警,同时在 Grafana 进行展现
我们模拟一个账务系统,主要功能有:充值与提现,其中会定义5 个业务指标,如下
针对以上5 业务指标,会使用prometheus的三种Metrics类型,如下
最终我们对以上指标进行 grafana 进行展现,同时对余额小于500 进行告警通知,效果如下
图片
图片
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId></dependency><dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId></dependency>
#监控的端点management.endpoints.web.exposure.include=*#应用程序名称,在prometheus 上会显示management.metrics.tags.applicatinotallow=${spring.application.name}#tomcat 指标需要开启server.tomcat.mbeanregistry.enabled=true
@Service@Slf4jpublic class AccountServiceImpl implements IAccountService { @Autowired private MeterRegistry registry; //入金笔数 private Counter depositCounter; // 出金笔数 private Counter withdrawCounter; //入金金额 private DistributionSummary depositAmountSummary; // 出金金额 private DistributionSummary withdrawAmountSummary; //余额 private BigDecimal balance = new BigDecimal(1000); @PostConstruct private void init() { depositCounter = registry.counter("deposit_counter", "currency", "btc"); withdrawCounter = registry.counter("withdraw_counter", "currency", "btc"); depositAmountSummary = registry.summary("deposit_amount", "currency", "btc"); withdrawAmountSummary = registry.summary("withdraw_amount", "currency", "btc"); Gauge.builder("balanceGauge", () -> balance) .tags("currency", "btc") .description("余额") .register(registry); } @Override // 充值操作 public void depositOrder(BigDecimal amount) { log.info("depositOrder amount:{}", amount); try { //余额增加 balance = balance.add(amount); //充值笔数埋点 depositCounter.increment(); //充值金额埋点 depositAmountSummary.record(amount.doubleValue()); } catch (Exception e) { log.info("depositOrder error", e); } finally { log.info("depositOrder result:{}", amount); } } @Override //提现操作 public void withdrawOrder(BigDecimal amount) { log.info(" withdrawOrder amount:{}", amount); try { if (balance.subtract(amount).compareTo(BigDecimal.ZERO) < 0) { throw new Exception("提现金额不足,提现失败"); } //余额减少 balance = balance.subtract(amount); // 提现笔数埋点数据 withdrawCounter.increment(); // 提现金额埋点 withdrawAmountSummary.record(amount.doubleValue()); } catch (Exception e) { log.info("withdrawOrder error", e); } finally { log.info("withdrawOrder result:{}", amount); } }}
@RestController@RequestMapping(ControllerConstants.PATH_PREFIX + "/account")public class AccountController { @Autowired IAccountService accountService; /** * 充值 */ @RequestMapping(value = "/deposit", method = RequestMethod.GET) public void deposit(@RequestParam("amount") BigDecimal amount) { accountService.depositOrder(amount); } /** * 提现 */ @RequestMapping(value = "/withdraw", method = RequestMethod.GET) public void withdraw(@RequestParam("amount") BigDecimal amount) { accountService.withdrawOrder(amount); }}
##充值笔数deposit_counter_total## 充值总金额deposit_amount_sum##提现笔数withdraw_counter_total##提现总金额withdraw_amount_sum## 余额balanceGauge
在prometheus.yml文件中进行配置业务系统采集点,5s 拉取一次指标,由于prometheus server 部署在docker 中,所以访问主机IP 用host.docker.internal
#业务系统监控 - job_name: 'SpringBoot' # Override the global default and scrape_interval: 5s metrics_path: '/actuator/prometheus' static_configs: - targets: ['host.docker.internal:8080']
图片
告警规则配置,在容器启动时用主机的/data/prometheus目录映射到容器的/prometheus目录,因此在主机/data/prometheus/目录创建rules文件夹,并创建告警文件business-alert.rules,这里告警对余额小于 500 则进行告警
groups:- name: businessAlert rules: - alert: balanceAlert expr: balanceGauge{applicatinotallow="backend"} < 500 for: 20s labels: severity: page team: g2park annotations: summary: "{{ $labels.currency }} balance is insufficient " description: "{{ $labels.currency }} balance : {{ $value }}"
启动Prometheus,进行验证,查询采集目标,已生效
图片
查询充值次数,已采集点击Alters,可以看到业务告警已经生效
在/data/prometheus/alertmanager目录下,新增告警模板notify-template.tmpl,此目录映射到altermanager 的/etc/alertmanager目录,模板包含告警和自愈两部分,2006-01-02 15:04:05是go语言的日志格式,固定值,加28800e9表示转换为东八区时间,即北京时间
{{ define "test.html" }} {{- if gt (len .Alerts.Firing) 0 -}}{{ range .Alerts }}<h1 align="left" style="color:red;">告警</h1><pre>告警级别: {{ .Labels.severity }} 级 <br>告警类型: {{ .Labels.alertname }} <br>故障主机: {{ .Labels.instance }} <br>告警主题: {{ .Annotations.summary }} <br>告警详情: {{ .Annotations.description }} <br>告警时间:{{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br> </pre>{{ end }}{{ end }}{{- if gt (len .Alerts.Resolved) 0 -}}{{ range .Alerts }}<h1 align="left" style="color:green;">恢复</h1><pre>告警名称:{{ .Labels.alertname }}<br>告警级别:{{ .Labels.severity }}<br>告警机器:{{ .Labels.instance }}<br>告警主题:{{ .Annotations.summary }}<br>告警主题:{{ .Annotations.description }}<br>告警时间:{{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br> 恢复时间:{{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}<br> </pre>{{- end }}{{- end }}{{- end }}
修改alertmanager.yml为以下内容,替换对应账号即可
global: smtp_smarthost: smtp.qq.com:465 smtp_from: 9238223@qq.com smtp_auth_username: 9238223@qq.com smtp_auth_identity: 9238223@qq.com smtp_auth_password: 123 smtp_require_tls: falsetemplates: #添加模板 - '/etc/alertmanager/notify-template.tmpl' #指定路径 route: group_by: ['alertname'] receiver: 'default-receiver' group_wait: 30s group_interval: 5m repeat_interval: 1hreceivers: - name: default-receiver email_configs: - to: abc123@foxmail.com html: '{{ template "test.html" . }}' send_resolved: true headers: { Subject: "系统监控告警{{- if gt (len .Alerts.Resolved) 0 -}}恢复{{ end }}" }
global: 这是一个全局配置部分,用于配置全局的Alertmanager设置。
route: 用于配置警报的路由规则。
receivers: 接收者部分,用于配置接收告警通知的收件人。
启动Altermanager,进行验证
docker start alertmanager
访问stauts,如果出现以下结果则成功
告警验证,系统默认余额为1000,调用backend/account/withdraw提现接口,使余额降至500,进行报警
等待20s 左右,prometheus 收到报警会推送至Altermanager
图片
Altermanager则会根据我们配置时间等待 30s,进行通知告警
图片
自愈验证,调用充值backend/account/deposit接口,使余额大于500,等待6m 左右会收到自愈告警,如果嫌时间比较长,修改alertmanager.yml中 group_wait、group_interval参数值即可
启动 Grafana,点击新增面板,创建三种图表,分别为余额走势、提现与充值金额占比、提现与充值笔数走势图,如下
图片
余额走势,报表类型为Stat
sum(balanceGauge{applicatinotallow="backend"})
图片
提现与充值金额占比,报表类型为Pie chart
withdraw_amount_sum{applicatinotallow="backend"}deposit_amount_sum{applicatinotallow="backend"}
提现与充值笔数走势图,报表类型为Time series
increase(deposit_counter_total{applicatinotallow="backend"}[5m])increase(withdraw_counter_total{applicatinotallow="backend"}[5m])
以上介绍了如何在Spring Boot中自定义业务指标以及对指标进行监控和告警,希望对你所帮助,注意以上示例只是为了简单便于理解才是这样写,真实使用中,指标可以与数据库或者缓存进行结合,比如余额报警,调用查询余额接口即可。
本文链接:http://www.28at.com/showinfo-26-55324-0.html在SpringBoot中自定义指标并使用Prometheus监控报警
声明:本网页内容旨在传播知识,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。邮件:2376512515@qq.com