ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

Java程序监控---Metrics

2020-02-04 12:57:13  阅读:393  来源: 互联网

标签:metrics Java MetricRegistry Metrics rate static 监控 new public


概念

Metrics是一个给JAVA服务的各项指标提供度量工具的包,在JAVA代码中嵌入Metrics代码,可以方便的对业务代码的各个指标进行监控

目前最为流行的 metrics 库是来自 Coda Hale 的 dropwizard/metrics,该库被广泛地应用于各个知名的开源项目中。例如 Hadoop,Kafka,Spark,JStorm 中。

有一些优点:

  • 提供了对Ehcache、Apache HttpClient、JDBI、Jersey、Jetty、Log4J、Logback、JVM等的集成
  • 支持多种Metric指标:Gauges、Counters、Meters、Histograms和Timers
  • 支持多种Reporter发布指标
    • JMX、Console,CSV文件和SLF4J loggers
    • Ganglia、Graphite,用于图形化展示

MetricRegistry

MetricRegistry类是Metrics的核心,它是存放应用中所有metrics的容器。也是我们使用 Metrics 库的起点。其中maven依赖添加在文末。

1
static final MetricRegistry metrics = new MetricRegistry();

Reporter

指标获取之后需要上传到各种地方,就需要用到Reporter。

控制台

监控指标直接打印在控制台

1234567
pravite static void startReportConsole() {    ConsoleReporter reporter = ConsoleReporter.forRegistry(metrics)            .convertRatesTo(TimeUnit.SECONDS)            .convertDurationsTo(TimeUnit.MILLISECONDS)            .build();    reporter.start(1, TimeUnit.SECONDS);}

JMX

将监控指标上报到JMX中,后续可以通过其他的开源工具上传到Graphite等供图形化展示。从Jconsole中MBean中能看到。

1234
pravite static void startReportJmx(){    JmxReporter reporterJmx = JmxReporter.forRegistry(metrics).build();    reporterJmx.start();}

Graphite

将监控指标上传到Graphite,从Graphite-web中能看到上传的监控指标。

12345678910
pravite static void startReportGraphite(){    Graphite graphite = new Graphite(new InetSocketAddress("graphite.xxx.com", 2003));    GraphiteReporter reporter = GraphiteReporter.forRegistry(metrics)            .prefixedWith("test.metrics")            .convertRatesTo(TimeUnit.SECONDS)            .convertDurationsTo(TimeUnit.MILLISECONDS)            .filter(MetricFilter.ALL)            .build(graphite);    reporter.start(1, TimeUnit.MINUTES);}

封装各种Reporter

调用方式MetricCommon.getMetricAndStartReport();

12345678910111213
public class MetricCommon {    private static final MetricRegistry metricRegistry = new MetricRegistry();    public static MetricRegistry getMetricAndStartReport(){        startReportConsole();        startReportJmx();        startReportGraphite();        return metricRegistry;    }    pravite static void startReportConsole() {...}    pravite static void startReportJmx(){...}    pravite static void startReportGraphite(){...}}

Metics指标

Metrics 有如下监控指标:

  • Gauges:记录一个瞬时值。例如一个待处理队列的长度。
  • Histograms:统计单个数据的分布情况,最大值、最小值、平均值、中位数,百分比(75%、90%、95%、98%、99%和99.9%)
  • Meters:统计调用的频率(TPS),总的请求数,平均每秒的请求数,以及最近的1、5、15分钟的平均TPS
  • Timers:当我们既要统计TPS又要统计耗时分布情况,Timer基于Histograms和Meters来实现
  • Counter:计数器,自带inc()和dec()方法计数,初始为0。
  • Health Checks:用于对Application、其子模块或者关联模块的运行是否正常做检测

Gauges

最简单的度量指标,只有一个简单的返回值,例如,我们想衡量一个待处理队列中任务的个数

123456789101112131415161718192021222324252627282930313233
public class GaugeTest {    private static final MetricRegistry registry = MetricCommon.getMetricAndStartReport();    private static final Random random = new Random();    @Test    public void testOneGuage() throws InterruptedException {        Queue queue= new LinkedList<String>();        registry.register(MetricRegistry.name(GaugeTest.class, "testGauges-queue-size", "size"),                (Gauge<Integer>) () -> queue.size());        while(true){            Thread.sleep(1000);            queue.add("Job-xxx");        }    }    @Test    public void testMultiGuage() throws InterruptedException {        Map<Integer, Integer> map = new ConcurrentHashMap<>();        while(true){            int i = random.nextInt(100);            int j = i % 10;            if(!map.containsKey(j)){                map.put(j,i);                registry.register(MetricRegistry.name(GaugeTest.class, "testGauges-number", String.valueOf(j)),                        (Gauge<Integer>) () -> map.get(j));            }else{                map.put(j,i);            }            Thread.sleep(1000);        }    }}

第一个测试用例,是用一个guage记录队列的长度

123
-- Gauges ----------------------------------------------------------------------GaugeTest.testGauges-queue-size.size             value = 4

第二个测试用例,每次产生一个100以内的随机数,将这些数以个位数的数字分组,guage记录每一组现在是什么数。

12345678910111213141516171819
-- Gauges ----------------------------------------------------------------------GaugeTest.testGauges-number.0             value = 60GaugeTest.testGauges-number.1             value = 1GaugeTest.testGauges-number.2             value = 82GaugeTest.testGauges-number.3             value = 23GaugeTest.testGauges-number.4             value = 74GaugeTest.testGauges-number.5             value = 25GaugeTest.testGauges-number.7             value = 17GaugeTest.testGauges-number.8             value = 78GaugeTest.testGauges-number.9             value = 69

Histogram

Histogram统计数据的分布情况。比如最小值,最大值,中间值,还有中位数,75百分位, 90百分位, 95百分位, 98百分位, 99百分位, 和 99.9百分位的值(percentiles)。

123456789101112131415
public class HistogramTest {    private static final MetricRegistry registry = MetricCommon.getMetricAndStartReport();    public static Random random = new Random();    @Test    public void test() throws InterruptedException {        Histogram histogram = new Histogram(new ExponentiallyDecayingReservoir());        registry.register(MetricRegistry.name(HistogramTest.class, "request", "histogram"), histogram);        while(true){            Thread.sleep(1000);            histogram.update(random.nextInt(100000));        }    }}

运行很长时间之后,相当于随机值取极限,会趋向于统计值,75%肯定是要<=75000,99.9%肯定是要<=999000。

12345678910111213
-- Histograms ------------------------------------------------------------------HistogramTest.request.histogram             count = 1336               min = 97               max = 99930              mean = 49816.49            stddev = 29435.27            median = 49368.00              75% <= 75803.00              95% <= 95340.00              98% <= 98096.00              99% <= 98724.00            99.9% <= 99930.00

Meters

Meter度量一系列事件发生的速率(rate),例如TPS。Meters会统计最近1分钟,5分钟,15分钟,还有全部时间的速率。

123456789101112131415161718192021222324
public class MetersTest {    MetricRegistry registry = MetricCommon.getMetricAndStartAllReport("nc110x.corp.youdao.com","test.metrics");    public static Random random = new Random();    @Test    public void testOne() throws InterruptedException {        Meter meterTps = registry.meter(MetricRegistry.name(MetersTest.class,"request","tps"));        while(true){            meterTps.mark();            Thread.sleep(random.nextInt(1000));        }    }    @Test    public void testMulti() throws InterruptedException {        while(true){            int i = random.nextInt(100);            int j = i % 10;            Meter meterTps = registry.meter(MetricRegistry.name(MetersTest.class,"request","tps",String.valueOf(j)));            meterTps.mark();            Thread.sleep(10);        }    }}

这里,多个注册多个meter与注册多个guage、Histograms用法会有不同,meter方法是getOrAdd

123
public Meter meter(String name) {        return (Meter)this.getOrAdd(name, MetricRegistry.MetricBuilder.METERS);}

一个meter的测试用例,运行结果如下。可以看到随着次数的增多,各种rate无限趋近于2次。

1234567
-- Meters ------------------------------- 大专栏  Java程序监控---Metrics---------------------------------------MetersTest.request.tps             count = 452         mean rate = 1.99 events/second     1-minute rate = 2.03 events/second     5-minute rate = 2.00 events/second    15-minute rate = 2.00 events/second

多个meter的测试用例,运行结果取了数字个位数为6/7/8的三个如下。最后都会无限趋近于10。sleep时间为10ms,每秒有100份,平均到尾数不同的,每组就有10份。

123456789101112131415161718
MetersTest.request.tps.6             count = 905         mean rate = 9.74 events/second     1-minute rate = 9.76 events/second     5-minute rate = 9.94 events/second    15-minute rate = 9.98 events/secondMetersTest.request.tps.7             count = 935         mean rate = 10.07 events/second     1-minute rate = 10.62 events/second     5-minute rate = 11.82 events/second    15-minute rate = 12.19 events/secondMetersTest.request.tps.8             count = 937         mean rate = 10.09 events/second     1-minute rate = 10.09 events/second     5-minute rate = 10.31 events/second    15-minute rate = 10.37 events/second

Timer

Timer其实是 Histogram 和 Meter 的结合, histogram 某部分代码/调用的耗时, meter统计TPS。

1234567891011121314151617181920212223242526272829303132333435
public class TimerTest {    public static Random random = new Random();    private static final MetricRegistry registry = MetricCommon.getMetricAndStartAllReport("nc110x.corp.youdao.com","test.metrics");    private static final Map<Integer,Timer> timerMap = new ConcurrentHashMap<>();    @Test    public void testOneTimer() throws InterruptedException {        Timer timer = registry.timer(MetricRegistry.name(TestTimer.class,"get-latency"));        Timer.Context ctx;        while(true){            ctx = timer.time();            Thread.sleep(random.nextInt(1000));            ctx.stop();        }    }    @Test    public void testMultiTimer() throws InterruptedException {        while(true){            int i = random.nextInt(100);            int j = i % 10;            Timer timer = registry.timer(MetricRegistry.name(TestTimer.class,"get-latency",String.valueOf(j)));            Timer.Context ctx;            ctx = timer.time();            Thread.sleep(random.nextInt(1000));            ctx.stop();            Thread.sleep(1000);        }    }}

测试用例1是单个timer,结果如下。最后的时间都趋近于统计值。

1234567891011121314151617
-- Timers ----------------------------------------------------------------------com.testmetrics.TestTimer.get-latency             count = 657         mean rate = 2.05 calls/second     1-minute rate = 1.98 calls/second     5-minute rate = 2.02 calls/second    15-minute rate = 2.01 calls/second               min = 4.98 milliseconds               max = 998.93 milliseconds              mean = 496.79 milliseconds            stddev = 297.46 milliseconds            median = 501.02 milliseconds              75% <= 765.09 milliseconds              95% <= 952.03 milliseconds              98% <= 974.12 milliseconds              99% <= 989.02 milliseconds            99.9% <= 998.93 milliseconds

Counters

Counter 就是计数器,Counter 只是用 Gauge 封装了 AtomicLong 。我们可以使用如下的方法,使得获得队列大小更加高效。

1234567891011121314151617181920212223242526272829303132333435
public class CounterTest {    private static final MetricRegistry registry = MetricCommon.getMetricAndStartReport();    public static Queue<String> q = new LinkedBlockingQueue<String>();    public static Counter pendingJobs;    public static Random random = new Random();    public static void addJob(String job) {        pendingJobs.inc();        q.offer(job);    }    public static String takeJob() {        pendingJobs.dec();        return q.poll();    }    @Test    public void test() throws InterruptedException {        pendingJobs = registry.counter(MetricRegistry.name(Queue.class,"pending-jobs","size"));        int num = 1;        while(true){            Thread.sleep(200);            if (random.nextDouble() > 0.7){                String job = takeJob();                System.out.println("take job : "+job);            }else{                String job = "Job-"+num;                addJob(job);                System.out.println("add job : "+job);            }            num++;        }    }}

job会越来越多,因为每次取走只取一个job,但是加入job是加入num个,num会一直增加,而概率是7:3。

123
-- Counters --------------------------------------------------------------------java.util.Queue.pending-jobs.size             count = 36

HeathChecks

Metrics提供了一个独立的模块:Health Checks,用于对Application、其子模块或者关联模块的运行是否正常做检测。该模块是独立metrics-core模块的,使用时则导入metrics-healthchecks包。

12345678910111213141516171819202122232425262728
public class HeathChecksTest extends HealthCheck {    @Override    protected Result check() throws Exception {        Random random = new Random();        if(random.nextInt(10)!=9){            return Result.healthy();        }else{            return Result.unhealthy("oh,unhealthy");        }    }    @Test    public void test() throws InterruptedException {        HealthCheckRegistry registry = new HealthCheckRegistry();        registry.register("check1",new HeathChecksTest());        registry.register("check2", new HeathChecksTest());        while (true) {            for (Map.Entry<String, Result> entry : registry.runHealthChecks().entrySet()) {                if (entry.getValue().isHealthy()) {                    System.out.println(entry.getKey() + ": OK, message:"+entry.getValue());                } else {                    System.err.println(entry.getKey() + ": FAIL, error message: " + entry.getValue());                }            }            Thread.sleep(1000);        }    }}

注册两个HeathChecks,重写其check()方法为取随机数,只要不是9就为healthy,输出结果如下:

123456789
check1: OK, message:Result{isHealthy=true}check2: FAIL, error message: Result{isHealthy=false, message=oh,unhealthy}check1: OK, message:Result{isHealthy=true}check2: OK, message:Result{isHealthy=true}check1: OK, message:Result{isHealthy=true}check2: OK, message:Result{isHealthy=true}check1: OK, message:Result{isHealthy=true}check2: OK, message:Result{isHealthy=true}check1: OK, message:Result{isHealthy=true}

maven依赖

  • metrics-core:必须添加
  • metrics-healthchecks:用到healthchecks时添加
  • metrics-graphite:用到graphite时添加
  • org.slf4j:不添加看不到metrics-graphite包出错的log
    123456789101112131415161718192021222324252627282930
    <properties>    <metrics.version>3.1.0</metrics.version>    <sl4j.version>1.7.22</sl4j.version></properties><dependency>    <groupId>io.dropwizard.metrics</groupId>    <artifactId>metrics-core</artifactId>    <version>${metrics.version}</version></dependency><dependency>    <groupId>io.dropwizard.metrics</groupId>    <artifactId>metrics-healthchecks</artifactId>    <version>${metrics.version}</version></dependency><dependency>    <groupId>io.dropwizard.metrics</groupId>    <artifactId>metrics-graphite</artifactId>    <version>${metrics.version}</version></dependency><dependency>    <groupId>org.slf4j</groupId>    <artifactId>slf4j-api</artifactId>    <version>${sl4j.version}</version></dependency><dependency>    <groupId>org.slf4j</groupId>    <artifactId>slf4j-simple</artifactId>    <version>${sl4j.version}</version></dependency>

参考

http://metrics.dropwizard.io/3.1.0/getting-started/
http://www.cnblogs.com/nexiyi/p/metrics_sample_1.html
http://wuchong.me/blog/2015/08/01/getting-started-with-metrics/

标签:metrics,Java,MetricRegistry,Metrics,rate,static,监控,new,public
来源: https://www.cnblogs.com/lijianming180/p/12259003.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有