首页 > 代码库 > 监控开发之用munin来自定义插件监控redis和mongodb
监控开发之用munin来自定义插件监控redis和mongodb
求监控组的大哥大妹子们干点事,真不容易 ! 要问他们是谁? 他们是神 。轻易别找他们,因为找了也是白找。
上次因为python和redis长时间brpop的时候,会有线程休眠挂起的情况,所有通知报警平台被下线了。这次算是完美解决了。再把他给上线。这两公司的告警已经开始往我这边的接口开始仍了。
这边正在改zabbix cmdb的控制,所以暂时不能登录。等搞好了后,让他们搞下redis和mogodb的监控,居然还让我发邮件和提供脚本及思路啥的。。。 一寻思,又要去zabbix,又要写脚本,还不如把监控都集合在自己的平台上的了。
这次没用选用钟爱的ganglia,麻烦。 也没用另一个graphite,而是用的是munin 。 一个直接yum后就可以访问的性能监控页面。
官方的redis监控和mongodb看起来很麻烦的样子,算了。直接看他们是怎么写的。源码是perl写的,插件好多是shell写的。
写法是相当的简单,只需要指明下图片的显示Y X 轴 ,然后echo就可以了!
下面是redis 的token使用热点数据,队列的数据,及mongodb count的数据。
原文:http://rfyiamcool.blog.51cto.com/1030776/1426130
脚本的位置: /etc/munin/plugins
监控mongodb的脚本:
#xiaorui.cc if [ "$1" = "autoconf" ]; then echo yes exit 0 fi if [ "$1" = "config" ]; then echo ‘graph_title mongodb count mail‘ echo ‘graph_args --base 1000 -l 0‘ echo ‘graph_vlabel mail queue‘ echo ‘graph_scale no‘ echo ‘graph_category system‘ echo ‘load.label load‘ echo ‘graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").‘ echo ‘load.info 5 minute load average‘ exit 0 fi echo -n "load.value " mongo reportops --eval "db.reportops_log_mail.count()"|tail -n1
原文:http://rfyiamcool.blog.51cto.com/1030776/1426130
监控redis队列的脚本:
#xiaorui.cc if [ "$1" = "config" ]; then # The host name this plugin is for. (Can be overridden to have # one machine answer for several) # The title of the graph echo ‘graph_title redis mail‘ # Arguments to "rrdtool graph". In this case, tell it that the # lower limit of the graph is ‘0‘, and that 1k=1000 (not 1024) echo ‘graph_args --base 1000 -l 0‘ # The Y-axis label echo ‘graph_vlabel load‘ # We want Cur/Min/Avg/Max unscaled (i.e. 0.42 load instead of # 420 milliload) echo ‘graph_scale no‘ # Graph category. Defaults to ‘other‘ echo ‘graph_category system‘ # The fields. "label" is used in the legend. "label" is the only # required subfield. echo ‘load.label load‘ # These two read the environment for warning values for the field # "load". If "load_warning" or "warning" aren‘t set in the # environment, no warning levels are set. Likewise for "load_critical" # and "critical". print_warning load print_critical load # This one is purely to add an explanation to the web page. The first # one is for the graph itself, while the second one is for the field # "load". echo ‘graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").‘ echo ‘load.info 5 minute load average‘ # Last, if run with the "config"-parameter, quit here (don‘t # display any data) exit 0 fi # If not run with any parameters at all (or only unknown ones), do the # real work - i.e. display the data. Almost always this will be # "value" subfield for every data field. echo -n "load.value " redis-cli LLEN sendmaillist|cut -d ‘‘ -f2
写完了后,/etc/init.d/munin-node restart 就可以了。等一会刷新下页面就出来了。
关键就是最后那两行。。
echo -n "load.value " redis-cli LLEN sendmaillist|cut -d ‘‘ -f2
网上有人做了python的munin操作模块,有兴趣的朋友可以试试。
https://github.com/samuel/python-munin
#!/usr/bin/env python import os from munin import MuninPlugin class LoadAVGPlugin(MuninPlugin): title = "Load average" args = "--base 1000 -l 0" vlabel = "load" scale = False category = "system" @property def fields(self): warning = os.environ.get(‘load_warn‘, 10) critical = os.environ.get(‘load_crit‘, 120) return [("load", dict( label = "load", info = ‘The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").‘, type = "GAUGE", min = "0", warning = str(warning), critical = str(critical)))] def execute(self): if os.path.exists("/proc/loadavg"): loadavg = open("/proc/loadavg", "r").read().strip().split(‘ ‘) else: from subprocess import Popen, PIPE output = Popen(["uptime"], stdout=PIPE).communicate()[0] loadavg = output.rsplit(‘:‘, 1)[1].strip().split(‘ ‘)[:3] return dict(load=loadavg[1]) if __name__ == "__main__": LoadAVGPlugin().run()
原文:http://rfyiamcool.blog.51cto.com/1030776/1426130
总结下,munin真的够简单的了,他的简单也意味着,他也就 适合我这样的运维研发人员临时做些统计的场景。 记得以前使用munin,当时做zeromq的统计,超过几十台是没啥问题,当然这话是(feihua),要是几十台都有问题,那这监控的水准确实够烂。 这东西的局限确实够大。也就临时画画图还行。
本文出自 “峰云,就她了。” 博客,谢绝转载!