首页 > 代码库 > mesos主机日常操作

mesos主机日常操作

清理日志


由于mesos平台日志文件会备份在日志系统内,固主机上的大日志文件可以定时清理。

/data/docker/containers -type f -name "*.log" -size +500M   /data/docker/containers目录下.log类型且大于500M的文件可清理

/data/logbak/passport/10.5.0/stress/base -type f -name passport.log.* -size +500M  /data/logbak/passport/10.5.0/stress/base目录下以passport.log.开头且大于500M的文件可清理

 

清理脚本(可以清理/data/docker ,/data/logbak/,/data/docker/containers)

#!/bin/bash
source /etc/profile &>/dev/null
#basedir=$(cd `dirname $0`;pwd)
for i in `find /data/docker/containers -type f -name "*.log" -size +500M`
do
> $i
# echo $i
done

for i in `find /data/logbak/ -type f -name "*.log.*" -size +500M`
do
> $i
# echo $i
done

for i in `find /data/log/ -type f -name "*.log" -size +500M`
do
> $i
# echo $i
done



crontab -e

59 23 * * *  sh /root/mesosclean.sh

每晚23点59分执行脚本



磁盘使用率100%时操作



说明:查找占用空间最大的文件,看是应用产生的还是平台产生的,应用产品联系应用负责人确认是否可清除,data目录下平台产品的可直接清除。

清除日志脚本模板:

#!/bin/bash
source /etc/profile &>/dev/null
#basedir=$(cd `dirname $0`;pwd)
for i in `find /data/docker/containers -type f -name "*.log" -size +500M`
do
> $i
# echo $i
done

当磁盘使用率已达到100%时,清除空间后还需要检查slave程序是否开启,因为磁盘满了后slave程序会自动退出,程序端口为5051

netstat -ntlp 查看5051端口是否启动

systemctl status mesos-slave  查看slave的运行状态,出现下方提示则为正常

● mesos-slave.service - Mesos Slave
Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-02-20 10:14:50 CST; 25min ago
Main PID: 12006 (mesos-slave)
Memory: 23.8M
CGroup: /system.slice/mesos-slave.service
├─12006 /usr/sbin/mesos-slave --master=zk://10.117.8.138:2181,10.168.152.122:2181,10.117.8.133:2181/mesos --log_dir=/var/log/mesos --attributes=environment:common;pool:common --containerizers=docker,mesos --executor_registration_timeout=5mins --hostname=10...
├─12024 logger -p user.info -t mesos-slave[12006]
└─12025 logger -p user.err -t mesos-slave[12006]

Feb 20 10:39:26 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:26.238210 12039 status_update_manager.cpp:282] Closing status update streams for framework 4aa27d1e-23e5-4559-ad75-6ff11078384e-0000
Feb 20 10:39:26 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:26.238214 12035 gc.cpp:55] Scheduling ‘/data/mesos-slave/meta/slaves/a73dd71d-8661-4f8b-8a0b-27ab3b41fd0e-S136/frameworks/4aa27d1e-23e5-4559-ad75-6ff11078384e-0000/executors/qa_base_nebula_c...
Feb 20 10:39:26 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:26.238400 12035 gc.cpp:55] Scheduling ‘/data/mesos-slave/meta/slaves/a73dd71d-8661-4f8b-8a0b-27ab3b41fd0e-S136/frameworks/4aa27d1e-23e5-4559-ad75-6ff11078384e-0000/executors/qa_base_nebula_c...
Feb 20 10:39:26 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:26.238453 12035 gc.cpp:55] Scheduling ‘/data/mesos-slave/slaves/a73dd71d-8661-4f8b-8a0b-27ab3b41fd0e-S136/frameworks/4aa27d1e-23e5-4559-ad75-6ff11078384e-0000‘ for gc 6.999...days in the future
Feb 20 10:39:26 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:26.238502 12035 gc.cpp:55] Scheduling ‘/data/mesos-slave/meta/slaves/a73dd71d-8661-4f8b-8a0b-27ab3b41fd0e-S136/frameworks/4aa27d1e-23e5-4559-ad75-6ff11078384e-0000‘ for gc ...days in the future
Feb 20 10:39:50 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:39:50.654115 12026 slave.cpp:4374] Current disk usage 66.34%. Max allowed age: 1.655855054554132days
Feb 20 10:40:04 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:40:04.569077 12036 slave.cpp:4282] Framework 4aa27d1e-23e5-4559-ad75-6ff11078384e-0000 seems to have exited. Ignoring registration timeout for executor ‘qa_base_uc_contract_v1...96ee-02422c99a708‘
Feb 20 10:40:05 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:40:05.225116 12040 slave.cpp:4282] Framework 4aa27d1e-23e5-4559-ad75-6ff11078384e-0000 seems to have exited. Ignoring registration timeout for executor ‘qa_base_nebula_custome...96ee-02422c99a708‘
Feb 20 10:40:19 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:40:19.237442 12030 slave.cpp:4282] Framework 4aa27d1e-23e5-4559-ad75-6ff11078384e-0000 seems to have exited. Ignoring registration timeout for executor ‘qa_base_nebula_custome...96ee-02422c99a708‘
Feb 20 10:40:24 jst-nebula-executor-stage-10 mesos-slave[12025]: I0220 10:40:24.275003 12031 slave.cpp:4282] Framework 4aa27d1e-23e5-4559-ad75-6ff11078384e-0000 seems to have exited. Ignoring registration timeout for executor ‘qa_base_uc_contract_v1...96ee-02422c99a708‘

systemctl start mesos-slave   启动slave程序

journalctl -f -u mesos-slave   检测slave程序是否启动正常,没出现error字样则说明正常。

 

启动slave前需先删除正在发布的任务

 

[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# journalctl -f -u mesos-slave
-- Logs begin at Fri 2017-03-24 10:16:31 CST. --
Apr 17 11:15:15 dev-slave-05 mesos-slave[32081]: I0417 11:15:15.928283 32090 detector.cpp:479] A new leading master (UPID=master@172.18.21.64:5050) is detected
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: Failed to perform recovery: Collect failed: Docker ps batch failed Collect failed: Failed to ‘docker -H unix:///var/run/docker.sock inspect mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.c92163d8-89c5-4750-b7f9-7053c68413bf‘: exit status = exited with status 1 stderr = Error response from daemon: devmapper: Unknown device 2079885cb029c078890ad0369c40e4eda57dec57537979526f285792ee1f4608
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: 
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: To remedy this do as follows:
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: Step 1: rm -f /data/mesos-slave-new2/meta/slaves/latest
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: This ensures slave doesn‘t recover old live executors.
Apr 17 11:15:16 dev-slave-05 mesos-slave[32081]: Step 2: Restart the slave.
Apr 17 11:15:16 dev-slave-05 systemd[1]: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 17 11:15:16 dev-slave-05 systemd[1]: Unit mesos-slave.service entered failed state.
Apr 17 11:15:16 dev-slave-05 systemd[1]: mesos-slave.service failed.
^C
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# ^C
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# ^C
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# ^C
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9c4ba674bc5b hub-dev.fenxibao.com/loyalty-engine/loyalty2-calc:1.0-1.0.0-d20170221160618 "/bin/sh -c ‘java -se" 12 hours ago Up 12 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.da5bd892-a89d-448a-8c3c-c1138e40d87b
f5f01bd0ba30 hub-dev.fenxibao.com/uc/tenant:v2-2.3.0-d20170301144513 "java -Xms512m -Xmx10" 12 hours ago Up 12 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.61493e5e-d7aa-4222-b47d-5e377a10f96a
dee7e32eee0a hub-dev.fenxibao.com/ebm/ea-process:1.0-1.1.0-d20170416081400 "/bin/sh -c ‘java -se" 26 hours ago Up 26 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.d689740b-01f5-4b5a-9260-2c1054117a76
0bb19e00004b hub-dev.fenxibao.com/camp/web-node:v1-1.4.0-d20170227104043 "java -Xms750m -Xmx75" 2 days ago Up 2 days mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.229be74b-202b-4550-9ea1-c7b68e5e16cd
def0bcc8d9a5 hub-dev.fenxibao.com/sp/epassport:0.0.1-SNAPSHOT-20170414155528 "java -jar -XX:MaxMet" 2 days ago Up 2 days 0.0.0.0:31252->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.068d48f8-d160-4db4-b915-fecc67f02a6f
f47708684e8e hub.fenxibao.com/nebula/registrator:v10-dev "/bin/registrator -re" 3 days ago Up 3 days my-registrator
e3925f665573 hub-dev.fenxibao.com/test/tenant:v2-2.3.0-m20170413170137 "java -Xms512m -Xmx10" 3 days ago Up 3 days 0.0.0.0:31797->6300/tcp, 0.0.0.0:31798->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.65440902-25a8-438e-b82b-6e5d09a9e1bf
b18391ee3ed7 hub-dev.fenxibao.com/sp/passport:9.0-9.2.2-h20170302170256 "java -jar -Xms2560m " 3 days ago Up 3 days 0.0.0.0:31249->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.e99ffa2a-9596-4ee9-8d5d-11f1686a69ff
9c4232898e69 hub.fenxibao.com/calico/node-libnetwork:v0.9.0 "./start.sh" 5 days ago Up 5 days calico-libnetwork
401910688c83 hub.fenxibao.com/calico/node:v0.22.0 "/sbin/start_runit" 5 days ago Up 5 days calico-node
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9c4ba674bc5b hub-dev.fenxibao.com/loyalty-engine/loyalty2-calc:1.0-1.0.0-d20170221160618 "/bin/sh -c ‘java -se" 12 hours ago Up 12 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.da5bd892-a89d-448a-8c3c-c1138e40d87b
f5f01bd0ba30 hub-dev.fenxibao.com/uc/tenant:v2-2.3.0-d20170301144513 "java -Xms512m -Xmx10" 12 hours ago Up 12 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.61493e5e-d7aa-4222-b47d-5e377a10f96a
dee7e32eee0a hub-dev.fenxibao.com/ebm/ea-process:1.0-1.1.0-d20170416081400 "/bin/sh -c ‘java -se" 26 hours ago Up 26 hours mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.d689740b-01f5-4b5a-9260-2c1054117a76
0bb19e00004b hub-dev.fenxibao.com/camp/web-node:v1-1.4.0-d20170227104043 "java -Xms750m -Xmx75" 2 days ago Up 2 days mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.229be74b-202b-4550-9ea1-c7b68e5e16cd
def0bcc8d9a5 hub-dev.fenxibao.com/sp/epassport:0.0.1-SNAPSHOT-20170414155528 "java -jar -XX:MaxMet" 2 days ago Up 2 days 0.0.0.0:31252->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.068d48f8-d160-4db4-b915-fecc67f02a6f
f47708684e8e hub.fenxibao.com/nebula/registrator:v10-dev "/bin/registrator -re" 3 days ago Up 3 days my-registrator
e3925f665573 hub-dev.fenxibao.com/test/tenant:v2-2.3.0-m20170413170137 "java -Xms512m -Xmx10" 3 days ago Up 3 days 0.0.0.0:31797->6300/tcp, 0.0.0.0:31798->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.65440902-25a8-438e-b82b-6e5d09a9e1bf
b18391ee3ed7 hub-dev.fenxibao.com/sp/passport:9.0-9.2.2-h20170302170256 "java -jar -Xms2560m " 3 days ago Up 3 days 0.0.0.0:31249->8080/tcp mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.e99ffa2a-9596-4ee9-8d5d-11f1686a69ff
9c4232898e69 hub.fenxibao.com/calico/node-libnetwork:v0.9.0 "./start.sh" 5 days ago Up 5 days calico-libnetwork
401910688c83 hub.fenxibao.com/calico/node:v0.22.0 "/sbin/start_runit" 5 days ago Up 5 days calico-node
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps |awk ‘{print $!}‘
awk: cmd. line:1: {print $!}
awk: cmd. line:1: ^ syntax error
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps |awk ‘{print $1}‘
CONTAINER
9c4ba674bc5b
f5f01bd0ba30
dee7e32eee0a
0bb19e00004b
def0bcc8d9a5
f47708684e8e
e3925f665573
b18391ee3ed7
9c4232898e69
401910688c83
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps |awk ‘{print $1}‘ |grep -v "CON"
9c4ba674bc5b
f5f01bd0ba30
dee7e32eee0a
0bb19e00004b
def0bcc8d9a5
f47708684e8e
e3925f665573
b18391ee3ed7
9c4232898e69
401910688c83
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# for i in `docker ps |awk ‘{print $1}‘ |grep -v "CON"`
> docker rm -f $i
-bash: syntax error near unexpected token `docker‘
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# for i in `docker ps |awk ‘{print $1}‘ |grep -v "CON"`; docker rm -f $i;done
-bash: syntax error near unexpected token `docker‘
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# for i in `docker ps |awk ‘{print $1}‘ |grep -v "CON"`
> do
> docker rm -f $i
> done
9c4ba674bc5b
f5f01bd0ba30
dee7e32eee0a
0bb19e00004b
def0bcc8d9a5
f47708684e8e
e3925f665573
b18391ee3ed7
9c4232898e69
401910688c83
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
689c10762169 hub.fenxibao.com/calico/node-libnetwork:v0.9.0 "./start.sh" 6 seconds ago Up 1 seconds calico-libnetwork
354009e615c6 hub.fenxibao.com/calico/node:v0.22.0 "/sbin/start_runit" 16 seconds ago Up 12 seconds calico-node
544fc88ded14 hub.fenxibao.com/nebula/registrator:v10-dev "/bin/registrator -re" 25 seconds ago Up 20 seconds my-registrator
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
689c10762169 hub.fenxibao.com/calico/node-libnetwork:v0.9.0 "./start.sh" 14 seconds ago Up 8 seconds calico-libnetwork
354009e615c6 hub.fenxibao.com/calico/node:v0.22.0 "/sbin/start_runit" 24 seconds ago Up 19 seconds calico-node
544fc88ded14 hub.fenxibao.com/nebula/registrator:v10-dev "/bin/registrator -re" 33 seconds ago Up 27 seconds my-registrator
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# journalctl -f -u mesos-slave
-- Logs begin at Fri 2017-03-24 10:16:31 CST. --
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: I0417 11:17:41.755380 1962 detector.cpp:479] A new leading master (UPID=master@172.18.21.64:5050) is detected
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: Failed to perform recovery: Collect failed: Docker ps batch failed Collect failed: Failed to ‘docker -H unix:///var/run/docker.sock inspect mesos-e64cc98e-2c5c-4b9a-a87e-63af4467c447-S32.c92163d8-89c5-4750-b7f9-7053c68413bf‘: exit status = exited with status 1 stderr = Error response from daemon: devmapper: Unknown device 2079885cb029c078890ad0369c40e4eda57dec57537979526f285792ee1f4608
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: 
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: To remedy this do as follows:
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: Step 1: rm -f /data/mesos-slave-new2/meta/slaves/latest
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: This ensures slave doesn‘t recover old live executors.
Apr 17 11:17:41 dev-slave-05 mesos-slave[1961]: Step 2: Restart the slave.
Apr 17 11:17:41 dev-slave-05 systemd[1]: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 17 11:17:42 dev-slave-05 systemd[1]: Unit mesos-slave.service entered failed state.
Apr 17 11:17:42 dev-slave-05 systemd[1]: mesos-slave.service failed.
^C
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# rm -f /data/mesos-slave-new2/meta/slaves/latest
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# systemctl restart mesos-slave
[root@dev-slave-05 dev_base_uc_tenant_v2_tenant-default.b8876cd1-1ff9-11e7-bd55-4e9572e6108d]# journalctl -f -u mesos-slave


mesos主机日常操作