首页 > 代码库 > bigdata_hadoop_namenode手动重启错误解决分析

bigdata_hadoop_namenode手动重启错误解决分析

现象: 集群大面积异常,通过ambari启动不起来。逐一排查,顺序 hdfs -> mapreduce->yarn->hive -other 

hdfs下发现namenode ,datanode启动不起来

namenode报错如下 【namenode.NameNode: Failed to start namenode. java.io.IOException: Gap in tra】

解决方案:

  step1: /usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs  namenode 让错误报出来

  step2:    namenode 格式化 : /usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs  namenode -format

      step3 :比对  clusterId  :  master   namenode下的  current/version    datanode下的 current/version   (多台机器) ,手动修改 datanode下的

      clusterId:  eg:[CID-e341356d-7657-48eb-b22e-3ab1f6771cd1]

   /mnt/hadoop/hdfs/namenode/current/VERSION

   /mnt/hadoop/hdfs/data/current/VERSION

   step4: ambari上手动重启  namenode ,datanode 

 

----------------分割线---------

常用命令,和手动重启

【设置任务类型:】

  set hive.execution.engine=tez;

【Hive debug模式】

   hive --hiveconf hive.root.logger=DEBUG,console

【yarn上杀死任务】

 yarn  application -kill application_1478856791630_0002

 

resourcemanager手动启停

/usr/hdp/current/hadoop-yarn-resourcemanager/sbin/yarn-daemon.sh  stop resourcemanager

/usr/hdp/current/hadoop-yarn-resourcemanager/sbin/yarn-daemon.sh  start resourcemanager

 【nodemanager手动启停

/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh  stop nodemanager

/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh  start nodemanager

yarn historyserver 重启

/usr/hdp/current/hadoop-mapreduce-historyserver/sbin/mr-jobhistory-daemon.sh stop historyserver

/usr/hdp/current/hadoop-mapreduce-historyserver/sbin/mr-jobhistory-daemon.sh start historyserver

【yarn  ha状态互转】

yarn rmadmin -getServiceState rm1

yarn rmadmin -transitionToStandby rm1 --forcemanual

yarn rmadmin -transitionToActive rm2 --forcemanual  

 【zookeper手动起停】

/usr/hdp/current/zookeeper-server/bin/zkServer.sh stop

/usr/hdp/current/zookeeper-server/bin/zkServer.sh start

namenode手动启停

/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs namenode

 

【datanode 手动启停

/usr/hdp/current/hadoop-hdfs-datanode/bin/hdfs datanode

bigdata_hadoop_namenode手动重启错误解决分析