首页 > 代码库 > 20140709 datanode bug
20140709 datanode bug
hadoop2分布式安装后总是报这个bug
2014-07-06 08:22:40,506 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to hadoop3/192.168.56.203:9000java.io.IOException: Cluster IDs not matched: dn cid=CID-dfdd94b3-ce12-4465-ab52-6a02b7e7dba1 but ns cid=CID-48bee819-05bb-4ad4-9f25-1aa903fbfcb7; bpid=BP-500128720-192.168.56.202-1404563273292 at org.apache.hadoop.hdfs.server.datanode.DataNode.setClusterId(DataNode.java:291) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:896) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,507 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to hadoop2/192.168.56.202:9000java.io.IOException: Cluster IDs not matched: dn cid=CID-dfdd94b3-ce12-4465-ab52-6a02b7e7dba1 but ns cid=CID-48bee819-05bb-4ad4-9f25-1aa903fbfcb7; bpid=BP-500128720-192.168.56.202-1404563273292 at org.apache.hadoop.hdfs.server.datanode.DataNode.setClusterId(DataNode.java:291) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:896) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,517 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hadoop2/192.168.56.202:90002014-07-06 08:22:40,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hadoop3/192.168.56.203:90002014-07-06 08:22:40,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NNjava.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,518 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)2014-07-06 08:22:40,518 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NNjava.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:861) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,529 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -562014-07-06 08:22:40,632 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /usr/local/hadoop/tmp/dfs/data/in_use.lock acquired by nodename 2624@hadoop22014-07-06 08:22:40,680 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to hadoop0/192.168.56.200:9000java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-dfdd94b3-ce12-4465-ab52-6a02b7e7dba1; datanode clusterID = CID-48bee819-05bb-4ad4-9f25-1aa903fbfcb7 at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:472) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:225) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,681 INFO org.apache.hadoop.hdfs.server.common.Storage: Data-node version: -55 and name-node layout version: -562014-07-06 08:22:40,689 ERROR org.apache.hadoop.hdfs.server.common.Storage: It appears that another namenode 2624@hadoop2 has already locked the storage directory2014-07-06 08:22:40,691 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /usr/local/hadoop/tmp/dfs/data. The directory is already locked2014-07-06 08:22:40,691 WARN org.apache.hadoop.hdfs.server.common.Storage: Ignoring storage directory /usr/local/hadoop/tmp/dfs/data due to an exceptionjava.io.IOException: Cannot lock storage /usr/local/hadoop/tmp/dfs/data. The directory is already locked at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:674) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:493) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:186) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,691 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hadoop0/192.168.56.200:90002014-07-06 08:22:40,691 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (Datanode Uuid unassigned) service to hadoop1/192.168.56.201:9000java.io.IOException: All specified directories are not accessible or do not exist. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:217) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:249) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:929) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:900) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:815) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,692 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to hadoop1/192.168.56.201:90002014-07-06 08:22:40,818 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NNjava.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.remove(BlockPoolManager.java:91) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:859) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:40,818 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)2014-07-06 08:22:40,818 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NNjava.lang.Exception: trace at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:861) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:837) at java.lang.Thread.run(Thread.java:662)2014-07-06 08:22:42,820 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode2014-07-06 08:22:42,830 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 02014-07-06 08:22:42,843 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
解决方法:
dataNode 无法启动是配置过程中最常见的问题,主要原因是多次format namenode 造成namenode 和datanode的clusterID不一致。建议查看datanode上面的log信息。解决办法:修改每一个datanode上面的CID(位于dfs/data/current/VERSION文件夹中)使两者一致。
另:
1. clusterID不一致,namenode的cid和datanode的cid不一致,导致的原因是对namenode进行format的之后,datanode不会进行format,所以datanode里面的cid还是和format之前namenode的cid一样,解决办法是删除datanode里面的dfs.datanode.data.dir目录和tmp目录,然后再启动start-dfs.sh
2.即使删除iptables之后,仍然报Datanode denied communication with namenode: DatanodeRegistration错误,参考文章http://stackoverflow.com/questions/17082789/cdh4-3exception-from-the-logs-after-start-dfs-sh-datanode-and-namenode-star,可以知道需要把集群里面每个houst对应的ip写入/etc/hosts文件就能解决问题。
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。