首页 > 代码库 > 通过secondary namenode恢复崩溃的namenode
通过secondary namenode恢复崩溃的namenode
模拟namenode崩溃,将name目录的内容全部删除,然后通过secondary namenode恢复namenode。
环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1
1、进入name目录下,删除name目录内容。
[huser@master name]$ pwd
/home/huser/hadoop/tmp/dfs/name
[huser@master name]$ ll
drwxrwxr-x 2 huser huser 4096 4月 16 20:16 current
drwxrwxr-x 2 huser huser 4096 4月 16 17:24 image
-rw-rw-r-- 1 huser huser 0 4月 16 20:10 in_use.lock
drwxrwxr-x 2 huser huser 4096 4月 16 18:55 previous.checkpoint
[huser@master name]$ rm -R *
[huser@master name]$ ls
2、停止集群,然后重启集群,发现nameNode失败。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh
[huser@master hadoop-1.2.1]$ bin/start-all.sh
[huser@master hadoop-1.2.1]$ jps
7160 SecondaryNameNode
7229 JobTracker
7369 Jps
3、停止集群格式化namenode。
[huser@master hadoop-1.2.1]$ bin/stop-all.sh
[huser@master hadoop-1.2.1]$ bin/hadoop namenode -format
14/04/16 21:17:39 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.1.115
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf‘ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_51
************************************************************/
Re-format filesystem in /home/huser/hadoop/tmp/dfs/name ? (Y or N) Y
14/04/16 21:17:42 INFO util.GSet: Computing capacity for map BlocksMap
14/04/16 21:17:42 INFO util.GSet: VM type = 64-bit
14/04/16 21:17:42 INFO util.GSet: 2.0% max memory = 1013645312
14/04/16 21:17:42 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/04/16 21:17:42 INFO util.GSet: recommended=2097152, actual=2097152
14/04/16 21:17:43 INFO namenode.FSNamesystem: fsOwner=huser
14/04/16 21:17:43 INFO namenode.FSNamesystem: supergroup=supergroup
14/04/16 21:17:43 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/04/16 21:17:43 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/04/16 21:17:43 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/04/16 21:17:43 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/04/16 21:17:43 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/04/16 21:17:43 INFO common.Storage: Image file /home/huser/hadoop/tmp/dfs/name/current/fsimage of size 111 bytes saved in 0 seconds.
14/04/16 21:17:43 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16 21:17:43 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16 21:17:44 INFO common.Storage: Storage directory /home/huser/hadoop/tmp/dfs/name has been successfully formatted.
14/04/16 21:17:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.115
************************************************************/
4、从datanode节点获取namespace的ID。
[huser@master hadoop-1.2.1]$ ssh slave1
[huser@slave1 current]$ pwd
/home/huser/hadoop/tmp/dfs/data/current
[huser@slave1 current]$ ll
-rw-rw-r-- 1 huser huser 49184 4月 16 18:43 blk_-1800088935645150399
-rw-rw-r-- 1 huser huser 395 4月 16 18:43 blk_-1800088935645150399_1013.meta
-rw-rw-r-- 1 huser huser 25 4月 16 18:43 blk_269963827714855400
-rw-rw-r-- 1 huser huser 11 4月 16 18:43 blk_269963827714855400_1014.meta
-rw-rw-r-- 1 huser huser 16353 4月 16 18:43 blk_4611281727215307463
-rw-rw-r-- 1 huser huser 135 4月 16 18:43 blk_4611281727215307463_1015.meta
-rw-rw-r-- 1 huser huser 769 4月 16 19:32 dncp_block_verification.log.curr
-rw-rw-r-- 1 huser huser 158 4月 16 19:51 VERSION
[huser@slave1 current]$ cat VERSION
#Wed Apr 16 19:51:23 CST 2014
namespaceID=589801292
storageID=DS-1065963269-192.168.1.111-50010-1397640950581
cTime=0
storageType=DATA_NODE
layoutVersion=-41
5、修改namenode的VERSION文件中namespaceID。
[huser@slave1 current]$ exit
logout
[huser@master current]$ pwd
/home/huser/hadoop/tmp/dfs/name/current
[huser@master current]$ vi VERSION
#Wed Apr 16 21:17:43 CST 2014
namespaceID=589801292
cTime=0
storageType=NAME_NODE
layoutVersion=-41
6、删除namenode节点下的fsinage文件。
[huser@master current]$ rm fsimage
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION
7、复制secondarynamenode节点的fsimage文件到namenode节点下。
[huser@master current]$ pwd
/home/huser/hadoop/tmp/dfs/namesecondary/current
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 20:16 edits
-rw-rw-r-- 1 huser huser 2259 4月 16 20:16 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16 20:16 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 20:16 VERSION
[huser@master current]$ cp fsimage /home/huser/hadoop/tmp/dfs/name/current/
[huser@master current]$ cd /home/huser/hadoop/tmp/dfs/name/current/
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser huser 2259 4月 16 21:37 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION
8、重启集群并检查运行情况。
[huser@master hadoop-1.2.1]$ jps
7927 SecondaryNameNode
7773 NameNode
8017 JobTracker
8123 Jps