首页 > 代码库 > 通过secondary namenode恢复崩溃的namenode

通过secondary namenode恢复崩溃的namenode

模拟namenode崩溃,将name目录的内容全部删除,然后通过secondary namenode恢复namenode。

环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1

1、进入name目录下,删除name目录内容。

[huser@master name]$ pwd
/home/huser/hadoop/tmp/dfs/name

[huser@master name]$ ll
drwxrwxr-x 2 huser huser 4096 4月 16 20:16 current
drwxrwxr-x 2 huser huser 4096 4月 16 17:24 image
-rw-rw-r-- 1 huser huser 0 4月 16 20:10 in_use.lock
drwxrwxr-x 2 huser huser 4096 4月 16 18:55 previous.checkpoint

[huser@master name]$ rm -R *
[huser@master name]$ ls

2、停止集群,然后重启集群,发现nameNode失败。

[huser@master hadoop-1.2.1]$ bin/stop-all.sh

[huser@master hadoop-1.2.1]$ bin/start-all.sh 
[huser@master hadoop-1.2.1]$ jps
7160 SecondaryNameNode
7229 JobTracker
7369 Jps

3、停止集群格式化namenode。

[huser@master hadoop-1.2.1]$ bin/stop-all.sh

[huser@master hadoop-1.2.1]$ bin/hadoop namenode -format
14/04/16 21:17:39 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.1.115
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by ‘mattf‘ on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_51
************************************************************/
Re-format filesystem in /home/huser/hadoop/tmp/dfs/name ? (Y or N) Y
14/04/16 21:17:42 INFO util.GSet: Computing capacity for map BlocksMap
14/04/16 21:17:42 INFO util.GSet: VM type = 64-bit
14/04/16 21:17:42 INFO util.GSet: 2.0% max memory = 1013645312
14/04/16 21:17:42 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/04/16 21:17:42 INFO util.GSet: recommended=2097152, actual=2097152
14/04/16 21:17:43 INFO namenode.FSNamesystem: fsOwner=huser
14/04/16 21:17:43 INFO namenode.FSNamesystem: supergroup=supergroup
14/04/16 21:17:43 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/04/16 21:17:43 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/04/16 21:17:43 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/04/16 21:17:43 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/04/16 21:17:43 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/04/16 21:17:43 INFO common.Storage: Image file /home/huser/hadoop/tmp/dfs/name/current/fsimage of size 111 bytes saved in 0 seconds.
14/04/16 21:17:43 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16 21:17:43 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/huser/hadoop/tmp/dfs/name/current/edits
14/04/16 21:17:44 INFO common.Storage: Storage directory /home/huser/hadoop/tmp/dfs/name has been successfully formatted.
14/04/16 21:17:44 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.115
************************************************************/

4、从datanode节点获取namespace的ID。

[huser@master hadoop-1.2.1]$ ssh slave1

[huser@slave1 current]$ pwd
/home/huser/hadoop/tmp/dfs/data/current

[huser@slave1 current]$ ll
-rw-rw-r-- 1 huser huser 49184 4月 16 18:43 blk_-1800088935645150399
-rw-rw-r-- 1 huser huser 395 4月 16 18:43 blk_-1800088935645150399_1013.meta
-rw-rw-r-- 1 huser huser 25 4月 16 18:43 blk_269963827714855400
-rw-rw-r-- 1 huser huser 11 4月 16 18:43 blk_269963827714855400_1014.meta
-rw-rw-r-- 1 huser huser 16353 4月 16 18:43 blk_4611281727215307463
-rw-rw-r-- 1 huser huser 135 4月 16 18:43 blk_4611281727215307463_1015.meta
-rw-rw-r-- 1 huser huser 769 4月 16 19:32 dncp_block_verification.log.curr
-rw-rw-r-- 1 huser huser 158 4月 16 19:51 VERSION

[huser@slave1 current]$ cat VERSION
#Wed Apr 16 19:51:23 CST 2014
namespaceID=589801292
storageID=DS-1065963269-192.168.1.111-50010-1397640950581
cTime=0
storageType=DATA_NODE
layoutVersion=-41

5、修改namenode的VERSION文件中namespaceID。

[huser@slave1 current]$ exit
logout

[huser@master current]$ pwd
/home/huser/hadoop/tmp/dfs/name/current

[huser@master current]$ vi VERSION
#Wed Apr 16 21:17:43 CST 2014
namespaceID=589801292
cTime=0
storageType=NAME_NODE
layoutVersion=-41

6、删除namenode节点下的fsinage文件。

[huser@master current]$ rm fsimage
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION

7、复制secondarynamenode节点的fsimage文件到namenode节点下。

[huser@master current]$ pwd
/home/huser/hadoop/tmp/dfs/namesecondary/current
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 20:16 edits
-rw-rw-r-- 1 huser huser 2259 4月 16 20:16 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16 20:16 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 20:16 VERSION

[huser@master current]$ cp fsimage /home/huser/hadoop/tmp/dfs/name/current/

[huser@master current]$ cd /home/huser/hadoop/tmp/dfs/name/current/
[huser@master current]$ ll
-rw-rw-r-- 1 huser huser 4 4月 16 21:17 edits
-rw-rw-r-- 1 huser huser 2259 4月 16 21:37 fsimage
-rw-rw-r-- 1 huser huser 8 4月 16 21:17 fstime
-rw-rw-r-- 1 huser huser 100 4月 16 21:30 VERSION

8、重启集群并检查运行情况。

[huser@master hadoop-1.2.1]$ jps
7927 SecondaryNameNode
7773 NameNode
8017 JobTracker
8123 Jps