首页 > 代码库 > Java之美[从菜鸟到高手演练]之Hadoop常用命令
Java之美[从菜鸟到高手演练]之Hadoop常用命令
邮箱:xtfggef@gmail.com 微博:http://weibo.com/xtfggef
这篇文章主要是讲一下位于bin下的hadoop命令,我们可以直接输入hadoop无任何参数看一下:
用法就是:hadoop [---config confdir] COMMAND此处COMMAND就是下面列出来的那些,fs, version,jar 等等。
用户命令
fs
目前版本的hadoop已经摒弃了fs命令,取而代之的是hdfs dfs.
Usage: hdfs dfs [GENERIC_OPTIONS] [COMMAND_OPTIONS], 熟悉linux命令的同学学这个很快,举个例子,创建目录,mkdir,在这里为:hdfs dfs -mkdir /test
解释:此处-mkdir即为GENERIC_OPTIONS,后面的/test为COMMAND_OPTIONS,默认路径为/,所以执行完上面的操作后,我们在根路径/下面创建了test目录。下面是我总结了一下hadoop中fs相关的命令,亦可见文件EXCEL。
参数 | 作用 | 示例 | 返回值 |
appendToFile | 将一个或者多个本地 文件追加到目的文件 | hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile | Returns 0 on success and 1 on error |
cat | 输出文件 | hdfs dfs -cat file:///file3 /user/hadoop/file4 | Returns 0 on success and -1 on error |
chgrp | 改变文件的分组 | hdfs dfs -chgrp [-R] GROUP URI [URI ...] | |
chmod | 改变文件的权限 | hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...] | |
chown | 改变文件的拥有者 | hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ] | |
copyFromLocal | 从本地复制 | ||
copyToLocal | 复制到本地 | ||
count | 得到文件/目录等数目 追加参数-q, -h有不同的意义 | hdfs dfs -count -q hdfs://nn1.example.com/file1 | Returns 0 on success and -1 on error |
cp | 复制,参数-f,-p | hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 | Returns 0 on success and -1 on error |
du | 得到指定文件的大小 | hdfs dfs -du /test/hadoop | Returns 0 on success and -1 on error. |
dus | 已摒弃,和du类似 | ||
expunge | 清空回收站 | hdfs dfs -expunge | |
get | 复制文件到本地路径下 | hdfs dfs -get /user/hadoop/file localfile | Returns 0 on success and -1 on error |
getfacl | 显示文件或者目录的权限控制列表 | hdfs dfs -getfacl /file hdfs dfs -getfacl -R /dir | Returns 0 on success and non-zero on error |
getfattr | 显示文件或者目录的扩展属性 | hdfs dfs -getfattr -d /file | Returns 0 on success and non-zero on error |
getmerge | 合并多个文件一个目标文件里 | hdfs dfs -getmerge <src> <localdst> [addnl] | |
ls | 和linux里一样 | hdfs dfs -ls /user/hadoop/file1 | Returns 0 on success and -1 on error |
lsr | 等同于ls -R | ||
mkdir | 创建目录,-p创建多层目录 | hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2 | Returns 0 on success and -1 on error |
moveFromLocal | 类似put,区别在于put完后删除 原文件 | ||
moveToLocal | 目前没有实现 | ||
mv | 移动文件 | hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2 | Returns 0 on success and -1 on error |
put | 像目标目录推送文件 | hdfs dfs -put localfile /user/hadoop/hadoopfile | Returns 0 on success and -1 on error |
rm | 删除文件 | hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir | Returns 0 on success and -1 on error |
rmr | 类似于rm -r | ||
setfacl | 设置文件或者目录的权限控制列表 | hdfs dfs -setfacl -m user:hadoop:rw- /file | Returns 0 on success and non-zero on error |
setfattr | 设置文件或者目录的扩展属性 | hdfs dfs -setfattr -n user.myAttr -v myValue /file | Returns 0 on success and non-zero on error |
setrep | 改变文件和目录的复制因子 | hdfs dfs -setrep -w 3 /user/hadoop/dir1 | Returns 0 on success and -1 on error |
stat | 返回路径信息 | hdfs dfs -stat path | Exit Code: Returns 0 on success and -1 on error |
tail | 输出文件的最后1千字节 | hdfs dfs -tail pathname | Returns 0 on success and -1 on error |
test | 检查文件 | hdfs dfs -test -e filename | |
text | 以文本方式输出文件 | hdfs dfs -text <src> | |
touchz | 创建空文件 | hdfs dfs -touchz pathname | Returns 0 on success and -1 on error |
version
hadoop version查看hadoop版本信息
jar
可以运行jar文件
credential
这个命令用来管理证书,密钥和一些其他的私有信息。
archive
1. 创建archive
Hadoop archive 是一种特定的文档格式,每一个archive会与相应的系统目录相对应,以*.har结尾。每一个archive文档包含一些信息(metadata),记录该文档的一些信息,如索引,数据等。使用如下命令去创建hadoop archive:hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir
-archiveName:定义要创建的archive文档的名字。
-p: 定义要archive的目录,可以是绝对路径,也可是相对路径。
-r: 定义复制因子,默认是10。
最后一个是输出目录,我们来看一个例子:把/hadoop下的config文件夹打包成config.har并放到/archives下,执行如下命令:
hadoop archive -archiveName config.har -p /hadoop/hadoop/etc -r 1 hadoop /archives
root@master:~/hadoop/bin# ./hadoop archive -archiveName config.har -p /hadoop -r 1 config /archives 15/01/15 14:28:02 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.118:8032 15/01/15 14:28:04 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.118:8032 15/01/15 14:28:04 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.118:8032 15/01/15 14:28:05 INFO mapreduce.JobSubmitter: number of splits:1 15/01/15 14:28:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421303038566_0001 15/01/15 14:28:06 INFO impl.YarnClientImpl: Submitted application application_1421303038566_0001 15/01/15 14:28:06 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1421303038566_0001/ 15/01/15 14:28:06 INFO mapreduce.Job: Running job: job_1421303038566_0001 15/01/15 14:28:21 INFO mapreduce.Job: Job job_1421303038566_0001 running in uber mode : false 15/01/15 14:28:21 INFO mapreduce.Job: map 0% reduce 0% 15/01/15 14:28:33 INFO mapreduce.Job: map 100% reduce 0% 15/01/15 14:28:44 INFO mapreduce.Job: map 100% reduce 100% 15/01/15 14:28:44 INFO mapreduce.Job: Job job_1421303038566_0001 completed successfully 15/01/15 14:28:44 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=701 FILE: Number of bytes written=215089 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=23417 HDFS: Number of bytes written=23343 HDFS: Number of read operations=24 HDFS: Number of large read operations=0 HDFS: Number of write operations=7 Job Counters Launched map tasks=1 Launched reduce tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=9767 Total time spent by all reduces in occupied slots (ms)=7888 Total time spent by all map tasks (ms)=9767 Total time spent by all reduce tasks (ms)=7888 Total vcore-seconds taken by all map tasks=9767 Total vcore-seconds taken by all reduce tasks=7888 Total megabyte-seconds taken by all map tasks=10001408 Total megabyte-seconds taken by all reduce tasks=8077312 Map-Reduce Framework Map input records=7 Map output records=7 Map output bytes=680 Map output materialized bytes=701 Input split bytes=116 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=701 Reduce input records=7 Reduce output records=0 Spilled Records=14 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=291 CPU time spent (ms)=2090 Physical memory (bytes) snapshot=306540544 Virtual memory (bytes) snapshot=1320914944 Total committed heap usage (bytes)=127242240 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=632 File Output Format Counters Bytes Written=0 root@master:~/hadoop/bin#
浏览目录结构,我们看到了生成的har包:
用命令查看,并且copy到本地,得到如下:
我们发现har包看上去就是一个文件夹,不过比较特殊的是,它里面包含metadata,记录文件的基本信息和数据,而且有专门的特殊用途。
2. 查找archive
我们可以像操作普通文件系统一下去操作archive,唯一不同的是查找路径URI,这个需要注意一下。
har://scheme-hostname:port/archivepath/fileinarchive har:///archivepath/fileinarchive
root@master:~/hadoop/bin# ./hdfs dfs -ls har:///archives/config.har Found 1 items drwxr-xr-x - root supergroup 0 2015-01-15 14:26 har:///archives/config.har/config
3. 解压缩archive
hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
官方文档
distcp
distcp是一个文件和目录复制命令,用于集群之间已经集群内节点之间的文件复制,详情见官方文档。fsck
fsck是一个文件系统检查工具,用来检查各类问题,比如文件块丢失等。
用法:hdfs fsck [GENERIC_OPTIONS] <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots]
COMMAND_OPTION | Description |
---|---|
path | Start checking from this path. |
-move | Move corrupted files to /lost+found |
-delete | Delete corrupted files. |
-files | Print out files being checked. |
-openforwrite | Print out files opened for write. |
-includeSnapshots | Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it. |
-list-corruptfileblocks | Print out list of missing blocks and files they belong to. |
-blocks | Print out block report. |
-locations | Print out locations for every block. |
-racks | Print out network topology for data-node locations. |
这是一个独立的命令,不属于dfs. 看个例子:检查/下面的文件目录情况。
fetchdt
从namenode上拿到授权符,暂时没有用到,后面更新。job
与map reduce的job进行交互。
Usage: mapred job | [GENERIC_OPTIONS] | [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]
COMMAND_OPTION | Description |
---|---|
-submit job-file | Submits the job. |
-status job-id | Prints the map and reduce completion percentage and all job counters. |
-counter job-id group-name counter-name | Prints the counter value. |
-kill job-id | Kills the job. |
-events job-id from-event-# #-of-events | Prints the events‘ details received by jobtracker for the given range. |
-history [all]jobOutputDir | Prints job details, failed and killed tip details. More details about the job such as successful tasks and task attempts made for each task can be viewed by specifying the [all] option. |
-list [all] | Displays jobs which are yet to complete. -list all displays all jobs. |
-kill-task task-id | Kills the task. Killed tasks are NOT counted against failed attempts. |
-fail-task task-id | Fails the task. Failed tasks are counted against failed attempts. |
-set-priority job-id priority | Changes the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW |
我们可以通过:mapred job -status <job_id> 来查看job的运行状态。
pipes
运行管道job,稍后更新。queue
获得与job交互的队列的信息,用法:./mapred queue -info default
CLASSNAME
hadoop可以运行任何java class。classpath
管理命令
balancer
root@master:~/hadoop/bin# ./hdfs balancer 15/01/15 16:20:03 INFO balancer.Balancer: namenodes = [hdfs://master:9000] 15/01/15 16:20:03 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/01/15 16:20:06 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.116:50010 15/01/15 16:20:06 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.1.189:50010 15/01/15 16:20:06 INFO balancer.Balancer: 0 over-utilized: [] 15/01/15 16:20:06 INFO balancer.Balancer: 0 underutilized: [] The cluster is balanced. Exiting... Jan 15, 2015 4:20:06 PM 0 0 B 0 B -1 B Jan 15, 2015 4:20:06 PM Balancing took 3.215 seconds root@master:~/hadoop/bin#
datanode
COMMAND_OPTION | Description |
---|---|
-regular | Normal datanode startup (default). |
-rollback | Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version. |
-rollingupgrade rollback | Rollback a rolling upgrade operation. |
dfsadmin
Mover
namenode
secondarynamenode
hsadmin
historyserver
Java之美[从菜鸟到高手演练]之Hadoop常用命令