HDFS

首页 > 代码库 > HDFS

2024-08-25 19:53:05 222人阅读

HDFS集群主要由管理文件系统元数据的NameNode和存储实际数据的DataNode组成.

HDFS架构描述了NameNode,DataNodes与客户端的基本交互.
客户端与NameNode联系以进行文件元数据或文件修改，并直接与DataNode执行实际的文件I / O。

Hadoop一些显著的特性:
1)Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.

2)HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.

3)Hadoop is written in Java and is supported on all major platforms.

4)Hadoop supports shell-like commands to interact with HDFS directly.

5)The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.

6)New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:

7)File permissions and authentication.
8)Rack awareness: to take a node’s physical location into account while scheduling tasks and allocating storage.
9)Safemode: an administrative mode for maintenance.
10)fsck: a utility to diagnose health of the file system, to find missing files or blocks.
11)fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
12)Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
13)Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
14)Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
15)Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
16)Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.

Web界面
每个NameNode和DataNode都运行了一个内部web服务器.
默认配置下,NameNode首页为:http://namenode-name:50070/
也可以浏览HDFS文件系统(使用"Browse the file system")

Shell命令:
bin/hdfs dfs -help #Hadoop shell所支持的命令列表
bin/hdfs dfs -help command-name #显示某个命令的详细帮助信息

dfsadmin命令
bin/hdfs dfsadmin -help

hdfs dfsadmin -printTopology # 输出集群的拓扑

Although the Hadoop framework is implemented in Java?, MapReduce applications need not be written in Java.

Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer.

Hadoop Pipes is a SWIG-compatible C++ API to implement MapReduce applications (non JNI? based).

HDFS

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > HDFS

HDFS

看完仍有疑问？有类似问题直接问程序猿