首页 > 代码库 > 数据模型

数据模型

数据处理模型

1、管道:UNIX pipes就是一种最常见的管道。管道有助于进程原语的重用,已有模块的简单链接即可组成一个新的模块。

2、消息队列:消息队列有助于进程原语的同步,程序员将数据处理任务以生产者或消费者的形式编写为进程原语,由系统来管理它们何时执行。

3、MapReduce:在MapReduce模型中,数据处理原语被称为Mapper和Reducer。分解一个数据处理应用为Mapper和Reducer有时是繁琐的,但是一旦以MapReduce的形式写好了一个应用程序,仅需修改配置就可以将它扩展到集群中成千上万台机器中运行。它最大的优点就是容易扩展到多个计算节点上处理数据。正式这种简单的可扩展性使得MapReduce模型吸引了众多程序员。

You’re probably aware of data processing models such as pipelines and message queues.These models provide specifi c capabilities in developing different aspects of data processing applications.The most familiar pipelines are the Unix pipes.Pipelines can help the reuse of processing primitives;simple chaining of existing modules creates new ones.Message queues can help the synchronization of processing primitives.The programmer writes her data processing task as processing primitives in the form of either a producer or a consumer.The timing of their execution is managed by the system.

Similarly,MapReduce is also a data processing model.Its greatest advantage is the easy scaling of data processing over multiple computing nodes.Under the MapReduce model,the data processing primitives are called mappers and reducers.Decomposing a data processing application into mappers and reducers is sometimes nontrivial.But,once you write an application in the MapReduce form,scaling the application to run over hundreds,thousands,or even tens of thousands of machines in a cluster is merely a confi guration change.This simple scalability is what has attracted many programmers to the MapReduce model.

——from 《hadoop in action》 1.5 Understanding MapReduce

数据模型