首页 > 代码库 > 数据模型



1、管道:UNIX pipes就是一种最常见的管道。管道有助于进程原语的重用,已有模块的简单链接即可组成一个新的模块。



You’re probably aware of data processing models such as pipelines and message queues.These models provide specifi c capabilities in developing different aspects of data processing applications.The most familiar pipelines are the Unix pipes.Pipelines can help the reuse of processing primitives;simple chaining of existing modules creates new ones.Message queues can help the synchronization of processing primitives.The programmer writes her data processing task as processing primitives in the form of either a producer or a consumer.The timing of their execution is managed by the system.

Similarly,MapReduce is also a data processing model.Its greatest advantage is the easy scaling of data processing over multiple computing nodes.Under the MapReduce model,the data processing primitives are called mappers and reducers.Decomposing a data processing application into mappers and reducers is sometimes nontrivial.But,once you write an application in the MapReduce form,scaling the application to run over hundreds,thousands,or even tens of thousands of machines in a cluster is merely a confi guration change.This simple scalability is what has attracted many programmers to the MapReduce model.

——from 《hadoop in action》 1.5 Understanding MapReduce
