首页 > 代码库 > Papers on github

Papers on github

Interesting Readings

  • Big Data Benchmark – Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez.
  • NoSQL Comparison – Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison.

Interesting Papers

2013 – 2014

  • 2014 – Stanford – Mining of Massive Datasets.
  • 2013 – AMPLab – Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
  • 2013 – AMPLab – MLbase: A Distributed Machine-learning System.
  • 2013 – AMPLab – Shark: SQL and Rich Analytics at Scale.
  • 2013 – AMPLab – GraphX: A Resilient Distributed Graph System on Spark.
  • 2013 – Google – HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
  • 2013 – Microsoft – Scalable Progressive Analytics on Big Data in the Cloud.
  • 2013 – Metamarkets – Druid: A Real-time Analytical Data Store.
  • 2013 – Google – Online, Asynchronous Schema Change in F1.
  • 2013 – Google – F1: A Distributed SQL Database That Scales.
  • 2013 – Google – MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
  • 2013 – Facebook – Scuba: Diving into Data at Facebook.
  • 2013 – Facebook – Unicorn: A System for Searching the Social Graph.
  • 2013 – Facebook – Scaling Memcache at Facebook.

2011 – 2012

  • 2012 – Twitter – The Unified Logging Infrastructure for Data Analytics at Twitter.
  • 2012 – AMPLab – Blink and It’s Done: Interactive Queries on Very Large Data.
  • 2012 – AMPLab – Fast and Interactive Analytics over Hadoop Data with Spark.
  • 2012 – AMPLab – Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
  • 2012 – Microsoft – Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
  • 2012 – Microsoft – Paxos Made Parallel.
  • 2012 – AMPLab – BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
  • 2012 – Google – Processing a trillion cells per mouse click.
  • 2012 – Google – Spanner: Google’s Globally-Distributed Database.
  • 2011 – AMPLab – Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
  • 2011 – AMPLab – Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
  • 2011 – Google – Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

2001 – 2010

  • 2010 – Facebook – Finding a needle in Haystack: Facebook’s photo storage.
  • 2010 – AMPLab – Spark: Cluster Computing with Working Sets.
  • 2010 – Google – Storage Architecture and Challenges.
  • 2010 – Google – Pregel: A System for Large-Scale Graph Processing.
  • 2010 – Google – Large-scale Incremental Processing Using Distributed Transactions and Noti?cations base of Percolator and Caffeine.
  • 2010 – Google – Dremel: Interactive Analysis of Web-Scale Datasets.
  • 2010 – Yahoo – S4: Distributed Stream Computing Platform.
  • 2009 – HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
  • 2008 – AMPLab – Chukwa: A large-scale monitoring system.
  • 2007 – Amazon – Dynamo: Amazon’s Highly Available Key-value Store.
  • 2006 – Google – The Chubby lock service for loosely-coupled distributed systems.
  • 2006 – Google – Bigtable: A Distributed Storage System for Structured Data.
  • 2004 – Google – MapReduce: Simplied Data Processing on Large Clusters.
  • 2003 – Google – The Google File System.

Papers on github