LRUCache和FastLRUCache实现分析

首页 > 代码库 > LRUCache和FastLRUCache实现分析

LRUCache和FastLRUCache实现分析

2024-07-15 21:44:54 219人阅读

1、LRUCache的实现分析

在分析LRUCache前先对LinkedHashMap做些介绍。LinkedHashMap继承于HashMap，它使用了一个双向链表来存储Map中的Entry顺序关系，这种顺序有两种，一种是LRU顺序，一种是插入顺序，这可以由其构造函数public LinkedHashMap(int initialCapacity,float loadFactor, boolean accessOrder)指定。所以，对于get、put、remove等操作，LinkedHashMap除了要做HashMap做的事情，还做些调整Entry顺序链表的工作。
以get操作为例，如果是LRU顺序（accessOrder为true），Entry的recordAccess方法就调整get到的Entry到链表的头部去：

 public V get(Object key) {        Entry<K,V> e = (Entry<K,V>)getEntry(key);        if (e == null)            return null;        e.recordAccess(this);        return e.value;    }

对于put来说，LinkedHashMap重写了addEntry方法：

void addEntry(int hash, K key, V value, int bucketIndex) {        createEntry(hash, key, value, bucketIndex);        // Remove eldest entry if instructed, else grow capacity if appropriate        Entry<K,V> eldest = header.after;        if (removeEldestEntry(eldest)) {            removeEntryForKey(eldest.key);        } else {            if (size >= threshold)                resize(2 * table.length);        }    }

addEntry中调用了boolean removeEldestEntry(Map.Entry<k,v> eldest)方法，默认实现一直返回false，也就是默认的Map是没有容量限制的。LinkedHashMap的子类可以复写该方法，当当前的size大于阈值时返回true，这样LinkedHashMap就可以从Entry顺序链表中删除最旧的Entry。这使得LinkedHashMap具有了Cache的功能，可以存储限量的元素，并具有两种可选的元素淘汰策略（LRU和FIFO），其中的LRU是最常用的。

Solr的LRUCache是基于LinkedHashMap实现的，所以LRUCache的实现真的很简单，这里列出其中核心的代码片断：

public Object init(final Map args, Object persistence, final CacheRegenerator regenerator) {    //一堆解析参数参数初始化的代码    //map map        map = new LinkedHashMap(initialSize, 0.75f, true) {      @Override      protected boolean removeEldestEntry(final Map.Entry eldest) {        if (size() > limit) {          // increment evictions regardless of state.          // this doesn‘t need to be synchronized because it will          // only be called in the context of a higher level synchronized block.          evictions++;          stats.evictions.incrementAndGet();          return true;        }        return false;      }    };    if (persistence==null) {      // must be the first time a cache of this type is being created      persistence = new CumulativeStats();    }    stats = (CumulativeStats)persistence;    return persistence;  }     public Object put(final Object key, final Object value) {    synchronized (map) {      if (state == State.LIVE) {        stats.inserts.incrementAndGet();      }      // increment local inserts regardless of state???      // it does make it more consistent with the current size...      inserts++;      return map.put(key,value);    }  }   public Object get(final Object key) {    synchronized (map) {      final Object val = map.get(key);      if (state == State.LIVE) {        // only increment lookups and hits if we are live.        lookups++;        stats.lookups.incrementAndGet();        if (val!=null) {          hits++;          stats.hits.incrementAndGet();        }      }      return val;    }  }

可以看到，LRUCache对读写操作直接加的互斥锁，多线程并发读写时会有锁的竞争问题。通常来说，Cache系统的读要远多于写，不能并发读是有些不够友好。不过，相比于Solr中其它耗时的操作来说，LRUCache的串行化读往往不会成为系统的瓶颈。LRUCache的优点是，直接套用LinkedHashMap，实现简单，缺点是，因为LinkedHashMap的get操作需要操作Entry顺序链表，所以必须对整个操作加锁。

2、FastLRUCache的实现分析

Solr1.4引入FastLRUCache作为另一种可选的实现。FastLRUCache放弃了LinkedHashMap，而是使用现在很多Java Cache实现中使用的ConcurrentHashMap。但ConcurrentHashMap只提供了高性能的并发存取支持，并没有提供对淘汰数据的支持，所以FastLRUCache主要需要做的就是这件事情。FastLRUCache的存取操作都在ConcurrentLRUCache中实现，所以我们直接过渡到ConcurrentLRUCache的实现。
ConcurrentLRUCache的存取操作代码如下：

public V get(final K key) {    final CacheEntry<K,V> e = map.get(key);    if (e == null) {      if (islive) {        stats.missCounter.incrementAndGet();      }      return null;    }    if (islive) {      e.lastAccessed = stats.accessCounter.incrementAndGet();    }    return e.value;  }   public V remove(final K key) {    final CacheEntry<K,V> cacheEntry = map.remove(key);    if (cacheEntry != null) {      stats.size.decrementAndGet();      return cacheEntry.value;    }    return null;  }   public Object put(final K key, final V val) {    if (val == null) {      return null;    }    final CacheEntry e = new CacheEntry(key, val, stats.accessCounter.incrementAndGet());    final CacheEntry oldCacheEntry = map.put(key, e);    int currentSize;    if (oldCacheEntry == null) {      currentSize = stats.size.incrementAndGet();    } else {      currentSize = stats.size.get();    }    if (islive) {      stats.putCounter.incrementAndGet();    } else {      stats.nonLivePutCounter.incrementAndGet();    }     // Check if we need to clear out old entries from the cache.    // isCleaning variable is checked instead of markAndSweepLock.isLocked()    // for performance because every put invokation will check until    // the size is back to an acceptable level.    // There is a race between the check and the call to markAndSweep, but    // it‘s unimportant because markAndSweep actually aquires the lock or returns if it can‘t.    // Thread safety note: isCleaning read is piggybacked (comes after) other volatile reads    // in this method.    if (currentSize > upperWaterMark && !isCleaning) {      if (newThreadForCleanup) {        new Thread() {          @Override          public void run() {            markAndSweep();          }        }.start();      } else if (cleanupThread != null){        cleanupThread.wakeThread();      } else {        markAndSweep();      }    }    return oldCacheEntry == null ? null : oldCacheEntry.value;  }

所有的操作都是直接调用map（ConcurrentHashMap）的。看下put中的代码，当map容量达到上限并且没有其他线程在清理数据（currentSize > upperWaterMark && !isCleaning），就调用markAndSweep方法清理数据，可以有3种方式做清理工作：1）在该线程同步执行，2）即时启动新线程异步执行，3）提供单独的清理线程，即时唤醒它异步执行。

markAndSweep方法那是相当的冗长，这里就不罗列出来。下面叙述下它的思路。

对于ConcurrentLRUCache中的每一个元素CacheEntry，它有个属性lastAccessed，表示最后访问的数值大小。ConcurrentLRUCache中的stats.accessCounter是全局的自增整数，当put或get Entry时，Entry的lastAccessed会被更新成新自增得到的accessCounter。 ConcurrentLRUCache淘汰数据就是淘汰那些lastAccessed较小的Entry。因为ConcurrentLRUCache没有维护以lastAccessed排序的Entry链表（否则就是LRUCache了），所以淘汰数据时就需要遍历整个Map中的元素来淘汰合适的Entry。这是不是要扯上排序呢？其实不用那么大动干戈。

这里定义几个变量，wantToKeep表示Map中需要保留的Entry个数，wantToRemove表示需要删除的个数（wantToRemove=map.size-wantToKeep),newestEntry是最大的lastAccessed值（初始是stats.accessCounter），这三个变量初始都是已知的，oldestEntry表示最小的lastAccessed，这个是未知的，可以在遍历Entry时通过比较递进到最小。Map中的Entry有3种:(a)是可以立刻判断出可以被淘汰的，也就是lastAccessed<(oldestEntry+wantToRemove)的，（b）是可以立刻判断出可以被保留的，也就是lastAccessed>(newestEntry-1000)的，（c）除上述两者之外的就是不能准确判断是否需要被淘汰的。对于遍历一趟Map中的Entry来说，极好的情况是如果淘汰掉满足（a）的Entry后Map大小降到了wantToKeep，这种情况的典型代表是对Cache只有get和put操作，使得lastAccessed在Map中能保持连续；极坏的情况是，可能满足（a）的Entry不够多甚至没有。但遍历一趟Map至少有一个效果是，会把需要处理的Entry范围缩小到满足（c）的。如此反复迭代，一定使得Map容量调到wantToKeep。而对这个淘汰，也要考虑一个现实情况是，wantToKeep往往是接近于map.size（比如等于0.9*map.size）的，如果remove操作不是很多，那么并不需要很多次遍历就可以完成清理工作。

ConcurrentLRUCache淘汰数据的基本思想如上所述。它的执行过程可以分为3个阶段。第一个阶段就是遍历Map中的每个Entry，如果满足（a）就remove，满足（b）则跳过，满足（c）则放到新map中。一遍下来后，如果map.size还大于wantToKeep，第二个阶段就再重复上述过程（实现上，Solr用了个变量numPasses，似乎想做个开关控制遍历几次，当前就固定成一次）。完了如果map.size还大于wantToKeep，第三阶段再遍历一遍Map，但这次使用PriorityQueue来提取出还需要再淘汰的N个最old的Entry，这样一次下来就收工了。需要补充一点，上面提到的wantToKeep在代码中是acceptableWaterMark和lowerWaterMark，也就是如果遍历后达到acceptableWaterMark就算完成，但操作是按lowerWaterMark的要求来。

这个算法的时间复杂度是2n+kln(k)（k值在实际大多数情况下会很小），相比于直接的堆排，通常会更快些。

3、总结

LRUCache和FastLRUCache两种Cache实现是两种很不同的思路。两者的相同点是，都使用了现成的Map来维护数据。不同点是如何来淘汰数据。LRUCache（也就是LinkedHashMap）格外维护了一个结构，在做存取操作时同时更新该结构，优点在于淘汰操作是O(1)的，缺点是需要对存取操作加互斥锁。FastLRUCache正相反，它没有额外维护新的结构，可以由ConcurrentHashMap支持并发读，但put操作中如果需要淘汰数据，淘汰过程是O(n)的，因为整个过程不加锁，这也只会影响该次put的性能，而FastLRUCache也可选成起独立线程异步执行来降低影响。而另一个Cache实现Ehcache，它在淘汰数据就是同步的，不过它限定了每次淘汰数据的大小（通常都少于5个），所以同步情况下性能不会太受影响。

原文：http://www.cnblogs.com/chenying99/archive/2012/08/02/2620703.html

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > LRUCache和FastLRUCache实现分析