Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

首页 > 代码库 > Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

2024-08-04 02:29:54 220人阅读

先来看一下使用流程：

1）拿到DataModel

2）定义相似度计算模型 PearsonCorrelationSimilarity

3）定义用户邻域计算模型 NearestNUserNeighborhood

4）定义推荐模型 GenericUserBasedRecommender

5)进行推荐

  @Test  public void testHowMany() throws Exception {    DataModel dataModel = getDataModel(            new long[] {1, 2, 3, 4, 5},            new Double[][] {                    {0.1, 0.2},                    {0.2, 0.3, 0.3, 0.6},                    {0.4, 0.4, 0.5, 0.9},                    {0.1, 0.4, 0.5, 0.8, 0.9, 1.0},                    {0.2, 0.3, 0.6, 0.7, 0.1, 0.2},            });    //用于计算最相似的用户,领域用户    UserSimilarity similarity = new PearsonCorrelationSimilarity(dataModel);    UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, dataModel);        Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, similarity);    List<RecommendedItem> fewRecommended = recommender.recommend(1, 2);    List<RecommendedItem> moreRecommended = recommender.recommend(1, 4);    for (int i = 0; i < fewRecommended.size(); i++) {      assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());    }    recommender.refresh(null);    for (int i = 0; i < fewRecommended.size(); i++) {      assertEquals(fewRecommended.get(i).getItemID(), moreRecommended.get(i).getItemID());    }  }

相似度计算，参考上篇的PearsonCorrelationSimilarity。

NearestNUserNeighborhood ，获取最近的N个用户，怎么实现的呢？
~/mahout-core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericUserBasedRecommender.java

  @Override  public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {    Preconditions.checkArgument(howMany >= 1, "howMany must be at least 1");    log.debug("Recommending items for user ID ‘{}‘", userID);        //根据similarity模型进行计算，计算最相似的N个用户    long[] theNeighborhood = neighborhood.getUserNeighborhood(userID);    if (theNeighborhood.length == 0) {      return Collections.emptyList();    }    //获取其他领域用户进行评分而且当前用户所没有进行评分的Item列表，作为推荐的基本池子     FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID);    //获取池子里面,当前用户偏好最高的TopN进行推荐    TopItems.Estimator<Long> estimator = new Estimator(userID, theNeighborhood);    List<RecommendedItem> topItems = TopItems        .getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);    log.debug("Recommendations are: {}", topItems);    return topItems;  }

Estimator的实现，是这样的：

  private final class Estimator implements TopItems.Estimator<Long> {        private final long theUserID;    private final long[] theNeighborhood;        Estimator(long theUserID, long[] theNeighborhood) {      this.theUserID = theUserID;      this.theNeighborhood = theNeighborhood;    }        @Override    public double estimate(Long itemID) throws TasteException {      return doEstimatePreference(theUserID, theNeighborhood, itemID);    }  }}

  protected float doEstimatePreference(long theUserID, long[] theNeighborhood, long itemID) throws TasteException {    //把相似用户对该Item的偏好累加起来,再做平均值,当做当前用户对改Item的偏好    if (theNeighborhood.length == 0) {      return Float.NaN;    }    DataModel dataModel = getDataModel();    double preference = 0.0;    double totalSimilarity = 0.0;    int count = 0;    for (long userID : theNeighborhood) {      if (userID != theUserID) {        // See GenericItemBasedRecommender.doEstimatePreference() too        Float pref = dataModel.getPreferenceValue(userID, itemID);        if (pref != null) {          double theSimilarity = similarity.userSimilarity(theUserID, userID);          if (!Double.isNaN(theSimilarity)) {            preference += theSimilarity * pref;            totalSimilarity += theSimilarity;            count++;          }        }      }    }    // Throw out the estimate if it was based on no data points, of course, but also if based on    // just one. This is a bit of a band-aid on the ‘stock‘ item-based algorithm for the moment.    // The reason is that in this case the estimate is, simply, the user‘s rating for one item    // that happened to have a defined similarity. The similarity score doesn‘t matter, and that    // seems like a bad situation.    if (count <= 1) {      return Float.NaN;    }    float estimate = (float) (preference / totalSimilarity);    if (capper != null) {      estimate = capper.capEstimate(estimate);    }    return estimate;  }

总结：
1）计算最相似的N个用户
2）从最相似的N个用户中，获取自己没有评分过的Item
3）预计自己对每个Item的偏好
4）取偏好最高的N个Item进行推荐

Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

看完仍有疑问？有类似问题直接问程序猿