首页 > 代码库 > mahout推荐4-评估GroupLens数据集

mahout推荐4-评估GroupLens数据集

使用GroupLens数据集ua.base

这是一个tab分割的文件,用户Id,物品Id,评分(偏好值),以及附加信息。可用吗?之前使用的是CSV格式,现在是tsv格式,可用,使用FileDataModel

对mahout推荐2中的评估程序使用这个数据集测试:

package mahout;import java.io.File;import org.apache.mahout.cf.taste.common.TasteException;import org.apache.mahout.cf.taste.eval.RecommenderBuilder;import org.apache.mahout.cf.taste.eval.RecommenderEvaluator;import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator;import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;import org.apache.mahout.cf.taste.model.DataModel;import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;import org.apache.mahout.cf.taste.recommender.Recommender;import org.apache.mahout.cf.taste.similarity.UserSimilarity;import org.apache.mahout.common.RandomUtils;public class TestRecommenderEvaluator {	public static void main(String[] args) throws Exception {		//强制每次生成相同的随机值,生成可重复的结果		RandomUtils.useTestSeed();		//数据装填		//DataModel model = new FileDataModel(new File("data/intro.csv"));		DataModel model = new FileDataModel(new File("data/ua.base"));		//推荐评估,使用平均值		RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();		//推荐评估,使用均方差		//RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();		//用于生成推荐引擎的构建器,与上一例子实现相同		RecommenderBuilder builder = new RecommenderBuilder() {						public Recommender buildRecommender(DataModel model) throws TasteException {				// TODO Auto-generated method stub				//用户相似度,多种方法				UserSimilarity similarity = new PearsonCorrelationSimilarity(model);				//用户邻居				UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);				//一个推荐器				return new GenericUserBasedRecommender(model, neighborhood, similarity);			}		};		//推荐程序评估值(平均差值)训练90%的数据,测试数据10%,《mahout in Action》使用的是0.7,但是出现结果为NaN		double score = evaluator.evaluate(builder, null, model, 0.9, 1.0);		System.out.println(score);	}}

 结果输出:

14/08/04 09:52:38 INFO file.FileDataModel: Creating FileDataModel for file data\ua.base14/08/04 09:52:38 INFO file.FileDataModel: Reading file info...14/08/04 09:52:38 INFO file.FileDataModel: Read lines: 9057014/08/04 09:52:38 INFO model.GenericDataModel: Processed 943 users14/08/04 09:52:38 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of FileDataModel[dataFile:D:\workspace\zoodemo\data\ua.base]14/08/04 09:52:38 INFO model.GenericDataModel: Processed 943 users14/08/04 09:52:38 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 878 users14/08/04 09:52:38 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 878 tasks in 4 threads14/08/04 09:52:39 INFO eval.StatsCallable: Average time per recommendation: 39ms14/08/04 09:52:39 INFO eval.StatsCallable: Approximate memory used: 16MB / 79MB14/08/04 09:52:39 INFO eval.StatsCallable: Unable to recommend in 114 cases14/08/04 09:52:43 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.93750000000000020.9375000000000002

 现在是基于100 000 个偏好值,而不是少数几个

结果大约为0.9 在1到5的区间内,这个值偏离了将近一个点,不算太好。

也许我们正在使用的这个特定Recommender实现并不是最优的。