首页 > 代码库 > 《集体智慧编程》学习记录:欧几里得距离&皮尔逊相关系数

《集体智慧编程》学习记录:欧几里得距离&皮尔逊相关系数

 1 critics={Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5,Just My Luck: 3.0, Superman Returns: 3.5, You, Me and Dupree: 2.5, The Night Listener: 3.0},
 2          Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5, Just My Luck: 1.5, Superman Returns: 5.0, The Night Listener: 3.0, You, Me and Dupree: 3.5}, 
 3          Michael Phillips: {Lady in the Water: 2.5, Snakes on a Plane: 3.0,Superman Returns: 3.5, The Night Listener: 4.0},
 4          Claudia Puig: {Snakes on a Plane: 3.5, Just My Luck: 3.0, The Night Listener: 4.5, Superman Returns: 4.0, You, Me and Dupree: 2.5},
 5          Mick LaSalle: {Lady in the Water: 3.0, Snakes on a Plane: 4.0, Just My Luck: 2.0, Superman Returns: 3.0, The Night Listener: 3.0,You, Me and Dupree: 2.0}, 
 6          Jack Matthews: {Lady in the Water: 3.0, Snakes on a Plane: 4.0,The Night Listener: 3.0, Superman Returns: 5.0, You, Me and Dupree: 3.5},
 7          Toby: {Snakes on a Plane:4.5,You, Me and Dupree:1.0,Superman Returns:4.0}}
 8 
 9 critics[Lisa Rose][Lady in the Water]
10 critics[Toby][Snakes on a Plane] = 4.5
11 critics[Toby]

1.欧几里得距离

技术分享

 1 from math import sqrt
 2 def sim_distance(prefs,person1,person2):
 3     si = {}
 4     for item in prefs[person1]:
 5         if item in prefs[person2]:
 6             si[item] = 1
 7     if len(si) == 0:return 0
 8     
 9     sum_of_squares = sum([pow(prefs[person1][item] - prefs[person2][item],2)
10     for item in prefs[person1] if item in prefs[person2]])
11     
12     return 1/(1+sqrt(sum_of_squares))
13 
14 sim_distance(critics,Lisa Rose,Gene Seymour)

2.皮尔逊相关系数(能够修正“夸大分值”的情况)

技术分享

 1 def sim_pearson(prefs,p1,p2):
 2     si = {}
 3     for item in prefs[p1]:
 4         if item in prefs[p2]:
 5             si[item] = 1
 6             
 7     n = len(si)
 8     
 9     if n==0:
10         return 0
11     
12     sum1=sum([prefs[p1][it] for it in si])
13     sum2=sum([prefs[p2][it] for it in si])
14     
15     sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
16     sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
17     
18     pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
19     
20     num=pSum-(sum1*sum2/n)
21     den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
22     
23     if den==0:
24         return 0
25     
26     r = num/den
27     return r
28 
29 sim_pearson(critics,Lisa Rose,Gene Seymour)
30 
31 
32 def topMatchs(prefs,person,n=5,similarity=sim_pearson):
33     scores=[(similarity(prefs,person,other),other)
34     for other in prefs if other !=person]
35     
36     scores.sort()
37     scores.reverse()
38     
39     return scores[0:n]
40 
41 topMatchs(critics,Toby,n=3)

 

《集体智慧编程》学习记录:欧几里得距离&皮尔逊相关系数