首页 > 代码库 > spark MLlib 概念 5: 余弦相似度(Cosine similarity)
spark MLlib 概念 5: 余弦相似度(Cosine similarity)
概述:
余弦相似度 是对两个向量相似度的描述,表现为两个向量的夹角的余弦值。当方向相同时(调度为0),余弦值为1,标识强相关;当相互垂直时(在线性代数里,两个维度垂直意味着他们相互独立),余弦值为0,标识他们无关。
Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a Cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1].
定义
基础知识。。
The cosine of two vectors can be derived by using the Euclidean dot product formula:
Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as
The resulting similarity ranges from ?1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.
与皮尔森相关系数的关系
If the attribute vectors are normalized by subtracting the vector means (e.g., ), the measure is called centered cosine similarity and is equivalent to the Pearson Correlation Coefficient.
来源: <http://en.wikipedia.org/wiki/Cosine_similarity>
来自为知笔记(Wiz)
spark MLlib 概念 5: 余弦相似度(Cosine similarity)
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。