机器学习笔记（Washington University）- Clustering Specialization-week four

2024-09-29 08:08:02 218人阅读

1. Probabilistic clustering model

(k-means) Hard assignments do not tell the full story, capture the uncertainty
k-means only considers the cluster center, not good for overlapping clusters,disparate cluster size,different shaped cluster
learn weights on dimensions
can learn cluster-specific weights on dimensions

2. Gaussian distribution

1-D gaussian is fully specified by mean μ and variance σ².

2-D gaussian is fully specified by mean μ vector and covariance matrix Σ.

技术分享

thusly our mixture model of gaussian is defined by

{π_k, μ_k, Σ_k}

3. EM(Expectation maximization)

what if we knew the cluster parameters {π_k, μ_k, Σ_k} ?

compute responsibilites:

技术分享

r_ik is the responsibility cluster k takes for observation i.

p is the probability of assignment to cluster k, given model parameters and observaed value.

π_k is the initial probability of being from cluster k.

N is the gaussian model.

what if we knew the cluster soft assignments r_ij ?

技术分享

The procedure for the iterative algorithm:

1. initialize

2. estimate cluster responsibilities given current parameter estimates(E-step)

3. maximize likelihood given soft assignments

Notes:

EM is a coordinate-ascent algorithm

EM converges to a local mode

There are many ways to initialize the EM algorithm and it is important for convergence rates and quality of local mode

prevent overfitting

Do not let the variance goes down to zero, add small amount to diagonal of covariance estimate

机器学习笔记（Washington University）- Clustering Specialization-week four

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们