首页 > 代码库 > 机器学习笔记(Washington University)- Clustering Specialization-week five

机器学习笔记(Washington University)- Clustering Specialization-week five

1. Mixed membership model

This model wants to discover a set of memberships

In contrast, cluster models aim at discovering a single membership

In clustering:

  • one topic indicator zi per document i
  • all words come from(get scored under) same topic zi
  • distribution on prevaluence of topics in corpus,  πi=[πi1 ... πik]

In LDA:

  • one topic indicator ziw per word in doc i
  • each word gets socred under its topic  ziw 
  • distribution on prevaluence of topics in document, πi=[πi1 ... πik]

LDA inputs: set of words per doc for each doc in corpus

LDA outputs: corpus-wide topic vocab distributions, topic assignments per word, topic proportions per doc

Typically LDA is specified as a bayesian model

  • account for unvertainty in parameters when making predictions
  • naturally regularizes parameter estimates in contrast to MLE.

 

2. Gibbs sampling

Iterative random hard assignments

predictions:

  • make prediction for each snapshot of randomly assigned variables/parameters
  • average predictions for final result
  • look at snapshot of randomly assigned variables/parameters that maximize joint model probability

benefits:

  • intuitive updates
  • very straightforward to implement

Procedure:

  • randomly reassign all ziw based on doc topic proportions and topic vocab distributions
  • randomly reassign doc topic proportions based on assignments ziw in current doc
  • repeat for all docs
  • randomly ressign topic vocab distributions based on assignments ziw in entire corpus
  • repeat steps 1-4 until max iter reached

3. Collapsed gibbs sampling

Based no special structure of LDA model, can sample just indicator variables ziw.

no need to sample other parameters

  • corpus-wide topic vocab distributions
  • per-doc topic proportions

Procedure:

randomly reassign ziw based on current assignment zjv of all other words in document and corpus.

How much doc likes each topic based on other assignments in doc

 

 技术分享

 

nik is the current assignment to topic k in doc i

Ni is the words in doc i

α is the smoothing param from bayes prior

 

How much each topic likes the word dynamic based on assignments in other docs in corpus

技术分享

mdynamic,k is the assignments corpus-wide of word dynamic to topic k

γ is the smoothing param

V is the size of vocab

 

probabilities = how much doc likes topic * how much topic likes word(normalize this product of terms over k possible topics)

Based on the probabilities increment count based on new assignmentof ziw

what to do with the collapsed samples?

From best sample of ziw, can infer

  • Topics from conditional distribution
  • document embedding

 

机器学习笔记(Washington University)- Clustering Specialization-week five