首页 > 代码库 > 机器学习笔记(Washington University)- Clustering Specialization-week five
机器学习笔记(Washington University)- Clustering Specialization-week five
1. Mixed membership model
This model wants to discover a set of memberships
In contrast, cluster models aim at discovering a single membership
In clustering:
- one topic indicator zi per document i
- all words come from(get scored under) same topic zi
- distribution on prevaluence of topics in corpus, πi=[πi1 ... πik]
In LDA:
- one topic indicator ziw per word in doc i
- each word gets socred under its topic ziw
- distribution on prevaluence of topics in document, πi=[πi1 ... πik]
LDA inputs: set of words per doc for each doc in corpus
LDA outputs: corpus-wide topic vocab distributions, topic assignments per word, topic proportions per doc
Typically LDA is specified as a bayesian model
- account for unvertainty in parameters when making predictions
- naturally regularizes parameter estimates in contrast to MLE.
2. Gibbs sampling
Iterative random hard assignments
predictions:
- make prediction for each snapshot of randomly assigned variables/parameters
- average predictions for final result
- look at snapshot of randomly assigned variables/parameters that maximize joint model probability
benefits:
- intuitive updates
- very straightforward to implement
Procedure:
- randomly reassign all ziw based on doc topic proportions and topic vocab distributions
- randomly reassign doc topic proportions based on assignments ziw in current doc
- repeat for all docs
- randomly ressign topic vocab distributions based on assignments ziw in entire corpus
- repeat steps 1-4 until max iter reached
3. Collapsed gibbs sampling
Based no special structure of LDA model, can sample just indicator variables ziw.
no need to sample other parameters
- corpus-wide topic vocab distributions
- per-doc topic proportions
Procedure:
randomly reassign ziw based on current assignment zjv of all other words in document and corpus.
How much doc likes each topic based on other assignments in doc
nik is the current assignment to topic k in doc i
Ni is the words in doc i
α is the smoothing param from bayes prior
How much each topic likes the word dynamic based on assignments in other docs in corpus
mdynamic,k is the assignments corpus-wide of word dynamic to topic k
γ is the smoothing param
V is the size of vocab
probabilities = how much doc likes topic * how much topic likes word(normalize this product of terms over k possible topics)
Based on the probabilities increment count based on new assignmentof ziw
what to do with the collapsed samples?
From best sample of ziw, can infer
- Topics from conditional distribution
- document embedding
机器学习笔记(Washington University)- Clustering Specialization-week five