统计所有词汇的 neighbor ,形成一个二维表,每个词汇则有对应的向量,通过向量的运算得出词汇的相关性 例子: to solve issues: low dimensional word vector
Goal: estimate the probablity of a word sequence
N-Gram language model: Probability is conditioned on a window of (n-1) previous words Issue: some sequences may not appear in the training data
Neural Language Modeling: estimate not from count, but from NN predition Issue: The input layer of the related words are close, so the possibility is almost the same
Recurrent Neutual Network Language Model (RNNLM) Idea: pass the information from the previous hidden layer to leverage all contexts
formulation: definition: Model Training: the target y is O, to train the model, we should change the parameters {U,V,W}
parameters{U,W} are tied together
Idea: make sure features are on the same scale compute the average and variance to adjust the feature Batch Normalization: