lecture 6

Continue from lecture 5: tagging as classification, combining searching and learning

Transition probability: count(t_{i}, t_{i-1}) / count(t_{i-1})
Emission probability: count(w_i, t_i) / count(t_i)
-code: Viterbi-tagger.ipynb
-those probabilities often are very small numbers.
Reason of dropping words: comparing long sentences with short ones, longer sentence has too small numbers related to short ones

Conclusion

-Cons: word with different meanings under the same tag, local information doesn't help:(plant)
All the model been talked so far only use local informations. But there are long term ambiguity.
Pipe line to handle different kinds of ambiguities.

Continue with the problem talked last time: when maximizing the likelihood...

Observe some words and find tags that maximizes the likelihood.
Since we're finding the sequence that maximizing the likelihood, the sequence at any time must have the maximum likelihood till this specific time(only one path from the start point till now), a DP problem.

A small code example on wikipedia of Viterbi algorithm.

Complexity Analysis

-num of words:
-num of tags:
Sometimes changing tag set helps.

Trigram Tagging

Similar with trigram with language models on the transition probability side, emission probability stays the same.
This is a tutorial about tagging with Markov models.

Cons: Couldn't generalize well, sparsity makes it easy to blow up with unseen transitions.

Review -- Search

Guidance: give guidance, algorithm will improve. e.g: in the example(path planing) mentioned in class: guide = cost + heuristic guise
A* search wiki.

Analogy to Sequence Tagging

Transform probability into neg-log-prob.
Heuristic is a lower bound .... not clear....
Compare A* , Viterbi and beam search.

你可能感兴趣的:(lecture 6)