word hashing-letter trigrams

We use a 100K-dimentions vector to express a query.

But obviously, it's too large  and can't contain all the words.


So we propose a word hashing method, 

e.g.    "shirt"

first, we expand the word with a pair of '#'. 

 #shirt#

then, we take every 3 letters.

#sh, shi, hir, irt ,rt#


Finally, the word is represented using a vector of letter n-grams

你可能感兴趣的:(word hashing-letter trigrams)