看看微软语料里的特征都是些啥

来自:http://research.microsoft.com/en-us/projects/mslr/feature.aspx

 

Bing计算一个文档的相关性要用这么多维向量吗?太可怕了。。。

 

搜索引擎果然是高科技阿

 

 

Feature List of Microsoft Learning to Rank Datasets

feature id feature description stream comments
1 covered query term number body  
2 anchor
3 title
4 url
5 whole document
6 covered query term ratio body
7 anchor
8 title
9 url
10 whole document
11 stream length body
12 anchor
13 title
14 url
15 whole document
16 IDF(Inverse document frequency) body
17 anchor
18 title
19 url
20 whole document
21 sum of term frequency body
22 anchor
23 title
24 url
25 whole document
26 min of term frequency body
27 anchor
28 title
29 url
30 whole document
31 max of term frequency body
32 anchor
33 title
34 url
35 whole document
36 mean of term frequency body
37 anchor
38 title
39 url
40 whole document
41 variance of term frequency body
42 anchor
43 title
44 url
45 whole document
46 sum of stream length normalized term frequency body
47 anchor
48 title
49 url
50 whole document
51 min of stream length normalized term frequency body
52 anchor
53 title
54 url
55 whole document
56 max of stream length normalized term frequency body
57 anchor
58 title
59 url
60 whole document
61 mean of stream length normalized term frequency body
62 anchor
63 title
64 url
65 whole document
66 variance of stream length normalized term frequency body
67 anchor
68 title
69 url
70 whole document
71 sum of tf*idf body
72 anchor
73 title
74 url
75 whole document
76 min of tf*idf body
77 anchor
78 title
79 url
80 whole document
81 max of tf*idf body
82 anchor
83 title
84 url
85 whole document
86 mean of tf*idf body
87 anchor
88 title
89 url
90 whole document
91 variance of tf*idf body
92 anchor
93 title
94 url
95 whole document
96 boolean model body  
97 anchor
98 title
99 url
100 whole document
101 vector space model body
102 anchor
103 title
104 url
105 whole document
106 BM25 body
107 anchor
108 title
109 url
110 whole document
111 LMIR.ABS body Language model approach for information retrieval (IR) with absolute discounting smoothing
112 anchor
113 title
114 url
115 whole document
116 LMIR.DIR body Language model approach for IR with Bayesian smoothing using Dirichlet priors
117 anchor
118 title
119 url
120 whole document
121 LMIR.JM body Language model approach for IR with Jelinek-Mercer smoothing
122 anchor
123 title
124 url
125 whole document
126 Number of slash in URL    
127 Length of URL
128 Inlink number
129 Outlink number
130 PageRank
131 SiteRank   Site level PageRank
132 QualityScore   The quality score of a web page. The score is outputted by a web page quality classifier.
133 QualityScore2   The quality score of a web page. The score is outputted by a web page quality classifier, which measures the badness of a web page.
134 Query-url click count   The click count of a query-url pair at a search engine in a period
135 url click count   The click count of a url aggregated from user browsing data in a period
136 url dwell time   The average dwell time of a url aggregated from user browsing data in a period

你可能感兴趣的:(Web,Stream,Microsoft,url,query,微软)