向量空间模型文档相似度计算实现(C#)

<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">读者可以根据自己的需要进行加壳或改写,本文权当抛砖引玉。</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">笔者加的壳在:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="text-decoration: underline;"><span style="color: #800080;"><a href="http://download.csdn.net/source/1143450">http://download.csdn.net/source/1143450</a></span></span><a href="http://download.csdn.net/source/1143450"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">VSM模型介绍:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="color: #0000ff;"><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx">http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx</a></span><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4023944.aspx"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Collections</span>.<span style="color: #010001;">Generic</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Linq</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>.<span style="color: #010001;">RegularExpressions</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">namespace</span><span style=""> <span style="color: #010001;">Felomeng</span>.<span style="color: #010001;">VSMSimilarity</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">class</span> <span style="color: #2b91af;">SVMModle</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">降维词表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">private</span> <span style="color: #2b91af;">List</span>&lt;<span style="color: blue;">string</span>&gt; <span style="color: #010001;">reducingKeys</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">List</span>&lt;<span style="color: blue;">string</span>&gt;();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="reducingKeys"&gt;</span><span style="color: green;" lang="ZH-CN">降维词表</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>(<span style="color: #2b91af;">List</span>&lt;<span style="color: blue;">string</span>&gt; <span style="color: #010001;">reducingKeys</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">this</span>.<span style="color: #010001;">reducingKeys</span> = <span style="color: #010001;">reducingKeys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:不使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>()</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="text1"&gt;</span><span style="color: green;" lang="ZH-CN">文档1(分好词的,分词符为非汉字字符)</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="text2"&gt;</span><span style="color: green;" lang="ZH-CN">文档2(分好词的,分词符为非汉字字符)</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;returns&gt;</span><span style="color: green;" lang="ZH-CN">两篇文章的相似度</span><span style="color: gray;">&lt;/returns&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text1</span>, <span style="color: blue;">string</span> <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">dictionary1</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">dictionary2</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> &lt; 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> &lt; 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt;.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt;.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="text1"&gt;</span><span style="color: green;" lang="ZH-CN">第一篇文档的词频词典</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="text2"&gt;</span><span style="color: green;" lang="ZH-CN">第二篇文档的词频词典</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;returns&gt;</span><span style="color: green;" lang="ZH-CN">两篇文档的相似度</span><span style="color: gray;">&lt;/returns&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">text1</span>, <span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">dictionary1</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>,<span style="color: blue;">int</span>&gt;( <span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">dictionary2</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>,<span style="color: blue;">int</span>&gt;( <span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> &lt; 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> &lt; 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt;.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt;.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">统计文档词频词典</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;/summary&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;param name="text"&gt;</span><span style="color: green;" lang="ZH-CN">已分词文档,分隔符为非汉语字符</span><span style="color: gray;">&lt;/param&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;">&lt;returns&gt;</span><span style="color: green;" lang="ZH-CN">该文档词频词典</span><span style="color: gray;">&lt;/returns&gt;</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">GetDictionary</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt; <span style="color: #010001;">dictionary</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span>&lt;<span style="color: blue;">string</span>, <span style="color: blue;">int</span>&gt;();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Regex</span> <span style="color: #010001;">regex</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Regex</span>(<span style="color: #a31515;">@"[\u4e00-\u9fa5]+"</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">MatchCollection</span> <span style="color: #010001;">results</span> = <span style="color: #010001;">regex</span>.<span style="color: #010001;">Matches</span>(<span style="color: #010001;">text</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: #2b91af;">Match</span> <span style="color: #010001;">word</span> <span style="color: blue;">in</span> <span style="color: #010001;">results</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (<span style="color: #010001;">dictionary</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp</span>++;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: #010001;">temp</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">else</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, 1);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">dictionary</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">还有很多可以优化的地方,大家多加思考。如果能够得到适当优化的话,速度还能提高很多。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-size: 6pt; line-height: 115%;"><span style="font-family: Calibri;"><span style="font-size: small;"></span></span></span></p>

你可能感兴趣的:(相似度计算)