快十年没碰过编程了,最近公司服务器换了一下,挪腾机器的过程里面顺便触发了再写点程序玩玩的想法,于是拿起.net的教材看了两天,练手写了个,测试反正能用,哈哈,发来让高手指点指点,揪揪错!
盘古开源分词组件直接去http://pangusegment.codeplex.com/下载就是了,Pangu.dll和Pangu.xml文件放入wwwroot的bin目录,字典别忘记放进去bin下的Dictionaries目录,呵呵,pangu.xml要设置好Dictionaries目录位置
用aspx共写了三个程序,第一个:
文件名:Default.aspx,存放目录:wwwroot
< html >
< head runat = " server " >
< title > WWW.RCSKY.NET title >
head >
< body >
< p >
< form id = " form1 " runat = " server " >
< div class = " align-center " >
< p > 原文: < asp:Label ID = " fc_content " runat = " server " Text = " 分词内容 " > asp:Label >
< p > 分词结果: < br >< asp:Label ID = " fc_result " runat = " server " Text = " 分词结果 " > asp:Label >
< p >< asp:DataGrid id = " Orign " runat = " server " HeaderStyle - BackColor = " #aaaadd " AlternatingItemStyle - BackColor = " #eeeeee " />
< br />
div >
form >
body >
html >
第二个:default.aspx.cs,存放目录:wwwroot
using System;
using System.Data;
using System.Data.OleDb;
using Rcsky.GetKeyword;
public partial class _Default : System.Web.UI.Page
{
private void Page_Load(Object src,EventArgs e)
{
string MyConnString = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source="+Server.MapPath("DatabaseDir/data.mdb");
string strSel = "select * from db_table where id="+Request.QueryString["id"];
DataSet ds = new DataSet();
OleDbConnection MyConn = new OleDbConnection(MyConnString);
OleDbDataAdapter MyAdapter = new OleDbDataAdapter(strSel,MyConn);
OleDbCommandBuilder custCB = new OleDbCommandBuilder(MyAdapter);
MyAdapter.Fill(ds,"TB_content");
Orign.DataSource = ds;
Orign.DataMember = "TB_content";
Orign.DataBind();
if (ds.Tables[0].Rows.Count > 0) {
DataRow dr=ds.Tables[0].Rows[0];
fc_content.Text=dr["description"].ToString();//对db_table的description字段进行分词
fc_result.Text = Segment.DoSegment(fc_content.Text);
}
GC.Collect();
}
}
第三个程序:keyword.cs,存放目录:wwwroot/App_Code
using System.Collections;
using System.Collections.Generic;
namespace Rcsky.GetKeyword {
public class Segment {
public static string DoSegment( string keyWord)
{
return DoSegment(keyWord, "
" ); //分词输出的间隔符
}
public static string DoSegment( string keyWord, string separator) {
PanGu.Segment.Init();
PanGu.Segment segment = new PanGu.Segment();
ICollection < PanGu.WordInfo > words = segment.DoSegment(keyWord);
keyWord = "" ;
int i = 0 ;
string v_list = "" ;
foreach (PanGu.WordInfo wordInfo in words)
{
v_list=wordInfo.Word+"^"+wordInfo.Rank+"^"+wordInfo.Frequency+"^"+wordInfo.WordType+"^"+wordInfo.Pos;
// 词 + " ^ " + 权重 + " ^ " + 词频 + " ^ " + " ^ " + 词性;
if (i == 0 ) keyWord = v_list;
else keyWord += separator + v_list;
i ++ ;
}
return keyWord;
}
}
}
运行default.aspx,结果是这个样子滴,呵呵,词^权重^词频^词性,我的程序没有做任何校验和判断,看官要自己加上,要不id缺失,或者表的description为Null,应该会出错滴
问题关键不在这,呵呵,反正有了“词^权重^词频^词性”这个结果,后面的事情不就好办了,切分一下,爱筛选也行,爱按权重词频排序也行,那就不细说了。我计划的应用是自动提取文章的关键词,填入到db_table的keywords字段里面去,这样无论搜索还是输入到页面做SEO,不都挺好用的嘛,呵呵
再次说明哦:以上程序只是我用两天时间粗略看了《dotNET入门经典教程:七天学会用.NET绘图》和《亲密接触ASP.NET 》这两本电子书,随便写的一个小练手程序,参考下就好了,千万别拿了就用啊,D机不负责哈,呵呵
老鸟看我这源码有什么问题,如果您有空的话,俺虚心请您指教,俺还什么都不懂,门都没找着呢,呵呵
最后,感谢盘古开源的作者,感谢两本电子教材的作者(也不知道我下载的电子书有没有侵犯人家的版权,唉,主要是懒得跑图书城了),感谢那么多辛勤在网上写文章公布源码教导新手的老鸟!