贝叶斯论坛垃圾帖屏蔽演示系统 Beta 1

介绍:

作为论坛的版主,肩负的任务之一就是维护论坛发言的质量,删除广告贴,灌水贴 垃圾贴等等.
本系统的开发目的就是为减轻版主的工作负担,自动识别垃圾贴的一个演示系统
理论依据是朴素贝叶斯原理.

使用的过程如下:
1、首先在多么乐注册帐号,登陆系统。
2、录入训练系统的原始数据,分两类垃圾贴 和 非垃圾贴。
3、录入需要检测的帖子,查看帖子是垃圾贴的百分比。

欢迎一起 讨论完善这个程序.

微软亚洲研究院-自然语言计算组

论文
  1. 信息检索的依存语言模型
    Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu and Guihong Cao."Dependence language model for information retrieval", In SIGIR-2004. Sheffield, UK, July 25-29, 2004.
  2. 一种英-汉命名实体对齐的新方法
    Dong-Hui Feng, Ya-Juan Lv, Ming Zhou,"A New Approach for English-Chinese Named Entity Alignment", 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, Jul. 2004.
  3. 基于单语语料库的搭配翻译自动获取
    Ya-Juan Lv,Ming Zhou,"Collocation Translation Acquisition Using Monolingual Corpora", 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, Jul. 2004.
  4. 可适应性的中文分词
    Jianfeng Gao, Andi Wu, Mu Li, Chang-Ning Huang, Hongqiao Li, Xinsong Xia and Haowei Qin."Adaptive Chinese word segmentation" , 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, Jul. 2004.
  5. 采用支持向量机识别中文新词
    Hongqiao Li, Chang-Ning Huang, Jianfeng Gao and Xiaozhong Fan, "The use of SVM for Chinese new word identification", In IJCNLP-04. Sanya City, Hainan Island, China, March 22-24, 2004.
  6. 语言模型中获取长距离依存的经验探讨
    Jianfeng Gao and Hisami Suzuki,"Capturing long distance dependency for language modeling: an empirical study", In IJCNLP-04. Sanya City, Hainan Island, China, March 22-24, 2004.
  7. Word Translation Disambiguation Using Bilingual Bootstrapping
    Hang Li and Cong Li," Word Translation Disambiguation Using Bilingual Bootstrapping", Computational Linguistics 30(1), 1-22, 2004.
  8. Text Classification Using Stochastic Keyword Generation
    Cong Li, Ji-Rong Wen, and Hang Li, "Text Classification Using Stochastic Keyword Generation", Proc. of ICML'03, 464-471.
  9. Uncertainty Reduction in Collaborative Bootstrapping: Measure and Algorithm
    Yunbo Cao, Hang Li, and Li Lian, "Uncertainty Reduction in Collaborative Bootstrapping: Measure and Algorithm", Proc. of ACL'03, 327-334.
  10. 改进的信源-信道模型在中文分词中的应用
    Ya-JJianfeng Gao, Mu Li and Chang-Ning Huang, "Improved Source-Channel Models for Chinese Word Segmentation", 41nd Annual Meeting of the Association for Computational Linguistics. Sapporo. Japan, July 7-12, 2003.
  11. Topic Analysis Using a Finite Mixture Model
    Hang Li and Kenji Yamanishi, "Topic Analysis Using a Finite Mixture Model", Information Processing & Management, 39(4), 521-541, (2003).
  12. Using Bilingual Web Data to Mine and Rank Translations
    Hang Li, Yunbo Cao, and Cong Li,"Using Bilingual Web Data to Mine and Rank Translations", IEEE Intelligent Systems, Vol. 18(4), 54-59, (2003)


你可能感兴趣的:(Web,工作)