开源软件/代码链接合集

定向抓取/抽取: http://www.scrapy.org
流式处理工具:
收集计算 https://github.com/nathanmarz/storm
流数据统计 https://sites.google.com/site/countminsketch/

中文处理:
简繁转换cconv: http://code.google.com/p/cconv/
汉字转拼音pinyin4j: http://pinyin4j.sourceforge.net/

字符串匹配:
ahocorasick python实现
ahocorasick: https://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
acora: http://pypi.python.org/pypi/acora/1.5
esmre: http://code.google.com/p/esmre/
double array trie:
libdatrie: http://linux.thai.net/~thep/datrie/datrie.html
darts:  http://chasen.org/~taku/software/darts/
darts-clone:  http://code.google.com/p/darts-clone
java aho-corasick: https://github.com/robert-bor/aho-corasick
机器学习:
分类和rank机器学习工具包sofia-ml: http://code.google.com/p/sofia-ml/
一些机器学习算法SGD实现: http://leon.bottou.org/projects/sgd
liblinear: http://www.csie.ntu.edu.tw/~cjlin/liblinear/
libsvm: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
lingpipe: http://alias-i.com/lingpipe/
mahout: http://mahout.apache.org/
libfm: http://www.libfm.org/
graphChi https://code.google.com/p/graphchi/
CRF++ http://code.google.com/p/crfpp/
CRFSuite http://www.chokkan.org/software/crfsuite/
Wapiti http://wapiti.limsi.fr/
mloss: https://mloss.org/software/view/332

自然语言处理:
opennlp: http://opennlp.apache.org/
stanford corenlp: http://nlp.stanford.edu/software/corenlp.shtml#Download
srilm(语言模型): http://www.speech.sri.com/projects/srilm/download.html
mallet: http://mallet.cs.umass.edu/
gensim(topic model for human): http://radimrehurek.com/gensim/
TweetNLP: http://www.ark.cs.cmu.edu/TweetNLP/
java机器学习datumbox https://github.com/datumbox/datumbox-framework
语料:
20Newsgroups http://people.csail.mit.edu/jrennie/20Newsgroups/
南京大学机器学习与数据挖掘数据和代码 http://lamda.nju.edu.cn/CH.Data.ashx
图片处理:
http://code.google.com/p/thumbnailator/
http://yann.lecun.com/exdb/mnist

latex:
latex blog edit: http://latex.codecogs.com/gif.latex?
latax学习: http://latex.yo2.cn
latax Symbols: http://www.artofproblemsolving.com/Wiki/index.php/LaTeX:Symbols
http://web.ift.uib.no/Teori/KURS/WRK/TeX/symALL.html
latax math http://en.wikibooks.org/wiki/LaTeX/Mathematics
http://www.artofproblemsolving.com/Wiki/index.php/Math

cache:
simple-spring-memcached: http://code.google.com/p/simple-spring-memcached/wiki/Getting_Started
收藏链接:
ML/NLP: http://lxmls.it.pt/2013/
python数据分析 http://datacommunitydc.org/blog/2013/07/python-for-data-analysis-the-landscape-of-tutorials/?utm_source=rss&utm_medium=rss&utm_campaign=python-for-data-analysis-the-landscape-of-tutorials
Science Machine learning resource: http://m.sciencemag.org/site/feature/data/compsci/machine_learning.xhtml

公开课和slides:
语言类: http://www.codecademy.com/
数据科学 https://github.com/bcaffo/courses
CMU高级机器学习 http://www.cs.cmu.edu/~./epxing/Class/10715/lecture.html
凸优化 http://so.v.ifeng.com/video?q=%E5%87%B8%E4%BC%98%E5%8C%96&c=5#_v_mininav_search_pc
smola的课程 http://alex.smola.org/teaching/

你可能感兴趣的:(tools,open source)