以下是根据不同语言类型和应用领域收集的各类工具库,持续更新中。
C
通用机器学习
Recommender- 一个产品推荐的C语言库,利用了协同过滤.
计算机视觉
CCV -C-based/Cached/Core Computer Vision Library ,是一个现代化的计算机视觉库。
VLFeat - VLFeat 是开源的 computervision algorithms库, 有 Matlab toolbox。
C++
计算机视觉
OpenCV - 最常用的视觉库。有C++, C, Python 以及 Java 接口),支持Windows, Linux, Android and Mac OS。
DLib - DLib 有 C++ 和 Python 脸部识别和物体检测接口 。
EBLearn - Eblearn 是一个面向对象的 C++ 库,实现了各种机器学习模型。
VIGRA - VIGRA 是一个跨平台的机器视觉和机器学习库,可以处理任意维度的数据,有Python接口。
通用机器学习
MLPack - 可拓展的 C++ 机器学习库。
DLib - 设计为方便嵌入到其他系统中。
encog-cpp
shark
VowpalWabbit (VW) - A fast out-of-core learning system.
sofia-ml - fastincremental 算法套件.
Shogun -The Shogun Machine Learning Toolbox
Caffe - deeplearning 框架,结构清晰,可读性好,速度快。
CXXNET - 精简的框架,核心代码不到1000 行。
XGBoost - 为并行计算优化过的gradient boosting library.
CUDA - Thisis a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
Stan - A probabilisticprogramming language implementing full Bayesian statistical inference withHamiltonian Monte Carlo sampling
BanditLib - Asimple Multi-armed Bandit library.
Timbl - 实现了多个基于内存的算法,其中 IB1-IG (KNN分类算法)和 IGTree(决策树)在NLP中广泛应用.
自然语言处理
MIT InformationExtraction Toolkit - C, C++, and Python 工具,用来命名实体识别和关系抽取。
CRF++ - 条件随机场的开源实现,可以用作分词,词性标注等。
CRFsuite -CRFsuite 是条件随机场的实现,可以用作词性标注等。
BLLIP Parser -即Charniak-Johnson parser。
colibri-core- 一组C++ library, 命令行工具以及Python binding,高效实现了n-grams 和 skipgrams。
ucto - 多语言tokenizer,支持面向Unicode的正则表达式,支持FoLiA 格式.
libfolia - C++library for the FoLiAformat
MeTA - MeTA : ModErn Text Analysis从巨量文本中挖掘数据。
机器翻译
EGYPT (GIZA++)
Moses
pharaoh
SRILM
NiuTrans
jane
SAMT
语音识别
Kaldi - Kaldi是一个C ++工具,以Apache许可证V2.0发布。Kaldi适用于语音识别的研究。
Sequence Analysis
ToPS - This is anobjected-oriented framework that facilitates the integration of probabilisticmodels for sequences over a user defined alphabet.
Java
自然语言处理
Cortical.io - Retina: 此API执行复杂的NLP操作(消歧义,分类,流文本过滤等),快速、直观如同大脑一般。
CoreNLP- Stanford CoreNLP 提供了一组自然语言分析工具,可采取raw英语文本输入并给出单词的基本形式。
StanfordParser - parser是一个程序,能分析出句子的语法结构。
StanfordPOS Tagger - 词性标注器
StanfordName Entity Recognizer - 斯坦福大学NER是一个Java实现的命名实体识别器。
StanfordWord Segmenter - 原始文本的token化是许多NLP任务的标准预处理步骤。
Tregex,Tsurgeon and Semgrex - Tregex是匹配树模式的工具,基于树的关系和正则表达式的节点匹配( short for "tree regular expressions")。
StanfordPhrasal: A Phrase-Based Translation System
StanfordEnglish Tokenizer - Stanford Phrasal 是最先进的统计的基于短语的机器翻译系统,用Java编写。
StanfordTokens Regex - A tokenizer divides text into a sequence of tokens, whichroughly correspond to "words"
StanfordTemporal Tagger - SUTime 是识别和规范时间表达式的库。
StanfordSPIED - 从种子集开始,迭代使用模式,从未标注文本中习得实体。
StanfordTopic Modeling Toolbox - 主题建模工具,社会学家用它分析的数据集。
TwitterText Java - Java实现的Twitter文本处理库。
MALLET - 基于Java的软件包,包括统计自然语言处理,文档分类,聚类,主题建模,信息提取,以及其它机器学习应用。
OpenNLP - 一个基于机器学习的自然语言处理的工具包。
LingPipe - 计算语言学工具包。
ClearTK - ClearTK提供了开发统计自然语言处理组件的框架,其建立在Apache UIMA之上。
Apache cTAKES - Apache 临床文本分析及知识提取系统(cTAKES)是从电子病历、临床文本中进行信息抽取的一个开源系统。
通用机器学习
aerosolve -Airbnb 从头开始设计的机器学习库,易用性好。
Datumbox- 机器学习和统计应用程序的快速开发框架。
ELKI - 数据挖掘工具. (非监督学习: 聚类, 离群点检测等.)
Encog - 先进的神经网络和机器学习框架。 Encog中包含用于创建各种网络,以及规范和处理数据的神经网络。 Encog训练采用多线程弹性的传播方式。 Encog还可以利用GPU的进一步加快处理时间。有基于GUI的工作台。
H2O - 机器学习引擎,支持Hadoop,Spark等分布式系统和个人电脑,可以通过R, Python, Scala, REST/JSON调用API。
htm.java - 通用机器学习库,使用Numenta’s Cortical Learning Algorithm
java-deeplearning- 分布式深度学习平台 for Java, Clojure,Scala
JAVA-ML - Java通用机器学习库,所有算法统一接口。
JSAT- 具有很多分类,回归,聚类等机器学习算法。
Mahout - 分布式机器学习工具。
Meka - 一个开源实现的多标签分类和评估方法。基于weka扩展。
MLlibin Apache Spark - Spark分布式机器学习库
Neuroph - 轻量级Java神经网络框架
ORYX - LambdaArchitecture Framework,使用Apache Spark和Apache Kafka实现实时大规模机器学习。
RankLib- 排序算法学习库。
StanfordClassifier - A classifier is a machine learning tool that will take dataitems and place them into one of k classes.
SmileMiner -Statistical Machine Intelligence & Learning Engine
SystemML- 灵活的,可扩展的机器学习语言。
WalnutiQ - 面向对象的人脑模型
Weka - WEKA是机器学习算法用于数据挖掘任务的算法集合。
语音识别
CMU Sphinx - 开源工具包,用于语音识别,完全基于Java的语音识别库。
数据分析、可视化
Hadoop -Hadoop/HDFS
Spark - Spark 快速通用的大规模数据处理引擎。
Impala - 实时Hadoop查询。
DataMelt - 数学软件,包含数值计算,统计,符号计算,数据分析和数据可视化。
Dr. MichaelThomas Flanagan's Java Scientific Library
Deep Learning
Deeplearning4j- 可扩展的产业化的深度学习,利用并行的GPU
Python
计算机视觉
Scikit-Image- Python中的图像处理算法的集合。
SimpleCV - 一个开源的计算机视觉框架,允许访问几个高性能计算机视觉库,如OpenCV。可以运行在Mac,Windows和Ubuntu Linux操作系统上。
Vigranumpy - 计算机视觉库VIGRAC++ 的Python绑定。
自然语言处理
NLTK - 构建与人类语言数据相关工作的Python程序的领先平台。
Pattern - 基于Python的Web挖掘模块。它有自然语言处理,机器学习等工具。
Quepy - 将自然语言问题转换成数据库查询语言。
TextBlob - 为普通的自然语言处理(NLP)任务提供一致的API。构建于NLTK和Pattern上,并很好地与两者交互。
YAlign - 句子对齐工具,从对照语料中抽取并行句子。
jieba - 中文分词工具
SnowNLP - 中文文本处理库。
loso - 中文分词工具
genius - 基于条件随机场的中文分词工具
KoNLPy - 韩语自然语言处理
nut - 自然语言理解工具
Rosetta- Text processing tools and wrappers (e.g. Vowpal Wabbit)
BLLIP Parser- BLLIP Natural Language Parser 的Python绑定(即 Charniak-Johnson parser)
PyNLPl - Python的自然语言处理库。还包含用于解析常见NLP格式的工具,如FoLiA,以及 ARPA language models, Moses phrasetables, GIZA++ 对齐等。
python-ucto -ucto(面向unicode的基于规则的tokenizer)的Python 绑定
python-frog -Frog的Python 绑定。荷兰语的词性标注,lemmatisation,依存分析,NER。
python-zpar- ZPar的Python 绑定(英文的基于统计的词性标注, constiuency解析器和依赖解析器)
colibri-core- 高效提取 n-grams 和 skipgrams的C++库的Python 绑定
spaCy - 工业级 NLP withPython and Cython.
PyStanfordDependencies- 将 Penn Treebank tree转换到Stanford 依存树的Python接口.
通用机器学习
machinelearning - 构建和web-interface, programmatic-interface 兼容的支持向量机API. 相应的数据集存储到一个SQL数据库,然后生成用于预测的模型,存储到一个NoSQL的数据库。
XGBoost - eXtremeGradient Boosting (Tree)库的Python 绑定
Featureforge一组工具,用于创建和测试机器学习的特征,具有与scikit-learn兼容的API
scikit-learn - 基于SciPy的机器学习的Python模块。
metric-learn- metric learning的Python模块
SimpleAI -实现了“人工智能现代方法”一书中描述的许多人工智能算法。它着重于提供一个易于使用的,文档良好的和经过测试的库。
astroML - 天文学机器学习和数据挖掘库。
graphlab-create- 基于disk-backed DataFrame的库,实现了各种机器学习模型(回归,聚类,推荐系统,图形分析等)。
BigML - 与外部服务器交流的库。
pattern - Web数据挖掘模块.
NuPIC - Numenta智能计算平台.
Pylearn2 - 基于 Theano的机器学习库。
keras - 基于 Theano的神经网络库
hebel - GPU加速的Python深度学习库。
Chainer - 灵活的神经网络架构
gensim - 易用的主题建模工具
topik - 主题建模工具包
PyBrain - AnotherPython Machine Learning Library.
Crab - 灵活的,快速的推荐引擎
python-recsys- 实现一个推荐系统的Python工具
RestrictedBoltzmann Machines -受限玻尔兹曼机
CoverTree -Python implementation of cover trees, near-drop-in replacement forscipy.spatial.kdtree
nilearn -NeuroImaging机器学习库
Shogun -Shogun Machine Learning Toolbox
Pyevolve - 遗传算法框架
Caffe - deep learning框架,结构清晰,可读性好,速度快。
breze - 基于Theano 的深度神经网络
pyhsmm - 贝叶斯隐马尔可夫模型近似无监督的推理和显式时长隐半马尔可夫模型,专注于贝叶斯非参数扩展,the HDP-HMM and HDP-HSMM,大多是弱极限近似。
mrjob - 使得 Python 程序可以跑在 Hadoop上.
SKLL- 简化的scikit-learn接口,易于做实验
neurolab - https://github.com/zueve/neurolab
Spearmint -贝叶斯算法的优化。方法见于论文: Practical Bayesian Optimization of Machine Learning Algorithms.Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural InformationProcessing Systems, 2012.
Pebl - 贝叶斯学习的Python环境
Theano - 优化GPU元编程代码,生成面向矩阵的优化的数学编译器
TensorFlow- 用数据流图进行数值计算的开源软件库
yahmm - 隐马尔可夫模型,用Cython实现
python-timbl- 包装了完整的TiMBL C ++编程接口. Timbl是一个精心制作的k最近邻机器学习工具包。
deap - 进化算法框架
pydeep - Python 深度学习
mlxtend - 对数据科学和机器学习任务非常有用的工具库。
neon - 高性能 深度学习框架
Optunity - 致力于自动化超参数优化过程,使用一个简单的,轻量级的API,以方便直接替换网格搜索。
Annoy - Approximatenearest neighbours implementation
skflow - TensorFlow的简化界面, 类似 ScikitLearn.
TPOT - 自动创建并利用geneticprogramming优化机器学习的管道。将它看作您的数据科学助理,自动化机器学习中大部分的枯燥工作。
数据分析、可视化
SciPy - A Python-basedecosystem of open-source software for mathematics, science, and engineering.
NumPy - A fundamental packagefor scientific computing with Python.
Numba - Python JIT (just intime) complier to LLVM aimed at scientific Python by the developers of Cythonand NumPy.
NetworkX - Ahigh-productivity software for complex networks.
Pandas - A libraryproviding high-performance, easy-to-use data structures and data analysistools.
Open Mining -Business Intelligence (BI) in Python (Pandas web interface)
PyMC - MarkovChain Monte Carlo sampling toolkit.
zipline - APythonic algorithmic trading library.
PyDy - Short for Python Dynamics,used to assist with workflow in the modeling of dynamic motion based aroundNumPy, SciPy, IPython, and matplotlib.
SymPy - A Pythonlibrary for symbolic mathematics.
statsmodels- Statistical modeling and econometrics in Python.
astropy - A community Pythonlibrary for Astronomy.
matplotlib - A Python 2Dplotting library.
bokeh - InteractiveWeb Plotting for Python.
plotly - Collaborative webplotting for Python and matplotlib.
vincent - APython to Vega translator.
d3py - A plottlinglibrary for Python, based on D3.js.
ggplot - Same API asggplot2 for R.
ggfortify -Unified interface to ggplot2 popular R packages.
Kartograph.py- Rendering beautiful SVG maps in Python.
pygal - A Python SVG ChartsCreator.
PyQtGraph - Apure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
pycascading
Petrel - Tools forwriting, submitting, debugging, and monitoring Storm topologies in pure Python.
Blaze - NumPy andPandas interface to Big Data.
emcee - The Pythonensemble sampling toolkit for affine-invariant MCMC.
windML - A Python Frameworkfor Wind Energy Analysis and Prediction
vispy - GPU-basedhigh-performance interactive OpenGL 2D/3D data visualization library
cerebro2 Aweb-based visualization and debugging platform for NuPIC.
NuPICStudio An all-in-one NuPIC Hierarchical Temporal Memory visualization anddebugging super-tool!
SparklingPandasPandas on PySpark (POPS)
Seaborn- A python visualization library based on matplotlib
bqplot - An APIfor plotting in Jupyter (IPython)
Common Lisp
通用机器学习
mgl - Neuralnetworks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes
mgl-gpr -Evolutionary algorithms
cl-libsvm -Wrapper for the libsvm support vector machine library
Clojure
自然语言处理
Clojure-openNLP- Natural Language Processing in Clojure (opennlp)
Infections-clj- Rails-like inflection library for Clojure and ClojureScript
通用机器学习
Touchstone- Clojure A/B testing library
Clojush - hePush programming language and the PushGP genetic programming system implementedin Clojure
Infer - Inferenceand machine learning in clojure
Clj-ML - Amachine learning library for Clojure built on top of Weka and friends
Encog - Clojurewrapper for Encog (v3) (Machine-Learning framework that specializes inneural-nets)
Fungp - A geneticprogramming library for Clojure
Statistiker- Basic Machine Learning algorithms in Clojure.
clortex -General Machine Learning library using Numenta’s Cortical Learning Algorithm
comportex- Functionally composable Machine Learning library using Numenta’s CorticalLearning Algorithm
数据分析、可视化
Incanter - Incanter is aClojure-based, R-like platform for statistical computing and graphics.
PigPen -Map-Reduce for Clojure.
Envision -Clojure Data Visualisation library, based on Statistiker and D3
Matlab
计算机视觉
Contourlets - MATLAB source code that implements thecontourlet transform and its utility functions.
Shearlets- MATLAB code for shearlet transform
Curvelets -The Curvelet transform is a higher dimensional generalization of the Wavelettransform designed to represent images at different scales and differentangles.
Bandlets- MATLAB code for bandlet transform
mexopencv -Collection and a development kit of MATLAB mex functions for OpenCV library
自然语言处理
NLP - An NLP library for Matlab
通用机器学习
t-DistributedStochastic Neighbor Embedding - t-SNE是一个获奖的技术,可以降维,尤其适合高维数据可视化
Spider -The spider有望成为matlab里机器学习中的完整的面向对象环境。
LibSVM- 著名的支持向量机库。
LibLinear- A Library for Large Linear Classification
Caffe - deeplearning 框架,结构清晰,可读性好,速度快。
Pattern RecognitionToolbox - Matlab机器学习中一个完整的面向对象的环境。
Optunity - A librarydedicated to automated hyperparameter optimization with a simple, lightweightAPI to facilitate drop-in replacement of grid search. Optunity is written inPython but interfaces seamlessly with MATLAB.致力于自动化超参数优化的,一个简单的,轻量级的API库,方便直接替换网格搜索。 Optunity是用Python编写的,但与MATLAB的无缝连接。
数据分析、可视化
matlab_gbl - MatlabBGL is a Matlab package for working withgraphs.
gamic - Efficient pure-Matlab implementations of graphalgorithms to complement MatlabBGL's mex functions.
.NET
计算机视觉
OpenCVDotNet- A wrapper for the OpenCV project to be used with .NET applications.
Emgu CV- Cross platform wrapper of OpenCV which can be compiled in Mono to e run onWindows, Linus, Mac OS X, iOS, and Android.
AForge.NET -Open source C# framework for developers and researchers in the fields of ComputerVision and Artificial Intelligence. Development has now shifted to GitHub.
Accord.NET - Togetherwith AForge.NET, this library can provide image processing and computer visionalgorithms to Windows, Windows RT and Windows Phone. Some components are alsoavailable for Java and Android.
自然语言处理
Stanford.NLPfor .NET - A full port of Stanford NLP packages to .NET and also availableprecompiled as a NuGet package.
通用机器学习
Accord-Framework - 一个完整的框架,可以用于机器学习,计算机视觉,computer audition,信号处理,统计应用等。.
Accord.MachineLearning- Support Vector Machines, Decision Trees, Naive Bayesian models, K-means,Gaussian Mixture models and general algorithms such as Ransac, Cross-validationand Grid-Search for machine-learning applications. This package is part of theAccord.NET Framework.
DiffSharp - Anautomatic differentiation (AD) library providing exact and efficientderivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-freeHessian- and Jacobian-vector products) for machine learning and optimizationapplications. Operations can be nested to any level, meaning that you cancompute exact higher-order derivatives and differentiate functions that areinternally making use of differentiation, for applications such ashyperparameter optimization.
Vulpes - Deepbelief and deep learning implementation written in F# and leverages CUDA GPUexecution with Alea.cuBase.
Encog- An advanced neural network and machine learning framework. Encog containsclasses to create a wide variety of networks, as well as support classes tonormalize and process data for these neural networks. Encog trains usingmultithreaded resilient propagation. Encog can also make use of a GPU tofurther speed processing time. A GUI based workbench is also provided to helpmodel and train neural networks.
Neural Network Designer - DBMSmanagement system and designer for neural networks. The designer application isdeveloped using WPF, and is a user interface which allows you to design yourneural network, query the network, create and configure chat bots that arecapable of asking questions and learning from your feed back. The chat bots caneven scrape the internet for information to return in their output as well asto use for learning.
数据分析、可视化
numl - numl is amachine learning library intended to ease the use of using standard modelingtechniques for both prediction and clustering.
Math.NETNumerics - Numerical foundation of the Math.NET project, aiming to providemethods and algorithms for numerical computations in science, engineering andevery day use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac;Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCLPortable Profiles 47 and 344; Android/iOS with Xamarin.
Sho- Sho is an interactive environment for data analysis and scientific computingthat lets you seamlessly connect scripts (in IronPython) with compiled code (in.NET) to enable fast and flexible prototyping. The environment includespowerful and efficient libraries for linear algebra as well as datavisualization that can be used from any .NET language, as well as afeature-rich interactive shell for rapid development.
Ruby
自然语言处理
Treat - TextREtrieval and Annotation Toolkit, definitely the most comprehensive toolkitI’ve encountered so far for Ruby
RubyLinguistics - Linguistics is a framework for building linguistic utilitiesfor Ruby objects in any language. It includes a generic language-independentfront end, a module for mapping language codes into language names, and amodule which contains various English-language utilities.
Stemmer -Expose libstemmer_c to Ruby
RubyWordnet - This library is a Ruby interface to WordNet
Raspel - raspellis an interface binding for ruby
UEA Stemmer -Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
Twitter-text-rb- A library that does auto linking and extraction of usernames, lists andhashtags in tweets
通用机器学习
RubyMachine Learning - Some Machine Learning algorithms, implemented in Ruby
MachineLearning Ruby
jRuby Mahout- JRuby Mahout is a gem that unleashes the power of Apache Mahout in the worldof JRuby.
CardMagic-Classifier- A general classifier module to allow Bayesian and other types ofclassifications.
数据分析、可视化
rsruby -Ruby - R bridge
data-visualization-ruby- Source code and supporting content for my Ruby Manor presentation on DataVisualisation with Ruby
ruby-plot- gnuplot wrapper for ruby, especially for plotting roc curves into svg files
plot-rb - A plottinglibrary in Ruby built on top of Vega and D3.
scruffy - A beautiful graphing toolkit for Ruby
SciRuby
Glean - A datamanagement tool for humans
Bioruby
Arel
Misc
BigData For Chimps
Listof -Community based data collection, packed in gem. Get list of pretty muchanything (stop words, countries, non words) in txt, json or hash. Demo/Search for a list
R
通用机器学习
ahaz- ahaz: Regularization for semiparametric additive hazards regression
arules- arules: Mining Association Rules and Frequent Itemsets
bigrf- bigrf: Big Random Forests: Classification and Regression Forests for LargeData Sets
bigRR- bigRR: Generalized Ridge Regression (with special advantage for p >> ncases)
bmrm- bmrm: Bundle Methods for Regularized Risk Minimization Package
Boruta- Boruta: A wrapper algorithm for all-relevant feature selection
bst- bst: Gradient Boosting
C50- C50: C5.0 Decision Trees and Rule-Based Models
caret -Classification and Regression Training: Unified interface to ~150 ML algorithmsin R.
caretEnsemble - caretEnsemble: Framework for fittingmultiple caret models as well as creating ensembles of such models.
Clever Algorithms For Machine Learning
CORElearn - CORElearn: Classification, regression, featureevaluation and ordinal evaluation
CoxBoost - CoxBoost: Cox models by likelihood basedboosting for a single survival endpoint or competing risks
Cubist- Cubist: Rule- and Instance-Based Regression Modeling
e1071- e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
earth- earth: Multivariate Adaptive Regression Spline Models
elasticnet - elasticnet: Elastic-Net for Sparse Estimationand Sparse PCA
ElemStatLearn - ElemStatLearn: Data sets, functions andexamples from the book: "The Elements of Statistical Learning, DataMining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani andJerome Friedman Prediction" by Trevor Hastie, Robert Tibshirani and JeromeFriedman
evtree- evtree: Evolutionary Learning of Globally Optimal Trees
fpc- fpc: Flexible procedures for clustering
frbs- frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks
GAMBoost - GAMBoost: Generalized linear and additive modelsby likelihood based boosting
gamboostLSS - gamboostLSS: Boosting Methods for GAMLSS
gbm- gbm: Generalized Boosted Regression Models
glmnet- glmnet: Lasso and elastic-net regularized generalized linear models
glmpath- glmpath: L1 Regularization Path for Generalized Linear Models and CoxProportional Hazards Model
GMMBoost - GMMBoost: Likelihood-based Boosting for Generalizedmixed models
grplasso - grplasso: Fitting user specified models withGroup Lasso penalty
grpreg- grpreg: Regularization paths for regression models with grouped covariates
h2o- A framework for fast, parallel, and distributed machine learning algorithmsat scale -- Deeplearning, Random forests, GBM, KMeans, PCA, GLM
hda- hda: Heteroscedastic Discriminant Analysis
Introduction toStatistical Learning
ipred- ipred: Improved Predictors
kernlab- kernlab: Kernel-based Machine Learning Lab
klaR- klaR: Classification and visualization
lars- lars: Least Angle Regression, Lasso and Forward Stagewise
lasso2- lasso2: L1 constrained estimation aka ‘lasso’
LiblineaR - LiblineaR: Linear Predictive Models Based OnThe Liblinear C/C++ Library
LogicReg - LogicReg: Logic Regression
MachineLearning For Hackers
maptree- maptree: Mapping, pruning, and graphing tree models
mboost- mboost: Model-Based Boosting
medley - medley: Blending regression models, using a greedystepwise approach
mlr- mlr: Machine Learning in R
mvpart- mvpart: Multivariate partitioning
ncvreg- ncvreg: Regularization paths for SCAD- and MCP-penalized regression models
nnet- nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models
oblique.tree - oblique.tree: Oblique Trees forClassification Data
pamr- pamr: Pam: prediction analysis for microarrays
party- party: A Laboratory for Recursive Partytioning
partykit - partykit: A Toolkit for Recursive Partytioning
penalized - penalized: L1 (lasso and fused lasso) and L2(ridge) penalized estimation in GLMs and in the Cox model
penalizedLDA - penalizedLDA: Penalized classification usingFisher's linear discriminant
penalizedSVM - penalizedSVM: Feature Selection SVM usingpenalty functions
quantregForest - quantregForest: Quantile RegressionForests
randomForest - randomForest: Breiman and Cutler's randomforests for classification and regression
randomForestSRC - randomForestSRC: Random Forests forSurvival, Regression and Classification (RF-SRC)
rattle- rattle: Graphical user interface for data mining in R
rda- rda: Shrunken Centroids Regularized Discriminant Analysis
rdetools - rdetools: Relevant Dimension Estimation (RDE) inFeature Spaces
REEMtree - REEMtree: Regression Trees with Random Effectsfor Longitudinal (Panel) Data
relaxo- relaxo: Relaxed Lasso
rgenoud- rgenoud: R version of GENetic Optimization Using Derivatives
rgp- rgp: R genetic programming framework
Rmalschains - Rmalschains: Continuous Optimization usingMemetic Algorithms with Local Search Chains (MA-LS-Chains) in R
rminer- rminer: Simpler use of data mining methods (e.g. NN and SVM) inclassification and regression
ROCR- ROCR: Visualizing the performance of scoring classifiers
RoughSets - RoughSets: Data Analysis Using Rough Set andFuzzy Rough Set Theories
rpart- rpart: Recursive Partitioning and Regression Trees
RPMM- RPMM: Recursively Partitioned Mixture Model
RSNNS- RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator(SNNS)
RWeka- RWeka: R/Weka interface
RXshrink - RXshrink: Maximum Likelihood Shrinkage viaGeneralized Ridge or Least Angle Regression
sda- sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection
SDDA- SDDA: Stepwise Diagonal Discriminant Analysis
SuperLearnerand subsemble - Multi-algorithm ensemble learning packages.
svmpath- svmpath: svmpath: the SVM Path algorithm
tgp- tgp: Bayesian treed Gaussian process models
tree- tree: Classification and regression trees
varSelRF - varSelRF: Variable selection using randomforests
XGBoost.R- R binding for eXtreme Gradient Boosting (Tree) Library
Optunity - A librarydedicated to automated hyperparameter optimization with a simple, lightweightAPI to facilitate drop-in replacement of grid search. Optunity is written inPython but interfaces seamlessly to R.
数据分析、可视化
ggplot2 - A data visualizationpackage based on the grammar of graphics.
Scala
自然语言处理
ScalaNLP - ScalaNLP is asuite of machine learning and numerical computing libraries.
Breeze - Breezeis a numerical processing library for Scala.
Chalk - Chalk is anatural language processing library.
FACTORIE -FACTORIE is a toolkit for deployable probabilistic modeling, implemented as asoftware library in Scala. It provides its users with a succinct language forcreating relational factor graphs, estimating parameters and performinginference.
数据分析、可视化
MLlibin Apache Spark - Distributed machine learning library in Spark
Scalding - AScala API for Cascading
Summing Bird- Streaming MapReduce with Scalding and Storm
Algebird -Abstract Algebra for Scala
xerial - Datamanagement utilities for Scala
simmer - Reduceyour data. A unix filter for algebird-powered aggregation.
PredictionIO- PredictionIO, a machine learning server for software developers and dataengineers.
BIDMat - CPU andGPU-accelerated matrix library intended to support large-scale exploratory dataanalysis.
Wolfe Declarative MachineLearning
通用机器学习
Conjecture -Scalable Machine Learning in Scalding
brushfire -Distributed decision tree ensemble learning in Scala
ganitha -scalding powered machine learning
adam - Agenomics processing engine and specialized file format built using Apache Avro,Apache Spark and Parquet. Apache 2 licensed.
bioscala -Bioinformatics for the Scala programming language
BIDMach - CPU andGPU-accelerated Machine Learning Library.
Figaro - a Scalalibrary for constructing probabilistic models.
H2O SparklingWater - H2O and Spark interoperability.