Recommender - 一个产品推荐的C语言库,利用了协同过滤.
CCV - C-based/Cached/Core Computer Vision Library ,是一个现代化的计算机视觉库。
VLFeat - VLFeat 是开源的 computer vision algorithms库, 有 Matlab toolbox。
计算机视觉
OpenCV - 最常用的视觉库。有 C++, C, Python 以及 Java 接口),支持Windows, Linux, Android and Mac OS。
DLib - DLib 有 C++ 和 Python 脸部识别和物体检测接口 。
EBLearn - Eblearn 是一个面向对象的 C++ 库,实现了各种机器学习模型。
VIGRA - VIGRA 是一个跨平台的机器视觉和机器学习库,可以处理任意维度的数据,有Python接口。
通用机器学习
MLPack - 可拓展的 C++ 机器学习库。
DLib - 设计为方便嵌入到其他系统中。
encog-cpp
shark
Vowpal Wabbit (VW) - A fast out-of-core learning system.
sofia-ml - fast incremental 算法套件.
Shogun - The Shogun Machine Learning Toolbox
Caffe - deep learning 框架,结构清晰,可读性好,速度快。
CXXNET - 精简的框架,核心代码不到 1000 行。
XGBoost - 为并行计算优化过的 gradient boosting library.
CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling
BanditLib - A simple Multi-armed Bandit library.
Timbl - 实现了多个基于内存的算法,其中 IB1-IG (KNN分类算法)和 IGTree(决策树)在NLP中广泛应用.
自然语言处理
MIT Information Extraction Toolkit - C, C++, and Python 工具,用来命名实体识别和关系抽取。
CRF++ - 条件随机场的开源实现,可以用作分词,词性标注等。
CRFsuite - CRFsuite 是条件随机场的实现,可以用作词性标注等。
BLLIP Parser - 即Charniak-Johnson parser。
colibri-core - 一组C++ library, 命令行工具以及Python binding,高效实现了n-grams 和 skipgrams。
ucto - 多语言tokenizer,支持面向Unicode的正则表达式,支持 FoLiA 格式.
libfolia - C++ library for the FoLiA format
MeTA - MeTA : ModErn Text Analysis 从巨量文本中挖掘数据。
机器翻译
EGYPT (GIZA++)
Moses
pharaoh
SRILM
NiuTrans
jane
SAMT
语音识别
Kaldi - Kaldi是一个C ++工具,以Apache许可证V2.0发布。Kaldi适用于语音识别的研究。
Sequence Analysis
ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.
自然语言处理
Cortical.io - Retina: 此API执行复杂的NLP操作(消歧义,分类,流文本过滤等),快速、直观如同大脑一般。
CoreNLP - Stanford CoreNLP 提供了一组自然语言分析工具,可采取raw英语文本输入并给出单词的基本形式。
Stanford Parser - parser是一个程序,能分析出句子的语法结构。
Stanford POS Tagger - 词性标注器
Stanford Name Entity Recognizer - 斯坦福大学NER是一个Java实现的命名实体识别器。
Stanford Word Segmenter - 原始文本的token化是许多NLP任务的标准预处理步骤。
Tregex, Tsurgeon and Semgrex - Tregex是匹配树模式的工具,基于树的关系和正则表达式的节点匹配( short for “tree regular expressions”)。
Stanford Phrasal: A Phrase-Based Translation System
Stanford English Tokenizer - Stanford Phrasal 是最先进的统计的基于短语的机器翻译系统,用Java编写。
Stanford Tokens Regex - A tokenizer divides text into a sequence of tokens, which roughly correspond to “words”
Stanford Temporal Tagger - SUTime 是识别和规范时间表达式的库。
Stanford SPIED - 从种子集开始,迭代使用模式,从未标注文本中习得实体。
Stanford Topic Modeling Toolbox - 主题建模工具,社会学家用它分析的数据集。
Twitter Text Java - Java实现的Twitter文本处理库。
MALLET - 基于Java的软件包,包括统计自然语言处理,文档分类,聚类,主题建模,信息提取,以及其它机器学习应用。
OpenNLP - 一个基于机器学习的自然语言处理的工具包。
LingPipe - 计算语言学工具包。
ClearTK - ClearTK提供了开发统计自然语言处理组件的框架,其建立在Apache UIMA之上。
Apache cTAKES - Apache 临床文本分析及知识提取系统(cTAKES)是从电子病历、临床文本中进行信息抽取的一个开源系统。
通用机器学习
aerosolve - Airbnb 从头开始设计的机器学习库,易用性好。
Datumbox - 机器学习和统计应用程序的快速开发框架。
ELKI - 数据挖掘工具. (非监督学习: 聚类, 离群点检测等.)
Encog - 先进的神经网络和机器学习框架。 Encog中包含用于创建各种网络,以及规范和处理数据的神经网络。 Encog训练采用多线程弹性的传播方式。 Encog还可以利用GPU的进一步加快处理时间。有基于GUI的工作台。
H2O - 机器学习引擎,支持Hadoop, Spark等分布式系统和个人电脑,可以通过R, Python, Scala, REST/JSON调用API。
htm.java - 通用机器学习库,使用 Numenta’s Cortical Learning Algorithm
java-deeplearning - 分布式深度学习平台 for Java, Clojure,Scala
JAVA-ML - Java通用机器学习库,所有算法统一接口。
JSAT - 具有很多分类,回归,聚类等机器学习算法。
Mahout - 分布式机器学习工具。
Meka - 一个开源实现的多标签分类和评估方法。基于weka扩展。
MLlib in Apache Spark - Spark分布式机器学习库
Neuroph - 轻量级Java神经网络框架
ORYX - Lambda Architecture Framework,使用Apache Spark和Apache Kafka实现实时大规模机器学习。
RankLib - 排序算法学习库。
Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
SmileMiner - Statistical Machine Intelligence & Learning Engine
SystemML - 灵活的,可扩展的机器学习语言。
WalnutiQ - 面向对象的人脑模型
Weka - WEKA是机器学习算法用于数据挖掘任务的算法集合。
语音识别
CMU Sphinx - 开源工具包,用于语音识别,完全基于Java的语音识别库。
数据分析、可视化
Hadoop - Hadoop/HDFS
Spark - Spark 快速通用的大规模数据处理引擎。
Impala - 实时Hadoop查询。
DataMelt - 数学软件,包含数值计算,统计,符号计算,数据分析和数据可视化。
Dr. Michael Thomas Flanagan’s Java Scientific Library
Deep Learning
Deeplearning4j - 可扩展的产业化的深度学习,利用并行的GPU。
计算机视觉
Scikit-Image - Python中的图像处理算法的集合。
SimpleCV - 一个开源的计算机视觉框架,允许访问几个高性能计算机视觉库,如OpenCV。可以运行在Mac,Windows和Ubuntu Linux操作系统上。
Vigranumpy - 计算机视觉库VIGRA C++ 的Python绑定。
自然语言处理
NLTK - 构建与人类语言数据相关工作的Python程序的领先平台。
Pattern - 基于Python的Web挖掘模块。它有自然语言处理,机器学习等工具。
Quepy - 将自然语言问题转换成数据库查询语言。
TextBlob - 为普通的自然语言处理(NLP)任务提供一致的API。构建于NLTK和Pattern上,并很好地与两者交互。
YAlign - 句子对齐工具,从对照语料中抽取并行句子。
jieba - 中文分词工具
SnowNLP - 中文文本处理库。
loso - 中文分词工具
genius - 基于条件随机场的中文分词工具
KoNLPy - 韩语自然语言处理
nut - 自然语言理解工具
Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)
BLLIP Parser - BLLIP Natural Language Parser 的Python绑定(即 Charniak-Johnson parser)
PyNLPl - Python的自然语言处理库。还包含用于解析常见NLP格式的工具,如FoLiA, 以及 ARPA language models, Moses phrasetables, GIZA++ 对齐等。
python-ucto - ucto(面向unicode的基于规则的tokenizer)的Python 绑定
python-frog - Frog的Python 绑定。荷兰语的词性标注,lemmatisation,依存分析,NER。
python-zpar - ZPar的Python 绑定(英文的基于统计的词性标注, constiuency解析器和依赖解析器)
colibri-core - 高效提取 n-grams 和 skipgrams的C++库的Python 绑定
spaCy - 工业级 NLP with Python and Cython.
PyStanfordDependencies - 将 Penn Treebank tree转换到Stanford 依存树的Python接口.
通用机器学习
machine learning - 构建和 web-interface, programmatic-interface 兼容的支持向量机API. 相应的数据集存储到一个SQL数据库,然后生成用于预测的模型,存储到一个NoSQL的数据库。
XGBoost - eXtreme Gradient Boosting (Tree)库的Python 绑定
Featureforge一组工具,用于创建和测试机器学习的特征,具有与scikit-learn兼容的API
scikit-learn - 基于SciPy的机器学习的Python模块。
metric-learn - metric learning的Python模块
SimpleAI -实现了“人工智能现代方法”一书中描述的许多人工智能算法。它着重于提供一个易于使用的,文档良好的和经过测试的库。
astroML - 天文学机器学习和数据挖掘库。
graphlab-create - 基于disk-backed DataFrame的库,实现了各种机器学习模型(回归,聚类,推荐系统,图形分析等)。
BigML - 与外部服务器交流的库。
pattern - Web数据挖掘模块.
NuPIC - Numenta智能计算平台.
Pylearn2 - 基于 Theano的机器学习库。
keras - 基于 Theano的神经网络库
hebel - GPU加速的Python深度学习库。
Chainer - 灵活的神经网络架构
gensim - 易用的主题建模工具
topik - 主题建模工具包
PyBrain - Another Python Machine Learning Library.
Crab - 灵活的,快速的推荐引擎
python-recsys - 实现一个推荐系统的Python工具
Restricted Boltzmann Machines -受限玻尔兹曼机
CoverTree - Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree
nilearn - NeuroImaging机器学习库
Shogun - Shogun Machine Learning Toolbox
Pyevolve - 遗传算法框架
Caffe - deep learning 框架,结构清晰,可读性好,速度快。
breze - 基于Theano 的深度神经网络
pyhsmm - 贝叶斯隐马尔可夫模型近似无监督的推理和显式时长隐半马尔可夫模型,专注于贝叶斯非参数扩展,the HDP-HMM and HDP-HSMM,大多是弱极限近似。
mrjob - 使得 Python 程序可以跑在 Hadoop上.
SKLL - 简化的scikit-learn接口,易于做实验
neurolab - https://github.com/zueve/neurolab
Spearmint - 贝叶斯算法的优化。方法见于论文: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012.
Pebl - 贝叶斯学习的Python环境
Theano - 优化GPU元编程代码,生成面向矩阵的优化的数学编译器
TensorFlow - 用数据流图进行数值计算的开源软件库
yahmm - 隐马尔可夫模型,用Cython实现
python-timbl - 包装了完整的TiMBL C ++编程接口. Timbl是一个精心制作的k最近邻机器学习工具包。
deap - 进化算法框架
pydeep - Python 深度学习
mlxtend - 对数据科学和机器学习任务非常有用的工具库。
neon - 高性能 深度学习框架
Optunity - 致力于自动化超参数优化过程,使用一个简单的,轻量级的API,以方便直接替换网格搜索。
Annoy - Approximate nearest neighbours implementation
skflow - TensorFlow的简化界面, 类似 Scikit Learn.
TPOT - 自动创建并利用genetic programming优化机器学习的管道。将它看作您的数据科学助理,自动化机器学习中大部分的枯燥工作。
数据分析、可视化
SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
NumPy - A fundamental package for scientific computing with Python.
Numba - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
NetworkX - A high-productivity software for complex networks.
Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
Open Mining - Business Intelligence (BI) in Python (Pandas web interface)
PyMC - Markov Chain Monte Carlo sampling toolkit.
zipline - A Pythonic algorithmic trading library.
PyDy - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
SymPy - A Python library for symbolic mathematics.
statsmodels - Statistical modeling and econometrics in Python.
astropy - A community Python library for Astronomy.
matplotlib - A Python 2D plotting library.
bokeh - Interactive Web Plotting for Python.
plotly - Collaborative web plotting for Python and matplotlib.
vincent - A Python to Vega translator.
d3py - A plottling library for Python, based on D3.js.
ggplot - Same API as ggplot2 for R.
ggfortify - Unified interface to ggplot2 popular R packages.
Kartograph.py - Rendering beautiful SVG maps in Python.
pygal - A Python SVG Charts Creator.
PyQtGraph - A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
pycascading
Petrel - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
Blaze - NumPy and Pandas interface to Big Data.
emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
windML - A Python Framework for Wind Energy Analysis and Prediction
vispy - GPU-based high-performance interactive OpenGL 2D/3D data visualization library
cerebro2 A web-based visualization and debugging platform for NuPIC.
NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool!
SparklingPandas Pandas on PySpark (POPS)
Seaborn - A python visualization library based on matplotlib
bqplot - An API for plotting in Jupyter (IPython)
通用机器学习
mgl - Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes
mgl-gpr - Evolutionary algorithms
cl-libsvm - Wrapper for the libsvm support vector machine library
Clojure
自然语言处理
Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
Infections-clj - Rails-like inflection library for Clojure and ClojureScript
通用机器学习
Touchstone - Clojure A/B testing library
Clojush - he Push programming language and the PushGP genetic programming system implemented in Clojure
Infer - Inference and machine learning in clojure
Clj-ML - A machine learning library for Clojure built on top of Weka and friends
Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets)
Fungp - A genetic programming library for Clojure
Statistiker - Basic Machine Learning algorithms in Clojure.
clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm
comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm
数据分析、可视化
Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
PigPen - Map-Reduce for Clojure.
Envision - Clojure Data Visualisation library, based on Statistiker and D3
计算机视觉
Contourlets - MATLAB source code that implements the contourlet transform and its utility functions.
Shearlets - MATLAB code for shearlet transform
Curvelets - The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.
Bandlets - MATLAB code for bandlet transform
mexopencv - Collection and a development kit of MATLAB mex functions for OpenCV library
自然语言处理
NLP - An NLP library for Matlab
通用机器学习
t-Distributed Stochastic Neighbor Embedding - t-SNE是一个获奖的技术,可以降维,尤其适合高维数据可视化
Spider - The spider有望成为matlab里机器学习中的完整的面向对象环境。
LibSVM - 著名的支持向量机库。
LibLinear - A Library for Large Linear Classification
Caffe - deep learning 框架,结构清晰,可读性好,速度快。
Pattern Recognition Toolbox - Matlab机器学习中一个完整的面向对象的环境。
Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.致力于自动化超参数优化的,一个简单的,轻量级的API库,方便直接替换网格搜索。 Optunity是用Python编写的,但与MATLAB的无缝连接。
数据分析、可视化
matlab_gbl - MatlabBGL is a Matlab package for working with graphs.
gamic - Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL’s mex functions.
计算机视觉
OpenCVDotNet - A wrapper for the OpenCV project to be used with .NET applications.
Emgu CV - Cross platform wrapper of OpenCV which can be compiled in Mono to e run on Windows, Linus, Mac OS X, iOS, and Android.
AForge.NET - Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.
Accord.NET - Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.
自然语言处理
Stanford.NLP for .NET - A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.
通用机器学习
Accord-Framework - 一个完整的框架,可以用于机器学习,计算机视觉,computer audition, 信号处理,统计应用等。.
Accord.MachineLearning - Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.
DiffSharp - An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.
Vulpes - Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.
Encog - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
Neural Network Designer - DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.
数据分析、可视化
numl - numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.
Math.NET Numerics - Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and every day use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.
Sho - Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.
自然语言处理
Treat - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby
Ruby Linguistics - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independent front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.
Stemmer - Expose libstemmer_c to Ruby
Ruby Wordnet - This library is a Ruby interface to WordNet
Raspel - raspell is an interface binding for ruby
UEA Stemmer - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
Twitter-text-rb - A library that does auto linking and extraction of usernames, lists and hashtags in tweets
通用机器学习
Ruby Machine Learning - Some Machine Learning algorithms, implemented in Ruby
Machine Learning Ruby
jRuby Mahout - JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby.
CardMagic-Classifier - A general classifier module to allow Bayesian and other types of classifications.
数据分析、可视化
rsruby - Ruby - R bridge
data-visualization-ruby - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby
ruby-plot - gnuplot wrapper for ruby, especially for plotting roc curves into svg files
plot-rb - A plotting library in Ruby built on top of Vega and D3.
scruffy - A beautiful graphing toolkit for Ruby
SciRuby
Glean - A data management tool for humans
Bioruby
Arel
Misc
Big Data For Chimps
Listof - Community based data collection, packed in gem. Get list of pretty much anything (stop words, countries, non words) in txt, json or hash. Demo/Search for a list
通用机器学习
ahaz - ahaz: Regularization for semiparametric additive hazards regression
arules - arules: Mining Association Rules and Frequent Itemsets
bigrf - bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets
bigRR - bigRR: Generalized Ridge Regression (with special advantage for p >> n cases)
bmrm - bmrm: Bundle Methods for Regularized Risk Minimization Package
Boruta - Boruta: A wrapper algorithm for all-relevant feature selection
bst - bst: Gradient Boosting
C50 - C50: C5.0 Decision Trees and Rule-Based Models
caret - Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
caretEnsemble - caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models.
Clever Algorithms For Machine Learning
CORElearn - CORElearn: Classification, regression, feature evaluation and ordinal evaluation
CoxBoost - CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks
Cubist - Cubist: Rule- and Instance-Based Regression Modeling
e1071 - e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
earth - earth: Multivariate Adaptive Regression Spline Models
elasticnet - elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA
ElemStatLearn - ElemStatLearn: Data sets, functions and examples from the book: “The Elements of Statistical Learning, Data Mining, Inference, and Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction” by Trevor Hastie, Robert Tibshirani and Jerome Friedman
evtree - evtree: Evolutionary Learning of Globally Optimal Trees
fpc - fpc: Flexible procedures for clustering
frbs - frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks
GAMBoost - GAMBoost: Generalized linear and additive models by likelihood based boosting
gamboostLSS - gamboostLSS: Boosting Methods for GAMLSS
gbm - gbm: Generalized Boosted Regression Models
glmnet - glmnet: Lasso and elastic-net regularized generalized linear models
glmpath - glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
GMMBoost - GMMBoost: Likelihood-based Boosting for Generalized mixed models
grplasso - grplasso: Fitting user specified models with Group Lasso penalty
grpreg - grpreg: Regularization paths for regression models with grouped covariates
h2o - A framework for fast, parallel, and distributed machine learning algorithms at scale – Deeplearning, Random forests, GBM, KMeans, PCA, GLM
hda - hda: Heteroscedastic Discriminant Analysis
Introduction to Statistical Learning
ipred - ipred: Improved Predictors
kernlab - kernlab: Kernel-based Machine Learning Lab
klaR - klaR: Classification and visualization
lars - lars: Least Angle Regression, Lasso and Forward Stagewise
lasso2 - lasso2: L1 constrained estimation aka ‘lasso’
LiblineaR - LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library
LogicReg - LogicReg: Logic Regression
Machine Learning For Hackers
maptree - maptree: Mapping, pruning, and graphing tree models
mboost - mboost: Model-Based Boosting
medley - medley: Blending regression models, using a greedy stepwise approach
mlr - mlr: Machine Learning in R
mvpart - mvpart: Multivariate partitioning
ncvreg - ncvreg: Regularization paths for SCAD- and MCP-penalized regression models
nnet - nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models
oblique.tree - oblique.tree: Oblique Trees for Classification Data
pamr - pamr: Pam: prediction analysis for microarrays
party - party: A Laboratory for Recursive Partytioning
partykit - partykit: A Toolkit for Recursive Partytioning
penalized - penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model
penalizedLDA - penalizedLDA: Penalized classification using Fisher’s linear discriminant
penalizedSVM - penalizedSVM: Feature Selection SVM using penalty functions
quantregForest - quantregForest: Quantile Regression Forests
randomForest - randomForest: Breiman and Cutler’s random forests for classification and regression
randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC)
rattle - rattle: Graphical user interface for data mining in R
rda - rda: Shrunken Centroids Regularized Discriminant Analysis
rdetools - rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces
REEMtree - REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data
relaxo - relaxo: Relaxed Lasso
rgenoud - rgenoud: R version of GENetic Optimization Using Derivatives
rgp - rgp: R genetic programming framework
Rmalschains - Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
rminer - rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression
ROCR - ROCR: Visualizing the performance of scoring classifiers
RoughSets - RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories
rpart - rpart: Recursive Partitioning and Regression Trees
RPMM - RPMM: Recursively Partitioned Mixture Model
RSNNS - RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
RWeka - RWeka: R/Weka interface
RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression
sda - sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection
SDDA - SDDA: Stepwise Diagonal Discriminant Analysis
SuperLearner and subsemble - Multi-algorithm ensemble learning packages.
svmpath - svmpath: svmpath: the SVM Path algorithm
tgp - tgp: Bayesian treed Gaussian process models
tree - tree: Classification and regression trees
varSelRF - varSelRF: Variable selection using random forests
XGBoost.R - R binding for eXtreme Gradient Boosting (Tree) Library
Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.
数据分析、可视化
ggplot2 - A data visualization package based on the grammar of graphics.
自然语言处理
ScalaNLP - ScalaNLP is a suite of machine learning and numerical computing libraries.
Breeze - Breeze is a numerical processing library for Scala.
Chalk - Chalk is a natural language processing library.
FACTORIE - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
数据分析、可视化
MLlib in Apache Spark - Distributed machine learning library in Spark
Scalding - A Scala API for Cascading
Summing Bird - Streaming MapReduce with Scalding and Storm
Algebird - Abstract Algebra for Scala
xerial - Data management utilities for Scala
simmer - Reduce your data. A unix filter for algebird-powered aggregation.
PredictionIO - PredictionIO, a machine learning server for software developers and data engineers.
BIDMat - CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
Wolfe Declarative Machine Learning
通用机器学习
Conjecture - Scalable Machine Learning in Scalding
brushfire - Distributed decision tree ensemble learning in Scala
ganitha - scalding powered machine learning
adam - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
bioscala - Bioinformatics for the Scala programming language
BIDMach - CPU and GPU-accelerated Machine Learning Library.
Figaro - a Scala library for constructing probabilistic models.
H2O Sparkling Water - H2O and Spark interoperability.