最新版本请见:http://lore.chuci.info/taurenshaman/json/0e06eb9c41cb45489ad651684c1487e4
{
"title": "Project/Library: Mearchine Learning",
"tags": "Mearchine Learning; deep learning; sdk/lib",
"items": [
{
"title": "基于.Net的机器学习与信号处理框架Accord Framework/AForge.net",
"description": "Accord是AForge.net的扩展,是一个基于.Net的机器学习与信号处理框架。它包括了一系列的对图像和音频的机器学习算法,如人脸检测、SIFT拼接等等。同时,Accord支持移动对象的实时跟踪等功能。它提供了一个从神经网络到决策树系统的机器学习库。",
"tags": "机器学习; 信号处理; 框架",
"language": "C#",
"license": "GNU LESSER GENERAL PUBLIC LICENSE Version 2.1",
"url": "https://github.com/accord-net/framework/",
"reference": [
"http://accord-framework.net",
"http://www.aforgenet.com"
]
},
{
"title": "来自Airbnb的开源机器学习软件包Aerosolve",
"description": "A machine learning package built for humans.",
"tags": "机器学习; Airbnb",
"language": "Java; Scala",
"license": "Apache License Version 2.0",
"url": "https://github.com/airbnb/aerosolve",
"reference": [
"http://www.infoq.com/cn/news/2015/06/airbnb-release-aerosolve"
]
},
{
"title": "Amazon机器学习API",
"description": "Amazon机器学习API让用户不需要大量的数据专家就能够实现模型构建、数据清洗和统计分析等工作,简化了预测的实现流程。虽然该API有一些UI界面或者算法上的限制,但是却是用户友好和向导驱动的,它为开发者提供了一些可视化工具,让相关API的使用更直观、也更清晰。Amazon机器学习API支持的用户场景包括:1、通过分析信号水平特征对歌曲进行题材分类。2、通过对智能设备加速传感器捕获的数据以及陀螺仪的信号进行分析识别用户的活动,是上楼、下楼、平躺、坐下还是站立不动。3、通过分析用户行为预测用户是否能够成为付费用户。4、分析网站活动记录,发现系统中的假用户、机器人以及垃圾邮件制造者。",
"tags": "Mearchine Learning API",
"language": null,
"license": null,
"url": "https://aws.amazon.com/cn/machine-learning/",
"reference": null
},
{
"title": "BigML",
"description": "BigML是一个对用户友好、对开发者友好的机器学习API,该项目的动机是让预测分析对用户而言更简单也更容易理解。BigML API提供了3种重要的模式:命令行接口、Web接口和RESTful API,其支持的主要功能包括异常检测、聚类分析、决策树的SunBurst可视化以及文本分析等。借助于BigML,用户能够通过创建一个描述性的模型来理解复杂数据中各个属性和预测属性之间的关系,能够根据过去的样本数据创建预测模型,能够在BigML平台上维护模型并在远程使用。",
"tags": "Mearchine Learning API",
"language": null,
"license": null,
"url": "https://bigml.com/",
"reference": null
},
{
"title": "Blocks",
"description": "Blocks is a framework that helps you build neural network models on top of Theano.",
"tags": "Theano",
"language": "Python",
"license": "MIT",
"url": "https://github.com/mila-udem/blocks",
"reference": null
},
{
"title": "Caffe",
"description": "Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.",
"tags": "",
"language": "C++; Python; CUDA",
"license": "BSD 2-Clause license",
"url": "https://github.com/BVLC/caffe",
"reference": [
"http://caffe.berkeleyvision.org"
]
},
{
"title": "Cassovary",
"description": "Cassovary is a simple \"big graph\" processing library for the JVM. Most JVM-hosted graph libraries are flexible but not space efficient. Cassovary is designed from the ground up to first be able to efficiently handle graphs with billions of nodes and edges. A typical example usage is to do large scale graph mining and analysis of a big network. Cassovary is written in Scala and can be used with any JVM-hosted language. It comes with some common data structures and algorithms.",
"tags": "JVM; big graph processing",
"language": "Scala",
"license": "Apache License Version 2.0",
"url": "https://github.com/twitter/cassovary",
"reference": [
"http://twitter.com/cassovary"
]
},
{
"title": "Chainer",
"description": "A Powerful, Flexible, and Intuitive Framework of Neural Networks. 深度学习的神经网络灵活框架。Chainer 支持各种网络架构,包括 Feed-forward Nets、Convnets、Recurrent Nets 和 Recursive Nets。它也支持 per-batch 的架构。Chainer 支持 CUDA 计算,它在驱动 GPU 时只需要几行代码。它也能通过一些努力,运行在多 GPUs 的架构中。",
"tags": "",
"language": "Python",
"license": "MIT",
"url": "https://github.com/pfnet/chainer",
"reference": [
"http://chainer.org"
]
},
{
"title": "CNTK: Computational Network Toolkit",
"description": "CNTK, the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.",
"tags": "CNTK; DNN; CNN; RNN; LSTM; SGD",
"language": "C++",
"license": null,
"url": "https://github.com/Microsoft/CNTK/",
"reference": [
"http://research.microsoft.com/en-us/um/people/dongyu/CNTK-Tutorial-NIPS2015.pdf",
"http://blogs.microsoft.com/next/2015/12/10/microsoft-researchers-win-imagenet-computer-vision-challenge/"
]
},
{
"title": "ConvNet",
"description": "Convolutional Neural Networks for Matlab, including Invariang Backpropagation algorithm (IBP). Has versions for GPU and CPU, written on CUDA, C++ and Matlab. All versions work identically. The GPU version uses kernels from Alex Krizhevsky's library 'cuda-convnet2'.",
"tags": "IBP",
"language": "C++; CUDA; Matlab",
"license": null,
"url": "https://github.com/sdemyanov/ConvNet",
"reference": [
"http://www.demyanov.net/"
]
},
{
"title": "基于JavaScript的在线深度学习库ConvNetJS",
"description": "ConvNetJS是一款基于JavaScript的在线深度学习库,它提供了在线的深度学习训练方式。它能够帮助深度学习的初学者更快、更加直观的理解算法,通过一些简单的Demo给用户最直观的解释。",
"tags": "Deep Learning",
"language": "JavaScript",
"license": "MIT",
"url": "https://github.com/karpathy/convnetjs",
"reference": [
"http://cs.stanford.edu/people/karpathy/convnetjs/"
]
},
{
"title": "基于GPU加速的神经网络应用程序机器学习库CUDA-Convnet",
"description": "CUDA是我们众所周知的GPU加速套件。而CUDA-Convnet是一个基于GPU加速的神经网络应用程序机器学习库。它使用C++编写,并且使用了NVidia的CUDA GPU处理技术。目前,这个项目已经被重组成为CUDA-Convnet2,支持多个GPU和Kepler-generation GPUs. Vuples项目与之类似,使用F#语言编写,并且适用于.Net平台上。",
"tags": "机器学习; 人工神经网络",
"language": "CUDA; C++; F#",
"license": null,
"url": "https://code.google.com/p/CUDA-convnet2/",
"reference": null
},
{
"title": "darch",
"description": "Create deep architectures in the R programming language. darch package can be used for generating neural networks with many layers (deep architectures). Training methods includes a pre training with the contrastive divergence method and a fine tuning with common known training algorithms like backpropagation or conjugate gradient.",
"tags": "",
"language": "R; C++",
"license": "GPLv3 (GNU GENERAL PUBLIC LICENSE Version 3)",
"url": "https://github.com/maddin79/darch",
"reference": [
"http://cran.um.ac.ir/web/packages/darch/index.html"
]
},
{
"title": "Datumbox",
"description": "Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.",
"tags": "Machine Learning",
"language": "Java",
"license": "Apache License Version 2.0",
"url": "https://github.com/datumbox/datumbox-framework",
"reference": [
"http://www.datumbox.com"
]
},
{
"title": "DeepLearning",
"description": "Code to build MLP models for outdoor head orientation tracking",
"tags": "",
"language": "C++; Python",
"license": null,
"url": "https://github.com/vishwa-raman/DeepLearning",
"reference": null
},
{
"title": "DeepLearnToolbox",
"description": "Matlab/Octave toolbox for deep learning. Includes Deep Belief Nets, Stacked Autoencoders, Convolutional Neural Nets, Convolutional Autoencoders and vanilla Neural Nets. Each method has examples to get you started. NO LONGER MAINTAINED.",
"tags": "",
"language": "Matlab",
"license": null,
"url": "https://github.com/rasmusbergpalm/DeepLearnToolbox",
"reference": null
},
{
"title": "deepnet",
"description": "deepnet is a GPU-based python implementation of deep learning algorithms like Feed-forward Neural Nets, Restricted Boltzmann Machines, Deep Belief Nets, Autoencoders, Deep Boltzmann Machines and Convolutional Neural Nets.",
"tags": "cudamat; cuda-convnet",
"language": "Python; C++",
"license": null,
"url": "https://github.com/nitishsrivastava/deepnet",
"reference": null
},
{
"title": "deepnet: deep learning toolkit in R",
"description": "Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.",
"tags": "",
"language": null,
"license": "GPLv3 (GNU GENERAL PUBLIC LICENSE Version 3)",
"url": "",
"reference": [
"https://cran.r-project.org/web/packages/deepnet/index.html"
]
},
{
"title": "DeepPy: Deep learning in Python",
"description": "DeepPy is a Pythonic deep learning framework built on top of NumPy (with CUDA acceleration).",
"tags": "",
"language": "Python",
"license": "MIT",
"url": "https://github.com/andersbll/deeppy",
"reference": null
},
{
"title": "NVIDIA DIGITS (The NVIDIA Deep Learning GPU Training System)",
"description": "The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning in the hands of data scientists and researchers. Quickly design the best deep neural network (DNN) for your data using real-time network behavior visualization. Best of all, DIGITS is a complete system so you don’t have to write any code. Get started with DIGITS in under an hour.",
"tags": "",
"language": null,
"license": null,
"url": "",
"reference": [
"https://developer.nvidia.com/digits"
]
},
{
"title": "DL4J/Deeplearning4j",
"description": "Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments, rather than as a research tool.",
"tags": "deep learning",
"language": "Java; Scala; Clojure",
"license": "Apache License Version 2.0",
"url": "https://github.com/deeplearning4j/deeplearning4j",
"reference": [
"http://deeplearning4j.org/"
]
},
{
"title": "DMKT: Distributed Machine Learning Toolkit",
"description": "DMTK由一个服务于分布式机器学习的框架和一组分布式机器学习算法构成,是一个将机器学习算法应用在大数据上的强大工具包。微软亚洲研究院通过Github将分布式机器学习工具包(DMTK)开源。",
"tags": "DMKT framework; big data; big model; flexibility; efficiency; multiverso; LightLDA; Distributed word embedding; Distributed skipgram mixture model",
"language": "C++",
"license": "MIT",
"url": "https://github.com/Microsoft/DMTK",
"reference": [
"http://www.dmtk.io"
]
},
{
"title": "DNNGraph - A deep neural network model generation DSL in Haskell",
"description": "A DSL for deep neural networks, supporting Caffe and Torch.",
"tags": "",
"language": "Haskell; Protocol Buffer",
"license": "BSD License",
"url": "https://github.com/ajtulloch/dnngraph",
"reference": null
},
{
"title": "eblearn",
"description": "eblearn is an open-source C++ library of machine learning by New York University’s machine learning lab, led by Yann LeCun. In particular, implementations of convolutional neural networks with energy-based models along with a GUI, demos and tutorials.",
"tags": "",
"language": "C++",
"license": null,
"url": "http://eblearn.sourceforge.net/",
"reference": null
},
{
"title": "Encog Machine Learning Framework",
"description": "Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algoritms are multi-threaded and scale well to multicore hardware. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms.",
"tags": "Machine Learning",
"language": "C; C#; Java; JavaScript",
"license": "Apache License Version 2.0",
"url": "https://github.com/encog",
"reference": [
"http://www.heatonresearch.com/encog/"
]
},
{
"title": "FsLab",
"description": "FsLab is a single package that gives you all you need for doing data science with F#. FsLab includes an explorative data manipulation library, type providers for easy data access, a simple charting library and support for integration with R and numerical computing libraries. All available in a single package and ready to use!",
"tags": "data science tools; XPlot; Deedle",
"language": "F#",
"license": "Apache License Version 2.0",
"url": "https://github.com/fslaborg/FsLab",
"reference": [
"http://fslab.org"
]
},
{
"title": "gensim: Topic Modelling for Humans",
"description": "Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.",
"tags": "NLP; IR",
"language": "Python",
"license": " GNU LGPL license",
"url": "https://github.com/piskvorky/gensim/",
"reference": [
"http://radimrehurek.com/gensim/"
]
},
{
"title": "Go语言的一体化机器学习库:GoLearn",
"description": "Machine Learning for Go",
"tags": "机器学习",
"language": "Go; C++",
"license": "MIT",
"url": "https://github.com/sjwhitworth/golearn",
"reference": null
},
{
"title": "Google预测API",
"description": "Google预测API是一个云端机器学习和模式匹配工具,它能够从BigQuery和Google云存储上读取数据,能够处理销售机会分析、客户情感分析、客户流失分析、垃圾邮件检测、文档分类、购买率预测、推荐和智能路由等用户场景。使用Google预测API的用户不需要人工智能的知识,只需要有一些基础的编程背景即可。Google预测API支持众多的编程语言,比如 .NET、Go、Google Web Toolkit、JavaScript、Objective C、PHP、Python、Ruby和Apps Script,基本覆盖了主流的编程语言。",
"tags": "Mearchine Learning API",
"language": null,
"license": null,
"url": "https://cloud.google.com/prediction/",
"reference": null
},
{
"title": "数据分析平台H2O",
"description": "H2O是0xdata的旗舰产品,是一款核心数据分析平台。它的一部分是由R语言编写的,另一部分是由Java和Python语言编写的。",
"tags": "数据分析",
"language": "R; Java; Python",
"license": null,
"url": "http://0xdata.com/h2o/",
"reference": null
},
{
"title": "Hebel: GPU-Accelerated Deep Learning Library in Python",
"description": "Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.",
"tags": "Chainer",
"language": "Python",
"license": "GPLv2 (GNU General Public License Version 2)",
"url": "https://github.com/hannes-brt/hebel",
"reference": [
"http://hebel.readthedocs.org/"
]
},
{
"title": "htm.java",
"description": "Hierarchical Temporal Memory implementation in Java - an official Community-Driven Java port of the Numenta Platform for Intelligent Computing (NuPIC).",
"tags": "",
"language": "Java",
"license": "GPLv3 (GNU GENERAL PUBLIC LICENSE Version 3)",
"url": "https://github.com/numenta/htm.java",
"reference": null
},
{
"title": "IDLF: The Intel® Deep Learning Framework",
"description": "The Intel® Deep Learning Framework (IDLF) is a SDK library for Deep Neural Networks training and execution. It includes the API that enables building neural network topology as a compute workflow, functions to optimize the graph and execute it on hardware. Our initial focus is neural network driven object classification (ImageNet topology) implemented on CPU (Xeon) and GPU (Gen). The API is designed in the way allowing us to easily support more devices in the future. Our key principle is achieving maximum performance on each supported Intel platform.",
"tags": "Deep Learning; Intel",
"language": "C++",
"license": null,
"url": "https://github.com/01org/idlf",
"reference": [
"https://01.org/zh/intel-deep-learning-framework"
]
},
{
"title": "Java Machine Learning Library",
"description": "The Java Machine Learning Library is a set of reference implementations of machine learning algorithms. These algorithms are well documented, both in the source code as on the documentation site. Besides real machine learning algorithms also a lot of supporting classes are provided: distance measures, evaluation criteria, datasets for validation purposes and some sample code.",
"tags": "machine learning",
"language": "Java",
"license": "GPLv2 (GNU General Public License Version 2)",
"url": "http://java-ml.sourceforge.net/",
"reference": null
},
{
"title": "JSAT: Java Statistical Analysis Tool",
"description": "JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such - all code is self contained. JSAT has no external dependencies, and is pure Java. I also aim to make the library suitably fast for small to medium size problems. As such, much of the code supports parallel execution.",
"tags": "Machine Learning",
"language": "Java",
"license": "GPLv3 (GNU GENERAL PUBLIC LICENSE Version 3)",
"url": "https://github.com/EdwardRaff/JSAT",
"reference": null
},
{
"title": "Keras: Deep Learning library for Theano and TensorFlow",
"description": "Deep Learning library for Python. Convnets, recurrent neural networks, and more. Runs on Theano and TensorFlow.",
"tags": "",
"language": "Python",
"license": "MIT",
"url": "https://github.com/fchollet/keras",
"reference": [
"http://keras.io/"
]
},
{
"title": "Lasagne",
"description": "Lasagne is a lightweight library to build and train neural networks in Theano.",
"tags": "Theano",
"language": "Python",
"license": "MIT",
"url": "https://github.com/Lasagne/Lasagne",
"reference": [
"http://lasagne.readthedocs.org/"
]
},
{
"title": "The Lemur Project",
"description": "The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine and ClueWeb09 dataset.",
"tags": "BSD License",
"language": "Java",
"license": null,
"url": "http://sourceforge.net/p/lemur/",
"reference": [
"http://www.lemurproject.org/"
]
},
{
"title": "LIBSVM",
"description": "LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.",
"tags": "SVM",
"language": "C++; Java",
"license": null,
"url": "https://www.csie.ntu.edu.tw/~cjlin/libsvm/"
},
{
"title": "Lush: Lisp Universal Shell",
"description": "Lush(Lisp Universal Shell) is an object-oriented programming language designed for researchers, experimenters, and engineers interested in large-scale numerical and graphic applications. It comes with rich set of deep learning libraries as a part of machine learning libraries.",
"tags": "",
"language": null,
"license": null,
"url": "http://lush.sourceforge.net/",
"reference": null
},
{
"title": "Magellan",
"description": "Geospatial Analytics Using Spark",
"tags": "Geospatial Analytics; Spark",
"language": "Scala",
"license": "Apache License Version 2.0",
"url": "https://github.com/harsha2010/magellan",
"reference": null
},
{
"title": "Mahout",
"description": "Mahout是一个广为人知的开源项目,它是Apache Software旗下的一个开源项目,提供了众多的机器学习经典算法的实现,旨在帮助开发人员更加方便快捷地创建智能应用程序。Mahout内包含了聚类、分类、推荐等很多经典算法,并且提供了很方便的云服务的接口。",
"tags": "机器学习; 经典算法",
"language": null,
"license": null,
"url": "https://mahout.apache.org/",
"reference": null
},
{
"title": "Mallet",
"description": "MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.",
"tags": "statistical natural language processing; document classification",
"language": "Java",
"license": "Common Public License Version 1.0",
"url": "https://github.com/mimno/Mallet",
"reference": [
"http://mallet.cs.umass.edu"
]
},
{
"title": "MatConvNet: CNNs for MATLAB",
"description": "MatConvNet is a MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications. It is simple, efficient, and can run and learn state-of-the-art CNNs. Many pre-trained CNNs for image classification, segmentation, face recognition, and text detection are available.",
"tags": "",
"language": "CUDA; Matlab; C++",
"license": null,
"url": "https://github.com/vlfeat/matconvnet",
"reference": [
"http://www.vlfeat.org/matconvnet/"
]
},
{
"title": "MBrace.Core",
"description": "MBrace.Core is a standalone class library that contains the core MBrace programming model, used to author general-purpose, runtime-agnostic distributed computation. It is centered on the concept of cloud workflows, a composable, language-integrated API based on F# computation expressions. It can be used to author specialized cloud libraries like MBrace.Flow. MBrace.Core is a simple programming model for scalable cloud data scripting and programming with F# and C#. With MBrace.Azure, you can script Azure for large-scale compute and data processing, directly from your favourite editor.",
"tags": "big data; compute",
"language": "F#; C#",
"license": "Apache License Version 2.0",
"url": "https://github.com/mbraceproject/MBrace.Core",
"reference": [
"http://mbrace.io"
]
},
{
"title": "Microsoft Azure机器学习API",
"description": "Microsoft Azure机器学习是一个用于处理海量数据并构建预测型应用程序的平台,该平台提供的功能有自然语言处理、推荐引擎、模式识别、计算机视觉以及预测建模等,为了迎合数据科学家的喜好,Microsoft Azure机器学习平台还增加了对Python的支持,用户能够直接将Python代码片段发布成API。借助于Microsoft Azure机器学习API,数据科学家能够更容易地构建预测模型并缩短开发周期,其主要特性包括:1、支持创建自定义的、可配置的R模块,让数据分析师或者数据科学家能够使用自己的R语言代码来执行训练或预测任务。2、支持自定义的Python脚本,这些脚本可以使用SciPy、SciKit-Learn、NumPy以及Pandas等数据科学类库。3、支持PB级的数据训练,支持Spark和Hadoop大数据处理平台。",
"tags": "Mearchine Learning API",
"language": null,
"license": null,
"url": "https://azure.microsoft.com/en-us/services/machine-learning/",
"reference": null
},
{
"title": "Apache的Spark和Hadoop机器学习库:MLlib",
"description": "MLlib是Apache自己的Spark和Hadoop机器学习库,它被设计用于大规模高速度地执行MLlib所包含的大部分常见机器学习算法。MLlib是基于Java开发的项目,同时可以方便地与Python等语言对接。用户可以自己设计针对MLlib编写代码,这是很具有个性化的设计。",
"tags": "机器学习; Spark; Hadoop",
"language": "Java",
"license": null,
"url": "https://spark.apache.org/mllib/",
"reference": null
},
{
"title": "Mocha: Deep Learning framework for Julia",
"description": "Mocha is a Deep Learning framework for Julia, inspired by the C++ framework Caffe. Efficient implementations of general stochastic gradient solvers and common layers in Mocha could be used to train deep / shallow (convolutional) neural networks, with (optional) unsupervised pre-training via (stacked) auto-encoders.",
"tags": "",
"language": "Julia",
"license": "MIT \"Expat\" License",
"url": "https://github.com/pluskid/Mocha.jl",
"reference": [
"http://mochajl.readthedocs.org/"
]
},
{
"title": "MR4C",
"description": "MR4C is an implementation framework that allows you to run native code within the Hadoop execution framework. Pairing the performance and flexibility of natively developed algorithms with the unfettered scalability and throughput inherent in Hadoop, MR4C enables large-scale deployment of advanced data processing applications. 该框架最初由Skybox团队开发,用于卫星图像处理和地理空间数据科学的用例。该团队希望既能利用用C和C++语言开发的图像处理库又能利用适于可扩展数据处理的Hadoop框架的作业跟踪和集群管理能力。在MR4C中,算法存储在原生共享对象中,这些对象通过本地文件或统一资源标识符(URI)访问数据。输入/输出数据集、运行时参数和外部函数库都通过JavaScript对象表示法(JSON)文件进行配置。映射器分裂和资源分配可以用基于Apache YARN(适用于Hadoop v2)的工具配置或在集群层级配置(适用于MapReduce v1(MRv1))。多个算法的工作流可以通过自动生成的配置连接在一起。该框架还支持用Hadoop JobTracker接口浏览日志回调和过程报告。而且还可以用与目标Hadoop集群所用的相同接口在本地机器上对工作流进行测试。",
"tags": "Hadoop",
"language": "Java; C++",
"license": "Apache License Version 2.0",
"url": "https://github.com/google/mr4c",
"reference": null
},
{
"title": "MXNet",
"description": "MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of symbolic programming and imperative programming together to maximize the efficiency and your productivity. In its core, a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer is build on top, which makes symbolic execution fast and memory efficient. The library is portable and lightweight, and is ready scales to multiple GPUs, and multiple machines.",
"tags": "",
"language": "C++; Jupyter Notebook; Python; R",
"license": "Apache License Version 2.0",
"url": "https://github.com/dmlc/mxnet",
"reference": [
"http://mxnet.rtfd.org/"
]
},
{
"title": "MXNetJS Deep Learning in Browser",
"description": "MXNetJS is the dmlc/mxnet Javascript package. MXNetJS brings state of art deep learning prediction API to the browser. It is generated with Emscripten and Amalgamation. MXNetJS allows you to run prediction of state-of-art deep learning models in any computational graph, and brings fun of deep learning to client side.",
"tags": "",
"language": "JavaScript",
"license": "Apache License Version 2.0",
"url": "https://github.com/dmlc/mxnet.js/",
"reference": null
},
{
"title": "ND4J: N-Dimensional Arrays for Java",
"description": "ND4J and ND4S are scientific computing libraries for the JVM. They are meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.",
"tags": "N-Dimensional Array; scientific computing",
"language": "Java",
"license": "Apache License Version 2.0",
"url": "https://github.com/deeplearning4j/nd4j",
"reference": [
"http://nd4j.org"
]
},
{
"title": "neon: Python based Deep Learning Framework by Nervana",
"description": "neon is Nervana's Python based Deep Learning framework and achieves the fastest performance on many common deep neural networks such as AlexNet, VGG and GoogLeNet.",
"tags": "",
"language": "Python",
"license": "Apache License Version 2.0",
"url": "https://github.com/NervanaSystems/neon",
"reference": [
"http://neon.nervanasys.com/"
]
},
{
"title": "neural-style",
"description": "Torch implementation of neural style algorithm",
"tags": "机器学习; neural style algorithm; Deep Neural Network; DNN",
"language": "Torch; Lua",
"license": "MIT",
"url": "https://github.com/jcjohnson/neural-style",
"reference": [
"http://arxiv.org/abs/1508.06576",
"http://www.infoq.com/cn/news/2015/09/MachineLearning-drawing"
]
},
{
"title": "Nodejs wrapper for Stanford Classifier",
"description": "",
"tags": "",
"language": "JavaScript",
"license": null,
"url": "https://github.com/mbejda/Nodejs-Stanford-Classifier",
"reference": null
},
{
"title": "nolearn",
"description": "nolearn contains a number of wrappers and abstractions around existing neural network libraries, most notably Lasagne, along with a few machine learning utility modules. All code is written to be compatible with scikit-learn.",
"tags": "scikit-learn; Lasagne",
"language": "Python",
"license": "MIT",
"url": "https://github.com/dnouri/nolearn",
"reference": [
"http://pythonhosted.org/nolearn/"
]
},
{
"title": "OpenDeep: a fully modular & extensible deep learning framework in Python",
"description": "OpenDeep is a deep learning framework for Python built from the ground up in Theano with a focus on flexibility and ease of use for both industry data scientists and cutting-edge researchers. OpenDeep is a modular and easily extensible framework for constructing any neural network architecture to solve your problem.",
"tags": "",
"language": "Python",
"license": "Apache License Version 2.0",
"url": "https://github.com/vitruvianscience/OpenDeep",
"reference": [
"http://www.opendeep.org/"
]
},
{
"title": "Oryx",
"description": "Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine learning. It is a framework for building applications, but also includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering. Oryx是由Hadoop设计的机器学习开源项目,由Cloudera Hadoop Distribution的创造者所提供。Oryx能够让机器学习的模型使用在实时的数据流上,如垃圾邮件过滤等。",
"tags": "机器学习; 实时; Apache Spark; Apache Kafka",
"language": "Java",
"license": "Apache License Version 2.0",
"url": "https://github.com/cloudera/oryx",
"reference": null
},
{
"title": "Pylearn2: A machine learning research library based on Theano",
"description": "Pylearn2: A machine learning research library based on Theano.",
"tags": "Theano",
"language": "Python; CUDA; C++",
"license": "3-claused BSD License",
"url": "https://github.com/lisa-lab/pylearn2",
"reference": [
"http://deeplearning.net/software/pylearn2/"
]
},
{
"title": "REINFORCEjs: Reinforcement Learning Agents in Javascript",
"description": "REINFORCEjs is a Reinforcement Learning library that implements several common RL algorithms, all with web demos.",
"tags": "Dynamic Programming; Temporal Difference; Deep Q-Learning; Stochastic/Deterministic Policy Gradients",
"language": "Javascript",
"license": "MIT",
"url": "https://github.com/karpathy/reinforcejs",
"reference": [
"http://cs.stanford.edu/people/karpathy/reinforcejs"
]
},
{
"title": "Apache SAMOA: Scalable Advanced Massive Online Analysis",
"description": "Apache SAMOA is a platform for mining on big data streams. It is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms. Apache SAMOA enables development of new ML algorithms without dealing with the complexity of underlying streaming processing engines (SPE, such as Apache Storm and Apache S4). Apache SAMOA also provides extensibility in integrating new SPEs into the framework. These features allow Apache SAMOA users to develop distributed streaming ML algorithms once and to execute the algorithms in multiple SPEs, i.e., code the algorithms once and execute them in multiple SPEs.",
"tags": "big data stream; online analysis; Machine Learning",
"language": "Java",
"license": "Apache License Version 2.0",
"url": "https://github.com/apache/incubator-samoa",
"reference": [
"http://samoa.incubator.apache.org/"
]
},
{
"title": "Scikit-learn",
"description": "Scikit-learn是一个非常强大的Python机器学习工具包。它通过在现有Python的基础上构建了NumPy和Matplotlib,提供了非常便利的数学工具。这个工具包包括了很多简单且高效的工具,很适合用于数据挖掘和数据分析。",
"tags": "机器学习; 数据挖掘; 数据分析; NumPy; Matplotlib",
"language": "Python",
"license": null,
"url": "https://github.com/scikit-learn/scikit-learn",
"reference": [
"http://scikit-learn.org/stable/"
]
},
{
"title": "基于C++的最古老的机器学习开源库Shogun",
"description": "Shogun是一个基于C++的最古老的机器学习开源库,它创建于1999年。作为一个SWIG库,Shogun可以轻松地嵌入Java、Python、C#等主流处理语言中。它的重点在于大尺度上的内核方法,特别是“支持向量机”的学习工具箱。其中,它包括了大量的线性方法,如LDA、LPM、HMM等等。",
"tags": "机器学习",
"language": "C++",
"license": null,
"url": "https://github.com/shogun-toolbox/shogun",
"reference": [
"http://www.shogun-toolbox.org"
]
},
{
"title": "Apache SINGA",
"description": "A General Distributed Deep Learning Platform",
"tags": "",
"language": "C++; Python; CUDA",
"license": "Apache License Version 2.0",
"url": "https://github.com/apache/incubator-singa",
"reference": [
"http://www.comp.nus.edu.sg/~dbsystem/singa/"
]
},
{
"title": "Sparkta",
"description": "Real Time Aggregation based on Spark Streaming",
"tags": "Real Time Aggregation; Spark Streaming",
"language": "Scala",
"license": null,
"url": "https://github.com/Stratio/Sparkta",
"reference": [
"http://www.stratio.com"
]
},
{
"title": "Stanford Classifier",
"description": "A classifier is a machine learning tool that will take data items and place them into one of k classes. A probabilistic classifier, like this one, can also give a probability distribution over the class assignment for a data item. This software is a Java implementation of a maximum entropy classifier. Maximum entropy models are otherwise known as softmax classifiers and are essentially equivalent to multiclass logistic regression models (though parameterized slightly differently, in a way that is advantageous with sparse explanatory feature vectors). In other words, this is the same basic technology that you're usually getting in various of the cloud-based machine learning APIs (Amazon, Google, ...) The classification method is described in: Christopher Manning and Dan Klein. 2003. Optimization, Maxent Models, and Conditional Estimation without Magic. Tutorial at HLT-NAACL 2003 and ACL 2003.",
"tags": "classifier; machine learning",
"language": "Java",
"license": "GPLv2 (GNU General Public License Version 2)",
"url": "http://nlp.stanford.edu/software/classifier.shtml",
"reference": [
"http://nlp.stanford.edu/wiki/Software/Classifier"
]
},
{
"title": "TensorFlow",
"description": "TensorFlow是一个由“Google大脑”团队的研究人员开发的机器学习库。TensorFlow是一个用来编写和执行机器学习算法的工具。计算在数据流图中完成,图中的节点进行数学运算,边界是在各个节点中交换的张量(Tensors--多维数组)。TensorFlow负责在不同的设备、内核以及线程上异步地执行代码。TensorFlow在台式机、服务器或者移动设备的CPU和GPU上运行,也可以使用Docker容器部署到云环境中。",
"tags": "Google; 机器学习",
"language": null,
"license": null,
"url": "https://tensorflow.googlesource.com/tensorflow/",
"reference": [
"http://www.tensorflow.org",
"http://download.tensorflow.org/paper/whitepaper2015.pdf"
]
},
{
"title": "Theano",
"description": "Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.",
"tags": "",
"language": "Python",
"license": null,
"url": "https://github.com/Theano/Theano",
"reference": [
"http://deeplearning.net/software/theano/"
]
},
{
"title": "Theano-Lights",
"description": "Theano-Lights is a research framework based on Theano providing implementation of several recent Deep learning models and a convenient training and test functionality. The models are not hidden and spread out behind layers of abstraction as in most deep learning platforms to enable transparency and flexiblity during learning and research.",
"tags": "",
"language": "Python",
"license": "MIT",
"url": "https://github.com/Ivaylo-Popov/Theano-Lights",
"reference": null
},
{
"title": "Torch7:一个为机器学习算法提供广泛支持的科学计算框架",
"description": "Torch7是一个为机器学习算法提供广泛支持的科学计算框架,其中的神经网络工具包(Package)实现了均方标准差代价函数、非线性激活函数和梯度下降训练神经网络的算法等基础模块,可以方便地配置出目标多层神经网络。Torch长期以来都是很多机器学习和人工智能项目的核心,不仅是学术界,就连谷歌、Twitter和英特尔等企业也都使用这一架构。Facebook开发了一些能够在Torch7上更快速地训练神经网络的模块,推出了一些优化工具,加快了基于Torch的深度学习项目的运行速度,比如,其中一个工具允许开发者使用多个GPU进行参数的并行训练,还有工具可以使卷积神经网络的训练速度提升数十倍以上,而卷积神经网络是很多深度学习系统的核心。另外,Facebook还推出了多款工具,为Torch自带的功能赋予更快的速度,这些工具的速度常常比Torch默认工具快3至10倍。Torch is the main package in Torch7 where data structures for multi-dimensional tensors and mathematical operations over these are defined. Additionally, it provides many utilities for accessing files, serializing objects of arbitrary types and other useful utilities.",
"tags": "机器学习",
"language": "C",
"license": null,
"url": "https://github.com/torch/torch7",
"reference": [
"http://torch.ch"
]
},
{
"title": "wAlnut",
"description": "Object oriented model of partial human brain with 1 theorized common learning algorithm. Work in progress towards a simple strong AI.",
"tags": "human brain; common learning algorithm",
"language": "Java",
"license": "GPLv3 (GNU GENERAL PUBLIC LICENSE Version 3)",
"url": "https://github.com/WalnutiQ/wAlnut",
"reference": null
},
{
"title": "IBM Watson",
"description": "IBM Watson是一个包含听、看、说以及理解等感知功能的扩展工具集,它提供的API超过了25个,涵盖了近50种技术,其中最主要的服务包括:1、机器翻译——帮助翻译不同语言组合中的文本。2、消息共振——找出短语或单词在预定人群中的流行度。3、问答——为主文档来源触发的查询提供直接的答案。4、用户模型——根据给定的文本预测人们的社会特征。",
"tags": "Mearchine Learning API",
"language": null,
"license": null,
"url": "https://developer.ibm.com/watson/",
"reference": null
},
{
"title": "数据挖掘工作平台Weka",
"description": "Weka是使用Java开发的用户数据挖掘的开源项目。Weka作为一个公开的数据挖掘工作平台,集合了大量能够承担数据挖掘人物的机器学习算法,包括了对数据进行预处理、分类、回归、聚类等等。同时,Weka实现了对大数据的可视化,通过Java设计的新式交互界面上,实现人与程序的交互。",
"tags": "机器学习; 用户数据挖掘; 可视化; 交互",
"language": "Java",
"license": null,
"url": "http://www.cs.waikato.ac.nz/ml/weka/",
"reference": null
}
]
}