5个好用的开源数据挖掘软件

5 of the Best Free and Open Source Data Mining Software

The process of extracting patterns from data is called data mining. It is recognized as an essential tool by modern business since it is able to convert data into business intelligence thus giving an informational edge. At present, it is widely used in profiling practices, like surveillance, marketing, scientific discovery, and fraud detection. 
There are four kinds of tasks that are normally involve in Data mining:

  • Classification - the task of generalizing familiar structure to employ to new data
  • Clustering - the task of finding groups and structures in the data that are in some way or another the same, without using noted structures in the data.
  • Association rule learning - Looks for relationships between variables.
  • Regression - Aims to find a function that models the data with the slightest error.

For those of you who are looking for some data mining tools, here are five of the best open-source data mining software that you could get for free:


Orange

http://www.ailab.si/orange 
这里写图片描述 
Orange is a component-based data mining and machine learning software suite that features friendly yet powerful, fast and versatile visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It contains complete set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is written in C++ and Python, and its graphical user interface is based on cross-platform Qt framework. 
Orange 是一个基于组件的数据挖掘和机器学习软件套装,它的功能即友好,又很强大,快速而又多功能的可视化编程前端,以便浏览数据分析和可视化,基绑定了Python以进行脚本开发。它包含了完整的一系列的组件以进行数据预处理,并提供了数据帐目,过渡,建模,模式评估和勘探的功能。其由C++和 Python开发,它的图形库是由跨平台的Qt框架开发。


RapidMiner

https://rapidminer.com/ 
这里写图片描述 
这里写图片描述 
RapidMiner, formerly called YALE (Yet Another Learning Environment), is an environment for machine learning and data mining experiments that is utilized for both research and real-world data mining tasks. It enables experiments to be made up of a huge number of arbitrarily nestable operators, which are detailed in XML files and are made with the graphical user interface of RapidMiner. RapidMiner provides more than 500 operators for all main machine learning procedures, and it also combines learning schemes and attribute evaluators of the Weka learning environment. It is available as a stand-alone tool for data analysis and as a data-mining engine that can be integrated into your own products. 
RapidMiner,以前叫 YALE (Yet Another Learning Environment),其是一个给机器学习和数据挖掘和分析的试验环境,同时用于研究了真实世界数据挖掘。它提供的实验由大量的算子组成,而这些算子由详细的XML 文件记录,并被RapidMiner图形化的用户接口表现出来。RapidMiner为主要的机器学习过程提供了超过500算子,并且,其结合了学习方案和Weka学习环境的属性评估器。它是一个独立的工具可以用来做数据分析,同样也是一个数据挖掘引擎可以用来集成到你的产品中。


Weka

http://www.cs.waikato.ac.nz/~ml/weka/ 
这里写图片描述 
Written in Java, Weka (Waikato Environment for Knowledge Analysis) is a well-known suite of machine learning software that supports several typical data mining tasks, particularly data preprocessing, clustering, classification, regression, visualization, and feature selection. Its techniques are based on the hypothesis that the data is available as a single flat file or relation, where each data point is labeled by a fixed number of attributes. Weka provides access to SQL databases utilizing Java Database Connectivity and can process the result returned by a database query. Its main user interface is the Explorer, but the same functionality can be accessed from the command line or through the component-based Knowledge Flow interface. 
由Java开发的 Weka (Waikato Environment for Knowledge Analysis)是一个知名机器学机软件,其支持几种经典的数据挖掘任务,显著的数据预处理,集群,分类,回归,虚拟化,以及功能选择。其技术基于假设数据是以一种单个文件或关联的,在那里,每个数据点都被许多属性标注。 Weka 使用Java的数据库链接能力可以访问SQL数据库,并可以处理一个数据库的查询结果。它主要的用户接品是Explorer,也同样支持相同功能的命令行,或是一种基于组件的知识流接口。


JHepWork

http://jwork.org/jhepwork/ 
这里写图片描述 
Designed for scientists, engineers and students, jHepWork is a free and open-source data-analysis framework that is created as an attempt to make a data-analysis environment using open-source packages with a comprehensible user interface and to create a tool competitive to commercial programs. It is specially made for interactive scientific plots in 2D and 3D and contains numerical scientific libraries implemented in Java for mathematical functions, random numbers, and other data mining algorithms. jHepWork is based on a high-level programming language Jython, but Java coding can also be used to call jHepWork numerical and graphical libraries. 
为科学家、工程师和学生所设计的 jHepWork 是一个免费的开源数据分析框架,其主要是用开源库来创建一个数据分析环境,并提供了丰富的用户接口,以此来和那些收费的的软件竞争。它主要是为了科学计算用的二维和三维的制图,并包含了用Java实现的数学科学库,随机数,和其它的数据挖掘算法。 jHepWork 是基于一个高级的编程语言 Jython,当然,Java代码同样可以用来调用 jHepWork 的数学和图形库。


KNIME

http://www.knime.org/ 
这里写图片描述 
KNIME (Konstanz Information Miner) is a user friendly, intelligible, and comprehensive open-source data integration, processing, analysis, and exploration platform. It gives users the ability to visually create data flows or pipelines, selectively execute some or all analysis steps, and later study the results, models, and interactive views. KNIME is written in Java, and it is based on Eclipse and makes use of its extension method to support plugins thus providing additional functionality. Through plugins, users can add modules for text, image, and time series processing and the integration of various other open source projects, such as R programming language, Weka, the Chemistry Development Kit, and LibSVM. 
NIME (Konstanz Information Miner)是一个用户友好,智能的,并有丰演的开源的数据集成,数据处理,数据分析和数据勘探平台。它给了用户有能力以可视化的方式创建数据流或数据通道,可选择性地运行一些或全部的分析步骤,并以后面研究结果,模型以及可交互的视图。 KNIME 由Java写成,其基于 Eclipse 并通过插件的方式来提供更多的功能。通过以插件的文件,用户可以为文件,图片,和时间序列加入处理模块,并可以集成到其它各种各样的开源项目中,比如:R语言,Weka,

If you know of other free and open-source data mining software, please share them with us via comment.

引用自:http://www.junauza.com/2010/11/free-data-mining-software.html

你可能感兴趣的:(数据挖掘)