Datasets for Data Minging

转自: http://www.statoo.com/en/resources/anthill/Datamining/Data/

22 Links
  • Competitions in Data Mining and Knowledge Discovery
  • Data Library Sources: Downloadable Data by Subject
  • Data Mining - Case Study
  • Datasets for Data Mining and Knowledge Discovery
  • Datasets for Machine Learning, Knowledge Discovery, Data Mining - Machine Learning network Online Information Service
  • David Dowe's data links
  • Delve Datasets - Collections of data for developing, evaluating, and comparing learning methods
  • Digital Chart of the World
  • Directory of /pub/machine-learning-databases
  • FIMI (Frequent Itemset Mining Implementations): software and datasets
  • Kent Ridge Biomedical Data Set Repository
  • NCDC: Online Document Library, Dataset Documentation
  • Reuters-21578 Text Categorization Collection
  • State and County Demographic and Economic Profiles
  • Surveillance, Epidemiology, and End Results
  • The Royal Statistical Society Dataset Website
  • The UCR Time Series Data Mining Archive
  • TheDataWeb - a network of online data libraries
  • UCI KDD Archive
  • UCI Machine Learning Repository
  • UCR Time Series Data Mining Archive
  • WHO Statistical Information System

(二)
转自: http://www.statoo.com/en/resources/anthill/Data_Sets/

32 Links
  • Climate Data Archives
  • Data Centre - www.marine.csiro.au
  • Data Library Sources: Downloadable Data by Subject
  • Data Sets
  • Data Sources
  • Datafiles by Subject
  • Datasets for Data Mining and Knowledge Discovery
  • Datasets from the Book: Statistical Consulting
  • Delve Datasets - Collections of data for developing, evaluating, and comparing learning methods
  • Digital Chart of the World
  • Donnees SMEL
  • Electronic Dataset Service
  • Kent Ridge Biomedical Data Set Repository
  • Martin Bland's Medical Data-sets
  • MIMAS dataset services
  • NCDC: Online Document Library, Dataset Documentation
  • SNZ- Datalab
  • STA114/MTH136 Andrews and Herzberg Data Sets
  • Stat Labs Data Page
  • State and County Demographic and Economic Profiles
  • Statistical Reference Datasets (StRD)
  • StatLib---Datasets Archive
  • STATWEB of the Swiss Federal Statistical Office
  • Surveillance, Epidemiology, and End Results
  • The Data and Story Library
  • The DataLab at UC Irvine
  • The Royal Statistical Society Dataset Website
  • TheDataWeb - a network of online data libraries
  • U.S. Census Data Database Search
  • UCI KDD Archive
  • UCI ML Repository Content Summary
  • WHO Statistical Information System

(三)
  • Gunnar Raetsch's Benchmark Datasets Various benchmark datasets prepared for Matlab (V6 and V7). Includes BreastCancer, Cards, chess, Circle, credit, Heart1, hepatitis, HouseVotes84, Ionosphere, liver, monks3, musk, PimaIndiansDiabetes, promotergene, ringnorm, Sonar, Spirals, threenorm, tictactoe, titanic and twonorm. Those are Benchmark Data Sets used in [RaeOnoMue01] and [MikRaeWesSchMue99]. Very good for classification tasks.  [RaeOnoMue01 Mirror] [MikRaeWesSchMue99 Mirror]
  • Data from "Benchmarking Support Vector Machines"[MeyerLeischHornik02]. Very good for comparing your classifier or regression algorithm against other algorithms (SVM, KNN, Neural Nets, Bagging, Boosting, Random Forests and others). Includes many data sets such as liver, hepatitis, credit, monks3, HouseVotes84, Sonar, tictactoe, ringnorm, musk, Spirals, threenorm, Ionosphere, BreastCancer, Circle, titanic, Heart1, chess, PimaIndiansDiabetes, promotergene, twonorm, Cards. The data is in images of R. To extract it, you can use the following R-command:for(i in (1:100)){load(sprintf("%i.RData",i)); write.table(train,file=sprintf("%itrain.txt",i));}
  • UCI Machine Learning Repository - Many useful datasets
  • DMOZ - Data sets for machine learning
  • A dataset for path-finding in images (Field Robotics)
  • LETOR - package of benchmark data sets for LEarning TO Rank
  • KIN40K regressions data set
  • Clustering Data Sets (Mammals, Birth/Death Rates, New Haven Schools, Nutrients)
  • UCI and UCIKDD data sets classification and regression in Weka ARFF format. More ARFF datasets such as Protein & Biomedical data, drug design, Reuters21578 as the ModApte split, and various agricultural data sets can be found here.
  • Clustering data sets
  • Fundamental Clustering Problem Suite (FCPS). Includes clustering problems such as Hepta, Lsun, Tetra, Chainlink, Atom, EngyTime, Target, TwoDiamonds, Wingnut and Golfball.
  • RCV1 Text Categorization Test Collection

你可能感兴趣的:(数据挖掘,数据集)