数据挖掘测试数据集大全 关于源代码,网上有很多公开源码的算法包,例如最为著名的Weka,MLC++等。Weka还在不断的更新其算法,下载地址:
ftp://pami.sjtu.edu.cn
http://www.ics.uci.edu/~mlearn/MLRepository.htm
statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/
样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters数据集
http://www.research.att.com/~lewis/reuters21578.html
各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html
时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/
apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html
数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1.A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2.A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3.A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar
癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
kdnuggets 相关链接数据集(借花献佛了):
http://www.kdnuggets.com/datasets/index.html
另一个人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html
以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
Ranking:
【1】MovieLens Data Sets http://www.grouplens.org/node/12
【2】Yahoo! Learning to Rank Challenge http://learningtorankchallenge.yahoo.com/datasets.php
【3】LETOR: Learning to Rank for Information Retrieval http://research.microsoft.com/en-us/um/beijing/projects/letor/default.aspx
【4】http://research.microsoft.com/en-us/projects/mslr/default.aspx
Medical dataset:
【2】http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/DataSets
Video Surveillance:
【1】PETS 2012 Benchmark Data http://pets2012.net/
【2】Surveillance Performance EValuation Initiative (SPEVI) http://www.eecs.qmul.ac.uk/~andrea/spevi.html
【3】Vehicle tracking video datasets http://www.cs.bu.edu/groups/ivc/data/vehicles/
【4】Automobile Data Set http://archive.ics.uci.edu/ml/datasets/Automobile
【5】VIVID Tracking Evaluation Web Site http://vision.cse.psu.edu/data/vividEval/datasets/datasets.html
【6】CLIF datasets
【7】Video Surveillance Online Repository (ViSOR): an integrated framework http://imagelab.ing.unimore.it/visor/video_categories.asp
Synthesis:
【1】UC Irvine Machine Learning Repository http://archive.ics.uci.edu/ml/
【2】The CIFAR-10 dataset http://www.cs.utoronto.ca/~kriz/cifar.html
【3】D4.3 Datasets for CANTATA project http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html
【4】Roadside survey of vehicle observations http://data.gov.uk/dataset/roadside-survey-of-vehicles
【5】http://imagelab.ing.unimore.it/visor/video_categories.asp貌似是论坛的那种
【6】http://signal.ee.bilkent.edu.tr/VisiFire/index.html森林火焰检测。。
【7】http://signal.ee.bilkent.edu.tr/VisiFire/Demo/SmokeClips/Smoke_Far/ 烟雾检测
Vecicle Type Classification
【1】PANMMVRDataset http://pablonegri.free.fr/Downloads/PANMMVRDataset.htm
Action Dataset
【1】Action Dataset http://www.cs.ucf.edu/vision/public_html/data.html
Overhead:
【1】US Highway 101 Dataset http://www.fhwa.dot.gov/publications/research/operations/07030/index.cfm
Face Detection:
[1] Face Recognition Resources on the Internet http://www.ics.uci.edu/~smyth/courses/cs175/
Remote(没仔细看)
[1] http://www.itc.nl/ISPRS_WGIII4/ISPRSIII_4_Test_results/test_results_a1_detect.html
[2] http://www.itc.nl/ISPRS_WGIII4/tests_datasets.html
[3] ISPRS Test Project on Urban Classification and 3D Building Reconstruction http://www2.isprs.org/commissions/comm3/wg4/tests.html
[4] http://www.astrium-geo.com/en/19-gallery?page=2&search=gallery&type=0&sensor=26&resolution=0&continent=0&application=0&theme=0(SAR图像)
[5] Hyperspectral Remote Sensing Scenes http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes
[6] A Freeware Multispectral Image Data Analysis System https://engineering.purdue.edu/~biehl/MultiSpec/download_win.html
[7] Hyperspectral Images https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html
[8] Data & Products http://glcf.umd.edu/data/
[9] High Resolution Panchromatic Image Data https://www.e-education.psu.edu/natureofgeoinfo/c8_p11.html
[10] Optical Remote Sensing http://www.crisp.nus.edu.sg/~research/tutorial/optical.htm (新加坡国立大学研究的)
[11] 遥感 数据/资料 网站汇总 http://www.cnblogs.com/Romi/archive/2012/04/11/2442174.html
[12] A Freeware Multispectral Image Data Analysis System https://engineering.purdue.edu/~biehl/MultiSpec/index.html
Driver Identification in Smart Cars
[1] http://fipa.cs.kit.edu/26.php(感觉挺好的,但没有仔细看)
Panorama database
[1] SUN360 panorama database http://sun360.csail.mit.edu/
Segment
[1] Contour Detection and Image Segmentati on Resources http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
line
[2] York Urban Line Segment Database Information http://elderlab.yorku.ca/YorkUrbanDB/
眼睛运动
Actions in the Eye. Human Eye Movement Datasets http://109.101.234.42/datasets.php
Match:
[1] http://lear.inrialpes.fr/people/mikolajczyk/(这个是Krystian Mikolajczyk大神在Lear, INRIA Rhone-Alpes上的网站)
[2] http://www.featurespace.org/(这个是综合的数据库,包括很多种类的图片,个人感觉检测人的图片挺多的)
[3] Affine Covariant Features http://www.robots.ox.ac.uk/~vgg/research/affine/
[4] http://www.vision.ee.ethz.ch/datasets/index.en.html(综合类的,苏黎世联邦理工学院的)
[5] http://vision.middlebury.edu/MRF/results/ (都是利用马尔科夫随机场做的各种应用)
Track的
【1】Visual Tracking via Locality Sensitive Histograms有代码,有data
Face:
【1】http://vasc.ri.cmu.edu/idb/html/face/ (Face数据库)
火星照片:
【1】http://www.msss.com/mars_images/moc/
【2】http://www.msss.com/msss_images/index.html
Infrared
【1】http://www.imagefusion.org/images/mm-segmentations/mm-segmentations.html
【2】http://www.imagefusion.org/images/mm-segmentations/uncamp/uncamp-i.html
【3】OTCBVS Benchmark Dataset Collection http://www.cse.ohio-state.edu/otcbvs-bench/
Image fusion
[1] image fusion toolbox http://www.metapix.de/download.htm(包含有图)
[2] http://www.imagefusion.org/ (各种图片都有,遥感和不同季节的)
[3] http://www.geodeva.com/very_high_res_geoeye.html
[4] http://vision.ece.ucsb.edu/registration/satellite/
[5] http://emuch.net/bbs/viewthread.php?tid=4957851
[6] http://www.pxleyes.com/photography-contest/19726/multi-focus.html(很多multi focus 的照片)
[7] http://www.quxiaobo.org/software/software_FusingImages.html (是厦门大学一个老师的,有代码有图片)
[8] http://vision.ece.ucsb.edu/research.html (有融合,分割,配准)
[9] http://www.cleangreengems.com/cowdisley/lessons/depthofield.htm(multi-focus)
[10] http://vision.ece.ucsb.edu/fusion/mosaicframework/focus.shtml (multi-focus)
[11] Artifact-free High Dynamic Range Imaging http://users.soe.ucsc.edu/~orazio/deghost.html
[12]www.data-fusion.org http://www.data-fusion.org/introduction
近红外:
[1] LeeKosarajuSankaranarayanan http://white.stanford.edu/teach/index.php/LeeKosarajuSankaranarayanan
[2] Near Infrared http://ivrg.epfl.ch/research/infrared
lidar:
[1] Lidar Dense Ground Truth Datasets http://cvlab.epfl.ch/cms/site/cvlab2/lang/en/data/keypoint
[2] Lidar 的介绍http://resources.arcgis.com/en/help/main/10.1/index.html#//015w00000041000000
[3] LiDAR / ALSM Data http://lidar.asu.edu/data.html
[4] Mount Saint Helens - Lidar Data https://wagda.lib.washington.edu/data/type/elevation/lidar/st_helens/toutle01.html
[5] LiDAR Elevation Data for Minnesota http://www.mngeo.state.mn.us/chouse/elevation/lidar.html
[6] Puget Sound Lidar Consortium http://pugetsoundlidar.ess.washington.edu/lidardata/index.html
[7] ArcGIS 10.1对Lidar支持的魔法力量(一) http://blog.csdn.net/arcgis_all/article/details/8214587
综合总结自:
lsxpu的专栏
dataset 网易博客 数据挖掘者 数据挖掘测试数据集大全“数据挖掘者”在 数据挖掘和推荐算法方面有所建树