大数据
亚马逊网络服务数据
http://aws.amazon.com/datasets
航空公司数据(2009年ASA挑战)
http://stat-computing.org/dataexpo/2009/the-data.html
澳大利亚天气
http://www.bom.gov.au/climate/dwo/
因果关系工作台
http://www.causality.inf.ethz.ch/repository.php
Kaggle竞争数据
https://www.kaggle.com/datasets
KDNuggets竞争网站
www.kdnuggets.com/datasets/
机器学习的数据集存储库
http://mldata.org/
医疗保险数据文件
http://go.cms.gov/19xxPN4
微软研究院
http://research.microsoft.com/apps/dp/dl/downloads.aspx
百万歌曲数据集
http://blog.echonest.com/post/3639160982/million-song-dataset
歌曲数据集
http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets
RDataMining.com R和数据挖掘电子书数据
http://www.rdatamining.com/data
革命分析集合
http://www.revolutionanalytics.com/subscriptions/datasets/
社交网络
http://www.cs.cmu.edu//ancestry.com/ ~ jelsas /数据
UCI机器学习库
http://archive.ics.uci.edu/ml/
535亿点击
1.http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset
2.http://archive.ics.uci.edu/ml/
3.http://www.ics.uci.edu/~mlearn//MLRepository.htm
机器学习样本数据库
1.http://kdd.ics.uci.edu/
2.http://www.ics.uci.edu/~mlearn/MLRepository.html
关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
癌症基因
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融数据
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
斯坦福大学大型网络数据收集
http://snap.stanford.edu/data/
微软匿名网络数据
http://kdd.ics.uci.edu/databases/msweb/msweb.html
MSNBC匿名网络数据
http://kdd.ics.uci.edu/databases/msnbc/msnbc.html
SyskillWebert Web数据
http://kdd.ics.uci.edu/databases/SyskillWebert/SyskillWebert.html
ImageNet (包含1400万的图像)
http://www.image-net.org/
Tiny Images Dataset (包含8000万的32x32图像)
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
MirFlickr1M (包含100万的图像)
http://press.liacs.nl/mirflickr/
CoPhIR (包含1亿600万的图像 )
http://cophir.isti.cnr.it/whatis.html
SBU captioned photo dataset (包含100万的图像)
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/
Large-Scale Image Annotation using Visual Synset(ICCV 2011) (包含2亿图像 )
http://cpl.cc.gatech.edu/projects/VisualSynset/
NUS-WIDE(包含27万的图像)
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
SUN dataset (包含13万的图像)
http://people.csail.mit.edu/jxiao/SUN/
MSRA-MM (包含100万的图像,23000视频 )
http://research.microsoft.com/enus/projects/msrammdata/
TRECVID
http://trecvid.nist.gov/
卡耐基-梅隆的脸图片
http://kdd.ics.uci.edu/databases/faces/faces.html
金星上的火山
http://kdd.ics.uci.edu/databases/volcanoes/volcanoes.html
雅虎发布超大Flickr数据集 1亿的图片+视频
http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for
100多个有趣的数据集
http://www.csdn.net/article/2014-06-06/2820111-100-Interesting-Data-Sets-for-Statistics
图像处理相关个人主页、研究组及公开数据集网址
http://blog.sciencenet.cn/blog-673472-759786.html
Data360
http://www.data360.org/index.aspx
Datamob.org
http://datamob.org/datasets
Factual
http://www.factual.com/topics/browse
Freebase
http://www.freebase.com/
Google
http://www.google.com/publicdata/directory
infochimps: http://www.infochimps.com/
numbray
http://numbrary.com/
Quora
https://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public
RS Collection 100+
http://rs.io/2014/05/29/list-of-data-sets.html
Sample R data sets
http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html
SourceForge研究数据
http://www.nd.edu/ oss /数据/研究司
StatSci.org
http://www.statsci.org/datasets.html
UFO报告
http://www.nuforc.org/webreports.html
维基解密911寻呼机截取
http://911.wikileaks.org/files/index.html
Stats4Stem.org:R数据集
http://www.stats4stem.org/data-sets.html
《华盛顿邮报》名单
http://www.washingtonpost.com/wp-srv/metro/data/datapost.html
农业实验
http://www.insider.org/packages/cran/agridat/docs/agridat
气候数据
http://www.cru.uea.ac.uk/cru/data/temperature/#datter
and ftp://ftp.cmdl.noaa.gov/
Gene Expression Omnibus
http://www.ncbi.nlm.nih.gov/geo/
Geo Spatial Data
http://geodacenter.asu.edu/datalist/
Human Microbiome Project
http://www.hmpdacc.org/reference_genomes/reference_genomes.php
MIT Cancer Genomics Data
http://www.broadinstitute.org/cgibin/cancer/datasets.cgi
NASA
http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
NIH Microarray data
ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/(R)
Protein structure
http://www.infobiotic.net/PSPbenchmarks/
Public Gene Data
http://www.pubgene.org/
斯坦福大学的微阵列数据
http://smd.stanford.edu/
综合社会调查
http://www3.norc.org/GSS +网站/
ICPSR
http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/index.jsp
皮尤研究
http://www.pewinternet.org/datasets/pages/2/
加州大学洛杉矶分校的社会科学档案
http://dataarchives.ss.ucla.edu/Home.DataPortals.html
UPJOHN本月
http://www.upjohn.org/erdc/erdc.html
时间序列
时间序列数据库
http://robjhyndman.com/TSDL/
澳大利亚手语数据
http://kdd.ics.uci.edu/databases/auslan/auslan.html
高质量的澳大利亚手语数据
http://kdd.ics.uci.edu/databases/auslan2/auslan.html
脑电图数据
http://kdd.ics.uci.edu/databases/eeg/eeg.html
日本的元音
http://kdd.ics.uci.edu/databases/JapaneseVowels/JapaneseVowels.html
Pioneer-1移动机器人数据
http://kdd.ics.uci.edu/databases/pioneer/pioneer.html
伪周期合成时间序列
http://kdd.ics.uci.edu/databases/synthetic/synthetic.html
合成控制图时间序列
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.html
卡内基梅隆大学安然电子邮件
http://www.cs.cmu.edu/~安然/
卡内基梅隆大学StatLab
http://lib.stat.cmu.edu/datasets/
龙骨存储库
http://sci2s.ugr.es/keel/datasets.php
卡内基梅隆大学JASA数据归档
http://lib.stat.cmu.edu/jasadata/
俄亥俄州立大学财务数据
http://fisher.osu.edu/fin/osudata.htm
加州大学伯克利分校
http://ucdata.berkeley.edu/
加州大学洛杉矶分校
http://aws.amazon.com/datasets
加州大学河滨分校时间序列
http://www.cs.ucr.edu/ / time_series_data /
多伦多大学
http://www.cs.toronto.edu/深入/数据/datasets.html
UCI知识发现(KDD)归档
http://kdd.ics.uci.edu/
信息和计算机科学
http://www.ics.uci.edu/
加州大学欧文分校
https://uci.edu/
互联网相关数据集
Dataset for “Statistics and SocialNetwork of YouTube Videos”
http://netsg.cs.sfu.ca/youtubedata/
1998 World Cup Web Site Access Logs
http://ita.ee.lbl.gov/html/contrib/WorldCup.html
(从1998/04/26 到 1998/07/26 的92天中,发生了 1,352,804,107次请求)
Page view statistics for Wikimedia projects
http://dammit.lt/wikistats/
AOL Search Query Logs - RP
http://www.researchpipeline.com/mediawiki/index.php?title=AOL_Search_Query_Logs
livedoor gourmet
http://blog.livedoor.jp/techblog/archives/65836960.html
离散序列数据
UNIX用户数据
http://kdd.ics.uci.edu/databases/UNIX_user_data/UNIX_user_data.html
主菜芝加哥推荐数据
http://kdd.ics.uci.edu/databases/entree/entree.html
人口收入调查数据库
http://kdd.ics.uci.edu/databases/census-income/census-income.html
线圈数据
http://kdd.ics.uci.edu/databases/coil/coil.html
Corel图像特征
http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html
森林CoverType
http://kdd.ics.uci.edu/databases/covertype/covertype.html
保险公司基准(2000卷)
http://kdd.ics.uci.edu/databases/tic/tic.html
互联网使用数据
http://kdd.ics.uci.edu/databases/internet_usage/internet_usage.html
IPUMS人口普查数据
http://kdd.ics.uci.edu/databases/ipums/ipums.html
KDD CUP 1998数据
http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
KDD CUP 1999数据
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
1990年美国人口普查数据
http://kdd.ics.uci.edu/databases/census1990/USCensus1990.html
大肠杆菌基因
http://kdd.ics.uci.edu/databases/ecoli/ecoli.html
结核分枝杆菌基因
http://kdd.ics.uci.edu/databases/tb/tb.html
电影
http://kdd.ics.uci.edu/databases/movies/movies.html
MovieLens数据集
http://datahub.io/dataset/movielens
厄尔尼诺现象的数据
http://kdd.ics.uci.edu/databases/el_nino/el_nino.html
20新闻组数据
http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html
路透社- 21578文本分类收集
http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
路透转录子集
http://kdd.ics.uci.edu/databases/reuters_transcribed/reuters_transcribed.html
摘要1990- 2003年NSF研究奖项
http://kdd.ics.uci.edu/databases/nsfabs/nsfawards.html
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html