数据集整理汇总附链接(深度学习)

之前遇到的一些数据集,自己收集一下,归到一起使用方便,可能不是很全,持续更新汇总。。。

1. Image Datasets — 图像数据集

Dataset Link
MNIST http://yann.lecun.com/exdb/mnist/
CIFAR-100 http://www.cs.utoronto.ca/~kriz/cifar.html
Imagenet http://www.image-net.org/
Caltech 101 http://www.vision.caltech.edu/Image_Datasets/Caltech101/
Caltech 256 http://www.vision.caltech.edu/Image_Datasets/Caltech256/
PASCAL VOC https://pjreddie.com/projects/pascal-voc-dataset-mirror/
COCO http://cocodataset.org/
COIL100 http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php
STL-10 http://www.stanford.edu/~acoates//stl10/
Google Open images https://ai.googleblog.com/2016/09/introducing-open- images-dataset.html
Labelme http://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php

2. Speech Datasets — 语音数据集

Dataset Link
Google Audioset https://research.google.com/audioset/dataset/index.html
TIMIT http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1
VoxForge http://www.voxforge.org/
2000 HUB5 English https://catalog.ldc.upenn.edu/LDC2002T43
LibriSpeech http://www.openslr.org/12/
VoxCeleb http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
Open SLR https://www.openslr.org/51
CALLHOME American English Speech https://catalog.ldc.upenn.edu/LDC97S42

3. Text Datasets — 文本数据集

Dataset Link
English Broadcast News https://catalog.ldc.upenn.edu/LDC97S44
SQuAD https://rajpurkar.github.io/SQuAD-explorer/
Billion Word Dataset http://www.statmt.org/lm-benchmark/
20 Newsgroups http://qwone.com/~jason/20Newsgroups/
Google Books Ngrams https://aws.amazon.com/datasets/google-books-ngrams/
UCI Spambase https://archive.ics.uci.edu/ml/datasets/Spambase
Common Crawl http://commoncrawl.org/the-data/
Yelp Open Dataset https://www.yelp.com/dataset

4. Natural Language Datasets — 自然语言数据集

Dataset Link
Web 1T 5-gram https://catalog.ldc.upenn.edu/LDC2006T13
Blizzard Challenge 2018 https://www.synsig.org/index.php/Blizzard_Challenge_2018
Flickr personal taxonomies https://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html
Multi-Domain Sentiment Dataset http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Enron Email Dataset https://www.cs.cmu.edu/~./enron/
Blogger Corpus http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
Wikipedia Links Data https://code.google.com/archive/p/wiki-links/downloads
Gutenberg eBooks List http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
SMS Spam Collection http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
UCI’s Spambase data https://archive.ics.uci.edu/ml/datasets/Spambase

5. Geospatial Datasets — 地理空间数据集

Dataset Link
OpenStreetMap https://www.openstreetmap.org
Landsat8 https://landsat.gsfc.nasa.gov/landsat-8/
NEXRAD https://www.ncdc.noaa.gov/data-access/radar-data/nexrad
ESRI Open data https://hub.arcgis.com/pages/open-data
USGS EarthExplorer https://earthexplorer.usgs.gov/
OpenTopography https://opentopography.org/
NASA SEDAC https://sedac.ciesin.columbia.edu/
NASA Earth Observations https://neo.sci.gsfc.nasa.gov/
Terra Populus https://terra.ipums.org/

6. Recommender Systems Datasets — 推荐系统数据集

Dataset Link
Movielens https://grouplens.org/datasets/movielens/
Million Song Dataset https://www.kaggle.com/c/msdchallenge
Last.fm https://grouplens.org/datasets/hetrec-2011/
Book-crossing Dataset http://www2.informatik.uni-freiburg.de/~cziegler/BX/
Jester https://goldberg.berkeley.edu/jester-data/
Netflix Prize https://www.netflixprize.com/
Pinterest Fashion Compatibility http://cseweb.ucsd.edu/~jmcauley/datasets.html#pinterest
Amazon Question and Answer Data http://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_qa
Social Circles Data http://cseweb.ucsd.edu/~jmcauley/datasets.html#socialcircles

7. Economics and Finance Datasets — 经济和金融数据集

Dataset Link
Quandl https://www.quandl.com/
World Bank Open Data https://data.worldbank.org/
IMF Data https://www.imf.org/en/Data
Financial Times Market Data https://markets.ft.com/data/
Google Trends https://trends.google.com/trends/?q=google&ctab=0&geo=all&date=all&sort=0
American Economic Association https://www.aeaweb.org/resources/data/us-macro-regional
US stock Data https://github.com/eliangcs/pystock-data
World Factbook https://www.cia.gov/library/publications/download/
Dow Jones Index Data Set http://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index

8. Autonomous Vehicles Datasets — 自动驾驶数据集

Dataset Link
BDD100k https://bdd-data.berkeley.edu/
Baidu Apolloscapes http://apolloscape.auto/
Comma.ai https://archive.org/details/comma-dataset
Oxford’s Robotic Car https://robotcar-dataset.robots.ox.ac.uk/
Cityscape Dataset https://www.cityscapes-dataset.com/
CSSAD Dataset http://aplicaciones.cimat.mx/Personal/jbhayet/ccsad-dataset
KUL Belgium Traffic Sign Dataset http://www.vision.ee.ethz.ch/~timofter/traffic_signs/
LISA http://cvrr.ucsd.edu/LISA/datasets.html
Bosch Small Traffic Light https://hci.iwr.uni-heidelberg.de/node/6132
LaRa Traffic Light Recognition http://www.lara.prd.fr/benchmarks/trafficlightsrecognition
WPI Datasets http://computing.wpi.edu/dataset.html

Reference:
《A review of deep learning with special emphasis on architectures,
applications and recent trends》

你可能感兴趣的:(深度学习,图像数据集,语音数据集,文本数据集,自动驾驶数据集,自然语言数据集)