金融
交通
商业
推荐系统
医疗健康
图像数据
综合图像
场景图像
Web标签图像
人形轮廓图像
视觉文字识别图像
特定一类事物图像
材质纹理图像
物体分类图像
人脸图像
姿势动作图像
指纹识别
其它图像数据
视频数据
综合视频
人类动作视频
目标检测视频
密集人群视频
其它视频
音频数据
综合音频
语音识别
自然语言处理
社会数据
处理后的科研和竞赛数据
1.深度学习常用数据集
2、[导读] “大数据时代”,数据为王!无论是数据挖掘还是目前大热的深度学习领域都离不开“大数据”。大公司们一般会有自己的数据,但对于创业公司或是高校老师、学生来说,“Where can I get large datasets open to the public?”是不得不面对的一个问题。
本文结合笔者在研究生学习、科研期间使用过以及阅读文献了解到的深度学习视觉领域常用的开源数据集,进行介绍和汇总。
MNIST
深度学习领域的“Hello World!”,入门必备!MNIST是一个手写数字数据库,它有60000个训练样本集和10000个测试样本集,每个样本图像的宽高为28*28。此数据集是以二进制存储的,不能直接以图像格式查看,不过很容易找到将其转换成图像格式的工具。
最早的深度卷积网络LeNet便是针对此数据集的,当前主流深度学习框架几乎无一例外将MNIST数据集的处理作为介绍及入门第一教程,其中Tensorflow关于MNIST的教程非常详细。
数据集大小:~12MB
下载地址:
http://yann.lecun.com/exdb/mnist/index.html
Imagenet
MNIST将初学者领进了深度学习领域,而Imagenet数据集对深度学习的浪潮起了巨大的推动作用。深度学习领域大牛Hinton在2012年发表的论文《ImageNet Classification with Deep Convolutional Neural Networks》在计算机视觉领域带来了一场“革命”,此论文的工作正是基于Imagenet数据集。
Imagenet数据集有1400多万幅图片,涵盖2万多个类别;其中有超过百万的图片有明确的类别标注和图像中物体位置的标注,具体信息如下:
1)Total number of non-empty synsets: 21841
2)Total number of images: 14,197,122
3)Number of images with bounding box annotations: 1,034,908
4)Number of synsets with SIFT features: 1000
5)Number of images with SIFT features: 1.2 million
Imagenet数据集是目前深度学习图像领域应用得非常多的一个领域,关于图像分类、定位、检测等研究工作大多基于此数据集展开。Imagenet数据集文档详细,有专门的团队维护,使用非常方便,在计算机视觉领域研究论文中应用非常广,几乎成为了目前深度学习图像领域算法性能检验的“标准”数据集。
与Imagenet数据集对应的有一个享誉全球的“ImageNet国际计算机视觉挑战赛(ILSVRC)”,以往一般是google、MSRA等大公司夺得冠军,今年(2016)ILSVRC2016中国团队包揽全部项目的冠军。
Imagenet数据集是一个非常优秀的数据集,但是标注难免会有错误,几乎每年都会对错误的数据进行修正或是删除,建议下载最新数据集并关注数据集更新。
数据集大小:~1TB(ILSVRC2016比赛全部数据)
下载地址:
http://www.image-net.org/about-stats
COCO
COCO(Common Objects in Context)是一个新的图像识别、分割和图像语义数据集,它有如下特点:
1)Object segmentation
2)Recognition in Context
3)Multiple objects per image
4)More than 300,000 images
5)More than 2 Million instances
6)80 object categories
7)5 captions per image
8)Keypoints on 100,000 people
COCO数据集由微软赞助,其对于图像的标注信息不仅有类别、位置信息,还有对图像的语义文本描述,COCO数据集的开源使得近两三年来图像分割语义理解取得了巨大的进展,也几乎成为了图像语义理解算法性能评价的“标准”数据集。
Google开源的开源了图说生成模型show and tell就是在此数据集上测试的,想玩的可以下下来试试哈。
数据集大小:~40GB
下载地址:http://mscoco.org/
PASCAL VOC
PASCAL VOC挑战赛是视觉对象的分类识别和检测的一个基准测试,提供了检测算法和学习性能的标准图像注释数据集和标准的评估系统。PASCAL VOC图片集包括20个目录:人类;动物(鸟、猫、牛、狗、马、羊);交通工具(飞机、自行车、船、公共汽车、小轿车、摩托车、火车);室内(瓶子、椅子、餐桌、盆栽植物、沙发、电视)。PASCAL VOC挑战赛在2012年后便不再举办,但其数据集图像质量好,标注完备,非常适合用来测试算法性能。
数据集大小:~2GB
下载地址:
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
CIFAR
CIFAR-10包含10个类别,50,000个训练图像,彩色图像大小:32x32,10,000个测试图像。CIFAR-100与CIFAR-10类似,包含100个类,每类有600张图片,其中500张用于训练,100张用于测试;这100个类分组成20个超类。图像类别均有明确标注。CIFAR对于图像分类算法测试来说是一个非常不错的中小规模数据集。
数据集大小:~170MB
下载地址:
http://www.cs.toronto.edu/~kriz/cifar.html
Open Image
过去几年机器学习的发展使得计算机视觉有了快速的进步,系统能够自动描述图片,对共享的图片创造自然语言回应。其中大部分的进展都可归因于 ImageNet 、COCO这样的数据集的公开使用。谷歌作为一家伟大的公司,自然也要做出些表示,于是乎就有了Open Image。
Open Image是一个包含~900万张图像URL的数据集,里面的图片通过标签注释被分为6000多类。该数据集中的标签要比ImageNet(1000类)包含更真实生活的实体存在,它足够让我们从头开始训练深度神经网络。
谷歌出品,必属精品!唯一不足的可能就是它只是提供图片URL,使用起来可能不如直接提供图片方便。
此数据集,笔者也未使用过,不过google出的东西质量应该还是有保障的。
数据集大小:~1.5GB(不包括图片)
下载地址:
https://github.com/openimages/dataset
Youtube-8M
Youtube-8M为谷歌开源的视频数据集,视频来自youtube,共计8百万个视频,总时长50万小时,4800类。为了保证标签视频数据库的稳定性和质量,谷歌只采用浏览量超过1000的公共视频资源。为了让受计算机资源所限的研究者和学生也可以用上这一数据库,谷歌对视频进行了预处理,并提取了帧级别的特征,提取的特征被压缩到可以放到一个硬盘中(小于1.5T)。
此数据集的下载提供下载脚本,由于国内网络的特殊原因,下载此数据经常断掉,不过还好下载脚本有续传功能,过一会儿重新连接就能再连上。可以写一个脚本检测到下载中断后就sleep一段时间然后再重新请求下载,这样就不用一直守着了。(截至发文,断断续续的下载,笔者表示还没下完呢……)
数据集大小:~1.5TB
下载地址:https://research.google.com/youtube8m/
以上是笔者根据学习科研和文献阅读经历总结的目前深度学习视觉领域研究人员常用数据集。由于个人学识有限,难免有疏漏和不当的地方,望读者朋友们不吝赐教。
如果以上数据集还不能满足你的需求的话,不妨从下面找找吧。
1.深度学习数据集收集网站
http://deeplearning.net/datasets/**
收集大量的各深度学习相关的数据集,但并不是所有开源的数据集都能在上面找到相关信息。
2、Tiny Images Dataset
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
包含8000万的32x32图像,CIFAR-10和CIFAR-100便是从中挑选的。
3、CoPhIR
http://cophir.isti.cnr.it/whatis.html
雅虎发布的超大Flickr数据集,包含1亿多张图片。
4、MirFlickr1M
http://press.liacs.nl/mirflickr/Flickr数据集中挑选出的100万图像集。
5、SBU captioned photo dataset
http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/Flickr的一个子集,包含100万的图像集。
6、NUS-WIDE
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htmFlickr中的27万的图像集。
7、Large-Scale Image Annotation using Visual Synset(ICCV 2011)
http://cpl.cc.gatech.edu/projects/VisualSynset/机器标注的一个超大规模数据集,包含2亿图像。
8、SUN dataset
http://people.csail.mit.edu/jxiao/SUN/包含13万的图像的数据集。
9、MSRA-MM
http://research.microsoft.com/en-us/projects/msrammdata/ 包含100万的图像,23000视频;微软亚洲研究院出品,质量应该有保障。
中国是一个“数据大国”,中国的数据开放在政府部门以北京、上海等地为首,陆续开放了交通、天气等数据集;在企业中以新浪微博等为首,开放了真实、有效的数据给研究人员提供了极大的便利;但就计算机视觉领域来说,国内数据集的开放水平和国外相比仍有一定差距。希望国内相关企业和组织能够开放更多优秀的数据集,促进相关行业研究进展,提升中国在相关研究领域的影响力,为推动全人类科学技术的进步贡献自己的一份力量。
常用图像数据集大全
1.搜狗实验室数据集:
http://www.sogou.com/labs/dl/p.html
互联网图片库来自sogou图片搜索所索引的部分数据。其中收集了包括人物、动物、建筑、机械、风景、运动等类别,总数高达2,836,535张图片。对于每张图片,数据集中给出了图片的原图、缩略图、所在网页以及所在网页中的相关文本。200多G
2
http://www.imageclef.org/
IMAGECLEF致力于位图片相关领域提供一个基准(检索、分类、标注等等) Cross Language Evaluation Forum (CLEF) 。从2003年开始每年举行一次比赛.
http://staff.science.uva.nl/~xirong/index.php?n=Main.Dataset
3
Xiaorong Li 维护的数据集。PhD ,Intelligent Systems Lab Amsterdam.research on video and image retrieval.
4
wikipedia featured articles 函数图片(以及特征)以及对应的wiki文本。可以看看文章A New Approach to Cross-Modal Multimedia Retrieval,还有一批文章On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval不过还没有下载链接
http://www.svcl.ucsd.edu/projects/crossmodal/
5
http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
To our knowledge, this is the largest real-world web image dataset comprising over 269,000 images with over 5,000 user-provided tags, and ground-truth of 81 concepts for the entire dataset. The dataset is much larger than the popularly available Corel and Caltech 101 datasets. Though some datasets comprise over 3 million images, they only have ground-truth for a small fraction of images. Our proposed NUS-WIDE dataset has the ground-truth for the entire dataset.
6.
http://www.cs.washington.edu/research/imagedatabase/
7.
http://lear.inrialpes.fr/~jegou/data.php
Jegou的数据集,不过Jegou是专门做CBIR的,图像有ground truth,没有标注。
8.
http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/
vgg的osford building dataset。也是专门CBIR的数据。
9.
http://acmmm13.org/submissions/call-for-multimedia-grand-challenge-solutions/msr-bing-grand-challenge-on-image-retrieval-scientific-track/
The dataset for the Microsoft Image Grand Challenge on Image Retrieval
另外介绍cvpaper上的整理的数据集
http://www.cvpapers.com/index.html
PASCAL VOC 2009 dataset
Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
LabelMe dataset
LabelMe is a web-based image annotation tool that allows researchers to label images and share the annotations with the rest of the community. If you use the database, we only ask that you contribute to it, from time to time, by using the labeling tool.
BioID Face Detection Database
1521 images with human faces, recorded under natural conditions, i.e. varying illumination and complex background. The eye positions have been set manually.
CMU/VASC & PIE Face dataset
Yale Face dataset
Caltech
Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds
Caltech 101
Pictures of objects belonging to 101 categories
Caltech 256
Pictures of objects belonging to 256 categories
Daimler Pedestrian Detection Benchmark
15,560 pedestrian and non-pedestrian samples (image cut-outs) and 6744 additional full images not containing pedestrians for bootstrapping. The test set contains more than 21,790 images with 56,492 pedestrian labels (fully visible or partially occluded), captured from a vehicle in urban traffic.
MIT Pedestrian dataset
CVC Pedestrian Datasets
CVC Pedestrian Datasets
CBCL Pedestrian Database
MIT Face dataset
CBCL Face Database
MIT Car dataset
CBCL Car Database
MIT Street dataset
CBCL Street Database
INRIA Person Data Set
A large set of marked up images of standing or walking people
INRIA car dataset
A set of car and non-car images taken in a parking lot nearby INRIA
INRIA horse dataset
A set of horse and non-horse images
H3D Dataset
3D skeletons and segmented regions for 1000 people in images
HRI RoadTraffic dataset
A large-scale vehicle detection dataset
BelgaLogos
10000 images of natural scenes, with 37 different logos, and 2695 logos instances, annotated with a bounding box.
FlickrBelgaLogos
10000 images of natural scenes grabbed on Flickr, with 2695 logos instances cut and pasted from the BelgaLogos dataset.
FlickrLogos-32
The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection/recognition as well as logo retrieval methods on real-world images. It consists of 8240 images downloaded from Flickr.
TME Motorway Dataset
30000+ frames with vehicle rear annotation and classification (car and trucks) on motorway/highway sequences. Annotation semi-automatically generated using laser-scanner data. Distance estimation and consistent target ID over time available.
PHOS (Color Image Database for illumination invariant feature selection)
Phos is a color image database of 15 scenes captured under different illumination conditions. More particularly, every scene of the database contains 15 different images: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The images contain objects of different shape, color and texture and can be used for illumination invariant feature detection and selection.
CaliforniaND: An Annotated Dataset For Near-Duplicate Detection In Personal Photo Collections
California-ND contains 701 photos taken directly from a real user's personal photo collection, including many challenging non-identical near-duplicate cases, without the use of artificial image transformations. The dataset is annotated by 10 different subjects, including the photographer, regarding near duplicates.
PASCAL VOC 2009 dataset
Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
Caltech
Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds
Caltech 101
Pictures of objects belonging to 101 categories
Caltech 256
Pictures of objects belonging to 256 categories
ETHZ Shape Classes
A dataset for testing object class detection algorithms. It contains 255 test images and features five diverse shape-based classes (apple logos, bottles, giraffes, mugs, and swans).
Flower classification data sets
17 Flower Category Dataset
Animals with attributes
A dataset for Attribute Based Classification. It consists of 30475 images of 50 animals classes with six pre-extracted feature representations for each image.
Stanford Dogs Dataset
Dataset of 20,580 images of 120 dog breeds with bounding-box annotation, for fine-grained image categorization.
Face and Gesture Recognition Working Group FGnet
Face and Gesture Recognition Working Group FGnet
Feret
Face and Gesture Recognition Working Group FGnet
PUT face
9971 images of 100 people
Labeled Faces in the Wild
A database of face photographs designed for studying the problem of unconstrained face recognition
Urban scene recognition
Traffic Lights Recognition, Lara's public benchmarks.
PubFig: Public Figures Face Database
The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects.
YouTube Faces
The data set contains 3,425 videos of 1,595 different people. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames.
MSRC-12: Kinect gesture data set
The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system.
QMUL underGround Re-IDentification (GRID) Dataset
This dataset contains 250 pedestrian image pairs + 775 additional images captured in a busy underground station for the research on person re-identification.
Person identification in TV series
Face tracks, features and shot boundaries from our latest CVPR 2013 paper. It is obtained from 6 episodes of Buffy the Vampire Slayer and 6 episodes of Big Bang Theory.
ChokePoint Dataset
ChokePoint is a video dataset designed for experiments in person identification/verification under real-world surveillance conditions. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2.
BIWI Walking Pedestrians dataset
Walking pedestrians in busy scenarios from a bird eye view
"Central" Pedestrian Crossing Sequences
Three pedestrian crossing sequences
Pedestrian Mobile Scene Analysis
The set was recorded in Zurich, using a pair of cameras mounted on a mobile platform. It contains 12'298 annotated pedestrians in roughly 2'000 frames.
Head tracking
BMP image sequences.
KIT AIS Dataset
Data sets for tracking vehicles and people in aerial image sequences.
MIT Traffic Data Set
MIT traffic data set is for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is recorded by a stationary camera.
Image Segmentation with A Bounding Box Prior dataset
Ground truth database of 50 images with: Data, Segmentation, Labelling - Lasso, Labelling - Rectangle
PASCAL VOC 2009 dataset
Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
Motion Segmentation and OBJCUT data
Cows for object segmentation, Five video sequences for motion segmentation
Geometric Context Dataset
Geometric Context Dataset: pixel labels for seven geometric classes for 300 images
Crowd Segmentation Dataset
This dataset contains videos of crowds and other high density moving objects. The videos are collected mainly from the BBC Motion Gallery and Getty Images website. The videos are shared only for the research purposes. Please consult the terms and conditions of use of these videos from the respective websites.
CMU-Cornell iCoseg Dataset
Contains hand-labelled pixel annotations for 38 groups of images, each group containing a common foreground. Approximately 17 images per group, 643 images total.
Segmentation evaluation database
200 gray level images along with ground truth segmentations
The Berkeley Segmentation Dataset and Benchmark
Image segmentation and boundary detection. Grayscale and color segmentations for 300 images, the images are divided into a training set of 200 images, and a test set of 100 images.
Weizmann horses
328 side-view color images of horses that were manually segmented. The images were randomly collected from the WWW.
Saliency-based video segmentation with sequentially updated priors
10 videos as inputs, and segmented image sequences as ground-truth
Wallflower Dataset
For evaluating background modelling algorithms
Foreground/Background Microsoft Cambridge Dataset
Foreground/Background segmentation and Stereo dataset from Microsoft Cambridge
Stuttgart Artificial Background Subtraction Dataset
The SABS (Stuttgart Artificial Background Subtraction) dataset is an artificial dataset for pixel-wise evaluation of background models.
AIM
120 Images / 20 Observers (Neil D. B. Bruce and John K. Tsotsos 2005).
LeMeur
27 Images / 40 Observers (O. Le Meur, P. Le Callet, D. Barba and D. Thoreau 2006).
Kootstra
100 Images / 31 Observers (Kootstra, G., Nederveen, A. and de Boer, B. 2008).
DOVES
101 Images / 29 Observers (van der Linde, I., Rajashekar, U., Bovik, A.C., Cormack, L.K. 2009).
Ehinger
912 Images / 14 Observers (Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba and Aude Oliva 2009).
NUSEF
758 Images / 75 Observers (R. Subramanian, H. Katti, N. Sebe1, M. Kankanhalli and T-S. Chua 2010).
JianLi
235 Images / 19 Observers (Jian Li, Martin D. Levine, Xiangjing An and Hangen He 2011).
Extended Complex Scene Saliency Dataset (ECSSD)
ECSSD contains 1000 natural images with complex foreground or background. For each image, the ground truth mask of salient object(s) is provided.
CAVIAR
For the CAVIAR project a number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, entering and exitting shops, fighting and passing out and last, but not least, leaving a package in a public place.
ViSOR
ViSOR contains a large set of multimedia data and the corresponding annotations.
3D Photography Dataset
Multiview stereo data sets: a set of images
Multi-view Visual Geometry group's data set
Dinosaur, Model House, Corridor, Aerial views, Valbonne Church, Raglan Castle, Kapel sequence
Oxford reconstruction data set (building reconstruction)
Oxford colleges
Multi-View Stereo dataset (Vision Middlebury)
Temple, Dino
Multi-View Stereo for Community Photo Collections
Venus de Milo, Duomo in Pisa, Notre Dame de Paris
IS-3D Data
Dataset provided by Center for Machine Perception
CVLab dataset
CVLab dense multi-view stereo image database
3D Objects on Turntable
Objects viewed from 144 calibrated viewpoints under 3 different lighting conditions
Object Recognition in Probabilistic 3D Scenes
Images from 19 sites collected from a helicopter flying around Providence, RI. USA. The imagery contains approximately a full circle around each site.
Multiple cameras fall dataset
24 scenarios recorded with 8 IP video cameras. The first 22 first scenarios contain a fall and confounding events, the last 2 ones contain only confounding events.
UCF Sports Action Dataset
This dataset consists of a set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. The video sequences were obtained from a wide range of stock footage websites including BBC Motion gallery, and GettyImages.
UCF Aerial Action Dataset
This dataset features video sequences that were obtained using a R/C-controlled blimp equipped with an HD camera mounted on a gimbal.The collection represents a diverse pool of actions featured at different heights and aerial viewpoints. Multiple instances of each action were recorded at different flying altitudes which ranged from 400-450 feet and were performed by different actors.
UCF YouTube Action Dataset
It contains 11 action categories collected from YouTube.
Weizmann action recognition
Walk, Run, Jump, Gallop sideways, Bend, One-hand wave, Two-hands wave, Jump in place, Jumping Jack, Skip.
UCF50
UCF50 is an action recognition dataset with 50 action categories, consisting of realistic videos taken from YouTube.
ASLAN
The Action Similarity Labeling (ASLAN) Challenge.
MSR Action Recognition Datasets
The dataset was captured by a Kinect device. There are 12 dynamic American Sign Language (ASL) gestures, and 10 people. Each person performs each gesture 2-3 times.
KTH Recognition of human actions
Contains six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors, outdoors with scale variation, outdoors with different clothes and indoors.
Hollywood-2 Human Actions and Scenes dataset
Hollywood-2 datset contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video in total.
Collective Activity Dataset
This dataset contains 5 different collective activities : crossing, walking, waiting, talking, and queueing and 44 short video sequences some of which were recorded by consumer hand-held digital camera with varying view point.
Olympic Sports Dataset
The Olympic Sports Dataset contains YouTube videos of athletes practicing different sports.
SDHA 2010
Surveillance-type videos
VIRAT Video Dataset
The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets.
HMDB: A Large Video Database for Human Motion Recognition
Collected from various sources, mostly from movies, and a small proportion from public databases, YouTube and Google videos. The dataset contains 6849 clips divided into 51 action categories, each containing a minimum of 101 clips.
Stanford 40 Actions Dataset
Dataset of 9,532 images of humans performing 40 different actions, annotated with bounding-boxes.
50Salads dataset
Fully annotated dataset of RGB-D video data and data from accelerometers attached to kitchen objects capturing 25 people preparing two mixed salads each (4.5h of annotated data). Annotated activities correspond to steps in the recipe and include phase (pre-/ core-/ post) and the ingredient acted upon.
AFEW (Acted Facial Expressions In The Wild)/SFEW (Static Facial Expressions In The Wild)
Dynamic temporal facial expressions data corpus consisting of close to real world environment extracted from movies.
ETHZ CALVIN Dataset
IPM Vision Group Image Stitching datasets
Images and parameters for registeration
VIP Laparoscopic / Endoscopic Dataset
Collection of endoscopic and laparoscopic (mono/stereo) videos and images
Zurich Buildings Database
ZuBuD Image Database contains over 1005 images about Zurich city building.
Color Name Data Sets
Mall dataset
The mall dataset was collected from a publicly accessible webcam for crowd counting and activity profiling research.
QMUL Junction Dataset
A busy traffic dataset for research on activity analysis and behaviour understanding.
CVOnline的数据集
http://homepages.inf.ed.ac.uk/rbf/CVonline/CVentry.htm
监控视频相关数据集
Website:
Datasets are available here.
Dataset:
The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside. In particular, the BOSS concepts will be evaluated and demonstrated in the context of railway transport. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out.
Dataset include 15 sequences shot by 9 cameras and 8 microphones, all synchronized together to give the possibility of 3D video/audio reconstruction.
In these datasets, we can find the following events:
- Cell phone theft (in Spanish language).
- Check out - a passenger checking out another man's wife, then fighting (in French language).
- Disease - a series of 3 passengers fainting, alone in the coach (both in French and Spanish).
- Disease in public (both in French and Spanish).
- Harass - 3 sequences in which a man harasses a woman. In "Harass2", there are other passengers in the coach.
- Newspaper - two sequences (one in French, one in Spanish) in which a passenger harasses another passenger for his newspaper, and end up assaulting him.
- Panic (in French language) - a passenger notices a fire in the next coach, and everybody runs out of the train.
- Two more sequences are provided, containing no incidents whatsoever. They were shot to assess the robustness of incident detection software to false alarms.
- Other sequences are provided, which are not acted incidents but were used for specific incident detection tasks.
Metadata:
Events generated by the BOSS processing are given for some sequences, in a file called "nameofthesequence.xml", in the same directory as the data set of the sequence itself. The format and types of the events are described in a PDF files.
Contextual info:
All the sequences were shot in a Madrid suburban train kindly lent by RENFE who are gratefully acknowledged.
In order to allow as much flexibility as possible, all the video files are uncalibrated, the calibration files are provided along with each sequence and the description of how to use them is given in calibTutorial.pdf . An associated Matlab library is provided in BOSScalibTutorial.zip.
Comments:
Copyrights:
The sequences are provided free of charge for academic research. For any other use, please ask the contact person. Should you care to publish these sequences or results obtained using, please indicate their origin as "BOSS project", and mention the address of the project: http://www.celtic-boss.org.
You are welcome to provide a link to the location of the sequences, but copying them to another web site is subject to prior consent of the contact person.
Contact:
Website:
Datasets are available here:
http://www.emav09.org/
The objective of the EMAV 2009 (European Micro Aerial Vehicle Conference and Flight Competition) conference is to provide an effective and established forum for discussion and dissemination of original and recent advances in MAV technology. The conference program will consist of a theoretical part and a flight competition. We aim for submission of papers that address novel, challenging and innovative ideas, concepts or systems. We particularly encourage papers that go beyond MAV hardware, and address issues such as the collaboration of multiple MAVs, applications of computer vision, and non-GPS based navigation.
Dataset:
For computer vision researchers an image set is published. The set consists of photos taken with various MAV platforms at different locations. The photos are always stills from movies made by the platform. For this EMAV, there is no explicit assignment or competition linked to this data set. However, possible tasks with the data set are: segmentation of the images in meaningful entities, specific object recognition (cars / roads), construction of image mosaics on the basis of the films, etc.
Metadata:
Contextual info:
Comments:
Copyrights:
Contact:
info [-at-] emav2009.org
Website:
Datasets are available here:
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
Dataset:
The Caltech Pedestrian Dataset consists of approximately 10 hours of 640x480 30Hz video taken from a vehicle driving through regular traffic in an urban environment. About 250,000 frames (in 137 approximately minute long segments) with a total of 350,000 bounding boxes and 2300 unique pedestrians were annotated.
Metadata:
The annotation includes temporal correspondence between bounding boxes and detailed occlusion labels. More information can be found in our CVPR09 paper.
Associated Matlab code is available. The annotations use a custom "video bounding box" (vbb) file format. The code also contains utilities to view seq files with the annotations overlayed, evaluation routines used to generate all the ROC plots in the paper, and also the vbb labeling tool used to create the dataset (a slightly outdated video tutorial of the labeler is also).
Contextual info:
Comments:
Copyrights:
Contact:
pdollar[at]caltech.edu
Website:
Datasets are available here (registration is needed):
http://ngsim.fhwa.dot.gov/modules.php?op=modload&name=News&file=article&sid=4
Dataset:
Detailed vehicle trajectory data on parts of highways
Metadata:
Contextual info:
Comments:
Copyrights:
Need to register before using the NGSIM Data Sets.
Contact:
Website:
Datasets are available here (registration is needed)
http://corpus.amiproject.org/amicorpus/download/download
Dataset:
This dataset consists in meeting room scenarios, with two people sitting around meeting tables
Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains.
Metadata:
Annotations are available for many different phenomena (dialog acts, head movement etc. ).
See here for more information.
Contextual info:
Comments:
Copyrights:
Contact:
Website:
http://www.fp6-moryne.org/
MORYNE aims at contributing to greater transport efficiency, increased transport safety and more environmental friendly transport by improving traffic management in an urban and sub-urban area.
Dataset:
There are sequences from both demonstration busses of the MORYNE project.
Filenames explicitly provide the date and time of acquisition.
Metadata:
Ground truth is provided in XML format as following:
< event >
< time >2008-01-18T10:05:10.747209< /time >
< name >ODOINFO
< parameters >
< sender >OBU< /sender >
< target >MVS< /target >
< starttime >2008-01-18T10:05:10.747209< /starttime >
< stoptime >2008-01-18T10:05:11.784436< /stoptime >
< distance >9.216714< /distance >
< /parameters >
< /event >
This file gives the distance covered by the bus during the interval starttime - stoptime.
Contextual info:
.idx files
----------
.idx files contain the date and time for each frame in the sequence. The structure of this file is:
- header of 12 bytes
- For each frame, a structure of 24 bytes
The structure contains:
- unsigned 32 bits integer: seconds since Epoch
- unsigned 32 bits integer: microseconds in the second
- unsigned 64 bits integer: offset in bytes in the .avi file
- unsigned 32 bits integer: frame number starting with 0
- unsigned 32 bits integer: frame type as defined by libavcodec (may be useless)
All integers are encoded in little endian.
Comments:
The material for camera calibration and bus speed/context metadata will be added as soon as possible.
Copyrights:
This folder contains a list of test sequences which have been recorded for the MORYNE project (http://www.fp6-moryne.org).
They can be used for non-commercial purpose only, if a reference to the MORYNE project is associated to their use (e.g. in publications, video demontrations...).
Contact:
christophe.parisot(at)multitel.be
Website:
Datasets are available here:
http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/CROWDS/index.html
Dataset:
Data for the real scene:
These are the smoothed flow sequences for the Waverly train station scene. There are 4 files number. (002) is used for testing, the remaining used for training.
Data for the simulated scene
These are the smoothed flow sequences for the train station simulation. There are 30 files divided in the groups below. Use from frame 1100 to 4000. The emergency is at frame 2000.
Group 1: Normal - Training
Group 2: Normal - Testing
Group 3: Emergency - Blocked exit at the bottom of the scene.
Metadata:
No Ground Thruth available
Contextual info:
Comments:
Copyrights:
Free download from website.
Contact:
Dimitrios Makris, [email protected]
Website:
http://www.multitel.be/~va/cantata/LeftObject/
Dataset:
A number of video clips were recorded acting out the scenario of interest: left objects. 31 sequences of two minutes have be recorded, showing different left objects scenarios (1 or more objects, person staying close to the left object, etc).
The 31 scenarios have been recorded using 2 different cameras (not synchronised), with two different views:
- a Panasonic camera - miniDV, model NV-DS28EG (camera1)
- a Sony camera - miniDV, model DSR-PD170P (camera2)
The videos have the following caracteristics:
- A resolution of 720x576 pixels
- 25 frames per second
- A compression using MPEG4
- The file sizes are of 75 Mo for camera1 and 65 Mo for camera2.
Metadata:
All the sequences are annotated using XML format. Each sequence is associated with a ".xml" annotation file with the same name ending by .gt.xml.
For each left object, we can find in the xml:
- the exact time of the detection
- the position of the object in the image
Contextual info:
Comments:
In each sequence, nothing appends before 30 seconds, and after 1m45s.
Copyrights:
Free download from website. If you publish results using the data, please acknowledge the data as coming from the CANTATA project, found at URL: http://www.hitech-projects.com/euprojects/cantata/. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND
Contact:
Website:
Datasets are available here:
http://imagelab.ing.unimore.it/visor/
Dataset:
4 types of video clips. These sequences constitute a representative panel of different video surveillance areas.
They merge indoor and outdoor scenes, such as Indoor Domotic Unimore D.I.I. setup.
Metadata:
Object Detection and Tracking.
Contextual info:
Comments:
Mostly simple videos.
Copyrights:
Free download
Contact:
Website:
Sequences are available here:
http://i21www.ira.uka.de/image_sequences/
Dataset:
Metadata:
Camera projection data in the file proj.dat which uses the following format:
tx ty tz # Translation vector Global <---> Camera Coordinates r11 r12 r13 # r21 r22 r23 # > 3x3 Rotation Matrix Global <---> Camera r31 r32 r33 # / fx # Focal length x-direction (pixels) fy # Focal length y-direction (pixels, usually 4/3 * fx) x0 # Image Center X (pixels) y0 # Image Center Y (pixels) 1 # Sharp shadows visible (1=true, 0=false) phi # Azimut angle for shadow theta # Polar angle for shadow
Contextual info:
Different context, snow, fogs, etc.
Comments:
Copyrights:
license (no), cost (free)
Contact:
Sabri Boughorbel (mailto:[email protected])
Website:
No
Dataset:
Traffic jam.
Metadata:
Contextual info:
Camera height 12m, Camera: inch sensor, 4 mm lens.
Comments:
Period of road markings is 12m (9+3).
Copyrights:
License (no), cost (free): When dataset is used refer and give credit to Traficon N.V. as follows: " www.traficon.com".
Contact:
Wouter Favoreel, [email protected]
Website:
Datasets are available here:
http://www.multitel.be/~va/candela/
Dataset:
Two different scenarios have been relaized during the CANDELA project : "Indoor abandonned object" and "road intersection".
o Scenario 1: Abandoned object. The detection of abandoned objects is more or less the detection of idle (stationary or non-moving) objects that remain stationary over a certain period of time. The period of time is adjustable. In several types of scenes, idle objects should be detected. In a parking lot e.g., an idle object can be a parked car or a left suitcase. For this scenario we are not looking at the object types "person" or "car", but at unidentified objects, called "unknown objects". An unknown object is any object that is not a person or a vehicle. In general, unknown objects cannot move. What should be detected? : Whenever an unknown object appears in the scene and remains stationary for some amount of time person, an alarm needs to be generated. This alarm must remain active, as long as the unknown object remains stationary.
o Scenario 2: Persons are allowed to cross the street at zebra crossings, a crossing controlled with lights. Alarms should be generated when persons are not allowed to be on the crossing, or when dangerous scenarios occur (cars driving when people crossing). Since the external signal from the traffic light is not available (when the crossing is regulated by traffic lights), detection needs to be done automatically. Detection of persons on the crossing itself is pretty easy, but alarms should only be given when persons are on the crossing, and cars are driving.
Metadata:
Detailed information about data and metadatas can be found here:
http://www.hitech-projects.com/euprojects/candela/pr/scenario_description_document_v06.pdf
Contextual info:
Comments:
Copyrights:
Public domain
Contact:
Xavier Desurmont, [email protected]
Website:
Datasets are available here:
http://development.objectvideo.com/
Dataset:
The ObjectVideo Virtual Video provides the ability to generate virtual video sequences. These video sequences can then be used to test VCA algorithms.
Metadata:
The automatically generated ground truth is generated in a propriety binary format. The format is open, and a conversion program can be created to convert metadata to any format. A simple bounding box scheme is available, for more powerful validation a "blob" video can be created.
Contextual info:
Virtual environment, the user can make his own environment from the internet. Several camera settings can be changed to simulate real-world cameras more closely.
Comments:
This is not a dataset as is but using these tools, very powerful and tailored; test videos can be created.
Copyrights:
The ObjectVideo Virtual Video Tool is provided free for non-commercial use, for your own research and development purposes. If you publish or distribute images, videos or derivative results based on this software, you must acknowledge ObjectVideo by including "ObjectVideo Virtual Video Tool".
To use the ObjectVideo Virtual Video tool a licence for the commercial game Half-Life 2 is needed (www.steampowered.com).
Contact:
Rick Koeleman, VDG-Security bv. [email protected]
Website:
http://domino.research.ibm.com/comm/research_projects.nsf/pages/s3.performanceevaluation.html
Dataset:
4 outdoor (from PETS2001) of people and vehicles and 11 indoor clips of people.
Metadata:
Motion detection and motion tracking
Contextual info:
Comments:
Copyrights:
Free download from website
Contact:
Dimitrios Makris, [email protected]
Website:
http://www.spevi.org
Dataset:
This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences. Total number of images: 2769, DivX 6 compression,640 x 480 pixels,25 Hz.
Sensor details
- video camera: JVC GR-20EK
Metadata:
Contextual info:
Comments:
Copyrights:
Requested citation acknowledgment: E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007
Contact:
Xavier Desurmont, [email protected]
Website:
http://www.spevi.org
Dataset:
This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. Three sequences (motinas_toni, motinas_toni_change_ill and motinas_nikola_dark) are shot with a hand held camera (JVC GR-20EK). In motinas_toni the target moves under a constant bright illumination; in motinas_toni_change_ill the illumination changes from dark to bright; the sequence motinas_nikola_dark is constantly dark. Two sequences (motinas_emilio_webcam and motinas_emilio_webcam_turning) are shot with a webcam (Logitech Quickcam) under a fairly constant illumination.Total number of images: 3018, DivX 6 compression, 640 x 480 pixels and 25 Hz (motinas_toni, motinas_toni_change_ill, motinas_nikola_dark), 320 x 240 pixels and 10 Hz (motinas_emilio_webcam and motinas_emilio_webcam_turning)
Metadata:
The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects' position and size in a frame. The syntax of a line is the following: frame number_of_objects obj_1_name x y half_width half_height angle obj_2_name x y half_width half_height angle ...
Contextual info:
Comments:
Copyrights:
Requested citation acknowledgment E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.
Contact:
Xavier Desurmont, [email protected]
Website:
http://www.spevi.org
Dataset:
This is a dataset for uni-modal and multi-modal (audio and visual) people detection tracking. The dataset consists of three sequences recorded in different scenarios with a video camera and two microphones. Two sequences (motinas_Room160 and motinas_Room105) are recorded in rooms with reverberations. The third sequence (motinas_Chamber) is recorded in a room with reduced reverberations. The camera is placed in the centre of a bar that supports two microphones. Total number of images: 3271, Format of images: 8-bit color AVI 360 x 288 pixels 25 fps, audio sampling rate: 44.1 kHz.
Sensor details
- The camera is placed in the centre of a bar that supports two microphones
- Distance between the microphones: 95 cm
- Microphones: Beyerdynamic MCE 530 condenser microphones
- Camera: KOBI KF-31CD analog CCD surveillance camera
Metadata:
The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.
Contextual info:
Comments:
Copyrights:
Requested citation acknowledgment Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)
Contact:
Xavier Desurmont, [email protected]
Website:
Datasets are available here: (registration is needed)
http://www-sop.inria.fr/orion/ETISEO/
Dataset:
86 video clips. These sequences constitute a representative panel of different video surveillance areas.
They merge indoor and outdoor scenes, corridors, streets, building entries, subway station... They also mix different types of sensors and complexity levels.
Metadata:
5 different levels: Object Detection, Object Localization, Object Tracking, Object Classification.
Contextual info:
Zone of interest, calibration matrix
Comments:
Copyrights:
Free download but registration and user agreement is required.
Contact:
Website:
These datasets have been realized during the SELCAT project.
http://www.levelcrossing.net/
Datasets are available here:
http://www.multitel.be/~va/selcat
Dataset:
These datasets are composed of 24 Hours of real sequences, showing a level crossing where some vehicles stop due to its particular configuration: on the right side of the LC, there is an avenue, parallel to the LC. So a traffic light is located just after the LC. Consequently, sometimes, vehicles stopped on the LC due to this traffic light. The Total Amount of data is about 7 GigaBytes.
Metadata:
For each video files, there is a corresponding ground truth file in XML that gives the timestamp of events "stopped vehicles".
Contextual info:
Environment conditions (calibration, scene...)
Comments:
Copyrights:
Licence, Cost, etc.
Contact:
Caroline Machy, [email protected]
Website:
http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/
Dataset:
The dataset comprises of two views of various scenario's of people acting out various interactions. Ten basic scenarios were acted out. These were called InGroup (IG), Approach (A), WalkTogether (WT), Split (S), Ignore (I), Following (FO), Chase (C), Fight (FI), RunTogether (RT), and Meet (M).The data is captured at 25 frames per second. The resolution is 640x480. The videos are available either as AVI's or as a numbered set of JPEG single image files.
Metadata:
Tracking, Event detection.
Contextual info:
3D coordinates of points for calibration purposes provided.
Comments:
The site will be updated when more of the ground truth becomes available.
Copyrights:
Free download from website.
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets ate available here:
http://www.pets2007.net/
Dataset:
The datasets are multisensor sequences containing the following 3 scenarios, with increasing scene complexity: 1. loitering, 2. attended luggage removal (theft), 3. unattended luggage.
Metadata:
Event Detection
Contextual info:
Calibration provided
Comments:
Free download from website . The UK Information Commisioner has agreed that the PETS 2007 datasets described here may be made publicly available for the purposes of academic research. The video sequences are copyright UK EPSRC REASON Project consortium and permission is hereby granted for free download for the purposes of the PETS 2007 workshop.
Copyrights:
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets are available here:
http://www.pets2006.net/
Dataset:
Surveillance of public spaces, detection of left luggage events. Scenarios of increasing complexity, captured using multiple sensors.
Metadata:
All scenarios come with two XML files. The first of these files contains camera calibration parameters, these are given in the sub-directory 'calibration'. See the previous section (Calibration Data) for information on this XML file format. The second XML file (given in the sub-directory 'xml') contains both configuration and ground-truth information.
Contextual info:
Calibration provided.
Comments:
Copyrights:
Free download from website . The UK Information Commisioner has agreed that the PETS 2006 data-sets described here may be made publicly available for the purposes of academic research. The video sequences are copyright ISCAPS consortium and permission is hereby granted for free download for the purposes of the PETS 2006 workshop.
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets are available here: (registration is needed)
http://www.vast.uccs.edu/~tboult/PETS05/
Dataset:
Challenging detection/tracking scenes on water.
Metadata:
Object Detection/Tracking.
Contextual info:
Comments:
Copyrights:
Free download from website, but registration is required.
Contact:
Dimitrios Makris, [email protected]
Website:
http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/
or http://www-prima.inrialpes.fr/PETS04/caviar_data.html
Dataset:
A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.
Metadata:
Person/Group Tracking, Person/Group Activity Recognition, Scenario/Situation Recognition
Contextual info:
3D coordinates of points for calibration purposes provided.
Comments:
Copyrights:
Free download from website. If you publish results using the data, please acknowledge the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at URL:http://www.dai.ed.ac.uk/homes/rbf/CAVIAR/
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets are available here:
http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html
Dataset:
Indoor people tracking (and counting). Two training and four testing sequences consist of people moving in front of a shop window. Sequences are provided as both MPEG movie format and as individual JPEG images.
Metadata:
People tracking, counting and activity recognition.
Contextual info:
No calibration provided
Comments:
Copyrights:
Free download from website
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets are available here:
http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html
http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset
Dataset:
Outdoor people and vehicle tracking (two synchronised views; includes omnidirectional and moving camera). PETS'2001 consists of five separate sets of training and test sequences, i.e. each set consists of one training sequence and one test sequence. All the datasets are multi-view (2 cameras) and are significantly more challenging than for PETS'2000 in terms of significant lighting variation, occlusion, scene activity and use of multi-view data.
Metadata:
Tracking information on image plane and ground plane can be found at:
http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/
Contextual info:
Camera Calibration provided
Comments:
Copyrights:
Free download from website
Contact:
Dimitrios Makris, [email protected]
Website:
ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/
Dataset:
Outdoor people and vehicle tracking (single camera).
Two sequences:
a) Training sequence of 3672 frames at 25 Hz (146.88 secs).
b) Test sequence of 1452 frames (58.08 secs).
The sequences are available in 2 formats:
a) QuickTime movie format with Motion JpegA compression (training.mov and test.mov).
b) Individual Jpeg files (training_images/*.jpg and test_9images/*.jpeg).
Metadata:
No Ground Truth provided.
Contextual info:
Camera Calibration provided.
Comments:
Copyrights:
Free download
Contact:
Dimitrios Makris, [email protected]
Website:
Website: http://www.cvg.rdg.ac.uk/slides/pets.html
Dataset:
Each year PETS runs an evaluation framework on specific datasets with specific objective. 2000: 2001.... (more on duration and theme)
Metadata:
Ground truth depends on the theme of each year's workshop.
Contextual info:
Comments:
Copyrights:
Free download from website
Contact:
Dimitrios Makris, [email protected]
Website:
http://scienceandresearch.homeoffice.gov.uk/hosdb/cctv-imaging-technology/video-based-detection-systems/i-lids/
Dataset:
4 scenarios (Parked Vehicle, Abandoned Package, Doorway Surveillance and Sterile Zone) x 2 datasets (training, testing) each. Each dataset contains about 24 hours of footage in few different scenes.
Metadata:
Event-based Ground truth.
Contextual info:
Images of a pedestrian model in different positions are given for calibration purposes
Comments:
7 free clips for 2 scenarios (Parked Vehicle, Abandoned Package) are available from: http://www.elec.qmul.ac.uk/staffinfo/andrea/avss2007_d.html
Copyrights:
A user agreement and a payment (£500-£650 per dataset) is required to obtain each dataset. Datasets are provided in hard disks.
Contact:
Dimitrios Makris, [email protected]
Website:
Datasets are available here:
http://marathon.csee.usf.edu/Mammography/Database.html
Dataset:
The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The database contains approximately 2620 cases available in 43 volumes (healthy and diseased).
Metadata:
Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions.
Contextual info:
Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution, ...). A case consists of between 6 and 10 files. These are an "ics" file, an overview "16-bit PGM" file, four image files that are compressed with lossless JPEG encoding and zero to four overlay files. Normal cases will not have any overlay files.
Comments:
Copyrights:
If you use data from DDSM in publications:
Please credit the DDSM project as the source of the data, and reference: ?The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5?. ?Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. MunishKumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography?. Also, please send a copy of your publication to Professor Kevin Bowyer / Computer Science and Engineering / University of Notre Dame / Notre Dame, Indiana 46530.
Contact:
Cedric Marchessoux, [email protected]
Website:
Datasets are available here:
http://www9.informatik.uni-erlangen.de/External/vollib/
Dataset:
Name of the set, Anatomy, resolution, number of bits
Metadata:
Contextual info:
Environment conditions (calibration, scene...): scanning parameters
Comments:
Mainly CT, PET, MRI. Additional comments are available, all the dataset are not only medical content, you could find a scan of a bonzaï. The raw data can be extracted easily using the PVM tools distributed with the V^3 volume rendering package available at http://www.stereofx.org/
Copyrights:
Commercial use is prohibited and no warranty whatsoever is expressed, credit should be given to the group who created the dataset.
Contact:
Stefan Roettger ([email protected]) or Cedric Marchessoux ([email protected])
Website:
http://pubimage.hcuge.ch:8080
http://pubimage.hcuge.ch/
Dataset:
DICOM sample image sets with alias name, the modality, the file size with a short description.
Metadata:
Contextual info:
Environment conditions (calibration, scene...)
Comments:
Mainly CT and MRI, more than 10 GB of data.
Copyrights:
Click on the thumbnail images to download the full set of corresponding DICOM images
Contact:
Cedric Marchessoux ([email protected])
Website:
Datasets are available here:
http://www.MyPACS.net
Dataset:
MyPACS.net is still free, and it now has over 16,500 teaching files contributed by 14,000 registered users. With 75,000 key images categorized by anatomy and pathology, you can quickly find examples of any disease. The web-based viewer has been improved with more PACS-like features, and it still works instantly in your browser, requiring nothing to download.
The datasets contain:
1. Cranium and Contents (1205)
2. Face and Neck (398)
3. Spine and Peripheral Nervous System (504)
4. Skeletal System (3433)
5. Heart (160)
6. Chest (894)
7. Gastrointestinal (1271)
8. Genitourinary (800)
9. Vascular/Lymphatic (416)
10. Breast (62)
11. Other (458)
Metadata:
Description of the pathology by medical doctors.
Contextual info:
Environment conditions (calibration, scene...): Medical modality described: Brand and acquisition conditions
Comments:
Copyrights:
MyPACS.net is still free, you need to be registered.
Contact:
Cedric Marchessoux ([email protected])
Website:
Datasets are available here:
https://imaging.nci.nih.gov/ncia/
Dataset:
Description of Dataset (Content, size, etc): CT scans with xml files for the ground truth, and also other modalities.
Metadata:
Groundtruth stored in xml
Contextual info:
Environment conditions (calibration, scene...): X-ray scanner system: Brand and acquisition conditions
Comments:
Copyrights:
The user should ask for a login. You may browse, download, and use the data for non-commercial, scientific and educational purposes. However, you may encounter documents or portions of documents contributed by private institutions or organizations. Other parties may retain all rights to publish or produce these documents. Commercial use of the documents on this site may be protected under United States and foreign copyright laws. In addition, some of the data may be the subject of patent applications or issued patents, and you may need to seek a license for its commercial use. NCI does not warrant or assume any legal liability or responsibility for the accuracy, completeness or usefulness of any information in this archive.
Contact:
Cedric Marchessoux ([email protected])
Website:
No official website, via Elizabeth Krupinski ([email protected])
Dataset:
Real masses, micro calcifications, backgrounds, conventional x-ray mammography, bmp images with resolution of 256x256.
Metadata:
None, signals can be extracted by substraction between backrgrounds alone and background+signals at 100% density
Contextual info:
Environment conditions (calibration, scene...): X-ray system
Comments:
See examples:
1. Backgrounds,
2. Signals: masses
3. Signals: micro calcifications
Copyrights:
Via Elizabeth Krupinski ([email protected]) free but credit should be given to them if publication.
Contact:
Elizabeth Krupinski ([email protected]) or Cedric Marchessoux ([email protected])
Website:
Datasets are available here:
http://www.jsrt.or.jp/web_data/english03.html
Dataset:
Around 5 datasets of 250 images, x-ray chest healthy and diseased with nodules. 2048x2048, white is zero, big endian.
Metadata:
Per image, clinical metadata in txt file for each image with patient information age, sexe and images in itf with nodule, cancer, infection position.
Contextual info:
Environment conditions (calibration, scene...): X-ray system
Comments:
THe dataset should be ordered by email with a Visa card number. The dataset is delivered by post after one week. The price per dataset is more than reasonable.
Copyrights:
For publication credit should be given by citing in references the following article:
o J. Shiraishi et al. Development of a Digital Image Database for Chest Radiographs with and without a Lung Nodule: Receiver Operating Characteristic Analysis of Radiologists, Detection of Pulmonary Nodules. AJR, 174(1):71-74, 2000.
Contact:
Cedric Marchessoux ([email protected])
Website:
Dataset can be found here: http://vision.middlebury.edu/flow/data/
Dataset:
Datasets are here composed of sets of images to evaluate optical flow.
Sets can be made of 2 or 8 images for the evaluation in color or graylevel format.
Metadata:
GT is not provided for all datasets
Contextual info:
Flow accuracy and interpolation evaluation
We report two measures of flow accuracy (angular and end-point error) and two measures of interpolation quality. For each of the 4 measures we report 8 error metrics, resulting in a total of 32 tables. Links to the 4 measures are included below, but the tables are also linked among each other. At this point we do not identify a "default" measure or metric, and thus we do not provide an overall ranking of methods.
Comments:
The ground-truth flow is provided in a .flo format. Information and C++ code is provided in flow-code.zip, which contains the file README.txt. A Matlab version is also available in flow-code-matlab.zip.
Copyrights:
thanks to Brad Hiebert-Treuer and Alan Lim, who spent countless hours creating the hidden texture datasets
Contact:
Website:
Sequences are available here: http://www.apidis.org/Public/
This page gives access to the first acquisition campaign of basket ball data during the APIDIS European project.
Dataset:
The dataset is composed of a basket ball game.
Note: Due to bandwidth limitations, only a part of the basket ball game is availbale from this web site. Please contact us (bottom of this page) for more data.
Metadata:
Contextual info:
All cameras are Arecont Vision AV2100M IP cameras. The datasheets can be downloaded from the constructor site here and here.
Lenses: The fish-eye lenses used for the top view cameras are Fujinon FE185C086HA-1 lenses.
Comments:
Copyrights:
This dataset is available for non-commercial research in video signal processing only. We kindly ask you to mention the APIDIS project when using this dataset (in publications, video demonstrations...).
Contact:
christophe.devleeschouwer(at)uclouvain.be or Damien.Delannay(at)uclouvain.be
Website:
Datasets are available here:
http://freesound.iua.upf.edu/
Dataset:
The Freesound Project is a collaborative database of Creative Commons licensed sounds. Freesound focusses only on sound, not songs.
Metadata:
Contextual info:
Comments:
Copyrights:
Creative Commons
Contact:
Website:
Datasets are available here:
http://www.music-ir.org/evaluation/
Dataset:
The objective of the International Music Information Retrieval Systems Evaluation Laboratory project (IMIRSEL) is the establishment of the necessary resources for the scientifically valid development and evaluation of emerging Music Information Retrieval (MIR) and Music Digital Library (MDL) techniques and technologies.
Metadata:
Contextual info:
Comments:
Copyrights:
Available on request
Contact:
Website:
Datasets are available here:
http://www.publicdomaintorrents.com/ Lien bittorrent
Dataset:
10 movies (from 1930-1950, some more recent), most are in color
Metadata:
The databases can be shared and are available on the internet. No annotation or ground-truth is currently available. It will be added when available.
Contextual info:
Comments:
Copyrights:
all fall now in the public domain
Contact:
Sabri Boughorbel
Website:
none
Dataset:
Metadata:
we can provide the metadata such as shot, scene cuts, face, eye position, identity etc.
Contextual info:
Comments:
Copyrights:
Contact:
Sabri Boughorbel
Website:
Datasets are available here:
http://staff.aist.go.jp/m.goto/RWC-MDB/
Dataset:
The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) that is available to researchers as a common foundation for research.
Metadata:
MIDI files, genre, lyrics
Contextual info:
Comments:
Copyrights:
Users who have submitted the Pledge and received authorization may freely use the database for research purposes without facing the usual copyright restrictions, but all of the copyrights and neighboring rights connected with this database belong to the National Institute of Advanced Industrial Science and Technology and are managed by the RWC Music Database Administrator. Persons or organizations that have not submitted a Pledge and that have not received authorization may not use the database.
Contact:
Website:
Datasets are available here:
http://vision.fe.uni-lj.si/cvbase06/downloads.html
Dataset:
Video data (.avi, DivX compressed). Dataset includes three types of sports: European (team) handball (3 synchronized videos, 10 min, 25 FPS, 384x288, Divx 5 AVI), Squash (2 videos from 2 separate matches, 25 FPS, 384x288, DivX AVI) , Basketball (videos only, 2 synchronized overhead videos in 2 quality modes 368x288, 25FPS, 5 minutes each and 720x576, 25 FPS 2 minutes each).
Metadata:
Annotations (individual player actions, group activity). Suitable for use as a gold standard. Trajectories (player positions in court and camera coordinate systems). These are not intended to be used as a gold standard, since their accuracy is not particularly high.
Contextual info:
Comments:
Copyrights:
nothing defined from website
Contact:
Xavier Desurmont, [email protected]
Website:
Datasets are available here:
ftp://ftp.cs.rdg.ac.uk/pub/VS-PETS/
Dataset:
Outdoor people tracking - football data (three synchronised views). The datasets consists of football players moving around a pitch.
Metadata:
Tracking information on image plane for camera 3 can be downloaded. An AVI file of the ground truth for camera view 3 is also available.
Contextual info:
Comments:
Copyrights:
Free download from website
Contact:
Dimitrios Makris, [email protected]
Website:
http://www.multitel.be/trictrac/?mod=3
Dataset:
HD progressive image in jpeg for synthetic video sequence of soccer.
Metadata:
XML (position is 2D, 3D of objects and camera)
Contextual info:
no
Comments:
The dataset is fully described in "TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth", X. Desurmont, J-B. Hayet, J-F. Delaigle, J. Piater, B. Macq, Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE), 2006.
Copyrights:
All data is publicly available and downloadable. If you publish results using the data, please acknowledge the data as coming from the TRICTRAC project, found at URL: http://www.multitel.be/trictrac. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND.
Contact:
Xavier Desurmont, [email protected]
Website:
The datasets are available here:
http://www.cvg.rdg.ac.uk/PETS2009/
Dataset:
Pets 2009 : Eleventh IEEE International Workshop on Performance Evaluation of Tracking and Surveillance
One-day workshop organised in association with CVPR 2009, supported by the EU project SUBITO.
The datasets for PETS 2009 consider crowd image analysis and include crowd count and density estimation, tracking of individual(s) within a crowd, and detection of separate flows and specific crowd events. Click on the link to the left to view the benchmark data.
The dataset is organised as follows:
Metadata:
Contextual info:
Comments:
Copyrights:
Please e-mail [email protected] if you require assistance obtaining these datasets for the workshop.
Contact:
Website:
Datasets are available here:
http://media.ee.ntu.edu.tw/Archer_contest/
Dataset:
3 different context of walking persons.
Metadata:
Segmentation of person is provided.
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://gavab.escet.urjc.es/recursos_en.html
Dataset:
GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (±30º, looking up and looking down respectively) with neutral expression, 2 y-rotated views (±90º, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).
Metadata:
Contextual info:
Comments:
Copyrights:
Those publications that use this signature date must reference the following work: A.B. Moreno y A.Sanchez. GavabDB: A 3D Face Database. Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, C. Garcia et al (eds): Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, Ed. Univ. Vigo, pp. 77-82, 2004
Contact:
Website:
Datasets are available here:
http://www.sic.rma.ac.be/~beumier/DB/3d_rma.html
Dataset:
120 persons were asked to pose twice in front of the system: in Nov 97 (session1) and in January 98 (session2). For each session, 3 shots were recorded with different (but limited) orientations of the head: straight forward / Left or Right / Upward or downard.
Among the 120 people, two thirds consist of students from the same ethnic origins and with nearly the same age. The last third consists of people of the academy, all aged between 20 and 60.
Different problems encountered in the cooperative scenario were taken into account. People sometimes worn their spectacles, sometimes didn't. Beards and moustaches were represented. Some people smiled in some shots. Small up/down and left/right rotations of the head were requested. We regret that only a few (14) women were available.
Metadata:
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html
Dataset:
Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et. al. for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
Metadata:
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://www.nada.kth.se/cvap/actions/
Dataset:
The current video database containing six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different clothes s3 and indoors s4 as illustrated below. Currently the database contains 2391 sequences. All sequences were taken over homogeneous backgrounds with a static camera with 25fps frame rate. The sequences were downsampled to the spatial resolution of160x120 pixels and have a length of four seconds in average.
Metadata:
Contextual info:
Comments:
Copyrights:
Contact:
laptev(at)nada.kth.se
Website:
Datasets are available here:
http://architecture.mit.edu/house_n/data/PlaceLab/PLIA2.htm
Dataset:
The researcher was asked to perform a set of common household activities during the four-hour period using a set of instructions. Activities included the following: preparing a recipe, doing a load of dishes, cleaning the kitchen, doing laundry, making the bed, and light cleaning around the apartment. The volunteer determined the sequence, pace, and concurrency of these activities and also integrated additional household tasks. Our intent was to have a short test dataset of a manageable size that could be easily placed on the web without concerns about anonymity. We wanted this test dataset, however, to show a variety of activity types and activate as many sensors as possible, but in a natural way. In addition to the activities above, the researcher searches for items, uses appliances, talks on the phone, answers email, and performs other everyday tasks. The researcher five mobile accelerometers (one on each limb and one on the hip) and a Polar M32 wireless heart rate monitor. The researcher carried an SMT 5600 mobile phone that ran experience sampling software that beeped and presented a set of questions about her activities.
Metadata:
The dataset includes four hours of partially (and soon to be fully) annotated video. The annotation was done using custom annotation software written by Randy Rockinson and Leevar Williams of MIT House_n. This software (called HandLense) is available for researchers to use to study this dataset. [Overview of HandLense and executable]
The annotations include descriptors for body posture, type of activity, location, and social context.
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://dipersec.king.ac.uk/MuHAVi-MAS/
Dataset:
Here is collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. We have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. We are working on improving the synchronisation between the images corresponding to different cameras.
Metadata:
Calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://dipersec.king.ac.uk/VIHASI/
Dataset:
This dataset provides a large body of synthetic video data generated for the purpose of evaluating different algorithms on human action recognition which are based on silhouettes. The data consist of 20 action classes, 9 actors and up to 40 synchronised perspective camera views. It is well known that for the action recognition algorithms which are purely based on human body masks, where other image properties such as colour and intensity are not used, it is important to obtain accurate silhouette data from video frames. This problem is not usually considered as part of the action recognition, but as a lower level problem in the motion tracking and change detection. Hence for researchers working on the recognition side, access to reliable Virtual Human Action Silhouette (ViHASi)data semmes to be both a necessity and a relief. The reason for this is that such data provide a wat of comprehensive experimentation and evaluation of the methods under study, that might even lead to thier improvments.
Metadata:
Contextual info:
Comments:
Copyrights:
Contact:
Website:
Datasets are available here:
http://www.gavrila.net/Computer_Vision/Research/Pedestrian_Detection/DC_Pedestrian_Class__Benchmark/dc_pedestrian_class__benchmark.html
Dataset:
The dataset contains a collection of pedestrian and non-pedestrian images. It is made available for download on this site for benchmarking purposes, in order to advance research on pedestrian classification.
The dataset consists of two parts:
Metadata:
Contextual info:
Comments:
Copyrights:
This dataset is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given.
Contact:
gavrila(at)science.uva.nl
Website:
Datasets are available here:
http://www.metaverselab.org/datasets/terrascope/
Dataset:
The dataset consists of nine different cameras, deployed over several different rooms and a hallway in a ``laboratory/office" setting. Several different scenarios were collected from the cameras. A two minute sequence was captured of researchers/staff/visitors going about their daily activities. In addition three different scenarios were scripted so that particular behaviors were exibited in the data.
During data collection, all cameras wrote raw (uncompressed) data at a resolution of 640x480. All machine clocks were synchonrized via the NTP. In addition to each frame, a timestamp was recorded so that frames can be associated with one another across cameras.
Selected Ground Truth (102 MB) - frames with hand-marked labels of individuals and objects
Scenario 1 (11.8 GB) - “Group Meeting”
Scenario 2 (11.2 GB) - “Group Exit and Intruder”
Scenario 3 (17.4 GB) - “Suspicious Behavior/Theft”
Unscripted Activities (59.6 GB) - natural behavior and activities
Subject Face/Gait Database (101 MB) - face pictures and video of subjects walking in front of the camera
Metadata:
Extensive groundtruth is also provided. Entrance and exit times for individuals in each camera, foreground segmentation, and activity labeling is all part of the dataset.
Contextual info:
Comments:
Copyrights:
Public datasets
Contact:
Website:
Datasets are available here:
http://www.cse.ohio-state.edu/otcbvs-bench/
Dataset:
This is a publicly available benchmark dataset for testing and evaluating novel and state-of-the-art computer vision algorithms. Several researchers and students have requested a benchmark of non-visible (e.g., infrared) images and videos. The benchmark contains videos and images recorded in and beyond the visible spectrum and is available for free to all researchers in the international computer vision communities. Also it will allow a large spectrum of IEEE and SPIE vision conference and workshop participants to explore the benefits of the non-visible spectrum in real-world applications, contribute to the OTCBVS workshop series, and boost this research field significantly.
There are 7 datasets:
1) Dataset 01: OSU Thermal Pedestrian Database
2) Dataset 02: IRIS Thermal/Visible Face Database
3) Dataset 03: OSU Color-Thermal Database
4) Dataset 04: Terravic Facial IR Database
5) Dataset 05: Terravic Motion IR Database
6) Dataset 06: Terravic Weapon IR Database
7) Dataset 07: CBSR NIR Face Dataset
Metadata:
Contextual info:
Comments:
Copyrights:
Register (name, institution, email) to download the datasets.
Contact:
Website:
Datasets are available here:
http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html
http://www.multitel.be/~va/cantata/EyesAndFaces/index.html
Dataset:
Hereby the eyes ground truth in Viper format of face YaleB database containing 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions) + 650 viper files. Ground truth developed in the context of CANTATA project, developed by BARCO
Metadata:
All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.xml” annotation file with the same name, containing the iris positions. The position corresponds to crosses. The path of the bmp image should be changed in the viper file.
Contextual info:
For every subject in a particular pose, an image with ambient (background) illumination was also captured. Hence, the total number of images is in fact 5760+90=5850. The total size of the compressed database is about 1GB.
Comments:
The dataset already exists without the ground truth in Viper format. The ground truth was either generated or converted in Viper format in the context of Cantata project. The metadata were generated by Arnaud Joubel.
Copyrights:
Dataset YaleB: You are free to use the Yale Face Database B for research purposes. If experimental results are obtained that use images from within the database, all publications of these results should acknowledge the use of the "Yale Face Database B" and reference to “Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J. From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", IEEE Trans. Pattern Anal. Mach. Intelligence, 2001, 23, number, 643-660”.
Ground truth in Viper: Requested citation acknowledgment about the ground truth:
Courtesy of ITEA2 funded Cantata project
Contact:
Quentin Besnehard, [email protected] or Cedric Marchessoux, [email protected]
Website:
Datasets are available here:
http://www.multitel.be/~va/cantata/AntiAliased/index.html
Dataset:
Set of bitmap images containing anti-aliased text in the context of CANTATA project, developed by BARCO. Number of images in the archive (2400 available in the archive)
Metadata:
All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.grid.xml” annotation file with the same name. The annotation takes the form of a grid of 32x32 pixels bounding boxes. The path of the bmp image should be changed in the viper file if you want to open it in viper-gt.
Contextual info:
The text is represented in different colors: black on white, white on black, random dark color on white, white on random dark color, black on random light color, random light color on white, random dark color on random light color and, finally, random light color on random dark color.The annotation takes the form of a grid of 32x32 pixels bounding boxes.
Comments:
The dataset and the ground truth were generated by Quentin Besnehard and Arnaud Joubel. To obtain the complete dataset, send an e-mail to the contact person
Copyrights:
The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry.
Requested citation acknowledgment about the dataset and the ground truth : Courtesy of ITEA2 funded Cantata project.
Contact:
Quentin Besnehard, [email protected] or Cedric Marchessoux, [email protected]
Website:
Datasets are available here:
http://www.multitel.be/~va/cantata/Aliased
Dataset:
Set of bitmap images containing aliased text (2 colors) in the context of CANTATA project, developed by BARCO. Number of images in the archive (1250 available in the archive)
Metadata:
All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.grid.xml” annotation file with the same name. The annotation takes the form of a grid of 32x32 pixels bounding boxes. The path of the bmp image should be changed in the viper file if you want to open it in viper-gt.
Contextual info:
The text is represented in different colors: black on white, white on black, random dark color on white, white on random dark color, black on random light color, random light color on white, random dark color on random light color and, finally, random light color on random dark color. Fonts used (from 7 to 42 points):
Helvetica
Optima
AvantGarde
Times
Palatino
Courier
Century
Comments:
The dataset and the ground truth were generated by Quentin Besnehard and Cédric Marchessoux.
Copyrights:
The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry. Requested citation acknowledgment about the data set and the ground truth: Courtesy of ITEA2 funded Cantata project
Contact:
Quentin Besnehard, [email protected]; C?dric Marchessoux, [email protected]
Website:
Datasets are available here:
http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html
Dataset:
Smart meeting, that includes facial expressions, gaze and gesture/action. The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room. The dataset consists of four scenarios.
Metadata:
a) Eye positions of people in Scenarios A, B and D. (every 10th frame is annotated).
b) Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2.
c) Gesture/action annotations for Scenarios B and D, Cameras 1-2.
Contextual info:
Camera Calibration provided.
Comments:
Copyrights:
Free download
Contact:
Dimitrios Makris, [email protected]
Datasets are available here:
http://gdcm.sourceforge.net/wiki/index.php/Sample_DataSet#DataSet
This website contains a multiple links to medical datasets.
The TRECVID conference series is sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agencies. The goal of the conference series is to encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. In 2001 and 2002 the TREC series sponsored a video "track" devoted to research in automatic segmentation, indexing, and content-based retrieval of digital video. Beginning in 2003, this track became an independent evaluation (TRECVID) with a 2-day workshop taking place just before TREC.
Datasets are described here.
Datasets are available here:
http://www.cs.bu.edu/groups/ivc/data.php
It contains various datasets like:
www.hl2mods.co.uk
More mods for the game engine.
A mod created by students in Toronto. It is a complete game, but maps can be used with the OVVV.
www.torontoconflict.com
The USC-SIPI image database is a collection of digitized images. It is maintained primarily to support research in image processing, image analysis, and machine vision. The first edition of the USC-SIPI image database was distributed in 1977 and many new images have been added since then.
The database is divided into volumes based on the basic character of the pictures. Images in each volume are of various sizes such as 256x256 pixels, 512x512 pixels, or 1024x1024 pixels. All images are 8 bits/pixel for black and white images, 24 bits/pixel for color images. The following volumes are currently available:
Textures | Brodatz textures, texture mosaics, etc. | |
Aerials | High altitude aerial images | |
Miscellaneous | Lena, the mandrill, and other favorites | |
Sequences | Moving head, fly-overs, moving vehicle |
http://sipi.usc.edu/database
http://www-2.cs.cmu.edu/~cil/v-images.html
http://homepages.inf.ed.ac.uk/rbf/CVonline/
国内数据:链接:http://pan.baidu.com/s/1i5nyjBn 密码:26bm
好玩的数据集:链接:http://pan.baidu.com/s/1bSDIEi 密码:25zr
微软数据:链接:http://pan.baidu.com/s/1bpmo6uV 密码:286q
微博数据集:链接:http://pan.baidu.com/s/1jHCOwCI 密码:x58f
遥感影像库:链接:http://pan.baidu.com/s/1dF63kDr 密码:7tnh
1990-2016年股票数据:链接:http://pan.baidu.com/s/1i44IQ3N 密码:o9hj
各大企业电话邮箱创立时间:链接:http://pan.baidu.com/s/1i5PXPCp 密码:m4mo
98-09年经济普查:链接:http://pan.baidu.com/s/1o8wbzsu 密码:a093
各国各产业资产数据:链接:http://pan.baidu.com/s/1jI19qmi 密码:on7y
1953-2013统计年鉴:链接:http://pan.baidu.com/s/1mh5sHuC 密码:7ije
2015全国人口普查:链接:http://pan.baidu.com/s/1i5mIj6t 密码:yad1
facebook大数据:链接:http://pan.baidu.com/s/1jHRb3Wq 密码:aezb
taiwind数据:链接:http://pan.baidu.com/s/1kV8YKXh 密码:984g
全球社交媒体:链接:http://pan.baidu.com/s/1qXXAQvU 密码:c8qc
京东2015自营:链接:http://pan.baidu.com/s/1i56uYFz 密码:oj4v
维基百科数据:链接:http://pan.baidu.com/s/1c2gMLUw 密码:4f3b
kaggle竞赛数据:链接:http://pan.baidu.com/s/1pLDAx6N 密码:i10y
生物数据:链接:http://pan.baidu.com/s/1pLLHQwr 密码:zfjs
nasa数据:链接:http://pan.baidu.com/s/1i50pw49 密码:aawf
基因组数据:链接:http://pan.baidu.com/s/1pLTPwtP 密码:vgs8
新闻数据:链接:http://pan.baidu.com/s/1hsHSyzE 密码:pey9
ImageNet数据:链接:http://pan.baidu.com/s/1c243tks 密码:mk1k
百肚数据:链接:http://pan.baidu.com/s/1hsr4ayg 密码:k76p
图像数据:链接:http://pan.baidu.com/s/1jHW1kAa 密码:qztt
google数据:链接:http://pan.baidu.com/s/1bpsugGn 密码:8bt4
分类练习数据:链接:http://pan.baidu.com/s/1pLuD3wJ 密码:4pxf
各大联赛世界杯数据:链接:http://pan.baidu.com/s/1jIO9TR4 密码:1v1q
自动驾驶数据:链接:http://pan.baidu.com/s/1miFcv5e 密码:y7uj