英国老鼠_

计算机视觉数据集大全 - Part1

转载自http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm

Index by Topic

Action Databases
Agriculture
Attribute recognition
Autonomous Driving
Biological/Medical
Camera calibration
Face and Eye/Iris Databases
Fingerprints
General Images
General RGBD and depth datasets
General Videos
Hand, Hand Grasp, Hand Action and Gesture Databases
Image, Video and Shape Database Retrieval
Object Databases
People (static and dynamic), human body pose
People Detection and Tracking Databases (See also Surveillance)
Remote Sensing
Robotics
Scenes or Places, Scene Segmentation or Classification
Segmentation
Simultaneous Localization and Mapping
Surveillance and Tracking (See also People)
Textures
Urban Datasets
Vision and Natural Language
Other Collection Pages
Miscellaneous Topics

Other helpful sites are:

Academic Torrents - computer vision - a set of 30+ large datasets available in BitTorrent form
Machine learning datasets - see CV tab
YACVID - a tagged index to some computer vision datasets

Action Databases

See also: Action Recognition's dataset summary with league tables (Gall, Kuehne, Bhattarai).

20bn-Something-Something - densely-labeled video clips that show humans performing predefined basic actions with everyday objects (Twenty Billion Neurons GmbH) [Before 28/12/19]
3D online action dataset - There are seven action categories (Microsoft and Nanyang Technological University) [Before 28/12/19]
50 Salads - fully annotated 4.5 hour dataset of RGB-D video + accelerometer data, capturing 25 people preparing two mixed salads each (Dundee University, Sebastian Stein) [Before 28/12/19]
A first-person vision dataset of office activities (FPVO) - FPVO contains first-person video segments of office activities collected using 12 participants. (G. Abebe, A. Catala, A. Cavallaro) [Before 28/12/19]
ActivityNet - A Large-Scale Video Benchmark for Human Activity Understanding (200 classes, 100 videos per class, 648 video hours) (Heilbron, Escorcia, Ghanem and Niebles) [Before 28/12/19]
Action Detection in Videos - MERL Shopping Dataset consists of 106 videos, each of which is a sequence about 2 minutes long (Michael Jones, Tim Marks) [Before 28/12/19]
Actor and Action Dataset - 3782 videos, seven classes of actors performing eight different actions (Xu, Hsieh, Xiong, Corso) [Before 28/12/19]
An analyzed collation of various labeled video datasets for action recognition (Kevin Murphy) [Before 28/12/19]
AQA-7 - Dataset for assessing the quality of 7 different actions. It contains 1106 action samples and AQA scores. (Parmar, Morris) [29/12/19]
ASLAN Action similarity labeling challenge database (Orit Kliper-Gross) [Before 28/12/19]
Attribute Learning for Understanding Unstructured Social Activity - Database of videos containing 10 categories of unstructured social events to recognise, also annotated with 69 attributes. (Y. Fu Fudan/QMUL, T. Hospedales Edinburgh/QMUL) [Before 28/12/19]
Audio-Visual Event (AVE) dataset- AVE dataset contains 4143 YouTube videos covering 28 event categories and videos in AVE dataset are temporally labeled with audio-visual event boundaries. (Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu) [Before 28/12/19]
AVA: A Video Dataset of Atomic Visual Action- 80 atomic visual actions in 430 15-minute movie clips. (Google Machine Perception Research Group) [Before 28/12/19]
BBDB - Baseball Database (BBDB) is a large-scale baseball video dataset that contains 4200 hours of full baseball game videos with 400,000 temporally annotated activity segments. (Shim, Minho, Young Hwi, Kyungmin, Kim, Seon Joo) [Before 28/12/19]
BEHAVE Interacting Person Video Data with markup (Scott Blunsden, Bob Fisher, Aroosha Laghaee) [Before 28/12/19]
BU-action Datasets - Three image action datasets (BU101, BU101-unfiltered, BU203-unfiltered) that have 1:1 correspondence with classes of the video datasets UCF101 and ActivityNet. (S. Ma, S. A. Bargal, J. Zhang, L. Sigal, S. Sclaroff.) [Before 28/12/19]
Berkeley MHAD: A Comprehensive Multimodal Human Action Database (Ferda Ofli) [Before 28/12/19]
Berkeley Multimodal Human Action Database - five different modalities to expand the fields of application (University of California at Berkeley and Johns Hopkins University) [Before 28/12/19]
Breakfast dataset - It's a dataset with 1712 video clips showing 10 kitchen activities, which are hand segmented into 48 atomic action classes . (H. Kuehne, A. B. Arslan and T. Serre ) [Before 28/12/19]
Bristol Egocentric Object Interactions Dataset - Contains videos shot from a first-person (egocentric) point of view of 3-5 users performing tasks in six different locations (Dima Damen, Teesid Leelaswassuk and Walterio Mayol-Cuevas, Bristol University) [Before 28/12/19]
Brown Breakfast Actions Dataset - 70 hours, 4 million frames of 10 different breakfast preparation activities (Kuehne, Arslan and Serre) [Before 28/12/19]
CAD-120 dataset - focuses on high level activities and object interactions (Cornell University) [Before 28/12/19]
CAD-60 dataset - The CAD-60 and CAD-120 data sets comprise of RGB-D video sequences of humans performing activities (Cornell University) [Before 28/12/19]
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning - A synthetic video understanding benchmark, with tasks that by-design require temporal reasoning to be solved (Girdhar, Ramanan) [29/12/19]
CVBASE06: annotated sports videos (Janez Pers) [Before 28/12/19]
Charades Dataset - 10,000 videos from 267 volunteers, each annotated with multiple activities, captions, objects, and temporal localizations. (Sigurdsson, Varol, Wang, Laptev, Farhadi, Gupta) [Before 28/12/19]
Composable activities dataset - Different combinations of 26 atomic actions formed 16 activity classes which were performed by 14 subjects and annotations were provided (Pontificia Universidad Catolica de Chile and Universidad del Norte) [Before 28/12/19]
Continuous Multimodal Multi-view Dataset of Human Fall - The dataset consists of both normal daily activities and simulated falls for evaluating human fall detection. (Thanh-Hai Tran) [Before 28/12/19]
Cornell Activity Datasets CAD 60, CAD 120 (Cornell Robot Learning Lab) [Before 28/12/19]
DMLSmartActions dataset - Sixteen subjects performed 12 different actions in a natural manner. (University of British Columbia) [Before 28/12/19]
DemCare dataset - DemCare dataset consists of a set of diverse data collection from different sensors and is useful for human activity recognition from wearable/depth and static IP camera, speech recognition for Alzheimmer's disease detection and physiological data for gait analysis and abnormality detection. (K. Avgerinakis, A.Karakostas, S.Vrochidis, I. Kompatsiaris) [Before 28/12/19]
Depth-included Human Action video dataset - It contains 23 different actions (CITI in Academia Sinica) [Before 28/12/19]
DogCentric Activity Dataset - first-person videos taken from a camera mounted on top of a *dog* (Michael Ryoo) [Before 28/12/19]
Edinburgh ceilidh overhead video data - 16 ground-truthed dances viewed from overhead, where the 10 dancers follow a structured dance pattern (2 different dances). The dataset is useful for highly structured behavior understanding (Aizeboje, Fisher) [Before 28/12/19]
EPIC-KITCHENS - egocentric video recorded by 32 participants in their native kitchen environments, non-scripted daily activities, 11.5M frames, 39.6K frame-level action segments and 454.2K object bounding boxes (Damen, Doughty, Fidler, et al) [Before 28/12/19]
EPFL crepe cooking videos - 6 types of structured cooking activity (12) videos in 1920x1080 resolution (Lee, Ognibene, Chang, Kim and Demiris) [Before 28/12/19]
ETS Hockey Game Event Data Set - This data set contains footage of two hockey games captured using fixed cameras. (M.-A. Carbonneau, A. J. Raymond, E. Granger, and G. Gagnon) [Before 28/12/19]
The Falling Detection dataset - Six subjects in two sceneries performed a series of actions continuously (University of Texas) [Before 28/12/19]
FCVID: Fudan-Columbia Video Dataset - 91,223 Web videos annotated manually according to 239 categories (Jiang, Wu, Wang, Xue, Chang) [Before 28/12/19]
G3D - synchronised video, depth and skeleton data for 20 gaming actions captured with Microsoft Kinect (Victoria Bloom) [Before 28/12/19]
G3Di - This dataset contains 12 subjects split into 6 pairs (Kingston University) [Before 28/12/19]
Gaming 3D dataset - real-time action recognition in gaming scenario (Kingston University) [Before 28/12/19]
Georgia Tech Egocentric Activities - Gaze(+) - videos of where people look at and their gaze location (Fathi, Li, Rehg) [Before 28/12/19]
HMDB: A Large Human Motion Database (Serre Lab) [Before 28/12/19]
Hollywood 3D dataset - 650 3D video clips, across 14 action classes (Hadfield and Bowden) [Before 28/12/19]
Human Actions and Scenes Dataset (Marcin Marszalek, Ivan Laptev, Cordelia Schmid) [Before 28/12/19]
Human Searches Search sequences of human annotators that were tasked to spot actions in AVA and THUMOS14 datasets. (Alwassel, H., Caba Heilbron, F., Ghanem, B.) [Before 28/12/19]
Hollywood Extended - 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. (Bojanowski, Lajugie, Bach, Laptev, Ponce, Schmid, and Sivic) [Before 28/12/19]
HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion (Brown University) [Before 28/12/19]
I-LIDS video event image dataset (Imagery library for intelligent detection systems) (Paul Hosner) [Before 28/12/19]
I3DPost Multi-View Human Action Datasets (Hansung Kim) [Before 28/12/19]
IAS-lab Action dataset - contain sufficient variety of actions and number of people performing the actions (IAS Lab at the University of Padua) [Before 28/12/19]
ICS-FORTH MHAD101 Action Co-segmentation - 101 pairs of long-term action sequences that share one or multiple common actions to be co-segmented, contains both 3d skeletal and video related frame-based features (Universtiy of Crete and FORTH-ICS, K. Papoutsakis) [Before 28/12/19]
IIIT Extreme Sports - 160 first person (egocentric) sport videos from YouTube with frame level annotations of 18 action classes. (Suriya Singh, Chetan Arora, and C. V. Jawahar. Trajectory Aligned) [Before 28/12/19]
INRIA Xmas Motion Acquisition Sequences (IXMAS) (INRIA) [Before 28/12/19]
InfAR Dataset -Infrared Action Recognition at Different Times Neurocomputing(Chenqiang Gao, Yinhe Du, Jiang Liu, Jing Lv, Luyu Yang, Deyu Meng, Alexander G. Hauptmann) [Before 28/12/19]
JHMDB: Joints for the HMDB dataset (J-HMDB) based on 928 clips from HMDB51 comprising 21 action categories (Jhuang, Gall, Zuffi, Schmid and Black) [Before 28/12/19]
JPL First-Person Interaction dataset - 7 types of human activity videos taken from a first-person viewpoint (Michael S. Ryoo, JPL) [Before 28/12/19]
Jena Action Recognition Dataset - Aibo dog actions (Korner and Denzler) [Before 28/12/19]
K3Da - Kinect 3D Active dataset - K3Da (Kinect 3D active) is a realistic clinically relevant human action dataset containing skeleton, depth data and associated participant information (D. Leightley, M. H. Yap, J. Coulson, Y. Barnouin and J. S. McPhee) [Before 28/12/19]
Kinetics Human Action Video Dataset - 300,000 video clips, 400 human action classe, 10 second clips, single action per clip (Kay, Carreira, et al) [Before 28/12/19]
KIT Robo-Kitchen Activity Data Set - 540 clips of 17 people performing 12 complex kitchen activities. (L. Rybok, S. Friedberger, U. D. Hanebeck, R. Stiefelhagen) [Before 28/12/19]
KTH human action recognition database (KTH CVAP lab) [Before 28/12/19]
Karlsruhe Motion, Intention, and Activity Data set (MINTA) - 7 types of activities of daily living including fully motion primitive segments. (D. Gehrig, P. Krauthausen, L. Rybok, H. Kuehne, U. D. Hanebeck, T. Schultz, R. Stiefelhagen) [Before 28/12/19]
Leeds Activity Dataset--Breakfast (LAD--Breakfast) - It is composed of 15 annotated videos, representing five different people having breakfast or other simple meal; (John Folkesson et al.) [Before 28/12/19]
LIRIS Human Activities Dataset - contains (gray/rgb/depth) videos showing people performing various activities (Christian Wolf, et al, French National Center for Scientific Research) [Before 28/12/19]
MEXaction2 action detection and localization dataset - To support the development and evaluation of methods for 'spotting' instances of short actions in a relatively large video database: 77 hours, 117 videos (Michel Crucianu and Jenny Benois-Pineau) [Before 28/12/19]
MLB-YouTube - Dataset for activity recognition in baseball videos (AJ Piergiovanni, Michael Ryoo) [Before 28/12/19]
Moments in Time Dataset - Moments in Time Dataset 1M 3-second videos annotated with action type, the largest dataset of its kind for action recognition and understanding in video. (Monfort, Oliva, et al.) [Before 28/12/19]
MPII Cooking Activities Dataset for fine-grained cooking activity recognition, which also includes the continuous pose estimation challenge (Rohrbach, Amin, Andriluka and Schiele) [Before 28/12/19]
MPII Cooking 2 Dataset - A large dataset of fine-grained cooking activities, an extension of the MPII Cooking Activities Dataset. (Rohrbach, Rohrbach, Regneri, Amin, Andriluka, Pinkal, Schiele) [Before 28/12/19]
MSR-Action3D - benchmark RGB-D action dataset (Microsoft Research Redmond and University of Wollongong) [Before 28/12/19]
MSRActionPair dataset - : Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences (University of Central Florida and Microsoft) [Before 28/12/19]
MSRC-12 Kinect gesture data set - 594 sequences and 719,359 frames from people performing 12 gestures (Microsoft Research Cambridge) [Before 28/12/19]
MSRC-12 dataset - sequences of human movements, represented as body-part locations, and the associated gesture (Microsoft Research Cambridge and University of Cambridge) [Before 28/12/19]
MSRDailyActivity3D Dataset - There are 16 activities (Microsoft and the Northwestern University) [Before 28/12/19]
ManiAc RGB-D action dataset: different manipulation actions, 15 different versions, 30 different objects manipulated, 20 long and complex chained manipulation sequences (Eren Aksoy) [Before 28/12/19]
Mivia dataset - It consists of 7 high-level actions performed by 14 subjects. (Mivia Lab at the University of Salemo) [Before 28/12/19]
MTL-AQA - Multitask learning dataset for assessing quality of Olympic Diving. More than 1500 samples. It contains videos of action samples, fine-grained action class, expert commentary (AQA-oriented captions), AQA scores from judges. Videos from multiple views included wherever available. Can be used for captioning, and fine-grained action recognition, apart from AQA. (Parmar, Morris) [29/12/19]
MuHAVi - Multicamera Human Action Video Data (Hossein Ragheb) [Before 28/12/19]
Multi-modal action detection (MAD) Dataset - It contains 35 sequential actions performed by 20 subjects. (CarnegieMellon University) [Before 28/12/19]
Multiview 3D Event dataset - This dataset includes 8 categories of events performed by 8 subjects (University of California at Los Angles) [Before 28/12/19]
Nagoya University Extremely Low-resolution FIR Image Action Dataset - Action recognition dataset captured by a 16x16 low-resolution FIR sensor. (Nagoya University) [Before 28/12/19]
NTU RGB+D Action Recognition Dataset - NTU RGB+D is a large scale dataset for human action recognition(Amir Shahroudy) [Before 28/12/19]
Northwestern-UCLA Multiview Action 3D - There are 10 action categories:(Northwestern University and University of California at Los Angles) [Before 28/12/19]
Office Activity Dataset - It consists of skeleton data acquired by Kinect 2.0 from different subjects performing common office activities. (A. Franco, A. Magnani, D. Maiop) [Before 28/12/19]
Oxford TV based human interactions (Oxford Visual Geometry Group) [Before 28/12/19]
PA-HMDB51 - human action video (592) dataset with potential privacy leak attributes annotated: skin color, gender, face, nudity, and relationship (Wang, Wu, Wang, Wang, Jin) [Before 28/12/19]
Parliament - The Parliament dataset is a collection of 228 video sequences, depicting political speeches in the Greek parliament. (Michalis Vrigkas, Christophoros Nikou, Ioannins A. kakadiaris) [Before 28/12/19]
Procedural Human Action Videos - This dataset contains about 40,000 videos for human action recognition that had been generated using a 3D game engine. The dataset contains about 6 million frames which can be used to train and evaluate models not only action recognition but also models for depth map estimation, optical flow, instance segmentation, semantic segmentation, 3D and 2D pose estimation, and attribute learning. (Cesar Roberto de Souza) [Before 28/12/19]
RGB-D activity dataset - Each video in the dataset contains 2-7 actions involving interaction with different objects. (Cornell University and Stanford University) [Before 28/12/19]
RGBD-Action-Completion-2016 - This dataset includes 414 complete/incomplete object interaction sequences, spanning six actions and presenting RGB, depth and skeleton data. (Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen) [Before 28/12/19]
RGB-D-based Action Recognition Datasets - Paper that includes the list and links of different rgb-d action recognition datasets. (Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, Chang Tang) [Before 28/12/19]
RGBD-SAR Dataset - RGBD-SAR Dataset (University of Electronic Science and Technology of China and Microsoft) [Before 28/12/19]
Rochester Activities of Daily Living Dataset (Ross Messing) [Before 28/12/19]
SBU Kinect Interaction Dataset - It contains eight types of interactions (Stony Brook University) [Before 28/12/19]
SBU-Kinect-Interaction dataset v2.0 - It comprises of RGB-D video sequences of humans performing interaction activities (Kiwon Yun etc.) [Before 28/12/19]
SDHA Semantic Description of Human Activities 2010 contest - Human Interactions (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury) [Before 28/12/19]
SDHA Semantic Description of Human Activities 2010 contest - aerial views (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury) [Before 28/12/19]
SFU Volleyball Group Activity Recognition - 2 levels annotations dataset (9 players' actions and 8 scene's activity) for volleyball videos. (M. Ibrahim, S. Muralidharan, Z. Deng, A. Vahdat, and G. Mori / Simon Fraser University) [Before 28/12/19]
SYSU 3D Human-Object Interaction Dataset - Forty subjects perform 12 distinct activities (Sun Yat-sen University) [Before 28/12/19]
ShakeFive Dataset - contains only two actions, namely hand shake and high five. (Universiteit Utrecht) [Before 28/12/19]
ShakeFive2 - A dyadic human interaction dataset with limb level annotations on 8 classes in 153 HD videos (Coert van Gemeren, Ronald Poppe, Remco Veltkamp) [Before 28/12/19]
SoccerNet - Scalable dataset for action spotting in soccer videos: 500 soccer games fully annotated with main actions (goal, cards, subs) and more than 13K soccer games annotated with 500K commentaries for event captioning and game summarization. (Silvio Giancola, Mohieddine Amine, Tarek Dghaily, Bernard Ghanem) [Before 28/12/19]
Sports Videos in the Wild (SVW) - SVW is comprised of 4200 videos captured solely with smartphones by users of Coach Eye smartphone app, a leading app for sports training developed by TechSmith corporation. (Seyed Morteza Safdarnejad, Xiaoming Liu) [Before 28/12/19]
Stanford Sport Events dataset (Jia Li) [Before 28/12/19]
THU-READ(Tsinghua University RGB-D Egocentric Action Dataset) - THU-READ is a large-scale dataset for action recognition in RGBD videos with pixel-layer hand annotation. (Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou) [Before 28/12/19]
THUMOS - Action Recognition in Temporally Untrimmed Videos! - 430 hours of video data and 45 million frames (Gorban, Idrees, Jiang, Zamir, Laptev Shah, Sukthanka) [Before 28/12/19]
Toyota Smarthome dataset - Dataset for Real-world activities of Daily Living (Toyota Motors Europe & INRIA Sophia Antipolis) [30/12/19]
TUM Kitchen Data Set of Everyday Manipulation Activities (Moritz Tenorth, Jan Bandouch) [Before 28/12/19]
TV Human Interaction Dataset (Alonso Patron-Perez) [Before 28/12/19]
The TJU dataset - contains 22 actions performed by 20 subjects in two different environments; a total of 1760 sequences. (Tianjin University) [Before 28/12/19]
UCF-iPhone Data Set - 9 Aerobic actions were recorded from (6-9) subjects using the Inertial Measurement Unit (IMU) on an Apple iPhone 4 smartphone. (Corey McCall, Kishore Reddy and Mubarak Shah) [Before 28/12/19]
UCI Human Activity Recognition Using Smartphones Data Set - recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors (Anguita, Ghio, Oneto, Parra, Reyes-Ortiz) [Before 28/12/19]
UNLV Dive & Gymvault - Dataset for assessing quality of Olympic Diving and Olympic Gymnastic Vault. It consists of videos of action samples and corresponding action quality scores. (Parmar, Morris) [29/12/19]
The UPCV action dataset - The dataset consists of 10 actions performed by 20 subjects twice. (University of Patras) [Before 28/12/19]
UC-3D Motion Database - Available data types encompass high resolution Motion Capture, acquired with MVN Suit from Xsens and Microsoft Kinect RGB and depth images. (Institute of Systems and Robotics, Coimbra, Portugal) [Before 28/12/19]
UCF 101 action dataset 101 action classes, over 13k clips and 27 hours of video data (Univ of Central Florida) [Before 28/12/19]
UCF-Crime Dataset: Real-world Anomaly Detection in Surveillance Videos - A large-scale dataset for real-world anomaly detection in surveillance videos. It consists of 1900 long and untrimmed real-world surveillance videos (of 128 hours), with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. (Center for Research in Computer Vision, University of Central Florida) [Before 28/12/19]
UCFKinect - The dataset is composed of 16 actions (University of Central Florida Orlando) [Before 28/12/19]
UCLA Human-Human-Object Interaction (HHOI) Dataset Vn1 - Human interactions in RGB-D videos (Shu, Ryoo, and Zhu) [Before 28/12/19]
UCLA Human-Human-Object Interaction (HHOI) Dataset Vn2 - Human interactions in RGB-D videos (version 2) (Shu, Gao, Ryoo, and Zhu) [Before 28/12/19]
UCR Videoweb Multi-camera Wide-Area Activities Dataset (Amit K. Roy-Chowdhury) [Before 28/12/19]
UTD-MHAD - Eight subjects performed 27 actions four times. (University of Texas at Dallas) [Before 28/12/19]
UTKinect dataset - Ten types of human actions were performed twice by 10 subjects (University of Texas) [Before 28/12/19]
UWA3D Multiview Activity Dataset - Thirty activities were performed by 10 individuals (University of Western Australia) [Before 28/12/19]
Univ of Central Florida - 50 Action Category Recognition in Realistic Videos (3 GB) (Kishore Reddy) [Before 28/12/19]
Univ of Central Florida - ARG Aerial camera, Rooftop camera and Ground camera (UCF Computer Vision Lab) [Before 28/12/19]
Univ of Central Florida - Feature Films Action Dataset (Univ of Central Florida) [Before 28/12/19]
Univ of Central Florida - Sports Action Dataset (Univ of Central Florida) [Before 28/12/19]
Univ of Central Florida - YouTube Action Dataset (sports) (Univ of Central Florida) [Before 28/12/19]
Unsegmented Sports News Videos - Database of 74 sports news videos tagged with 10 categories of sports. Designed to test multi-label video tagging. (T. Hospedales, Edinburgh/QMUL) [Before 28/12/19]
Utrecht Multi-Person Motion Benchmark (UMPM). - a collection of video recordings of people together with a ground truth based on motion capture data. (N.P. van der Aa, X. Luo, G.J. Giezeman, R.T. Tan, R.C. Veltkamp.) [Before 28/12/19]
VIRAT Video Dataset - event recognition from two broad categories of activities (single-object and two-objects) which involve both human and vehicles. (Sangmin Oh et al) [Before 28/12/19]
Verona Social interaction dataset (Marco Cristani) [Before 28/12/19]
ViHASi: Virtual Human Action Silhouette Data (userID: VIHASI password: virtual$virtual) (Hossein Ragheb, Kingston University) [Before 28/12/19]
Videoweb (multicamera) Activities Dataset (B. Bhanu, G. Denina, C. Ding, A. Ivers, A. Kamal, C. Ravishankar, A. Roy-Chowdhury, B. Varda) [Before 28/12/19]
WVU Multi-view action recognition dataset (Univ. of West Virginia) [Before 28/12/19]
WorkoutSU-10 Kinect dataset for exercise actions (Ceyhun Akgul) [Before 28/12/19]
WorkoutSU-10 dataset - contains exercise actions selected by professional trainers for therapeutic purposes. (Sabanci University) [Before 28/12/19]
Wrist-mounted camera video dataset - object manipulation (Ohnishi, Kanehira, Kanezaki, Harada) [Before 28/12/19]
YouCook - 88 open-source YouTube cooking videos with annotations (Jason Corso) [Before 28/12/19]
YouTube-8M Dataset -A Large and Diverse Labeled Video Dataset for Video Understanding Research(Google Inc.) [Before 28/12/19]

Agriculture

Aberystwyth Leaf Evaluation Dataset - Timelapse plant images with hand marked up leaf-level segmentations for some time steps, and biological data from plant sacrifice. (Bell, Jonathan; Dee, Hannah M.) [Before 28/12/19]
Fieldsafe - A multi-modal dataset for obstacle detection in agriculture. (Aarhus University) [Before 28/12/19]
KOMATSUNA dataset - The datasets is designed for instance segmentation, tracking and reconstruction for leaves using both sequential multi-view RGB images and depth images. (Hideaki Uchiyama, Kyushu University) [Before 28/12/19]
Leaf counting dataset - Dataset for estimating the growth stage of small plants. (Aarhus University) [Before 28/12/19]
Leaf Segmentation ChallengeTobacco and arabidopsis plant images (Hanno Scharr, Massimo Minervini, Andreas Fischbach, Sotirios A. Tsaftaris) [Before 28/12/19]
Multi-species fruit flower detection - This dataset consists of four sets of flower images, from three different tree species: apple, peach, and pear, and accompanying ground truth images. (Philipe A. Dias, Amy Tabb, Henry Medeiros) [Before 28/12/19]
Plant Phenotyping Datasets - plant data suitable for plant and leaf detection, segmentation, tracking, and species recognition (M. Minervini, A. Fischbach, H. Scharr, S. A. Tsaftaris) [Before 28/12/19]
Plant seedlings dataset - High-resolution images of 12 weed species. (Aarhus University) [Before 28/12/19]

Attribute recognition

Attribute Learning for Understanding Unstructured Social Activity - Database of videos containing 10 categories of unstructured social events to recognise, also annotated with 69 attributes. (Y. Fu Fudan/QMUL, T. Hospedales Edinburgh/QMUL) [Before 28/12/19]
Animals with Attributes 2 - 37322 (freely licensed) images of 50 animal classes with 85 per-class binary attributes. (Christoph H. Lampert, IST Austria) [Before 28/12/19]
BirdsThis database contains 600 images (100 samples each) of six different classes of birds. (Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce) [Before 28/12/19]
ButterfliesThis database contains 619 images of seven different classes of butterflies. (Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce) [Before 28/12/19]
CAER (Context-Aware Emotion Recognition) - Large scale image and video dataset for emotion recognition, and facial expression recognition (Lee, Kim, Kim, Park, and Sohn) [29/12/19]
CALVIN research group datasets - object detection with eye tracking, imagenet bounding boxes, synchronised activities, stickman and body poses, youtube objects, faces, horses, toys, visual attributes, shape classes (CALVIN ggroup) [Before 28/12/19]
CelebA - Large-scale CelebFaces Attributes Dataset(Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang) [Before 28/12/19]
DukeMTMC-attribute - 23 pedestrian attributes for DukeMTMC-reID (Lin, Zheng, Zheng, Wu and Yang) [Before 28/12/19]
EMOTIC (EMOTIons in Context) - Images of people (34357) embedded in their natural environments, annotated with 2 distinct emotion representation. (Ronak kosti, Agata Lapedriza, Jose Alvarez, Adria Recasens) [Before 28/12/19]
HAT database of 27 human attributes (Gaurav Sharma, Frederic Jurie) [Before 28/12/19]
LFW-10 dataset for learning relative attributes - A dataset of 10,000 pairs of face images with instance-level annotations for 10 attributes. (CVIT, IIIT Hyderabad. ) [Before 28/12/19]
Market-1501-attribute - 27 visual attributes for 1501 shoppers. (Lin, Zheng, Zheng, Wu and Yang) [Before 28/12/19]
Multi-Class Weather Dataset - Our multi-class benchmark dataset contains 65,000 images from 6 common categories for sunny, cloudy, rainy, snowy, haze and thunder weather. This dataset benefits weather classification and attribute recognition. (Di Lin) [Before 28/12/19]
Person Recognition in Personal Photo Collections - we introduced three harder splits for evaluation and long-term attribute annotations and per-photo timestamp metadata. (Oh, Seong Joon and Benenson, Rodrigo and Fritz, Mario and Schiele, Bernt) [Before 28/12/19]
UT-Zappos50K Shoes - Large scale shoe dataset consisting of 50,000 catalog images and over 50,000 pairwise relative attribute labels on 11 fine-grained attributes (Aron Yu, Mark Stephenson, Kristen Grauman, UT Austin) [Before 28/12/19]
Visual Attributes Dataset visual attribute annotations for over 500 object classes (animate and inanimate) which are all represented in ImageNet. Each object class is annotated with visual attributes based on a taxonomy of 636 attributes (e.g., has fur, made of metal, is round). (Before 30/12/19) [Before 28/12/19]
The Visual Privacy (VISPR) Dataset - Privacy Multilabel Dataset (22k images, 68 privacy attributes) (Orekondy, Schiele, Fritz) [Before 28/12/19]
WIDER Attribute Dataset - WIDER Attribute is a large-scale human attribute dataset, with 13789 images belonging to 30 scene categories, and 57524 human bounding boxes each annotated with 14 binary attributes. (Li, Yining and Huang, Chen and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]

Autonomous Driving

AMUSE -The automotive multi-sensor (AMUSE) dataset taken in real traffic scenes during multiple test drives. (Philipp Koschorrek etc.) [Before 28/12/19]
ApolloScape - high resolution cameras and a Riegl acquisition system. Our dataset is collected in different cities under various traffic conditions. 74555 video frames and their pixel-level and instance-level annotations (Peking University / Baido) [18/1/20]
Argoverse - Two public datasets supported by highly detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them; more than 300,000 curated scenarios, 3D tracking annotations for 113 scenes and 324,557 interesting vehicle trajectories for motion forecasting (Chang, Lambert, Sangkloy, Singh, Bak, Hartnett, Wang, Carr, Lucey, Ramanan, Hays) [18/1/20]
Autonomous Driving - Semantic segmentation, pedestrian detection, virtual-world data, far infrared, stereo,driver monitoring. (CVC research center and the UAB and UPC universities) [Before 28/12/19]
Bosch Small Traffic Lights Dataset (BSTLD) - A dataset for traffic light detection, tracking, and classification. [Before 28/12/19]
DrivingStereo - A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios. 180k stereo images covering a diverse set of driving scenarios (Yang, Song, Huang, Deng, Shi, Zhou) [Before 28/12/19]
Boxy vehicle detection dataset - A vehicle detection dataset with 1.99 million annotated vehicles in 200,000 images. It contains AABB and keypoint labels. [Before 28/12/19]
CASR: Cyclist Arm Sign Recognition - Small clips of ~10 seconds showing cyclists performing arm signs. The videos are acquired with a consumer-graded camera. There are 219 arm sign actions annotated. (Zhijie Fang, Antonio M. Lopez) [13/1/20]
Ford Campus Vision and Lidar Data Set - time-registered data from professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system (Pandey, McBride, Eustice) [Before 28/12/19]
FRIDA (Foggy Road Image DAtabase) Image Database - images for performance evaluation of visibility and contrast restoration algorithms. FRIDA: 90 synthetic images of 18 urban road scenes. FRIDA2: 330 synthetic images of 66 diverse road scenes, with viewpoint closed to that of the vehicle's driver. (Tarel, Cord, Halmaoui, Gruyer, Hautiere) [Before 28/12/19]
H3D - Honda Research 3D dataset - 360 degree LiDAR dataset (dense pointcloud from Velodyne-64), 160 crowded and highly interactive traffic scenes, 1,071,302 3D bounding box labels, 8 common classes of traffic participants (Patil, Malla, Gang, Chen) [18/1/20]
House3D - House3D is a virtual 3D environment which consists of thousands of indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. It consists of over 45k indoor 3D scenes, ranging from studios to two-storied houses with swimming pools and fitness rooms. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views. The renderer runs at thousands frames per second, making it suitable for large-scale RL training. (Yi Wu, Yuxin Wu, Georgia Gkioxari, Yuandong Tian, facebook research) [Before 28/12/19]
India Driving Dataset (IDD) - unstructured driving conditions from India with 50,000 frames (10,000 semantic, and 40,000 coarse annotations) for training autonomous cars to see using object detection, scene-level and instance-level semantic segmentation (CVIT, IIIT Hyderabad and Intel) [Before 28/12/19]
Joint Attention in Autonomous Driving (JAAD) - The dataset includes instances of pedestrians and cars intended primarily for the purpose of behavioural studies and detection in the context of autonomous driving. (Iuliia Kotseruba, Amir Rasouli and John K. Tsotsos) [Before 28/12/19]
LISA Vehicle Detection Dataset - colour first person driving video under various lighting and traffic conditions (Sivaraman, Trivedi) [Before 28/12/19]
LLAMAS Unsupervised dataset - A lane marker detection and segmentation dataset of 100,000 images with 3d lines, pixel level dashed markers, and curves for individual lines. [Before 28/12/19]
Lost and Found Dataset - The Lost and Found Dataset addresses the problem of detecting unexpected small road hazards (often caused by lost cargo) for autonomous driving applications. (Sebastian Ramos, Peter Pinggera, Stefan Gehrig, Uwe Franke, Rudolf Mester, Carsten Rother) [Before 28/12/19]
Multi Vehicle Stereo Event Camera Dataset - Multiple sequences containing a stereo pair of DAVIS 346b event cameras with ground truth poses, depth maps and optical flow. (lex Zihao Zhu, Dinesh Thakur, Tolga Ozaslan, Bernd Pfrommer, Vijay Kumar, Kostas Daniilidis) [Before 28/12/19]
nuTonomy scenes dataset (nuScenes) - The nuScenes dataset is a large-scale autonomous driving dataset. It features: Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS), 1000 scenes of 20s each, 1,440,000 camera images, 400,000 lidar sweeps, two diverse cities: Boston and Singapore, left versus right hand traffic, detailed map information, manual annotations for 25 object classes, 1.1M 3D bounding boxes annotated at 2Hz, attributes such as visibility, activity and pose. (Caesar et al) [Before 28/12/19]
RESIDE (Realistic Single Image DEhazing) - The current largest-scale benchmark consisting of both synthetic and real-world hazy images, for image dehazing research. RESIDE highlights diverse data sources and image contents, and serves various training or evaluation purposes. (Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, Zhangyang Wang) [Before 28/12/19]
semanticKITTI - A Dataset for Semantic Scene Understanding using LiDAR Sequences (Behley, Garbade, Milioto, Quenzel, Behnke, Stachniss, Gall) [18/1/20]
SYNTHetic collection of Imagery and Annotations - The purpose of aiding semantic segmentation and related scene understanding problems in the context of driving scenarios. (Computer vision center,UAB) [Before 28/12/19]
SYNTHIA - Large set (~half million) of virtual-world images for training autonomous cars to see. (ADAS Group at Computer Vision Center) [Before 28/12/19]
TRoM: Tsinghua Road Markings - This is a dataset which contributes to the area of road marking segmentation for Automated Driving and ADAS. (Xiaolong Liu, Zhidong Deng, Lele Cao, Hongchao Lu) [Before 28/12/19]
TUM City Campus - Urban point clouds taken by Mobile Laser Scanning (MLS) for classification, object extraction and change detection (Stilla, Hebel, Xu, Gehrung) [3/1/20]
University of Michigan North Campus Long-Term Vision and LIDAR Dataset - 27 sessions spaced approximately biweekly over the course of 15 months, indoors and outdoors, varying trajectories, different times of the day across all four seasons. Includes: moving obstacles (e.g., pedestrians, bicyclists, and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g., falling leaves and snow), and long-term structural changes caused by construction. Includes ground-truth pose. (Carlevaris-Bianco, Ushani, Eustice) [Before 28/12/19]
UZH-FPV Drone Racing Dataset - for visual inertial odometry and SLAM. 28 real-world first-person view sequences both indoors and outdoors, cintaining images, IMU, and events and ground truth (Delmerico, Cieslewski, Rebecq, Faessler, Scaramuzza) [Before 28/12/19]

Biological/Medical

2008 MICCAI MS Lesion Segmentation Challenge (National Institutes of Health Blueprint for Neuroscience Research) [Before 28/12/19]
ASU DR-AutoCC Data - a Multiple-Instance Learning feature space for a diabetic retinopathy classification dataset (Ragav Venkatesan, Parag Chandakkar, Baoxin Li - Arizona State University) [Before 28/12/19]
Aberystwyth Leaf Evaluation Dataset - Timelapse plant images with hand marked up leaf-level segmentations for some time steps, and biological data from plant sacrifice. (Bell, Jonathan; Dee, Hannah M.) [Before 28/12/19]
ADP: Atlas of Digital Pathology - 17,668 histological patch images extracted from 100 slides annotated with up to 57 hierarchical tissue types (HTTs) from different organs - the aim is to provide training data for supervised multi-label learning of tissue types in a digitized whole slide image (Hosseini, Chan, Tse, Tang, Deng, Norouzi, Rowsell, Plataniotis, Damaskinos) [14/1/20]
Annotated Spine CT Database for Benchmarking of Vertebrae Localization, 125 patients, 242 scans (Ben Glockern) [Before 28/12/19]
BRATS - the identification and segmentation of tumor structures in multiparametric magnetic resonance images of the brain (TU Munchen etc.) [Before 28/12/19]
Breast Ultrasound Dataset B - 2D Breast Ultrasound Images with 53 malignant lesions and 110 benign lesions. (UDIAT Diagnostic Centre, M.H. Yap, R. Marti) [Before 28/12/19]
Calgary-Campinas Public Brain MR Dataset: T1-weighted brain MRI volumes acquired in 359 subjects on scanners from three different vendors (GE, Philips, and Siemens) and at two magnetic field strengths (1.5 T and 3 T). The scans correspond to older adult subjects. (Souza, Roberto, Oeslle Lucena, Julia Garrafa, David Gobbi, Marina Saluzzi, Simone Appenzeller, Leticia Rittner, Richard Frayne, and Roberto Lotufo) [Before 28/12/19]
CAMEL colorectal adenoma dataset - image-level labels for weakly supervised learning containing 177 whole slide images (156 contain adenoma) gathered and labeled by pathologists (Song and Wang) [29/12/19]
CheXpert - a large dataset of chest X-rays and competition for automated chest x-ray interpretation, which features uncertainty labels and radiologist-labeled reference standard evaluation sets (Irvin, Rajpurkar et al) [Before 28/12/19]
Cholec80: 80 gallbladder laparoscopic videos annotated with phase and tool information. (Andru Putra Twinanda) [Before 28/12/19]
CRCHistoPhenotypes - Labeled Cell Nuclei Data - colorectal cancer?histology images?consisting of nearly 30,000 dotted nuclei with over 22,000 labeled with the cell type (Rajpoot + Sirinukunwattana) [Before 28/12/19]
Cavy Action Dataset - 16 sequences with 640 x 480 resolutions recorded at 7.5 frames per second (fps) with approximately 31621506 frames in total (272 GB) of interacting cavies (guinea pig) (Al-Raziqi and Denzler) [Before 28/12/19]
Cell Tracking Challenge Datasets - 2D/3D time-lapse video sequences with ground truth(Ma et al., Bioinformatics 30:1609-1617, 2014) [Before 28/12/19]
Computed Tomography Emphysema Database (Lauge Sorensen) [Before 28/12/19]
COPD Machine Learning Dataset - A collection of feature datasets derived from lung computed tomography (CT) images, which can be used in diagnosis of chronic obstructive pulmonary disease (COPD). The images in this database are weakly labeled, i.e. per image, a diagnosis(COPD or no COPD) is given, but it is not known which parts of the lungs are affected. Furthermore, the images were acquired at different sites and with different scanners. These problems are related to two learning scenarios in machine learning, namely multiple instance learning or weakly supervised learning, and transfer learning or domain adaptation. (Veronika Cheplygina, Isabel Pino Pena, Jesper Holst Pedersen, David A. Lynch, Lauge S., Marleen de Bruijne) [Before 28/12/19]
CREMI: MICCAI 2016 Challenge - 6 volumes of electron microscopy of neural tissue,neuron and synapse segmentation, synaptic partner annotation. (Jan Funke, Stephan Saalfeld, Srini Turaga, Davi Bock, Eric Perlman) [Before 28/12/19]
CRIM13 Caltech Resident-Intruder Mouse dataset - 237 10 minute videos (25 fps) annotated with actions (13 classes) (Burgos-Artizzu, Dollar, Lin, Anderson and Perona) [Before 28/12/19]
CVC colon DB - annotated video sequences of colonoscopy video. It contains 15 short colonoscopy sequences, coming from 15 different studies. In each sequence one polyp is shown. (Bernal, Sanchez, Vilarino) [Before 28/12/19]
DIADEM: Digital Reconstruction of Axonal and Dendritic Morphology Competition (Allen Institute for Brain Science et al) [Before 28/12/19]
DIARETDB1 - Standard Diabetic Retinopathy Database (Lappeenranta Univ of Technology) [Before 28/12/19]
DRIVE: Digital Retinal Images for Vessel Extraction (Univ of Utrecht) [Before 28/12/19]
DeformIt 2.0 - Image Data Augmentation Tool: Simulate novel images with ground truth segmentations from a single image-segmentation pair (Brian Booth and Ghassan Hamarneh) [Before 28/12/19]
Deformable Image Registration Lab dataset - for objective and rigrorous evaluation of deformable image registration (DIR) spatial accuracy performance. (Richard Castillo et al.) [Before 28/12/19]
DERMOFIT Skin Cancer Dataset - 1300 lesions from 10 classes captured under identical controlled conditions. Lesion segmentation masks are included (Fisher, Rees, Aldridge, Ballerini, et al) [Before 28/12/19]
Dermoscopy images (Eric Ehrsam) [Before 28/12/19]
EATMINT (Emotional Awareness Tools for Mediated INTeraction) database - The EATMINT database contains multi-modal and multi-user recordings of affect and social behaviors in a collaborative setting. (Guillaume Chanel, Gaelle Molinari, Thierry Pun, Mireille Betrancourt) [Before 28/12/19]
EPT29.This database contains 4842 images of 1613 specimens of 29 taxa of EPTs:(Tom etc.) [Before 28/12/19]
EyePACS - retinal image database is comprised of over 3 million retinal images of diverse populations with various degrees of diabetic retinopathy (EyePACS) [Before 28/12/19]
FIRE Fundus Image Registration Dataset - 134 retinal image pairs and groud truth for registration. (FORTH-ICS) [Before 28/12/19]
FMD - Fluorescence Microscopy Denoising dataset - 12,000 real fluorescence microscopy images (Zhang, Zhu, Nichols, Wang, Zhang, Smith, Howard) [Before 28/12/19]
FocusPath - Focus Quality Assessment for Digital Pathology (Microscopy) Images. 864 image pathes are naturally blurred by 16 levels of out-of-focus lens provided with GT scores of focus levels. (Hosseini, Zhang, Plataniotis) [Before 28/12/19]
Histology Image Collection Library (HICL) - The HICL is a compilation of 3870histopathological images (so far) from various diseases, such as brain cancer,breast cancer and HPV (Human Papilloma Virus)-Cervical cancer. (Medical Image and Signal Processing (MEDISP) Lab., Department of BiomedicalEngineering, School of Engineering, University of West Attica) [Before 28/12/19]
Honeybee segmentation dataset - It is a dataset containing positions and orientation angles of hundreds of bees on a 2D surface of honey comb. (Bozek K, Hebert L, Mikheyev AS, Stephesn GJ) [Before 28/12/19]
IIT MBADA mice - Mice behavioral data. FLIR A315, spacial resolution of 320??240px at 30fps, 50x50cm open arena, two experts for three different mice pairs, mice identities. (Italian Inst. of Technology, PAVIS lab) [Before 28/12/19]
Indian Diabetic Retinopathy Image Dataset - This dataset consists of retinal fundus images annotated at pixel-level for lesions associated with Diabetic Retinopathy. Also, it provides the disease severity of diabetic retinopathy and diabetic macular edema. This dataset is useful for development and evaluation of image analysis algorithms for early detection of diabetic retinopathy. (Prasanna Porwal, Samiksha Pachade, Ravi Kamble, Manesh Kokare, Girish Deshmukh, Vivek Sahasrabuddhe, Fabrice Meriaudeau) [Before 28/12/19]
IRMA(Image retrieval in medical applications) - This collection compiles anonymous radiographs (Deserno TM, Ott B) [Before 28/12/19]
IVDM3Seg - 24 3D multi-modality MRI data sets of at least 7 IVDs of the lower spine, collected from 12 subjects in two different stages (Zheng, Li, Belavy) [Before 28/12/19]
JIGSAWS - JHU-ISI Surgical Gesture and Skill Assessment Working Set (a surgical activity dataset for human motion modeling, captured using the da Vinci Surgical System from eight surgeons with different levels of skill performing five repetitions of three elementary surgical tasks. It contains: kinematic and video data, plus manual annotations. (Carol Reiley and Balazs Vagvolgyi) [Before 28/12/19]
KID - A capsule endoscopy database for medical decision support (Anastasios Koulaouzidis and Dimitris Iakovidis) [Before 28/12/19]
Leaf Segmentation ChallengeTobacco and arabidopsis plant images (Hanno Scharr, Massimo Minervini, Andreas Fischbach, Sotirios A. Tsaftaris) [Before 28/12/19]
LIDC-IDRI - Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. (Before 30/12/19) [Before 28/12/19]
LITS Liver Tumor Segmentation - 130 3D CT scans with segmentations of the liver and liver tumor. Public benchmark with leaderboard at Codalab.org (Patrick Christ) [Before 28/12/19]
Mammographic Image Analysis Homepage - a collection of databases links [Before 28/12/19]
Medical image database - Database of ultrasound images of breast abnormalities with the ground truth. (Prof. Stanislav Makhanov, biomedsiit.com) [Before 28/12/19]
MiniMammographic Database (Mammographic Image Analysis Society) [Before 28/12/19]
MIT CBCL Automated Mouse Behavior Recognition datasets (Nicholas Edelman) [Before 28/12/19]
Moth fine-grained recognition - 675 similar classes, 5344 images (Erik Rodner et al) [Before 28/12/19]
Mouse Embryo Tracking Database - cell division event detection (Marcelo Cicconet, Kris Gunsalus) [Before 28/12/19]
MUCIC: Masaryk University Cell Image Collection - 2D/3D synthetic images of cells/tissues for benchmarking(Masaryk University) [Before 28/12/19]
NIH Chest X-ray Dataset - 112,120 X-ray images with disease labels from 30,805 unique patients. (NIH) [Before 28/12/19]
OASIS - Open Access Series of Imaging Studies - 500+ MRI data sets of the brain (Washington University, Harvard University, Biomedical Informatics Research Network) [Before 28/12/19]
Plant Phenotyping Datasets - plant data suitable for plant and leaf detection, segmentation, tracking, and species recognition (M. Minervini, A. Fischbach, H. Scharr, S. A. Tsaftaris) [Before 28/12/19]
RatSI: Rat Social Interaction Dataset - 9 fully annotated (11 class) videos (15 minute, 25 FPS) of two rats interacting socially in a cage (Malte Lorbach, Noldus Information Technology) [Before 28/12/19]
Retinal fundus images - Ground truth of vascular bifurcations and crossovers (Univ of Groningen) [Before 28/12/19]
SCORHE - 1, 2 and 3 mouse behavior videos, 9 behaviors, (Ghadi H. Salem, et al, NIH) [Before 28/12/19]
SLP (Simultaneously-collected multimodal Lying Pose) - large scale dataset on in-bed poses includes: 2 Data Collection Settings: (a) Hospital setting: 7 participants, and (b) Home setting: 102 participants (29 females, age range: 20-40). 4 Imaging Modalities: RGB (regular webcam), IR (FLIR LWIR camera), DEPTH (Kinect v2) and Pressure Map (Tekscan Pressure Sensing Map). 3 Cover Conditions: uncover, bed sheet, and blanket. Fully labeled poses with 14 joints. (Ostadabbas and Liu) [2/1/20]
SNEMI3D - 3D Segmentation of neurites in EM images [Before 28/12/19]
STructured Analysis of the Retina - DESCRIPTION(400+ retinal images, with ground truth segmentations and medical annotations) (Before 30/12/19) [Before 28/12/19]
Spine and Cardiac data (Digital Imaging Group of London Ontario, Shuo Li) [Before 28/12/19]
Stonefly9This database contains 3826 images of 773 specimens of 9 taxa of Stoneflies (Tom etc.) [Before 28/12/19]
Synthetic Migrating Cells -Six artificial migrating cells (neutrophils) over 98 time frames, various levels of Gaussian/Poisson noise and different paths characteristics with ground truth. (Dr Constantino Carlos Reyes-Aldasoro et al.) [Before 28/12/19]
UBFC-RPPG Dataset - remote photoplethysmography (rPPG) video data and ground truth acquired with a CMS50E transmissive pulse oximeter (Bobbia, Macwan, Benezeth, Mansouri, Dubois) [Before 28/12/19]
Uni Bremen Open, Abdominal Surgery RGB Dataset - Recording of a complete, open, abdominal surgery using a Kinect v2 that was mounted directly above the patient looking down at patient and staff. (Joern Teuber, Gabriel Zachmann, University of Bremen) [Before 28/12/19]
Univ of Central Florida - DDSM: Digital Database for Screening Mammography (Univ of Central Florida) [Before 28/12/19]
VascuSynth - 120 3D vascular tree like structures with ground truth (Mengliu Zhao, Ghassan Hamarneh) [Before 28/12/19]
VascuSynth - Vascular Synthesizer generates vascular trees in 3D volumes. (Ghassan Hamarneh, Preet Jassi, Mengliu Zhao) [Before 28/12/19]
York Cardiac MRI dataset (Alexander Andreopoulos) [Before 28/12/19]

Camera calibration

Catadioptric camera calibration images (Yalin Bastanlar) [Before 28/12/19]
GoPro-Gyro Dataset - This dataset consists of a number of wide-angle rolling shutter video sequences with corresponding gyroscope measurements (Hannes etc.) [Before 28/12/19]
LO-RANSAC - LO-RANSAC library for estimation of homography and epipolar geometry(K. Lebeda, J. Matas and O. Chum) [Before 28/12/19]

Face and Eye/Iris Databases

2D-3D face dataset - This dataset includes pairs of 2D face image and its corresponding 3D face geometry model with geometry details. (Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng) [Before 28/12/19]
300 Videos in the Wild (300-VW) - 68 Facial Landmark Tracking (Chrysos, Antonakos, Zafeiriou, Snape, Shen, Kossaifi, Tzimiropoulos, Pantic) [Before 28/12/19]
300W-Style - enhanced version of 300W by applying three style changes to the original images. It is used to facilitate the analysis of the facial landmark detection problem. (Xuanyi Dong) [29/12/19]
3D Mask Attack Database (3DMAD) - 76500 frames of 17 persons using Kinect RGBD with eye positions (Sebastien Marcel) [Before 28/12/19]
3D facial expression - Binghamton University 3D Static and Dynamic Facial Expression Databases (Lijun Yin, Jeff Cohn, and teammates) [Before 28/12/19]
AFLW-Style - enhanced version of AFLW by applying three style changes to the original images. It is used to facilitate the analysis of the facial landmark detection problem. (Xuanyi Dong) [29/12/19]
AginG Faces in the Wild v2 Database description: AGFW-v2 consists of 36,299 facial images divided into 11 age groups with a span of five years between groups. On average, there are 3,300 images per group. Facial images in AGFW-v2 are not public figures and less likely to have significant make-up or facial modifications, helping embed accurate aging effects during the learning process. (Chi Nhan Duong, Khoa Luu, Kha Gia Quach, Tien D. Bui) [Before 28/12/19]
Audio-visual database for face and speaker recognition (Mobile Biometry MOBIO http://www.mobioproject.org/) [Before 28/12/19]
Audiovisual Lombard grid speech corpus - a bi-view audiovisual Lombard speech corpus which can be used to support joint computational-behavioral studies in speech perception (Alghamdi, Maddock, Marxer, Barker and Brown) [31/12/19]
BANCA face and voice database (Univ of Surrey) [Before 28/12/19]
Binghampton Univ 3D static and dynamic facial expression database (Lijun Yin, Peter Gerhardstein and teammates) [Before 28/12/19]
Binghamton-Pittsburgh 4D Spontaneous Facial Expression Database - consist of 2D spontaneous facial expression videos and FACS codes. (Lijun Yin et al.) [Before 28/12/19]
BioID face database (BioID group) [Before 28/12/19]
BioVid Heat Pain Database - This video (and biomedical signal) dataset contains facial and physiopsychological reactions of 87 study participants who were subjected to experimentally induced heat pain. (University of Magdeburg (Neuro-Information Technology group) and University of Ulm (Emotion Lab)) [Before 28/12/19]
Biometric databases - biometric databases related to iris recognition (Adam Czajka) [Before 28/12/19]
Biwi 3D Audiovisual Corpus of Affective Communication - 1000 high quality, dynamic 3D scans of faces, recorded while pronouncing a set of English sentences. [Before 28/12/19]
Bosphorus 3D/2D Database of FACS annotated facial expressions, of head poses and of face occlusions (Bogazici University) [Before 28/12/19]
CAER (Context-Aware Emotion Recognition) - Large scale image and video dataset for emotion recognition, and facial expression recognition (Lee, Kim, Kim, Park, and Sohn) [29/12/19]
Caricature/Photomates dataset - a dataset with frontal faces and corresponding Caricature line drawings (Tayfun Akgul) [Before 28/12/19]
CASIA-IrisV3 (Chinese Academy of Sciences, T. N. Tan, Z. Sun) [Before 28/12/19]
CASIR Gaze Estimation Database - RGB and depth images (from Kinect V1.0) and ground truth values of facial features corresponding to experiments for gaze estimation benchmarking: (Filipe Ferreira etc.) [Before 28/12/19]
Celeb-DF - A new large-scale and challenging DeepFake video dataset, Celeb-DF, for the development and evaluation of DeepFake detection algorithms (Li, Yang, Sun, Qi and Lyu) [30/12/19]
CMU Facial Expression Database (CMU/MIT) [Before 28/12/19]
The CMU Multi-PIE Face Database - more than 750,000 images of 337 people recorded in up to four sessions over the span of five months. (Jeff Cohn et al.) [Before 28/12/19]
CMU Pose, Illumination, and Expression (PIE) Database (Simon Baker) [Before 28/12/19]
CMU/MIT Frontal Faces (CMU/MIT) [Before 28/12/19]
CMU/MIT Frontal Faces (CMU/MIT) [Before 28/12/19]
CoMA 3D face dataset - 20,466 meshes (3D head scans and registrations in FLAME topology) of extreme facial expressions captured from 12 different subjects (Ranjan, Bolkart, Sanyal, Black) [Before 28/12/19]
CSSE Frontal intensity and range images of faces (Ajmal Mian) [Before 28/12/19]
CelebA - Large-scale CelebFaces Attributes Dataset(Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang) [Before 28/12/19]
Celebrities in Frontal-Profile in the Wild - 500+ images of celebrities in frontal and profile views (Sengupta, Cheng, Castillo, Patel, Chellappa, Jacobs) [Before 28/12/19]
Cohn-Kanade AU-Coded Expression Database - 500+ expression sequences of 100+ subjects, coded by activated Action Units (Affect Analysis Group, Univ. of Pittsburgh) [Before 28/12/19]
Cohn-Kanade AU-Coded Expression Database - for research in automatic facial image analysis and synthesis and for perceptual studies (Jeff Cohn et al.) [Before 28/12/19]
Columbia Gaze Data Set - 5,880 images of 56 people over 5 head poses and 21 gaze directions (Brian A. Smith, Qi Yin, Steven K. Feiner, Shree K. Nayar) [Before 28/12/19]
Computer Vision Laboratory Face Database (CVL Face Database) - Database contains 798 images of 114 persons, with 7 images per person and is freely available for research purposes. (Peter Peer etc.) [Before 28/12/19]
Deep future gaze - This dataset consists of 57 sequences on search and retrieval tasks performed by 55 subjects. Each video clip lasts for around 15 minutes with the frame rate 10 fps and frame resolution 480 by 640. Each subject is asked to search for a list of 22 items (including lanyard, laptop) and move them to the packing location (dining table). (National University of Singapore, Institute for Infocomm Research) [Before 28/12/19]
DISFA+:Extended Denver Intensity of Spontaneous Facial Action Database - an extension of DISFA (M.H. Mahoor) [Before 28/12/19]
DISFA:Denver Intensity of Spontaneous Facial Action Database - a non-posed facial expression database for those who are interested in developing computer algorithms for automatic action unit detection and their intensities described by FACS. (M.H. Mahoor) [Before 28/12/19]
DHF1K - 1000 elaborately selected video sequences with fixation annotations from 17 viewers. (Prof. Jianbing Shen) [Before 28/12/19]
EURECOM Facial Cosmetics Database - 389 images, 50 persons with/without make-up, annotations about the amount and location of applied makeup. (Jean-Luc DUGELAY et al) [Before 28/12/19]
EURECOM Kinect Face Database - 52 people, 2 sessions, 9 variations, 6 facial landmarks. (Jean-Luc DUGELAY et al) [Before 28/12/19]
EYEDIAP dataset - The EYEDIAP dataset was designed to train and evaluate gaze estimation algorithms from RGB and RGB-D data.It contains a diversity of participants, head poses, gaze targets and sensing conditions. (Kenneth Funes and Jean-Marc Odobez) [Before 28/12/19]
Face2BMI Dataset The Face2BMI dataset contains 2103 pairs of faces, with corresponding gender, height and previous and current body weights, which allows for training computer vision models that can predict body-mass index (BMI) from profile pictures. (Enes Kocabey, Ferda Ofli, Yusuf Aytar, Javier Marin, Antonio Torralba, Ingmar Weber) [Before 28/12/19]
FDDB: Face Detection Data set and Benchmark - studying unconstrained face detection (University of Massachusetts Computer Vision Laboratory) [Before 28/12/19]
FDDB-360 - face detection in 360 degree fisheye images (Fu, Alvar, Bajic, and Vaughan) [29/12/19]
FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network) [Before 28/12/19]
Face Recognition Grand Challenge datasets (FRVT - Face Recognition Vendor Test) [Before 28/12/19]
FMTV - Laval Face Motion and Time-Lapse Video Database. 238 thermal/video subjects with a wide range of poses and facial expressions acquired over 4 years (Ghiass, Bendada, Maldague) [Before 28/12/19]
Face Super-Resolution Dataset - Ground truth HR-LR face images captured with a dual-camera setup (Chengchao Qu etc.) [Before 28/12/19]
FaceScrub - A Dataset With Over 100,000 Face Images of 530 People (50:50 male and female) (H.-W. Ng, S. Winkler) [Before 28/12/19]
FaceTracer Database - 15,000 faces (Neeraj Kumar, P. N. Belhumeur, and S. K. Nayar) [Before 28/12/19]
Facial Expression Dataset - This dataset consists of 242 facial videos (168,359 frames) recorded in real world conditions. (Daniel McDuff et al.) [Before 28/12/19]
Florence 2D/3D Hybrid Face Dataset - bridges the gap between 2D, appearance-based recognition techniques, and fully 3D approaches (Bagdanov, Del Bimbo, and Masi) [Before 28/12/19]
Facial Recognition Technology (FERET) Database (USA National Institute of Standards and Technology) [Before 28/12/19]
Gi4E Database - eye-tracking database with 1300+ images acquired with a standard webcam, corresponding to different subjects gazing at different points on a screen, including ground-truth 2D iris and corner points (Villanueva, Ponz, Sesma-Sanchez, Mikel Porta, and Cabeza) [Before 28/12/19]
Google Facial Expression Comparison dataset - a large-scale facial expression dataset consisting of face image triplets along with human annotations that specify which two faces in each triplet form the most similar pair in terms of facial expression, which is different from datasets that focus mainly on discrete emotion classification or action unit detection (Vemulapalli, Agarwala) [Before 28/12/19]
Hannah and her sisters database - a dense audio-visual person-oriented ground-truth annotation of faces, speech segments, shot boundaries (Patrick Perez, Technicolor) [Before 28/12/19]
Headspace dataset - The Headspace dataset is a set of 3D images of the full human head, consisting of 1519 subjects wearing tight fitting latex caps to reduce the effect of hairstyles. (Christian Duncan, Rachel Armstrong, Alder Hey Craniofacial Unit, Liverpool, UK) [Before 28/12/19]
Hong Kong Face Sketch Database [Before 28/12/19]
IDIAP Head Pose Database (IHPD) - The dataset contains a set of meeting videos along with the head groundtruth of individual participants (around 128min)(Sileye Ba and Jean-Marc Odobez) [Before 28/12/19]
IARPA Janus Benchmark datasets - IJB-A, IJB-B, IJB-C, FRVT (NIST) [Before 28/12/19]
IMDB-WIKI - 500k+ face images with age and gender labels (Rasmus Rothe, Radu Timofte, Luc Van Gool ) [Before 28/12/19]
Indian Movie Face database (IMFDB) - a large unconstrained face database consisting of 34512 images of 100 Indian actors collected from more than 100 videos (Vijay Kumar and C V Jawahar) [Before 28/12/19]
Iranian Face Database - IFDB is the first image database in middle-east, contains color facial images with age, pose, and expression whose subjects are in the range of 2-85. (Mohammad Mahdi Dehshibi) [Before 28/12/19]
Japanese Female Facial Expression (JAFFE) Database (Michael J. Lyons) [Before 28/12/19]
LIRIS Children Spontaneous Facial Expression Video Database - pontaneous / natural facial expressions of 12 children in diverse settings with variable video recording scenarios showing six universal or prototypic emotional expressions (happiness, sadness, anger, surprise, disgust and fear). Children are recorded in constraint free environment (no restriction on head movement, no restriction on hands movement, free sitting setting, no restriction of any sort) while they watched specially built / selected stimuli. This constraint free environment allowed us to record spontaneous / natural expression of children as they occur. The database has been validated by 22 human raters. (Khan, Crenn, Meyer, Bouakaz) [29/12/19]
LFW: Labeled Faces in the Wild - unconstrained face recognition [Before 28/12/19]
LS3D-W - a large-scale 3D face alignment dataset annotated with 68 points containing faces captured in a "in-the-wild" setting. (Adrian Bulat, Georgios Tzimiropoulos) [Before 28/12/19]
MAFA: MAsked FAces - 30,811 images with 35,806 labeled MAsked FAces, six main attributes of each masked face. (Shiming Ge, Jia Li, Qiting Ye, Zhao Luo) [Before 28/12/19]
Makeup Induced Face Spoofing (MIFS) - 107 makeup-transformations attempting to spoof a target identity. Also other datasets. (Antitza Dantcheva) [Before 28/12/19]
Mexculture142 - Mexican Cultural heritage objects and eye-tracker gaze fixations (Montoya Obeso, Benois-Pineau, Garcia-Vazquez, Ramirez Acosta) [Before 28/12/19]
MIT CBCL Face Recognition Database (Center for Biological and Computational Learning) [Before 28/12/19]
MIT Collation of Face Databases (Ethan Meyers) [Before 28/12/19]
MIT eye tracking database (1003 images) (Judd et al) [Before 28/12/19]
MMI Facial Expression Database - 2900 videos and high-resolution still images of 75 subjects, annotated for FACS AUs. [Before 28/12/19]
MORPH (Craniofacial Longitudinal Morphological Face Database) (University of North Carolina Wilmington) [Before 28/12/19]
MPIIGaze dataset - 213,659 samples with eye images and gaze target under different illumination conditions and nature head movement, collected from 15 participants with their laptop during daily using. (Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling.) [Before 28/12/19]
Manchester Annotated Talking Face Video Dataset (Timothy Cootes) [Before 28/12/19]
MegaFace - 1 million faces in bounding boxes (Kemelmacher-Shlizerman, Seitz, Nech, Miller, Brossard) [Before 28/12/19]
Music video dataset - 8 music videos from YouTube for developing multi-face tracking algorithms in unconstrained environments (Shun Zhang, Jia-Bin Huang, Ming-Hsuan Yang) [Before 28/12/19]
NIST Face Recognition Grand Challenge (FRGC) (NIST) [Before 28/12/19]
NIST mugshot identification database (USA National Institute of Standards and Technology) [Before 28/12/19]
NRC-IIT Facial Video Database - this database contains pairs of short video clips each showing a face of a computer user sitting in front of the monitor exhibiting a wide range of facial expressions and orientations (Dmitry Gorodnichy) [Before 28/12/19]
Notre Dame Iris Image Dataset (Patrick J. Flynn) [Before 28/12/19]
Notre Dame face, IR face, 3D face, expression, crowd, and eye biometric datasets (Notre Dame) [Before 28/12/19]
ORL face database: 40 people with 10 views (ATT Cambridge Labs) [Before 28/12/19]
OUI-Adience Faces - unfiltered faces for gender and age classification plus 3D faces (OUI) [Before 28/12/19]
Oxford: faces, flowers, multi-view, buildings, object categories, motion segmentation, affine covariant regions, misc (Oxford Visual Geometry Group) [Before 28/12/19]
Pandora - POSEidon: Face-from-Depth for Driver Pose (Borghi, Venturelli, Vezzani, Cucchiara) [Before 28/12/19]
PubFig: Public Figures Face Database (Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar) [Before 28/12/19]
QMUL-SurvFace - A large-scale face recognition benchmark dedicated for real-world surveillance face analysis and matching. (QMUL Computer Vision Group) [Before 28/12/19]
Re-labeled Faces in the Wild - original images, but aligned using "deep funneling" method. (University of Massachusetts, Amherst) [Before 28/12/19]
RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments 122,531 images with the subjects' ground truth eye gaze and head pose labels under free-viewing conditions and large camera-subject distances (Fischer, Chang, Demiris, Imperial College London) [Before 28/12/19]
S3DFM - Edinburgh Speech-driven 3D Facial Motion Database. 77 people with 10 repetitions of speaking a passphrase: 1 second of 500 frame per second 600x600 pixels of {IR intensity video, registered depth images} plus synchronized 44.1 Khz audio. There are an additional 26 people (10 repetitions) moving their heads while speaking (Zhang, Fisher) [Before 28/12/19]
Salient features in gaze-aligned recordings of human visual input - TB of human gaze-contingent data "in the wild" (Frank Schumann etc.) [Before 28/12/19]
SAMM Dataset of Micro-Facial Movements - The dataset contains 159 spontaneous micro-facial movements obtained from 32 participants from 13 different ethnicities. (A.Davison, C.Lansley, N.Costen, K.Tan, M.H.Yap) [Before 28/12/19]
SCface - Surveillance Cameras Face Database (Mislav Grgic, Kresimir Delac, Sonja Grgic, Bozidar Klimpak) [Before 28/12/19]
SiblingsDB - The SiblingsDB contains two datasets depicting images of individuals related by sibling relationships. (Politecnico di Torino/Computer Graphics & Vision Group) [Before 28/12/19]
SoF dataset - 42,592 face images with glasses under different illumination conditions; provided with face region, facial landmarks, facial expression, subject ID, gender, and age information (Afifi, Abdelhamed) [29/12/19]
Solving the Robot-World Hand-Eye(s) Calibration Problem with Iterative Methods - These datasets were generated for calibrating robot-camera systems. (Amy Tabb) [Before 28/12/19]
Spontaneous Emotion Multimodal Database (SEM-db) - non-posed reactions to visual stimulus data recorded with HD RGB, depth and IR frames of the face, EEG signal and eye gaze data (Fernandez. Montenegro, Gkelias, Argyriou) [Before 28/12/19]
The UNBC-McMaster Shoulder Pain Expression Archive Database - Painful data: The UNBC-McMaster Shoulder Pain Expression Archive Database (Lucy et al.) [Before 28/12/19]
VOCASET - 4D face dataset with about 29 minutes of 3D head scans captured at 60 fps and synchronized audio from 12 speakers (Cudeiro, Bolkart, Laidlaw, Ranjan, Black) [Before 28/12/19]
Trondheim Kinect RGB-D Person Re-identification Dataset (Igor Barros Barbosa) [Before 28/12/19]
UB KinFace Database - University of Buffalo kinship verification and recognition database [Before 28/12/19]
UBIRIS: Noisy Visible Wavelength Iris Image Databases (University of Beira) [Before 28/12/19]
UMDFaces - About 3.7 million annotated video frames from 22,000 videos and 370,000 annotated still images. (Ankan Bansal et al.) [Before 28/12/19]
UPNA Head Pose Database - head pose database, with 120 webcam videos containing guided-movement sequences and free-movement sequences, including ground-truth head pose and automatically annotated 2D facial points. (Ariz, Bengoechea, Villanueva, Cabeza) [Before 28/12/19]
UPNA Synthetic Head Pose Database - a synthetic replica of the UPNA Head Pose Database, with 120 videos with their 2D ground truth landmarks projections, their corresponding head pose ground truth, 3D head models and camera parameters. (Larumbe, Segura, Ariz, Bengoechea, Villanueva, Cabeza) [Before 28/12/19]
UTIRIS cross-spectral iris image databank (Mahdi Hosseini) [Before 28/12/19]
UvA-NEMO Smile Database - 1240 smile videos (597 spontaneous and 643 posed) from 400 subjects, including age, gender, and kinship annotations (Gevers, Dibeklioglu, Salah) [Before 28/12/19]
VGGFace2 - VGGFace2 is a large-scale face recognition dataset covering large variations in pose, age, illumination, ethnicity and profession. (Oxford Visual Geometry Group) [Before 28/12/19]
VIPSL Database - VIPSL Database is for research on face sketch-photo synthesis and recognition, including 200 subjects (1 photo and 5 sketches per subject). (Nannan Wang) [Before 28/12/19]
Visual Search Zero Shot Database - Collection of human eyetracking data in three increasingly complex visual search tasks: object arrays, natural images and Waldo images. (Kreiman lab) [Before 28/12/19]
VT-KFER: A Kinect-based RGBD+Time Dataset for Spontaneous and Non-Spontaneous Facial Expression Recognition - 32 subjects, 1,956 sequences of RGBD, six facial expressions in 3 poses (Aly, Trubanova, Abbott, White, and Youssef) [Before 28/12/19]
Washington Facial Expression Database (FERG-DB) - a database of 6 stylized (Maya) characters with 7 annotated facial expressions (Deepali Aneja, Alex Colburn, Gary Faigin, Linda Shapiro, and Barbara Mones) [Before 28/12/19]
WebCaricature Dataset - The WebCaricature dataset is a large photograph-caricature dataset consisting of 6042 caricatures and 5974 photographs from 252 persons collected from the web. (Jing Huo, Wenbin Li, Yinghuan Shi, Yang Gao and Hujun Yin) [Before 28/12/19]
WIDER FACE: A Face Detection Benchmark - 32,203 images with 393,703 labeled faces, 61 event classes (Shuo Yang, Ping Luo, Chen Change Loy, Xiaoou Tang) [Before 28/12/19]
Wider-360 - Datasets for face and object detection in fisheye images (Fu, Bajic, and Vaughan) [29/12/19]
XM2VTS Face video sequences (295): The extended M2VTS Database (XM2VTS) - (Surrey University) [Before 28/12/19]
Yale Face Database - 11 expressions of 10 people (A. Georghaides) [Before 28/12/19]
Yale Face Database B - 576 viewing conditions of 10 people (A. Georghaides) [Before 28/12/19]
York 3D Ear Dataset - The York 3D Ear Dataset is a set of 500 3D ear images, synthesized from detailed 2D landmarking, and available in both Matlab format (.mat) and PLY format (.ply). (Nick Pears, Hang Dai, Will Smith, University of York) [Before 28/12/19]
York Univ Eye Tracking Dataset (120 images) (Neil Bruce) [Before 28/12/19]
YouTube Faces DB - 3,425 videos of 1,595 different people. (Wolf, Hassner, Maoz) [Before 28/12/19]
Zurich Natural Image - the image material used for creating natural stimuli in a series of eye-tracking studies (Frey et al.) [Before 28/12/19]

Fingerprints

FVC fingerpring verification competition 2002 dataset (University of Bologna) [Before 28/12/19]
FVC fingerpring verification competition 2004 dataset (University of Bologna) [Before 28/12/19]
Fingerprint Manual Minutiae Marker (FM3) Databases: - Fingerprint Manual Minutiae Marker (FM3) Databases( Mehmet Kayaoglu, Berkay Topcu and Umut Uludag) [Before 28/12/19]
NIST fingerprint databases (USA National Institute of Standards and Technology) [Before 28/12/19]
SPD2010 Fingerprint Singular Points Detection Competition (SPD 2010 committee) [Before 28/12/19]

General Images

A Dataset for Real Low-Light Image Noise Reduction - It contains pixel and intensity aligned pairs of images corrupted by low-light camera noise and their low-noise counterparts. (J. Anaya, A. Barbu) [Before 28/12/19]
A database of paintings related to Vincent van Gogh - This is the dataset VGDB-2016 built for the paper "From Impressionism to Expressionism: Automatically Identifying Van Gogh's Paintings" (Guilherme Folego and Otavio Gomes and Anderson Rocha) [Before 28/12/19]
AMOS: Archive of Many Outdoor Scenes (20+m) (Nathan Jacobs) [Before 28/12/19]
Aerial imagesBuilding detection from aerial images using invariant color features and shadow information. (Beril Sirmacek) [Before 28/12/19]
Approximated overlap error datasetImage pairs with sparse sets of ground-truth matches for evaluating local image descriptors (Fabio Bellavia) [Before 28/12/19]
AutoDA (Automatic Dataset Augmentation) - An automatically constructed image dataset including 12.5 million images with relevant textual information for the 1000 categories of ILSVRC2012 (Bai, Yang, Ma, Zhao) [Before 28/12/19]
BGU Hyperspectral Image Database of Natural Scenes (Ohad Ben-Shahar and Boaz Arad) [Before 28/12/19]
Brown Univ Large Binary Image Database (Ben Kimia) [Before 28/12/19]
Butterfly-200 - Butterfly-20 is a image dataset for fine-grained image classification, which contains 25,279 images and covers four levels categories of 200 species, 116 genera, 23 subfamilies, and 5 families. (Tianshui Chen) [Before 28/12/19]
CIFAR-10 classes with different WB settings - 15,098 rendered images that reflect real in-camera white-balance settings (Afifi, Brown) [29/12/19]
CMP Facade Database - Includes 606 rectified images of facades from various places with 12 architectural classes annotated. (Radim Tylecek) [Before 28/12/19]
Caltech-UCSD Birds-200-2011 (Catherine Wah) [Before 28/12/19]
Color correction dataset - Homography-based registered images for evaluating color correction algorithms for image stitching. (Fabio Bellavia) [Before 28/12/19]
Columbia Multispectral Image Database (F. Yasuma, T. Mitsunaga, D. Iso, and S.K. Nayar) [Before 28/12/19]
DAQUAR (Visual Turing Challenge) - A dataset containing questions and answers about real-world indoor scenes. (Mateusz Malinowski, Mario Fritz) [Before 28/12/19]
Darmstadt Noise Dataset - 50 pairs of real noisy images and corresponding ground truth images (RAW and sRGB) (Tobias Plotz and Stefan Roth) [Before 28/12/19]
Dataset of American Movie Trailers 2010-2014 - Contains links to 474 hollywood movie trailers along with associated metadata (genre, budget, runtime, release, MPAA rating, screens released, sequel indicator) (USC Signal Analysis and Interpretation Lab) [Before 28/12/19]
DIML Multimodal Benchmark - To evaluate matching performance under photometric and geometric variations, 100 images of 1200 x 800 size. (Yonsei University) [Before 28/12/19]
DSLR Photo Enhancement Dataset (DPED) - 22K photos taken synchronously in the wild by three smartphones and one DSLR camera, useful for comparing infered high quality images from multiple low quality images (Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool). [Before 28/12/19]
Flickr-style - 80K Flickr photographs annotated with 20 curated style labels, and 85K paintings annotated with 25 style/genre labels (Sergey Karayev) [Before 28/12/19]
Flickr1024: A Dataset for Stereo Image Super-resolution - 1024 high-quality images pairs and covers diverse senarios (Wang, Wang, Yang, An, Guo) [Before 28/12/19]
Forth Multispectral Imaging Datasets - images from 23 spectral bands each from 5 paintings. Images are annotated with ground truth data. (Karamaoynas Polykarpos et al) [Before 28/12/19]
General 100 Dataset - General-100 dataset contains 100 bmp-format images (with no compression), which are well-suited for super-resolution training(Dong, Chao and Loy, Chen Change and Tang, Xiaoou) [Before 28/12/19]
GOPRO dataset - Blurred image dataset with sharp image ground truth (Nah, Kim, and Lee) [Before 28/12/19]
HIPR2 Image Catalogue of different types of images (Bob Fisher et al) [Before 28/12/19]
HPatches - A benchmark and evaluation of handcrafted and learned local descriptors (Balntas, Lenc, Vedaldi, Mikolajczyk) [Before 28/12/19]
Hyperspectral images for spatial distributions of local illumination in natural scenes - Thirty calibrated hyperspectral radiance images of natural scenes with probe spheres embedded for local illumination estimation. (Nascimento, Amano & Foster) [Before 28/12/19]
Hyperspectral images of natural scenes - 2002 (David H. Foster) [Before 28/12/19]
Hyperspectral images of natural scenes - 2004 (David H. Foster) [Before 28/12/19]
ISPRS multi-platform photogrammetry dataset - 1: Nadir and oblique aerial images plus 2: Combined UAV and terrestrial images (Francesco Nex and Markus Gerke) [Before 28/12/19]
Image & Video Quality Assessment at LIVE - used to develop picture quality algorithms (the University of Texas at Austin) [Before 28/12/19]
ImageNet Large Scale Visual Recognition Challenges - Currently 200 object classes and 500+K images (Alex Berg, Jia Deng, Fei-Fei Li and others) [Before 28/12/19]
ImageNet Linguistically organised (WordNet) Hierarchical Image Database - 10E7 images, 15K categories (Li Fei-Fei, Jia Deng, Hao Su, Kai Li) [Before 28/12/19]
Improved 3D Sparse Maps for High-performance Structure from Motion with Low-cost Omnidirectional Robots - Evaluation Dataset - Data set used in research paper doi:10.1109/ICIP.2015.7351744 (Breckon, Toby P., Cavestany, Pedro) [Before 28/12/19]
Konstanz visual quality databases - Large-scale image and video databases for the development and evaluation of visual quality assessment algorithms. (MMSP group, University of Konstanz) [Before 28/12/19]
Kodak McMaster demosaic dataset - (Zhang, Wu, Buades, Li) [Before 28/12/19]
LabelMeFacade Database - 945 labeled building images (Erik Rodner et al) [Before 28/12/19]
Local illumination hyperspectral radiance images - Thirty hyperspectral radiance images of natural scenes with embedded probe spheres for local illumination estimates(Sgio M. C. Nascimento, Kinjiro Amano, David H. Foster) [Before 28/12/19]
McGill Calibrated Colour Image Database (Adriana Olmos and Fred Kingdom) [Before 28/12/19]
Multiply Distorted Image Database -a database for evaluating the results of image quality assessment metrics on multiply distorted images. (Fei Zhou) [Before 28/12/19]
NAS-Bench-102 - An algorithm-agnostic nas benchmark with detailed information (training/validation/test loss/accuracy etc) of 15,625 architectures on three datasets. (Xuanyi Dong) [29/12/19]
NPRgeneral - A standardized collection of images for evaluating image stylization algorithms. (David Mould, Paul Rosin) [Before 28/12/19]
nuTonomy scenes dataset (nuScenes) - The nuScenes dataset is a large-scale autonomous driving dataset. It features: Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS), 1000 scenes of 20s each, 1,440,000 camera images, 400,000 lidar sweeps, two diverse cities: Boston and Singapore, left versus right hand traffic, detailed map information, manual annotations for 25 object classes, 1.1M 3D bounding boxes annotated at 2Hz, attributes such as visibility, activity and pose. (Caesar et al) [Before 28/12/19]
NYU Symmetry Database - 176 single-symmetry and 63 multyple-symmetry images (Marcelo Cicconet and Davi Geiger) [Before 28/12/19]
OceanDark dataset - 100 low-lighting underwater images from underwater sites in the Northeast Pacific Ocean. 1400x1000 pixels, varying lighting and recording conditions (Ocean Networks Canada) [Before 28/12/19]
OTCBVS Thermal Imagery Benchmark Dataset Collection (Ohio State Team) [Before 28/12/19]
PAnorama Sparsely STructured Areas Datasets - the PASSTA datasets used for evaluation of the image alignment (Andreas Robinson) [Before 28/12/19]
QMUL-OpenLogo - A logo detection benchmark for testing the model generalisation capability in detecting a variety of logo objects in natural scenes with the majority logo classes unlabelled. (QMUL Computer Vision Group) [Before 28/12/19]
RESIDE (Realistic Single Image DEhazing) - The current largest-scale benchmark consisting of both synthetic and real-world hazy images, for image dehazing research. RESIDE highlights diverse data sources and image contents, and serves various training or evaluation purposes. (Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, Zhangyang Wang) [Before 28/12/19]
Rijksmuseum Challenge 2014 - It consist of 100K art objects from the rijksmuseum and comes with an extensive xml files describing each object. (Thomas Mensink and Jan van Gemert) [Before 28/12/19]
See in the Dark - 77 Gb of dark images (Chen, Chen, Xu, and Koltun) [Before 28/12/19]
Smartphone Image Denoising Dataset (SIDD) - The Smartphone Image Denoising Dataset (SIDD) consists of about 30,000 noisy images with corresponding high-quality ground truth in both raw-RGB and sRGB spaces obtained from 10 scenes with different lighting conditions using five representative smartphone cameras. (Abdelrahman Abdelhamed, Stephen Lin, Michael S. Brown) [Before 28/12/19]
Rendered WB dataset - 100,000+ rendered sRGB images with different white balance (WB) settings (Afifi, Price, Cohen, Brown) [29/12/19]
Stanford Street View Image, Pose, and 3D Cities Dataset - a large scale dataset of street view images (25 million images and 118 matching image pairs) with their relative camera pose, 3D models of cities, and 3D metadata of images. (Zamir, Wekel, Agrawal, Malik, Savarese) [Before 28/12/19]
TESTIMAGES - Huge and free collection of sample images designed for analysis and quality assessment of different kinds of displays (i.e. monitors, televisions and digital cinema projectors) and image processing techniques. (Nicola Asuni) [Before 28/12/19]
Time-Lapse Hyperspectral Radiance Images of Natural Scenes - Four time-lapse sequences of 7-9 calibrated hyperspectral radiance images of natural scenes taken over the day. (Foster, D.H., Amano, K., & Nascimento, S.M.C.) [Before 28/12/19]
Time-lapse hyperspectral radiance images - Four time-lapse sequences of 7-9 calibrated hyperspectral images of natural scenes, spectra at 10-nm intervals(David H. Foster, Kinjiro Amano, Sgio M. C. Nascimento) [Before 28/12/19]
Tiny Images Dataset 79 million 32x32 color images (Fergus, Torralba, Freeman) [Before 28/12/19]
TURBID Dataset - five different subsets of degraded images with its respective ground-truth. Subsets Milk and DeepBlue have 20 images each and the subset Chlorophyll has 42 images (Amanda Duarte) [Before 28/12/19]
UT Snap Angle 360° Dataset - A list of 360° videos of four activities (disney, parade, ski, concert) from youtube (Kristen Grauman, UT Austin) [Before 28/12/19]
UT Snap Point Dataset - Human judgement on snap point quality of a subset of frames from UT Egocentric dataset and a newly collected mobile robot dataset (frames are also included) (Bo Xiong, Kristen Grauman, UT Austin) [Before 28/12/19]
Visual Dialog - 120k human-human dialogs on COCO images, 10 rounds of QA per dialog (Das, Kottur, Gupta, Singh, Yadav, Moura, Parikh, Batra) [Before 28/12/19]
Visual Question Answering - 254K imags, 764K questions, ground truth (Agrawal, Lu, Antol, Mitchell, Zitnick, Batra, Parikh) [Before 28/12/19]
Visual Question Generation - 15k images (including both object-centric and event-centric images), 75k natural questions asked about the images which can evoke further conversation(Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiao dong He, Lucy Vanderwende) [Before 28/12/19]
VQA Human Attention - 60k human attention maps for visual question answering i.e. where humans choose to look to answer questions about images (Das, Agrawal, Zitnick, Parikh, Batra) [Before 28/12/19]
Wild Web tampered image dataset - A large collection of tampered images from Web and social media sources, including ground-truth annotation masks for tampering localization (Markos Zampoglou, Symeon Papadopoulos) [Before 28/12/19]
YFCC100M: The New Data in Multimedia Research - This publicly available curated dataset of 100 million photos and videos is free and legal for all. (Bart Thomee, Yahoo Labs and Flickr in San Francisco,etc.) [Before 28/12/19]

General RGBD and Depth Datasets

Note: there are 3D datasets elsewhere as well, e.g. in Objects, Scenes, and Actions.

General Videos

AlignMNIST - An artificially extended version of the MNIST handwritten dataset. (en Hauberg) [Before 28/12/19]
Audio-Visual Event (AVE) dataset- AVE dataset contains 4143 YouTube videos covering 28 event categories and videos in AVE dataset are temporally labeled with audio-visual event boundaries. (Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu) [Before 28/12/19]
Dataset of Multimodal Semantic Egocentric Video (DoMSEV) - Labeled 80-hour Dataset of Multimodal Semantic Egocentric Videos (DoMSEV) covering a wide range of activities, scenarios, recorders, illumination and weather conditions. (UFMG, Michel Silva, Washington Ramos, Jo??o Ferreira, Felipe Chamone, Mario Campos, Erickson R. Nascimento) [Before 28/12/19]
DAVIS: Video Object Segmentation dataset 2016 - A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation (F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung) [Before 28/12/19]
DAVIS: Video Object Segmentation dataset 2017 - The 2017 DAVIS Challenge on Video Object Segmentation (J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbelaez, A. Sorkine-Hornung, and L. Van Gool) [Before 28/12/19]
EGO-CH - a large egocentric video dataset acquired by real visitors in two different cultural sites. The dataset includes more than 27 hours of video acquired by 70 different subjects. The overall dataset includes labels for 26 environments and over 200 Points of Interest (POIs). (Giovanni Maria Farinella) [31/12/19]
FAIR-Play - 1,871 video clips (~5 hrs) and their corresponding binaural audio clips recorded in a music room (Gao and Grauman) [29/12/19]
GoPro-Gyro Dataset - ego centric videos (Linkoping Computer Vision Laboratory) [Before 28/12/19]
Image & Video Quality Assessment at LIVE - used to develop picture quality algorithms (the University of Texas at Austin) [Before 28/12/19]
Large scale YouTube video dataset - 156,823 videos (2,907,447 keyframes) crawled from YouTube videos (Yi Yang) [Before 28/12/19]
Movie Memorability Dataset - memorable movie clips and ground truth of detail memorability, 660 short movie excerpts extracted from 100 Hollywood-like movies (Cohendet, Yadati, Duong and Demarty) [Before 28/12/19]
MovieQA - each machines to understand stories by answering questions about them. 15000 multiple choice QAs, 400+ movies. (M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler) [Before 28/12/19]
Multispectral visible-NIR video sequences - Annotated multispectral video, visible + NIR (LE2I, Universit de Bourgogne) [Before 28/12/19]
Moments in Time Dataset - Moments in Time Dataset 1M 3-second videos annotated with action type, the largest dataset of its kind for action recognition and understanding in video. (Monfort, Oliva, et al.) [Before 28/12/19]
Near duplicate video retrieval dataset - This database consists of 156,823 videos sequences (2,907,447 keyframes), which were crawled from YouTube during the period of July 2010 to September 2010. (Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, Richang Hong) [Before 28/12/19]
PHD2: Personalized Highlight Detection Dataset - PHD2 is a dataset with personalized highlight information, which allows to train highlight detection models that use information about the user, when making predictions. (Ana Garcia del Molino, Michael Gygli) [Before 28/12/19]
Sports-1M - Dataset for sports video classification containing 487 classes and 1.2M videos. (Andrej Karpathy and George Toderici and Sanketh Shetty and Thomas Leung and Rahul Sukthankar and Li Fei-Fei.) [Before 28/12/19]
nuTonomy scenes dataset (nuScenes) - The nuScenes dataset is a large-scale autonomous driving dataset. It features: Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS), 1000 scenes of 20s each, 1,440,000 camera images, 400,000 lidar sweeps, two diverse cities: Boston and Singapore, left versus right hand traffic, detailed map information, manual annotations for 25 object classes, 1.1M 3D bounding boxes annotated at 2Hz, attributes such as visibility, activity and pose. (Caesar et al) [Before 28/12/19]
REDS (REalistic and Dynamic Scenes) - high-quality realistic blurry video dataset with reference sharp frames (improved version of GOPRO) (Nah, Baik, Hong, Moon, Son, Timofte and Lee) [4/1/2020]
Video Sequencesused for research on Euclidean upgrades based on minimal assumptions about the camera(Kenton McHenry) [Before 28/12/19]
Video Stacking Dataset - A Virtual Tripod for Hand-held Video Stacking on Smartphones (Erik Ringaby etc.) [Before 28/12/19]
VideoMem Dataset - The VideoMem or Video Memorability Database is a collection of sound-less video excerpts and their corresponding ground-truth memorability files. The memorability scores are computed based on the measurement of short-term and long-term memory performances when recognizing small video excerpts a few minutes after viewing them for the short-term case, and 24 to 72 hours later, for the long-term case. It is accompanied with video features extracted from the video excerpts. It is intended to be used for understanding the memorability of videos and for assessing the quality of methods for predicting the memorability of multimedia content. (Cohendet, Demarty, Duong and Engilberge) [6/1/20]
YFCC100M videos - A benchmark on the video subset of YFCC100M which includes the videos, he video content features and the API to a sate-of-the-art video content engine. (Lu Jiang) [Before 28/12/19]
YFCC100M: The New Data in Multimedia Research - This publicly available curated dataset of 100 million photos and videos is free and legal for all. (Bart Thomee, Yahoo Labs and Flickr in San Francisco,etc.) [Before 28/12/19]
YouTube-BoundingBoxes - 5.6 million accurate human-annotated BB from 23 object classes tracked across frames, from 240,000 YouTube videos, with a strong focus on the person class (1.3 million boxes) (Real, Shlens, Pan, Mazzocchi, Vanhoucke, Khan, Kakarla et al) [Before 28/12/19]
YouTube-8M - Dataset for video classification in the wild, containing pre-extracted frame level features from 8M videos, and 4800 classes. (Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev,George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan) [Before 28/12/19]
YUP++ / Dynamic Scenes dataset - 20 outdoor scene classes, each with 60 colour videos (each 5 seconds, 480 pixels wide, 24-30 fps) from 60 different scenes. Half of the videos are with a static camera and half with a moving camera (Feichtenhofer, Pinz, Wildes) [Before 28/12/19

你可能感兴趣的:(计算机视觉CV,计算机视觉,数据集)

Python从0到100（三十九）：数据提取之正则（文末免费送书）是Dream呀 python mysql 开发语言
前言：零基础学Python：Python从0到100最新最全教程。想做这件事情很久了，这次我更新了自己所写过的所有博客，汇集成了Python从0到100，共一百节课，帮助大家一个月时间里从零基础到学习Python基础语法、Python爬虫、Web开发、计算机视觉、机器学习、神经网络以及人工智能相关知识，成为学习学习和学业的先行者！欢迎大家订阅专栏：零基础学Python：Python从0到100最新
深度学习工厂的蓝图：拆解CUDA驱动、PyTorch与OpenCV的依赖关系时光旅人01号深度学习 pytorch opencv
想象一下，你正在建造一座深度学习工厂，这座工厂专门用于高效处理深度学习任务（如训练神经网络）和计算机视觉任务（如图像处理）。为了让工厂顺利运转，你需要搭建基础设施、安装设备、设置生产线，并配备控制台来管理整个生产过程。以下是这座工厂的详细构建过程：1.工厂的基础设施：Ubuntu比喻：Ubuntu是工厂所在的土地和建筑，提供了基础设施和运行环境。作用：提供操作系统环境，支持安装和运行各种工具和框架
java实现，使用向量相似度输入字符串，在定义好的字符串集合中根据语义匹配出最准的一个。 melck 1024程序员节
以下是完整的Java示例代码，包括字符串集合的定义和根据输入字符串匹配最相似字符串的逻辑：importjava.util.*;publicclassSemanticMatching{publicstaticvoidmain(String[]args){//定义字符串集合ListstringCollection=Arrays.asList("Whereistherestroom?","Canyout
UDP通信开发 Charary udp 网络
开发流程：UDP本身不考虑链接，不存在客户和服务器的概念，UDP开发只有三步：创建UDP的套接字socket(AF_INET,SOCK_DGRAM,0)绑定自己的属性bindUDP随意的发送和接收数据sendto/recvfromUDP接口函数：sendto()函数功能：UDP专用的发送函数函数原型：ssize_tsendto(intsockfd,//套接字constvoid*buf,//待发送的
在瑞芯微RK3588平台上使用RKNN部署YOLOv8Pose模型的C++实战指南机＿长 YOLO系列模型有效涨点改进深度学习落地实战 YOLO c++开发语言
在人工智能和计算机视觉领域，人体姿态估计是一项极具挑战性的任务，它对于理解人类行为、增强人机交互等方面具有重要意义。YOLOv8Pose作为YOLO系列中的新成员，以其高效和准确性在人体姿态估计任务中脱颖而出。本文将详细介绍如何在瑞芯微RK3588平台上，使用RKNN（RockchipNeuralNetworkToolkit）框架部署YOLOv8Pose模型，并进行C++代码的编译和运行。注本文全
深度学习之目标检测的常用标注工具铭瑾熙人工智能机器学习深度学习深度学习目标检测目标跟踪
1LabelImgLabelImg是一款开源的图像标注工具，标签可用于分类和目标检测，它是用Python编写的，并使用Qt作为其图形界面，简单好用。注释以PASCALVOC格式保存为XML文件，这是ImageNet使用的格式。此外，它还支持COCO数据集格式。2labelmelabelme是一款开源的图像/视频标注工具，标签可用于目标检测、分割和分类。灵感是来自于MIT开源的一款标注工具Label
YOLOv8 Pose使用RKNN进行推理い不靠譜︶朱Sir 实用项目部署 YOLO 人工智能 python linux pip
关注微信公众号：朱sir的小站，发送202411081即可免费获取源代码下载链接一、简单介绍YOLOv8-Pose是一种基于YOLOv8架构的姿态估计模型，能够识别图像中的关键点位置，这些关键点通常表示人体的关节、特征点或其他显著位置。该模型在COCO关键点数据集上训练，适合多种姿势估计任务。二、ONNX推理1.首先需要先将Pytorch模型转换为Onnx模型，下载pt模型这里给出官方的权重下载地
基于python使用scanpy分析单细胞转录组数据探序基因单细胞分析 python 开发语言
探序基因肿瘤研究院整理相关后缀的格式介绍：.h5ad：是一种用于存储单细胞数据的文件格式，可以通过anndata库在Python中处理.loom：高效的数据存储格式（.loom文件），使得用户可以轻松地存储、查询和分析大规模的单细胞数据集。Loompy的设计目标是提供一个快速、灵活且易于使用的工具，以支持生物信息学家和研究人员在单细胞水平上进行数据分析。python的单细胞转录组数据结构说明：da
使用Python和OpenCV实现图像像素压缩与解压东方佑量子变法 python opencv 开发语言
在本文中，我们将探讨如何使用Python和OpenCV库来实现一种简单的图像像素压缩算法。我们将详细讨论代码的工作原理，并提供一个具体的示例来演示该过程。1.引言随着数字媒体的普及，图像处理成为了一个重要的领域。无论是为了减少存储空间还是加快网络传输速度，图像压缩技术都扮演着至关重要的角色。这里，我们提出了一种基于像素重复模式的简单压缩算法，它适用于具有大量连续相同像素值的图像。2.技术栈介绍2.
【自然语言处理|迁移学习-08】：中文语料完型填空爱学习不掉头发深度学习自然语言处理（NLP）自然语言处理迁移学习人工智能
文章目录1中文语料完型填空任务介绍2数据集加载及处理3定义下游任务模型4模型训练5.模型测试1中文语料完型填空任务介绍任务介绍：完成中文语料完型填空完型填空是一个分类问题，[MASK]单词有21128种可能数据构建实现分析：使用迁移学习方式完成使用预训练模型bert模型提取文特征，后面添加全连接层和softmax进行单标签多分类2数据集加载及处理数据介绍：数据文件有三个train.csv，test
鸢尾花分类项目 GUI 编织幻境的妖分类数据挖掘人工智能
1.机器学习的定义机器学习是一门人工智能的分支，专注于开发算法和统计模型，使计算机能够在没有明确编程的情况下从数据中自动学习和改进。通过识别数据中的模式和规律，机器学习系统可以做出预测或决策。常见的应用包括图像识别、语音识别、推荐系统等。2.为什么使用鸢尾花数据集（Irisdataset）鸢尾花数据集是一个经典的多类分类问题数据集，由英国统计学家和遗传学家RonaldFisher在1936年引入。
【干货】视频文件抽帧（opencv和ffmpeg方式对比） zkFun 超硬干货 Python opencv ffmpeg 人工智能
1废话不多说，直接上代码opencv方式importtimeimportsubprocessimportcv2,osfrommathimportceildefextract_frames_opencv(video_path,output_folder,frame_rate=1):"""使用OpenCV从视频中抽取每秒指定帧数的帧,并保存到指定文件夹。如果视频长度不是整数秒,则会在最后一帧时补充空白
解决安装 Node 出现的问题 code_stream #其他内容 node.js
日期：2025-2-16最近要开启一个新项目，我需要使用最新的Node环境。但是我重装之后，出现了一些列的问题，参考网络上的教程，基本上都无法解决，什么配置环境变量，什么创建文件夹，都没有作用，教程太落后了，问AI也是绕圈，毕竟AI的数据集也是来自互联网。最后总算解决了。方式就是，傻瓜式安装（下载node后，安装一直下一步就好，它会帮你完成一切配置），安装之后，最重要的一步来了，记得重启电脑！！！
RHEL 安装 Hadoop 服务器 XhClojure hadoop 服务器大数据
在这篇文章中，我们将探讨如何在RedHatEnterpriseLinux(RHEL)上安装和配置Hadoop服务器。Hadoop是一个开源的分布式数据处理框架，用于处理大规模数据集。以下是在RHEL上安装Hadoop的详细步骤。步骤1：安装Java在安装Hadoop之前，我们需要确保系统上安装了JavaDevelopmentKit(JDK)。执行以下命令安装JDK：sudoyuminstallja
Python数据分析与可视化程序媛小果 python python 数据分析开发语言
Python数据分析与可视化在数据驱动的商业世界中，数据分析和可视化成为了理解复杂数据集、做出明智决策的关键工具。Python，作为一种功能强大且易于学习的编程语言，提供了丰富的库和框架，使得数据分析和可视化变得简单高效。本文将探讨Python在数据分析和可视化中的应用，包括数据预处理、分析、以及如何通过可视化工具将数据洞察转化为可操作的策略。1.数据分析的重要性数据分析是提取数据中有用信息的过程
数据集 handpose_x_3d-wider_world V1 室外自然场景三维手势＞＞ DataBall Xian-HHappy DataBall数据集合（计算机视觉）-数据也可如此美好 3d
数据集handpose数据集handpose_x_3d-wider_worldV1室外自然场景三维手势>>DataBall数据特点：*场景多样性*包括有无遮挡多样性*有无拿物体多样性数据标注信息包括：二维21关键点，三维21关键点，三维网格点，图像相机内参。想要进一步了解，请联系。DataBall助力快速掌握数据集的信息和使用方式，会员享有百种数据集，持续增加中。示例：助力快速掌握数据集的信息和使
基于Kitti数据集实现MMDetection3D点云物体检测训练 Xian-HHappy 技术知识点 kitti三维点云无人驾驶 MMDetection3D 人工智能计算机视觉目标检测
DataBall助力快速掌握数据集的信息和使用方式，会员享有百种数据集，持续增加中。需要更多数据资源和技术解决方案，知识星球：“DataBall-X数据球(free)”贵在坚持！-----------------------------------------------------------------------------------------------MMDetection3D环境安
使用BLIP模型生成图像描述的可查询索引 dgay_hua python 计算机视觉开发语言
在本篇文章中，我们将介绍如何使用预训练的SalesforceBLIP图像描述模型，生成一个可查询的图像描述索引。我们将使用ImageCaptionLoader来加载图像，并通过一系列步骤生成查询索引。使用示例代码进行演示，帮助读者理解和实践。技术背景介绍随着计算机视觉技术的发展，图像描述生成成为了重要的研究领域。通过对图像内容自动生成文字描述，可以大大提高对图像信息的检索和管理效率。Salesfo
【CVPR 2021】Knowledge Review：知识蒸馏新解法 BIT可达鸭深度学习人工智能计算机视觉模型压缩知识蒸馏
【CVPR2021】KnowledgeReview：知识蒸馏新解法论文地址：主要问题：主要思路：符号假设：具体实现：实验结果：关注我的公众号：联系作者：论文地址：https://jiaya.me/papers/kdreview_cvpr21.pdf主要问题：目前大部分关于KD的方法都是基于相同层或者相同Block之间的知识迁移。但是Teacher往往深层表示抽象的语义信息，底层表示简单的知识的信息
自动驾驶感知、端到端论文集（2024-10-11）自动驾驶小学生毫米波雷达摄像头多传感器融合
文章目录1.Detection2.Segmentation（Map）3.DepthEstimation4.HighResolution5.End-to-EndAutonomousDriving1.DetectionLabelDistill:Label-guidedCross-modalKnowledgeDistillationforCamera-based3DObjectDetectionECCV
Java中sort()方法的使用吃锦鲤的猫 Java
/****@ClassName:Test*@Description:给定一个数组使用sort()方法进行排序(据说这是最快的方法)*默认采用升序排序*@author:yangyr*@date:2019年12月30日下午4:48:55**/publicclassTest{publicstaticvoidmain(String[]args){ArrayListarrayList=newArrayLis
【拥抱AI】一文讲清楚MCP(Model Context Protocol)核心功能及应用奔跑草- 人工智能人工智能 LLM 自然语言处理 MCP Function call
什么是MCP(ModelContextProtocol)？MCP（ModelContextProtocol）是Anthropic推出的一个开放协议，旨在统一LLM应用与外部数据源和工具之间的通信协议，为AI开发提供了标准化的上下文交互方式。MCP的主要功能包括数据集成、工具集成、模板化交互、安全性、开发者支持、预构建服务器和上下文维护。它通过客户端-服务器架构，支持多个服务连接到任何兼容的客户端，
java中sort排序 Nick yang ＇笔记
importjava.util.*;publicclasssortArray{publicstaticvoidmain(String[]args){int[]Array=newint[]{1,5,6,8,9,7,4,3,2,0};Arrays.sort(Array);for(intvalue:Array){System.out.print(value+"");}System.out.println
Java中的sort() 虚无中的真言81 Java
sort的第一种格式sort的第二种格式sort函数中cmp函数的使用方法自定义排序基本方法sort的第一种格式sort函数的基本格式（默认排序为升序排序）Arrays.sort(数组名,起始下标,终止下标);例：importjava.util.*;importjava.util.Arrays;publicclassMain{publicstaticvoidmain(String[]args){S
java中的sort() 愿随我ღ
importjava.util.*;publicclassCollectionTest{publicstaticvoidmain(String[]args){Listl=newArrayList();l.add(10);l.add(1);l.add(0);l.add(120);for(Iteratorit=l.iterator();it.hasNext();){System.out.println
使用Yuan 2.0与LangChain构建智能聊天应用：完整指南 scaFHIO langchain python
技术背景介绍Yuan2.0是IEIT系统开发的新一代基础大语言模型，包括Yuan2.0-102B、Yuan2.0-51B和Yuan2.0-2B三种版本。相比之前的Yuan1.0，Yuan2.0使用了更广泛的高质量预训练数据，并通过指令微调数据集增强了模型的语义理解、数学推理、编程知识等能力。为了方便开发者集成，Yuan2.0提供了兼容OpenAIAPI的服务接口。本文将介绍如何通过LangChai
查看opencv版本信息 zhanghui9020
在VS2010中编写控制台C++程序：#include#include"cv.h"usingnamespacestd;main(){cout<<CV_VERSION;}运行即可打印安装的opencv的版本信息
深度学习模型中的知识蒸馏是如何工作的? c++服务器开发深度学习人工智能
深度学习模型在多个领域，特别是计算机视觉和自然语言处理中，已经取得了革命性的进展。然而，随着模型复杂性和资源需求的不断攀升，如何将这些庞大模型的知识浓缩为更紧凑、更高效的形式，成为了当前研究的热点。知识蒸馏，作为一种将知识从复杂模型转移到更简单模型的策略，已经成为实现这一目标的有效工具。在本文中，我们将深入探究深度学习模型中知识蒸馏的概念、原理及其在各领域的应用，以期为读者提供一个全面而严谨的视角
【核心算法篇七】《DeepSeek异常检测：孤立森林与AutoEncoder对比》再见孙悟空_ 「2025 DeepSeek技术全景实战」算法分布式 docker 计算机视觉人工智能自然语言处理 DeepSeek
大家好，今天我们来深入探讨一下《DeepSeek异常检测：孤立森林与AutoEncoder对比》这篇技术博客。我们将从核心内容、原理、应用场景等多个方面进行详细解析，力求让大家对这两种异常检测方法有一个全面而深入的理解。一、引言在数据科学和机器学习领域，异常检测（AnomalyDetection）是一个非常重要的任务。它的目标是从数据集中识别出那些与大多数数据显著不同的异常点。这些异常点可能是由于
吐血整理！过拟合抑制策略调整方法大揭秘，让模型性能飙升盼达思文体科创经验分享
吐血整理！过拟合抑制策略调整方法大揭秘，让模型性能飙升引言你是否遇到过模型在训练集上表现完美，可一到测试集就“原形毕露”的糟心情况？为啥模型训练得好好的，实际应用时却差强人意呢？这其实就是过拟合在捣乱！接下来，我就带你深入了解过拟合抑制策略的调整方法，让你的模型告别过拟合，重获新生！核心内容数据层面的调整：扩充与清洗的力量场景化描述：你想象一下，你训练模型的数据集就像做饭的食材，如果食材种类单一，
jQuery 键盘事件keydown ,keypress ,keyup介绍 107x js jquery keydown keypress keyup
本文章总结了下些关于jQuery 键盘事件keydown ,keypress ,keyup介绍，有需要了解的朋友可参考。一、首先需要知道的是： 1、keydown() keydown事件会在键盘按下时触发. 2、keyup() 代码如下复制代码 $('input').keyup(funciton(){
AngularJS中的Promise bijian1013 JavaScript AngularJS Promise
一.Promise Promise是一个接口，它用来处理的对象具有这样的特点：在未来某一时刻（主要是异步调用）会从服务端返回或者被填充属性。其核心是，promise是一个带有then()函数的对象。为了展示它的优点，下面来看一个例子，其中需要获取用户当前的配置文件： var cu
c++ 用数组实现栈类 CrazyMizzz 数据结构 C++
#include<iostream> #include<cassert> using namespace std; template<class T, int SIZE = 50> class Stack{ private: T list[SIZE];//数组存放栈的元素 int top;//栈顶位置 public: Stack(
java和c语言的雷同麦田的设计者 java 递归 scaner
软件启动时的初始化代码，加载用户信息2015年5月27号从头学java二 1、语言的三种基本结构：顺序、选择、循环。废话不多说，需要指出一下几点： a、return语句的功能除了作为函数返回值以外，还起到结束本函数的功能，return后的语句不会再继续执行。 b、for循环相比于whi
LINUX环境并发服务器的三种实现模型被触发 linux
服务器设计技术有很多，按使用的协议来分有TCP服务器和UDP服务器。按处理方式来分有循环服务器和并发服务器。 1 循环服务器与并发服务器模型在网络程序里面，一般来说都是许多客户对应一个服务器，为了处理客户的请求，对服务端的程序就提出了特殊的要求。目前最常用的服务器模型有： ·循环服务器：服务器在同一时刻只能响应一个客户端的请求 ·并发服务器：服
Oracle数据库查询指令肆无忌惮_ oracle数据库
20140920 单表查询 -- 查询************************************************************************************************************ -- 使用scott用户登录 -- 查看emp表 desc emp
ext右下角浮动窗口知了ing JavaScript ext
第一种 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/
浅谈REDIS数据库的键值设计矮蛋蛋 redis
http://www.cnblogs.com/aidandan/ 原文地址：http://www.hoterran.info/redis_kv_design 丰富的数据结构使得redis的设计非常的有趣。不像关系型数据库那样，DEV和DBA需要深度沟通，review每行sql语句，也不像memcached那样，不需要DBA的参与。redis的DBA需要熟悉数据结构，并能了解使用场景。
maven编译可执行jar包 alleni123 maven
http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven <build> <plugins> <plugin> <artifactId>maven-asse
人力资源在现代企业中的作用百合不是茶 HR 企业管理
//人力资源在在企业中的作用人力资源为什么会存在，人力资源究竟是干什么的人力资源管理是对管理模式一次大的创新，人力资源兴起的原因有以下点：工业时代的国际化竞争，现代市场的风险管控等等。所以人力资源在现代经济竞争中的优势明显的存在，人力资源在集团类公司中存在着明显的优势(鸿海集团)，有一次笔者亲自去体验过红海集团的招聘，只知道人力资源是管理企业招聘的当时我被招聘上了，当时给我们培训的人
Linux自启动设置详解 bijian1013 linux
linux有自己一套完整的启动体系，抓住了linux启动的脉络，linux的启动过程将不再神秘。阅读之前建议先看一下附图。本文中假设inittab中设置的init tree为： /etc/rc.d/rc0.d /etc/rc.d/rc1.d /etc/rc.d/rc2.d /etc/rc.d/rc3.d /etc/rc.d/rc4.d /etc/rc.d/rc5.d /etc
Spring Aop Schema实现 bijian1013 java spring AOP
本例使用的是Spring2.5 1.Aop配置文件spring-aop.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmln
【Gson七】Gson预定义类型适配器 bit1129 gson
Gson提供了丰富的预定义类型适配器，在对象和JSON串之间进行序列化和反序列化时，指定对象和字符串之间的转换方式， DateTypeAdapter public final class DateTypeAdapter extends TypeAdapter<Date> { public static final TypeAdapterFacto
【Spark八十八】Spark Streaming累加器操作（updateStateByKey) bit1129 update
在实时计算的实际应用中，有时除了需要关心一个时间间隔内的数据，有时还可能会对整个实时计算的所有时间间隔内产生的相关数据进行统计。比如：对Nginx的access.log实时监控请求404时，有时除了需要统计某个时间间隔内出现的次数，有时还需要统计一整天出现了多少次404，也就是说404监控横跨多个时间间隔。 Spark Streaming的解决方案是累加器，工作原理是，定义
linux系统下通过shell脚本快速找到哪个进程在写文件 ronin47
一个文件正在被进程写我想查看这个进程文件一直在增大找不到谁在写使用lsof也没找到这个问题挺有普遍性的，解决方法应该很多，这里我给大家提个比较直观的方法。 linux下每个文件都会在某个块设备上存放，当然也都有相应的inode, 那么透过vfs.write我们就可以知道谁在不停的写入特定的设备上的inode。幸运的是systemtap的安装包里带了inodewatch.stp，位
java-两种方法求第一个最长的可重复子串 bylijinnan java 算法
import java.util.Arrays; import java.util.Collections; import java.util.List; public class MaxPrefix { public static void main(String[] args) { String str="abbdabcdabcx";
Netty源码学习-ServerBootstrap启动及事件处理过程 bylijinnan java netty
Netty是采用了Reactor模式的多线程版本，建议先看下面这篇文章了解一下Reactor模式： http://bylijinnan.iteye.com/blog/1992325 Netty的启动及事件处理的流程，基本上是按照上面这篇文章来走的文章里面提到的操作，每一步都能在Netty里面找到对应的代码其中Reactor里面的Acceptor就对应Netty的ServerBo
servelt filter listener 的生命周期 cngolon filter listener servelt 生命周期
1. servlet 当第一次请求一个servlet资源时，servlet容器创建这个servlet实例，并调用他的 init(ServletConfig config)做一些初始化的工作，然后调用它的service方法处理请求。当第二次请求这个servlet资源时，servlet容器就不在创建实例，而是直接调用它的service方法处理请求，也就是说
jmpopups获取input元素值 ctrain JavaScript
jmpopups 获取弹出层form表单首先，我有一个div，里面包含了一个表单，默认是隐藏的，使用jmpopups时，会弹出这个隐藏的div，其实jmpopups是将我们的代码生成一份拷贝。当我直接获取这个form表单中的文本框时，使用方法：$('#form input[name=test1]').val()；这样是获取不到的。我们必须到jmpopups生成的代码中去查找这个值，$(
vi查找替换命令详解 daizj linux 正则表达式替换查找 vim
一、查找查找命令 /pattern<Enter> ：向下查找pattern匹配字符串 ?pattern<Enter>：向上查找pattern匹配字符串使用了查找命令之后，使用如下两个键快速查找： n：按照同一方向继续查找 N：按照反方向查找字符串匹配 pattern是需要匹配的字符串，例如： 1: /abc<En
对网站中的js,css文件进行打包 dcj3sjt126com PHP 打包
一，为什么要用smarty进行打包 apache中也有给js,css这样的静态文件进行打包压缩的模块，但是本文所说的不是以这种方式进行的打包，而是和smarty结合的方式来把网站中的js,css文件进行打包。为什么要进行打包呢，主要目的是为了合理的管理自己的代码。现在有好多网站，你查看一下网站的源码的话，你会发现网站的头部有大量的JS文件和CSS文件，网站的尾部也有可能有大量的J
php Yii: 出现undefined offset 或者 undefined index解决方案 dcj3sjt126com undefined
在开发Yii 时，在程序中定义了如下方式： if($this->menuoption[2] === 'test')，那么在运行程序时会报：undefined offset:2，这样的错误主要是由于php.ini 里的错误等级太高了，在windows下错误等级
linux 文件格式（1） sed工具 eksliang linux linux sed工具 sed工具 linux sed详解
转载请出自出处： http://eksliang.iteye.com/blog/2106082 简介 sed 是一种在线编辑器，它一次处理一行内容。处理时，把当前处理的行存储在临时缓冲区中，称为“模式空间”（pattern space），接着用sed命令处理缓冲区中的内容，处理完成后，把缓冲区的内容送往屏幕。接着处理下一行，这样不断重复，直到文件末尾
Android应用程序获取系统权限 gqdy365 android
引用如何使Android应用程序获取系统权限第一个方法简单点，不过需要在Android系统源码的环境下用make来编译： 1. 在应用程序的AndroidManifest.xml中的manifest节点
HoverTree开发日志之验证码 hvt .net C#asp.net hovertree webform
HoverTree是一个ASP.NET的开源CMS，目前包含文章系统，图库和留言板功能。代码完全开放，文章内容页生成了静态的HTM页面，留言板提供留言审核功能，文章可以发布HTML源代码，图片上传同时生成高品质缩略图。推出之后得到许多网友的支持，再此表示感谢！留言板不断收到许多有益留言，但同时也有不少广告，因此决定在提交留言页面增加验证码功能。ASP.NET验证码在网上找，如果不是很多，就是特别多
JSON API：用 JSON 构建 API 的标准指南中文版 justjavac json
译文地址：https://github.com/justjavac/json-api-zh_CN 如果你和你的团队曾经争论过使用什么方式构建合理 JSON 响应格式，那么 JSON API 就是你的 anti-bikeshedding 武器。通过遵循共同的约定，可以提高开发效率，利用更普遍的工具，可以是你更加专注于开发重点：你的程序。基于 JSON API 的客户端还能够充分利用缓存，
数据结构随记_2 lx.asymmetric 数据结构笔记
第三章栈与队列一．简答题 1. 在一个循环队列中，队首指针指向队首元素的前一个位置。 2.在具有n个单元的循环队列中，队满时共有 n-1 个元素。 3. 向栈中压入元素的操作是先移动栈顶指针&n
Linux下的监控工具dstat 网络接口 linux
1) 工具说明dstat是一个用来替换 vmstat,iostat netstat,nfsstat和ifstat这些命令的工具, 是一个全能系统信息统计工具. 与sysstat相比, dstat拥有一个彩色的界面, 在手动观察性能状况时, 数据比较显眼容易观察; 而且dstat支持即时刷新, 譬如输入dstat 3, 即每三秒收集一次, 但最新的数据都会每秒刷新显示. 和sysstat相同的是,
C 语言初级入门--二维数组和指针 1140566087 二维数组 c/c++指针
/* 二维数组的定义和二维数组元素的引用二维数组的定义：当数组中的每个元素带有两个下标时，称这样的数组为二维数组； (逻辑上把数组看成一个具有行和列的表格或一个矩阵); 语法：类型名数组名[常量表达式1][常量表达式2] 二维数组的引用：引用二维数组元素时必须带有两个下标，引用形式如下：例如： int a[3][4]; 引用：
10点睛Spring4.1-Application Event wiselyman application
10.1 Application Event Spring使用Application Event给bean之间的消息通讯提供了手段应按照如下部分实现bean之间的消息通讯继承ApplicationEvent类实现自己的事件实现继承ApplicationListener接口实现监听事件使用ApplicationContext发布消息