NTU RGB+D Datasets

Basic size

This dataset consists of 56,880 action samples containing 4 different modalities of data for each sample:

  • RGB videos 136 GB
  • depth map sequences
    • Masked depth maps 83 GB
    • Full depth maps 886 GB
  • 3D skeletal data 5.8 GB
  • infrared videos 221 GB
  • Total 1.3 TB

The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame.

File Format

Each file/folder name in the dataset is in the format of SsssCcccPpppRrrrAaaa (e.g. S001C002P003R002A013), for which

  • sss is the setup number, // Height and Distance
  • ccc is the camera ID, // 1, 2, 3 => -45, 0, 45 degrees views; 2, 3 -> front and side views.
  • ppp is the performer ID, // just performer.
  • rrr is the replication number (1 or 2), // perform action twice.
  • and aaa is the action class label. // 60 action classes(40 daily actions/ 9 health-related actions/ 11 mutual actions)
