哎呦茶叶蛋

Pytorch自带数据集介绍

Pytorch01——数据集介绍

参考：https://pytorch.org/docs/stable/torchvision/datasets.html

本文主要是对Pytorch图像数据集的官方文档翻译，以及梳理和总结。有错误的地方请诸位大佬指正！转载请注明来源！

主要涉及Libraries库中的torchvision.datasets。

Pytorch01——数据集介绍

整体介绍

EMNIST

MNIST

QMNIST

USPS

SVHN

KMNIST

Omniglot

Fashion-MNIST

CIFAR

LSUN

STL10

CelebA

Places365

Cityscapes

SBD

Flickr

HMDB51

Kinetics-400

UCF101

PhotoTour

SBU

ImageNet

VOC

COCO

FakeData

DatasetFolder

ImageFolder

整体介绍

用于image classification：

手写字符识别：EMNIST、MNIST、QMNIST、USPS、SVHN、KMNIST、Omniglot

实物分类：Fashion MNIST、CIFAR、LSUN、SLT-10、ImageNet

人脸识别：CelebA

场景分类：LSUN、Places365

用于object detection：SVHN、VOCDetection、COCODetection

用于semantic/instance segmentation：

语义分割：Cityscapes、VOCSegmentation

语义边界：SBD

用于image captioning：Flickr、COCOCaption

用于video classification：HMDB51、Kinetics

用于3D reconstruction：PhotoTour

用于shadow detectors：SBU

EMNIST

torchvision.datasets.EMNIST(root: str, split: str, **kwargs: Any)

Parameters:
root (string) – Root directory of dataset where EMNIST/processed/training.pt and EMNIST/processed/test.pt exist.

split (string) – The dataset has 6 different splits: byclass, bymerge, balanced, letters, digits and mnist. This argument specifies which one to use.

train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址：

https://www.nist.gov/itl/products-and-services/emnist-dataset

作者：

Gregory Cohen, Saeed Afshar, Jonathan Tapson, Andre van Schaik

The MARCS Institute for Brain, Behaviour and Development, Western Sydney University

引用：

Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters. Retrieved from http://arxiv.org/abs/1702.05373

简介：

EMNIST来自NIST Special Database 19，包含了数字和大小写字母。大小为1.65GB，分为6部分：

By Class和By Merge的数据分布：

Balanced：

Letters：

Digits 和 MNIST：

MNIST

torchvision.datasets.MNIST(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.

train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址：

http://yann.lecun.com/exdb/mnist/

作者：

Yann LeCun， Courant Institute,，NYU

Corinna Cortes，Google Labs，New York

Christopher J.C. Burges，Microsoft Research，Redmond

简介：

前世：MNIST由NIST的Special Database 3和Special Database 1组成。

在NIST中SD-3由Census Bureau的雇员书写，作为训练集，SD-1由高中生书写，作为测试集，SD-3比SD-1更干净和便于识别。SD-1包含了58527张来自500位作者的手写数字图片。SD-3的数据是顺序写入的，同一个人写的10个数字是放在一起的，SD-1的数据是打乱的，但是数据中包含了作者的ID。
所以将SD-1的前250个作者写的近3万数字图像放入MNIST的训练集，剩下的由SD-3补全至6万张训练集。将SD-1的后250个作者写的3万数字图像放入MNIST的测试集，剩下的由SD-3补全至6万张测试集。但是，只能下载到6万张的训练集，和1万张的测试集（从6万张测试集选出）。

今生：MNIST训练集6万张图片，分别从SD-3和SD-1中选择3万张，测试集1万张图片，分别从SD-3和SD-1中选择5千张。训练集6万张图片大约来自250位作者，训练集和测试集中的作者不相交。

手写数字识别，样本为28*28的二值图，数字尺度统一，数字质心在图片正中。
训练集60k，测试集10k，共70k。分为10个数字类别，每类的图片数量相同。

QMNIST

torchvision.datasets.QMNIST(root: str, what: Optional[str] = None, compat: bool = True, train: bool = True, **kwargs: Any)

Parameters：
root (string) – Root directory of dataset whose ``processed’’ subdir contains torch binary files with the datasets.

what (string,optional) – Can be ‘train’, ‘test’, ‘test10k’, ‘test50k’, or ‘nist’ for respectively the mnist compatible training set, the 60k qmnist testing set, the 10k qmnist examples that match the mnist testing set, the 50k remaining qmnist testing examples, or all the nist digits. The default is to select ‘train’ or ‘test’ according to the compatibility argument ‘train’.

compat (bool,optional) – A boolean that says whether the target for each example is class number (for compatibility with the MNIST dataloader) or a torch vector containing the full qmnist information. Default=True.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

train (bool,optional,compatibility) – When argument ‘what’ is not specified, this boolean decides whether to load the training set ot the testing set. Default: True.

下载地址：

https://github.com/facebookresearch/qmnist

作者：

Facebook AI Research

New York University

引用：

Paszke, Adam, et al. "Advances in Neural Information Processing Systems 32." Curran Associates, Inc (2019): 8024-8035.

简介：

由于MNIST完整的测试集（6万）现在已经无法找到，而且当年MNIST数据集的制作方法也找不到了。

所以提出QMNIST，希望能从MNIST的源头NIST SD-19，生成MNIST数据，尽可能的逼近原始MNIST的预处理效果。

USPS

torchvision.datasets.USPS(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset to store``USPS`` data files.

train (bool, optional) – If True, creates dataset from usps.bz2, otherwise from usps.t.bz2.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.


Data Structure：
USPS Dataset. The data-format is : [label [index:value ]*256 n] * num_lines, where label lies in [1, 10]. The value for each pixel lies in [-1, 1]. Here we transform the label into [0, 9] and make pixel values in [0, 255].

下载地址：

https://www.kaggle.com/bistaumanga/usps-dataset

简介：

与MNIST类似，包含了10个数字类别，训练集7291，测试集2007，共有9298。图片为16*16的灰度图。

SVHN

torchvision.datasets.SVHN(root: str, split: str = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where directory SVHN exists.

split (string) – One of {‘train’, ‘test’, ‘extra’}. Accordingly dataset is selected. ‘extra’ is Extra training set.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

Data Structure
SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]

下载地址：

http://ufldl.stanford.edu/housenumbers/

简介：

The Street View House Numbers Dataset (SVHN)，包含两类图像，一种是原始实景图+标记框，另一种是类似于MNIST的单个数字图像。

KMNIST

torchvision.datasets.KMNIST(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where KMNIST/processed/training.pt and KMNIST/processed/test.pt exist.

train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址：

http://codh.rois.ac.jp/kmnist/index.html.en

作者：

"KMNIST Dataset" (created by CODH), adapted from "Kuzushiji Dataset" (created by NIJL and others), doi:10.20676/00000341

引用：

Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, David Ha, "Deep Learning for Classical Japanese Literature", arXiv:1812.01718.

简介：

Kuzushiji-MNIST

28*28的灰度图，共70k张图片，与MNIST结构相同。包含了10个Hiragana字符。21MB+35KB

Kuzushiji-49

28*28的灰度图，共270912张图片，包含了48个Hiragane字符和一个Hiragana迭代符，每类的数量并不相同。74MB+250KB

Kuzushiji-Kanji

64*64的灰度图，共140426张图片，包含了3832个Kanji字符，每类数量并不相同。310MB

Omniglot

torchvision.datasets.Omniglot(root: str, background: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
background (bool, optional) – If True, creates dataset from the “background” set, otherwise creates from the “evaluation” set. This terminology is defined by the authors.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset zip files from the internet and puts it in root directory. If the zip files are already downloaded, they are not downloaded again.

Usage：
Omniglot Dataset. :param root: Root directory of dataset where directory omniglot-py exists.

下载地址：

https://www.kaggle.com/watesoyan/omniglot

简介：

包含了1623种手写字符，来自50个字符表，每种字符由20个不同的人手写。数据集大小为8MB。

Fashion-MNIST

torchvision.datasets.FashionMNIST(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where FashionMNIST/processed/training.pt and FashionMNIST/processed/test.pt exist.

train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址:

https://www.kaggle.com/zalando-research/fashionmnist

作者：

Zalando Research

简介：

提出Fashion MNIST，是为了代替MNIST成为benchmark。包含了10类，分别为T-shirt/top、Trouser、Pullover、Dress、Coat、Sandal、Shirt、Sneaker、Bag、Ankle boot。与MNIST的数据集划分一样，包含了60k训练集，10k测试集，共70k数据，200MB大小。

CIFAR

torchvision.datasets.CIFAR10(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

torchvision.datasets.CIFAR100(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where directory cifar-10-batches-py exists or will be saved to if download is set to True.

train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

下载地址：

https://www.cs.toronto.edu/~kriz/cifar.html

作者：

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton

引用：

Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

简介：

CIFAR10和CIFAR100是 80 million tiny images数据集的子集。

CIFAR10包含了60k 32*32 的彩色图片，一共10类，每类6k图片，其中50k用于训练，分为5个batch，10k用于测试，分为1个batch。Test batch从10类中的每一类随机采样1k。训练集的每个batch中的各个类的数量不一定相同。

CIFAR100包含了60k 32*32 的彩色图片，一共100类，每类600图片，100个小类被划分为20个大类，每张图片包含了两个标签“fine label”表明小类，“coarse label”表明大类。

LSUN

torchvision.datasets.LSUN(root: str, classes: Union[str, List[str]] = 'train', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)

Parameters：
root (string) – Root directory for the database files.

classes (string or list) – One of {‘train’, ‘val’, ‘test’} or a list of categories to load. e,g. [‘bedroom_train’, ‘church_outdoor_train’].

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址：

https://www.yf.io/p/lsun

引用：

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser and Jianxiong Xiao

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

arXiv:1506.03365 [cs.CV], 10 Jun 2015

简介：

利用深度网络辅助人工，对大型数据集进行标注，产生的新数据集。

包含了1million带标签的图像，10个场景类别，20个物体类别。

STL10

torchvision.datasets.STL10(root: str, split: str = 'train', folds: Optional[int] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory of dataset where directory stl10_binary exists.

split (string) – One of {‘train’, ‘test’, ‘unlabeled’, ‘train+unlabeled’}. Accordingly dataset is selected.

folds (int, optional) –

One of {0-9} or None. For training, loads one of the 10 pre-defined folds of 1k samples for the

standard evaluation procedure. If no value is passed, loads the 5k samples.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

下载地址：

https://cs.stanford.edu/~acoates/stl10/

引用：

Adam Coates, Honglak Lee, Andrew Y. Ng An Analysis of Single Layer Networks in Unsupervised Feature Learning AISTATS, 2011.

简介：

用于无监督学习。来自ImageNet，训练集500张，测试集800张，还包括无标签的100k，96*96的彩色图片，一共有10类。相较于CIFAR10，每类只有很少的带标签训练数据。

CelebA

torchvision.datasets.CelebA(root: str, split: str = 'train', target_type: Union[List[str], str] = 'attr', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory where images are downloaded to.

split (string) – One of {‘train’, ‘valid’, ‘test’, ‘all’}. Accordingly dataset is selected.

target_type (string or list, optional) –

Type of target to use, attr, identity, bbox, or landmarks. Can also be a list to output a tuple with all specified target types. The targets represent:

attr (np.array shape=(40,) dtype=int): binary (0, 1) labels for attributes identity (int): label for each person (data points with the same identity are the same person) bbox (np.array shape=(4,) dtype=int): bounding box (x, y, width, height) landmarks (np.array shape=(10,) dtype=int): landmark points (lefteye_x, lefteye_y, righteye_x,

righteye_y, nose_x, nose_y, leftmouth_x, leftmouth_y, rightmouth_x, rightmouth_y)

Defaults to attr. If empty, None will be returned as target.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

下载地址：

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

作者：

Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Multimedia Laboratory, The Chinese University of Hong Kong

引用：

Liu, Ziwei, et al. "Deep learning face attributes in the wild." Proceedings of the IEEE international conference on computer vision. 2015.

简介：

包含了200k名人图像，每个图像包含了40个属性标签。

图像涵盖了较大范围的姿势变化和背景杂波。

CelebA具有种类多，数量多，注释丰富的特点。

包含了10,177个身份，202,599张人脸图像，以及5个地标位置。

应用场景：

face attribute recognition

face detection

landmark (or facial part) localization

face editing & synthesis.

Places365

torchvision.datasets.Places365(root: str, split: str = 'train-standard', small: bool = False, download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[str, Any] = )

Parameters：
root (string) – Root directory of the Places365 dataset.

split (string, optional) – The dataset split. Can be one of train-standard (default), train-challendge, val.

small (bool, optional) – If True, uses the small images, i. e. resized to 256 x 256 pixels, instead of the high resolution ones.

download (bool, optional) – If True, downloads the dataset components and places them in root. Already downloaded archives are not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

loader – A function to load an image given its path.

Raises：
RuntimeError – If download is False and the meta files, i. e. the devkit, are not present or corrupted.

RuntimeError – If download is True and the image archive is already extracted.

下载地址：

http://places2.csail.mit.edu/challenge.html

引用：

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba and Aude Oliva

Places: A 10 million Image Database for Scene Recognition.

arXiv:1610.02055

简介：

数据集一共包含了10+million图片，400+场景类别。

挑战赛：训练：8million，验证：36k，测试：328k。包含了365个场景。

训练集中每类场景的样本数量不同。

Cityscapes

torchvision.datasets.Cityscapes(root: str, split: str = 'train', mode: str = 'fine', target_type: Union[List[str], str] = 'instance', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory of dataset where directory leftImg8bit and gtFine or gtCoarse are located.

split (string, optional) – The image split to use, train, test or val if mode=”fine” otherwise train, train_extra or val

mode (string, optional) – The quality mode to use, fine or coarse

target_type (string or list, optional) – Type of target to use, instance, semantic, polygon or color. Can also be a list to output a tuple with all specified target types.

transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

Examples：
Get semantic segmentation target
dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type='semantic')

img, smnt = dataset[0]

Get multiple targets
dataset = Cityscapes('./data/cityscapes', split='train', mode='fine',
                     target_type=['instance', 'color', 'polygon'])

img, (inst, col, poly) = dataset[0]

Validate on the “coarse” set
dataset = Cityscapes('./data/cityscapes', split='val', mode='coarse',
                     target_type='semantic')

img, smnt = dataset[0]

下载地址：

https://www.cityscapes-dataset.com/dataset-overview/

简介：

Cityscapes数据集专注于城市街景的语义理解。

包含了语义分割，车和人的实例分割，共有30类

包含了50个城市，春、夏、秋多个季节，一天的不同时间段，不同的天气

包含了5k精准标注，和20k粗标注

SBD

torchvision.datasets.SBDataset(root: str, image_set: str = 'train', mode: str = 'boundaries', download: bool = False, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory of the Semantic Boundaries Dataset

image_set (string, optional) – Select the image_set to use, train, val or train_noval. Image set train_noval excludes VOC 2012 val images.

mode (string, optional) – Select target type. Possible values ‘boundaries’ or ‘segmentation’. In case of ‘boundaries’, the target is an array of shape [num_classes, H, W], where num_classes=20.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version. Input sample is PIL image and target is a numpy array if mode=’boundaries’ or PIL image if mode=’segmentation’.

NOTE：
Please note that the train and val splits included with this dataset are different from the splits in the PASCAL VOC dataset. In particular some “train” images might be part of VOC2012 val. If you are interested in testing on VOC 2012 val, then use image_set=’train_noval’, which excludes all val images.

下载地址：

http://home.bharathh.info/pubs/codes/SBD/download.html

作者：

Tomas F. Yago Vicente, Le Hou, Chen-Ping Yu, Minh Hoai, and Dimitris Samaras

引用：

Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji and Jitendra Malik. Semantic contours from inverse detectors. In International Conference on Computer Vision, 2011.

简介：

11355张图片来自PASCAL VOC 2011 数据集。包含了类别和实例的分割，类别位VOC2011的20类。

Flickr

torchvision.datasets.Flickr8k(root: str, ann_file: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)

torchvision.datasets.Flickr30k(root: str, ann_file: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)
Parameters：
root (string) – Root directory where images are downloaded to.

ann_file (string) – Path to annotation file.

transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, transforms.ToTensor

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

下载地址：

https://www.kaggle.com/hsankesara/flickr-image-dataset

简介：

Flickr30k是基于语句的图像描述任务的基准。

包含了158k字幕，244k coreference chains，276k标注框。

这个任务包含了image-text embedding, detectors for common objects, color classifier, bias towards selecting larger objects.

Flickr8k包含了8092张图像，每个图像5个字幕。

HMDB51

torchvision.datasets.HMDB51(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)

Parameters：
root (string) – Root directory of the HMDB51 Dataset.

annotation_path (str) – Path to the folder containing the split files.

frames_per_clip (int) – Number of frames in a clip.

step_between_clips (int) – Number of frames between each clip.

fold (int, optional) – Which fold to use. Should be between 1 and 3.

train (bool, optional) – If True, creates a dataset from the train split, otherwise from the test split.

transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Introduction：
HMDB51 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

下载：

https://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/#Downloads

引用：

The benchmark and database are described in the following article. We request that authors cite this paper in publications describing work carried out with this system and/or the video database.

H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A Large Video Database for Human Motion Recognition. ICCV, 2011.

The first benchmark STIP features are described in the following paper and we request the authors cite this paper if they use STIP features.

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning Realistic Human Actions From Movies. CVPR, 2008.

The second benchmark C2 features are described in the following paper and we request the authors cite this paper if they use C2 codes.

H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A Biologically Inspired System for Action Recognition. ICCV, 2007.

简介：

HMDB51，大约2GB，7k短片，51个动作类别，每个类最少101个短片。

动作分为五种类型：

1. 一般的面部动作微笑，大笑，咀嚼，交谈。

2. 通过物体操纵进行面部动作：吸烟，进食，饮水。

3. 全身动作：车轮，拍手，攀爬，爬楼梯，潜水，掉在地板上，反手翻转，倒立，跳跃，向上拉，向上推，奔跑，坐下，坐下，翻筋斗，站起来，转身，步行，波。

4. 与物体互动的身体动作：刷头发，抓，拔剑，运球，打高尔夫球，击球，踢球，接球，倒球，推东西，骑自行车，骑马，射击球，射击弓箭，射击枪，挥杆棒球棒，剑术，扔。

5. 与人体互动的身体动作：击剑，拥抱，踢人，亲吻，拳打，握手，打剑。

Kinetics-400

torchvision.datasets.Kinetics400(root, frames_per_clip, step_between_clips=1, frame_rate=None, extensions=('avi', ), transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0, _audio_channels=0)

Parameters：
root (string) – Root directory of the Kinetics-400 Dataset.

frames_per_clip (int) – number of frames in a clip

step_between_clips (int) – number of frames between each clip

transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Introduction：
Kinetics-400 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

下载地址：

https://deepmind.com/research/open-source/kinetics

作者：

Deepmind

简介：

650k视频短片的URL，包含了400/600/700种人类动作，每类动作至少包含400/600/700个视频短片。

每个短片只有一个动作标签，大约10s时长。

UCF101

torchvision.datasets.UCF101(root, annotation_path, frames_per_clip, step_between_clips=1, frame_rate=None, fold=1, train=True, transform=None, _precomputed_metadata=None, num_workers=1, _video_width=0, _video_height=0, _video_min_dimension=0, _audio_samples=0)

Parameters：
root (string) – Root directory of the UCF101 Dataset.

annotation_path (str) – path to the folder containing the split files

frames_per_clip (int) – number of frames in a clip.

step_between_clips (int, optional) – number of frames between each clip.

fold (int, optional) – which fold to use. Should be between 1 and 3.

train (bool, optional) – if True, creates a dataset from the train split, otherwise from the test split.

transform (callable, optional) – A function/transform that takes in a TxHxWxC video and returns a transformed version.

Introduction：
UCF101 is an action recognition video dataset. This dataset consider every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video might be present.

Internally, it uses a VideoClips object to handle clip creation.

下载地址：

https://www.crcv.ucf.edu/data/UCF101.php

作者：

University of Central Florida, Center for Research in Computer Vision

引用：

Khurram Soomro, Amir Roshan Zamir and Mubarak Shah, UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild, CRCV-TR-12-01, November, 2012.

简介：

视频动作检测，包含了101类动作（分为25组，每类包含4到7个视频），13320个短片

PhotoTour

torchvision.datasets.PhotoTour(root: str, name: str, train: bool = True, transform: Optional[Callable] = None, download: bool = False)

Parameters：
root (string) – Root directory where images are.

name (string) – Name of the dataset to load.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

下载地址：

http://phototour.cs.washington.edu/datasets/

引用：

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser and Jianxiong Xiao

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

arXiv:1506.03365 [cs.CV], 10 Jun 2015

简介：

用于3D重构。数据集包含了一组Flickr的图像和重构数据。

包含了715张图片的巴黎圣母院大教堂重构。

SBU

torchvision.datasets.SBU(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)

Parameters：
root (string) – Root directory of dataset where tarball SBUCaptionedPhotoDataset.tar.gz exists.

transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

下载地址：

https://www3.cs.stonybrook.edu/~cvl/projects/shadow_noisy_label/index.html

作者：

Tomas F. Yago Vicente, Le Hou, Chen-Ping Yu, Minh Hoai, and Dimitris Samaras

引用：

Large-scale Training of Shadow Detectors with Noisily-Annotated Shadow Examples, Vicente, T.F.Y., Hou, L., Yu, C.-P., Hoai, M., Samaras, D., Proceedings of European Conference on Computer Vision (ECCV), 2016.

简介：

阴影检测。

ImageNet

torchvision.datasets.ImageNet(root: str, split: str = 'train', download: Optional[str] = None, **kwargs: Any)

Parameters：
root (string) – Root directory of the ImageNet Dataset.

split (string, optional) – The dataset split, supports train, or val.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

loader – A function to load an image given its path.

Introduction：
ImageNet 2012 Classification Dataset.

下载地址：

http://image-net.org/index

引用：

Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.

简介：

Imagenet Large Scale Visual Recognition Challenge, ILSVRC 2010-2017

每一年基本包含了image classification, object localization, object detection。Pytorch只包含了2012年的图像分类数据集。

详细情况太多，请自行到官网查看。

VOC

torchvision.datasets.VOCSegmentation(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory of the VOC Dataset.

year (string, optional) – The dataset year, supports years 2007 to 2012.

image_set (string, optional) – Select the image_set to use, train, trainval or val

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

torchvision.datasets.VOCDetection(root: str, year: str = '2012', image_set: str = 'train', download: bool = False, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory of the VOC Dataset.

year (string, optional) – The dataset year, supports years 2007 to 2012.

image_set (string, optional) – Select the image_set to use, train, trainval or val

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again. (default: alphabetic indexing of VOC’s 20 classes).

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, required) – A function/transform that takes in the target and transforms it.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

下载地址：

http://host.robots.ox.ac.uk/pascal/VOC/

作者：

Mark Everingham (University of Leeds)

Luc van Gool (ETHZ, Zurich)

Chris Williams (University of Edinburgh)

John Winn (Microsoft Research Cambridge)

Andrew Zisserman (University of Oxford)

简介：

VOC 2005-2012 Challenge，包含了classification和detection，segementation

COCO

torchvision.datasets.CocoCaptions(root: str, annFile: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory where images are downloaded to.

annFile (string) – Path to json annotation file.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

torchvision.datasets.CocoDetection(root: str, annFile: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, transforms: Optional[Callable] = None)

Parameters：
root (string) – Root directory where images are downloaded to.

annFile (string) – Path to json annotation file.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.ToTensor

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

transforms (callable, optional) – A function/transform that takes input sample and its target as entry and returns a transformed version.

下载地址：

https://cocodataset.org/#home

简介：

COCO是一个大型数据集，包含了目标检测，图像分割，图像解释等。详细信息请到官网查看。

FakeData

torchvision.datasets.FakeData(size: int = 1000, image_size: Tuple[int, int, int] = (3, 224, 224), num_classes: int = 10, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, random_offset: int = 0)

Parameters：
size (int, optional) – Size of the dataset. Default: 1000 images

image_size (tuple, optional) – Size if the returned images. Default: (3, 224, 224)

num_classes (int, optional) – Number of classes in the datset. Default: 10

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

random_offset (int) – Offsets the index-based random seed used to generate each image. Default: 0

Introduction：
A fake dataset that returns randomly generated images and returns them as PIL images

该函数可以随机产生图片。

DatasetFolder

torchvision.datasets.DatasetFolder(root: str, loader: Callable[str, Any], extensions: Optional[Tuple[str, ...]] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, is_valid_file: Optional[Callable[str, bool]] = None)

Parameters：
root (string) – Root directory path.

loader (callable) – A function to load a sample given its path.

extensions (tuple[string]) – A list of allowed extensions. both extensions and is_valid_file should not be passed.

transform (callable, optional) – A function/transform that takes in a sample and returns a transformed version. E.g, transforms.RandomCrop for images.

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

is_valid_file – A function that takes path of a file and check if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed.

Introduction：
A generic data loader where the samples are arranged in this way:

root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext

root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext

该函数是一个通用数据加载器。可以加载各种类型的数据，比如图像、文本等。

ImageFolder

torchvision.datasets.ImageFolder(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, loader: Callable[str, Any] = , is_valid_file: Optional[Callable[str, bool]] = None)

Parameters：
root (string) – Root directory path.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

loader (callable, optional) – A function to load an image given its path.

is_valid_file – A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files)

Introduction：
A generic data loader where the images are arranged in this way:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

该函数是一个通用图像加载器。可以加载各类型的图像数据。

你可能感兴趣的:(Pytorch,pytorch)

PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
vllm本地部署bge-reranker-v2-m3模型API服务实战教程雷电法王大模型部署 linux python vscode language model
文章目录一、说明二、配置环境2.1安装虚拟环境2.2安装vllm2.3对应版本的pytorch安装2.4安装flash_attn2.5下载模型三、运行代码3.1启动服务3.2调用代码验证一、说明本文主要介绍vllm本地部署BAAI/bge-reranker-v2-m3模型API服务实战教程本文是在Ubuntu24.04+CUDA12.8+Python3.12环境下复现成功的二、配置环境2.1安装虚
pycharm无法识别conda环境（已解决） Reborker pycharm conda ide
文章目录前言研究过程解决办法前言好久不用pycharm了，打开后提示更新，更新到了2023.1版本。安装conda后在新建了一个虚拟环境pytorch，但是无论是基础环境还是虚拟环境，pycharm都识别不出conda里的python.exe(如图)。如果不想看啰嗦直接看后面的解决办法，比较闲的话可以看看我的研究过程。研究过程看了很多博客，尝试了以下解决办法：加载conda.bat文件，虽然出现了
jetson agx orin 刷机、cuda、pytorch配置指南【亲测有效】
jetsonagxorin刷机指南注意事项刷机具体指南cuda环境配置指南Anconda、Pytorch配置注意事项1.使用设备自带usbtoc的传输线时，注意c口插到orin左侧的口，右侧的口不支持数据传输；2.刷机时需准备ubuntu系统，可以是虚拟机，注意安装SDKManager刷机时，JetPack版本要选对，JetPack6.0的对应ubuntu22，cuda12版本，对应pytorch
Yolov5-obb(旋转目标poly_nms_cuda.cu编译bug记录及解决方案)
关于在执行pythonsetup.pydevelop#or"pipinstall-v-e."时poly_nms_cuda.cu报错问题。前面步骤严格按照install.md环境1.pytorch版本较低时（我的是1.10）：poly_nms_cuda.cu文件添加”#defineeps1e-8“，删除“constdoubleeps=1E-8;”这句2.pytorch版本较高时（我用的是1.27）h
【深度学习实战】当前三个最佳图像分类模型的代码详解云博士的AI课堂大模型技术开发与实践哈佛博后带你玩转机器学习深度学习深度学习人工智能分类模型机器学习 Transformer EfficientNet ConvNeXt
下面给出三个在当前图像分类任务中精度表现突出的模型示例，分别基于SwinTransformer、EfficientNet与ConvNeXt。每个模型均包含：训练代码（使用PyTorch）从预训练权重开始微调（也可注释掉预训练选项，从头训练）数据集目录结构：└──dataset_root├──buy#第一类图像└──nobuy#第二类图像随机拆分：80%训练，20%验证每个Epoch输出一次loss
Text2Reward学习笔记
1.提示词请问，“glew”是一个RL工程师常用的工具库吗？请问,thiscodebase主要是做什么用的呀？1.1解释代码是否可以请您根据thiscodebase的主要功能，参考PyTorch的文档格式和文档风格，使用Markdown格式为选中的代码行编写一段相应的文档说明呢？2.项目环境配置2.1新建环境[official]2.1.1Featurizecondacreate-p~/work/d
Embabel：下一代企业级JVM AI智能体框架的革命引言：AI时代的Java生态新机遇 DZSpace 软件开发 jvm 人工智能 java
在生成式AI（如ChatGPT、Claude、Gemini）席卷全球的背景下，Python凭借其丰富的AI工具链（如PyTorch、LangChain）成为主流开发语言。然而，在企业级软件开发领域，Java和JVM生态（如Kotlin、Scala）长期以来占据主导地位，尤其是在金融、电信、电商等对稳定性、可扩展性、事务管理要求极高的场景。RodJohnson（Spring框架创始人）敏锐地发现了这
【第三章:神经网络原理详解与Pytorch入门】02.深度学习框架PyTorch入门-(4)Pytorch实战 IT古董人工智能课程深度学习神经网络 pytorch
第三章:神经网络原理详解与Pytorch入门第二部分：深度学习框架PyTorch入门第四节：Pytorch模型构建内容：如何搭建复杂网络以及如何修改模型与保存一、构建复杂神经网络结构在PyTorch中，构建复杂模型通常通过继承nn.Module类，分模块组织层与前向传播逻辑。示例：自定义一个卷积神经网络（CNN）importtorch.nnasnnimporttorch.nn.functional
jetson orin nano安装GPU版本的pytorch过程小鲈鱼- pytorch 人工智能 python
一、安装jetpack组件和安装CUDA/cuDNN可以参考下面这个博客「解析」JetsonOrinNX安装CUDA/cuDNN_jetsoncuda-CSDN博客二、安装Pytorch和torchaudio可以直接看官方给的步骤https://pytorch.org/audio/main/build.jetson.html
lstm 输入数据维度_[mcj]pytorch中LSTM的输入输出解释||LSTM输入输出详解萬重 lstm 输入数据维度
最近想了解一些关于LSTM的相关知识，在进行代码测试的时候，有个地方一直比较疑惑，关于LSTM的输入和输出问题。一直不清楚在pytorch里面该如何定义LSTM的输入和输出。首先看个pytorch官方的例子：#首先导入LSTM需要的相关模块importtorchimporttorch.nnasnn#神经网络模块#数据向量维数10,隐藏元维度20,2个LSTM层串联(如果是1，可以省略，默认为1)r
pytorch 自动微分 this_show_time pytorch 人工智能 python 机器学习
自动微分1.基础概念1.1.**张量**1.2.**计算图**：1.3.**反向传播**1.4.**梯度**2.计算梯度2.1标量梯度计算2.2向量梯度计算2.3多标量梯度计算2.4多向量梯度计算3.梯度上下文控制3.1控制梯度计算（withtorch.no_grad()）3.2累计梯度3.3梯度清零(torch.zero_())自动微分模块torch.autograd负责自动计算张量操作的梯度，
PyTorch 在 Python 自然语言处理中的运用 Python编程之道 Python编程之道 python pytorch 自然语言处理 ai
PyTorch在Python自然语言处理中的运用关键词：PyTorch，Python，自然语言处理，深度学习，文本分类，情感分析摘要：本文全面探讨了PyTorch在Python自然语言处理（NLP）领域的运用。首先介绍了相关背景知识，包括目的范围、预期读者等内容。接着详细阐述了核心概念，如词嵌入、循环神经网络等，并给出了相应的原理示意图和流程图。深入讲解了核心算法原理，结合Python代码进行详细
【AI大模型】PyTorch Lightning 简化工具我爱一条柴ya 学习AI记录人工智能 pytorch python ai AI编程
PyTorchLightning是一个轻量级的PyTorch封装库，它通过抽象训练循环的工程细节，让研究人员可以专注于模型设计和实验。以下是PyTorchLightning的核心概念和实战指南。核心优势基础使用：三步搭建训练流程1.定义LightningModuleimporttorchimporttorch.nnasnnimportpytorch_lightningasplfromtorchme
【零基础学AI】第30讲：生成对抗网络(GAN)实战 - 手写数字生成 1989 0基础学AI 人工智能生成对抗网络神经网络 python 机器学习近邻算法深度学习
本节课你将学到GAN的基本原理和工作机制使用PyTorch构建生成器和判别器DCGAN架构实现技巧训练GAN模型的实用技巧开始之前环境要求Python3.8+需要安装的包：pipinstalltorchtorchvisionmatplotlibnumpyGPU推荐（可大幅加速训练）前置知识第21讲TensorFlow基础第23讲神经网络原理基本PyTorch使用经验核心概念什么是GAN？GAN就像
huggingface 笔记： Trainer UQI-LIUWJ 笔记人工智能
Trainer是一个为Transformers中PyTorch模型设计的完整训练与评估循环只需将模型、预处理器、数据集和训练参数传入Trainer，其余交给它处理，即可快速开始训练自动处理以下训练流程：根据batch计算loss使用backward()计算梯度根据梯度更新权重重复上述流程直到达到指定的epoch数1配置TrainingArguments使用TrainingArguments定义训练
【深度学习-Day 35】实战图像数据增强：用PyTorch和TensorFlow扩充你的数据集吴师兄大模型深度学习入门到精通深度学习 pytorch tensorflow 人工智能 python 大模型 LLM
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
PyTorch+CNN进行猫狗识别项目
任务介绍数据结构为：big_data├──train│└──cat│└──XXX.jpg（每个文件夹含若干张图像）│└──dog│└──XXX.jpg（每个文件夹含若干张图像）├──val│└──cat│└──XXX.jpg（每个文件夹含若干张图像）│└──dog└─────└──XXX.jpg（每个文件夹含若干张图像）需要对train数据集进行训练，达到给定val数据集中的一张猫/狗的图片，识别
人体坐姿检测系统开发实战（YOLOv8+PyTorch+可视化） Loving_enjoy 计算机学科论文创新点人工智能深度学习迁移学习经验分享
本文将手把手教你构建智能坐姿检测系统，结合目标检测与姿态估计技术，实现不良坐姿的实时识别与预警###一、项目背景与价值现代人每天平均坐姿时间超过8小时，不良坐姿会导致：-脊椎压力增加300%-颈椎病发病率提升45%-腰椎间盘突出风险增加60%本系统通过计算机视觉技术实时监测坐姿状态，对驼背、侧倾、前倾等不良姿势进行智能识别和预警。相较于传统传感器方案，我们的视觉方案具有非接触、低成本、易部署的优势
macOS运行python程序遇libiomp5.dylib库冲突错误解决方案 screenCui macos python 开发语言
用途说明在macOS系统运行某些涉及OpenMP或多线程的Python程序（如PyTorch、NumPy等科学计算库）时，可能会出现libiomp5.dylib库冲突的错误。设置os.environ['KMP_DUPLICATE_LIB_OK']='True'允许系统加载重复的动态链接库，临时解决冲突问题。典型错误场景错误信息通常包含以下内容：OMP:Error#15:Initializingli
【零基础学AI】第33讲：强化学习基础 - 游戏AI智能体 1989 0基础学AI 人工智能游戏 transformer 分类深度学习神经网络
本节课你将学到理解强化学习的基本概念和框架掌握Q-learning算法原理使用Python实现贪吃蛇游戏AI训练能够自主玩游戏的智能体开始之前环境要求Python3.8+PyTorch2.0+Gymnasium(原OpenAIGym)NumPyMatplotlib推荐使用JupyterNotebook进行实验前置知识Python基础编程（第1-8讲）基本数学概念（函数、导数）神经网络基础（第23讲
基于Abp Vnext、FastMCP构建一个企业级的模型即服务（MaaS）平台方案 NetX行者 Abp vnext Maas Abp vnext FastMCP 企业级平台解决方案开源 python
企业级MaaS平台技术可行性分析报告一、总体技术架构HTTP/WebSocketgRPC/RESTgRPC/RESTgRPCVue3前端ABPvNextAPI网关.NET9业务微服务ABPvNextMCPClientFastMCP模型仓库PyTorch/TensorFlowHuggingFaceHeyGem/ChatGLM自定义模型统一鉴权中心二、核心框架与中间件组件技术选型官方链接作用前端框架V
TensorRT-LLM：大模型推理加速引擎的架构与实践
前言：技术背景与发展历程：随着GPT-4、LLaMA等千亿级参数模型的出现，传统推理框架面临三大瓶颈：显存占用高（单卡可达80GB）、计算延迟大（生成式推理需迭代处理）、硬件利用率低（Transformer结构存在计算冗余）。根据MLPerf基准测试，原始PyTorch推理的token生成速度仅为12.3tokens/s（A100显卡）。一、TensorRT-LLM介绍：TensorRT-LLM是
服务器无对应cuda版本安装pytorch-gpu[自用] 片月斜生梦泽南 pytorch
服务器无对应cuda版本安装pytorch-gpu服务器无对应cuda版本安装pytorch-gpu网址下载非root用户安装tmux查看服务器ubuntu版本conda安装tensorflow-gpu安装1.x版本服务器无对应cuda版本安装pytorch-gpu网址GPU版本的pytorch、pytorchvision的下载链接https://download.pytorch.org/whl/
Python机器学习与深度学习：决策树、随机森林、XGBoost与LightGBM、迁移学习、循环神经网络、长短时记忆网络、时间卷积网络、自编码器、生成对抗网络、YOLO目标检测等 WangYan2022 机器学习/深度学习 Python 机器学习深度学习随机森林迁移学习
融合最新技术动态与实战经验，旨在系统提升以下能力：①掌握ChatGPT、DeepSeek等大语言模型在代码生成、模型调试、实验设计、论文撰写等方面的实际应用技巧②深入理解深度学习与经典机器学习算法的关联与差异，掌握其理论基础③熟练运用PyTorch实现各类深度学习模型，包括迁移学习、循环神经网络（RNN）、长短时记忆网络（LSTM）、时间卷积网络（TCN）、自编码器、生成对抗网络（GAN）、YOL
Python打卡：Day40
#先继续之前的代码importtorchimporttorch.nnasnnimporttorch.optimasoptimfromtorch.utils.dataimportDataLoader,Dataset#DataLoader是PyTorch中用于加载数据的工具fromtorchvisionimportdatasets,transforms#torchvision是一个用于计算机视觉的库，
Ubuntu下安装多版本CUDA及灵活切换全攻略芯作者 D2：ubuntu linux ubuntu
——释放深度学习潜能，告别版本依赖的烦恼！**为什么需要多版本CUDA？在深度学习、科学计算等领域，不同框架（TensorFlow、PyTorch等）对CUDA版本的要求各异。同时升级框架或维护旧项目时，版本冲突频发。多版本CUDA共存+一键切换是高效开发的刚需！本文将手把手教你实现这一能力，并分享独创的“动态软链接+环境隔离”技巧，让版本管理行云流水！环境准备硬件要求NVIDIA显卡（支持CUD
PyTorch安装总失败？看完这篇保姆级教程，从0到1轻松搞定！喜欢编程就关注我 pytorch 人工智能 python
引言：为什么你装不好PyTorch？“CUDA版本不匹配？pip安装超时？conda环境冲突？”新手安装PyTorch的坑比代码bug还多！这篇博客整合CSDN高赞实战技巧，手把手教你绕过10大安装陷阱，附赠代码级验证指南！一、安装前必看：环境检查清单检查项操作方法Python版本python--version（推荐3.8-3.11）CUDA驱动nvidia-smi（仅NVIDIA显卡需要）con
Python打卡DAY36
DAY36：复习日恩师@浙大疏锦行在PyTorch中，nn.Model是所有神经网络模块的基类，为构建和训练神经网络提供了丰富的方法，如下：1.模型构建与参数管理__init__方法功能：用于初始化神经网络模块的参数和子模块。在自定义网络时，通常会重写此方法来定义网络的结构。细节解释：在__init__方法中，可以定义各种层，如卷积层、全连接层等。这些层会被自动注册为子模块，方便后续管理。impo
动手学深度学习13.7. 单发多框检测（SSD）-笔记&练习（PyTorch） scdifsn 深度学习笔记 pytorch ssd 单发多框检测（SSD）目标检测 mAP评价
以下内容为结合李沐老师的课程和教材补充的学习笔记，以及对课后练习的一些思考，自留回顾，也供同学之人交流参考。本节课程地址：45SSD实现【动手学深度学习v2】_哔哩哔哩_bilibili本节教材地址：13.7.单发多框检测（SSD）—动手学深度学习2.0.0documentation本节开源代码：…>d2l-zh>pytorch>chapter_optimization>ssd.ipynb单发多框
[星球大战]阿纳金的背叛 comsci
本来杰迪圣殿的长老是不同意让阿纳金接受训练的......... 但是由于政治原因,长老会妥协了...这给邪恶的力量带来了机会所以......现代的地球联邦接受了这个教训...绝对不让某些年轻人进入学院
看懂它，你就可以任性的玩耍了！ aijuans JavaScript
javascript作为前端开发的标配技能，如果不掌握好它的三大特点：1.原型 2.作用域 3. 闭包 ,又怎么可以说你学好了这门语言呢？如果标配的技能都没有撑握好，怎么可以任性的玩耍呢？怎么验证自己学好了以上三个基本点呢，我找到一段不错的代码，稍加改动，如果能够读懂它，那么你就可以任性了。 function jClass(b
Java常用工具包 Jodd Kai_Ge java jodd
Jodd 是一个开源的 Java 工具集，包含一些实用的工具类和小型框架。简单，却很强大！写道 Jodd = Tools + IoC + MVC + DB + AOP + TX + JSON + HTML < 1.5 Mb Jodd 被分成众多模块，按需选择，其中工具类模块有： jodd-core &nb
SpringMvc下载 120153216 springMVC
@RequestMapping(value = WebUrlConstant.DOWNLOAD) public void download(HttpServletRequest request,HttpServletResponse response,String fileName) { OutputStream os = null; InputStream is = null;
Python 标准异常总结 2002wmj python
Python标准异常总结 AssertionError 断言语句（assert）失败 AttributeError 尝试访问未知的对象属性 EOFError 用户输入文件末尾标志EOF（Ctrl+d） FloatingPointError 浮点计算错误 GeneratorExit generator.close()方法被调用的时候 ImportError 导入模块失
SQL函数返回临时表结构的数据用于查询 357029540 SQL Server
这两天在做一个查询的SQL，这个SQL的一个条件是通过游标实现另外两张表查询出一个多条数据，这些数据都是INT类型，然后用IN条件进行查询，并且查询这两张表需要通过外部传入参数才能查询出所需数据，于是想到了用SQL函数返回值，并且也这样做了，由于是返回多条数据，所以把查询出来的INT类型值都拼接为了字符串，这时就遇到问题了，在查询SQL中因为条件是INT值，SQL函数的CAST和CONVERST都
java 时间格式化 | 比较大小| 时区个人笔记 7454103 java eclipse tomcat c MyEclipse
个人总结！不当之处多多包含！引用 1.0 如何设置 tomcat 的时区：位置：(catalina.bat---JAVA_OPTS 下面加上) set JAVA_OPT
时间获取Clander的用法 adminjun Clander 时间
/** * 得到几天前的时间 * @param d * @param day * @return */ public static Date getDateBefore(Date d,int day){ Calend
JVM初探与设置 aijuans java
JVM是Java Virtual Machine（Java虚拟机）的缩写，JVM是一种用于计算设备的规范，它是一个虚构出来的计算机，是通过在实际的计算机上仿真模拟各种计算机功能来实现的。Java虚拟机包括一套字节码指令集、一组寄存器、一个栈、一个垃圾回收堆和一个存储方法域。 JVM屏蔽了与具体操作系统平台相关的信息，使Java程序只需生成在Java虚拟机上运行的目标代码（字节码）,就可以在多种平台
SQL中ON和WHERE的区别 avords
SQL中ON和WHERE的区别数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。 www.2cto.com 在使用left jion时，on和where条件的区别如下： 1、 on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左边表中的记录。
说说自信 houxinyou 工作生活
自信的来源分为两种,一种是源于实力,一种源于头脑.实力是一个综合的评定,有自身的能力,能利用的资源等.比如我想去月亮上,要身体素质过硬,还要有飞船等等一系列的东西.这些都属于实力的一部分.而头脑不同,只要你头脑够简单就可以了!同样要上月亮上,你想,我一跳,1米,我多跳几下,跳个几年,应该就到了!什么?你说我会往下掉?你笨呀你!找个东西踩一下不就行了吗? 无论工作还
WEBLOGIC事务超时设置 bijian1013 weblogic jta 事务超时
系统中统计数据，由于调用统计过程，执行时间超过了weblogic设置的时间，提示如下错误：统计数据出错! 原因：The transaction is no longer active - status: 'Rolling Back. [Reason=weblogic.transaction.internal
两年已过去，再看该如何快速融入新团队 bingyingao java 互联网融入架构新团队
偶得的空闲，翻到了两年前的帖子该如何快速融入一个新团队，有所感触，就记下来，为下一个两年后的今天做参考。时隔两年半之后的今天，再来看当初的这个博客，别有一番滋味。而我已经于今年三月份离开了当初所在的团队，加入另外的一个项目组，2011年的这篇博客之后的时光，我很好的融入了那个团队，而直到现在和同事们关系都特别好。大家在短短一年半的时间离一起经历了一
【Spark七十七】Spark分析Nginx和Apache的access.log bit1129 apache
Spark分析Nginx和Apache的access.log，第一个问题是要对Nginx和Apache的access.log文件进行按行解析，按行解析就的方法是正则表达式： Nginx的access.log解析正则表达式 val PATTERN = """([^ ]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\&q
Erlang patch bookjovi erlang
Totally five patchs committed to erlang otp, just small patchs. IMO, erlang really is a interesting programming language, I really like its concurrency feature. but the functional programming style
log4j日志路径中加入日期 bro_feng java log4j
要用log4j使用记录日志，日志路径有每日的日期，文件大小5M新增文件。实现方式 log4j: <appender name="serviceLog" class="org.apache.log4j.RollingFileAppender"> <param name="Encoding" v
读《研磨设计模式》-代码笔记-桥接模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 个人觉得关于桥接模式的例子，蜡笔和毛笔这个例子是最贴切的：http://www.cnblogs.com/zhenyulu/articles/67016.html * 笔和颜色是可分离的，蜡笔把两者耦合在一起了：一支蜡笔只有一种
windows7下SVN和Eclipse插件安装 chenyu19891124 eclipse插件
今天花了一天时间弄SVN和Eclipse插件的安装，今天弄好了。svn插件和Eclipse整合有两种方式，一种是直接下载插件包，二种是通过Eclipse在线更新。由于之前Eclipse版本和svn插件版本有差别，始终是没装上。最后在网上找到了适合的版本。所用的环境系统：windows7JDK：1.7svn插件包版本：1.8.16Eclipse：3.7.2工具下载地址：Eclipse下在地址：htt
[转帖]工作流引擎设计思路 comsci 设计模式工作应用服务器 workflow 企业应用
作为国内的同行，我非常希望在流程设计方面和大家交流，刚发现篇好文(那么好的文章，现在才发现，可惜)，关于流程设计的一些原理，个人觉得本文站得高，看得远，比俺的文章有深度，转载如下 ================================================================================= 自开博以来不断有朋友来探讨工作流引擎该如何
Linux 查看内存，CPU及硬盘大小的方法 daizj linux cpu 内存硬盘大小
一、查看CPU信息的命令 [root@R4 ~]# cat /proc/cpuinfo |grep "model name" && cat /proc/cpuinfo |grep "physical id" model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz model name :
linux 踢出在线用户 dongwei_6688 linux
两个步骤： 1.用w命令找到要踢出的用户，比如下面： [root@localhost ~]# w 18:16:55 up 39 days, 8:27, 3 users, load average: 0.03, 0.03, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
放手吧,就像不曾拥有过一样 dcj3sjt126com
内容提要：静悠悠编著的《放手吧就像不曾拥有过一样》集结“全球华语世界最舒缓心灵”的精华故事，触碰生命最深层次的感动，献给全世界亿万读者。《放手吧就像不曾拥有过一样》的作者衷心地祝愿每一位读者都给自己一个重新出发的理由，将那些令你痛苦的、扛起的、背负的，一并都放下吧！把憔悴的面容换做一种清淡的微笑，把沉重的步伐调节成春天五线谱上的音符，让自己踏着轻快的节奏，在人生的海面上悠然漂荡，享受宁静与
php二进制安全的含义 dcj3sjt126com PHP
PHP里，有string的概念。 string里，每个字符的大小为byte（与PHP相比，Java的每个字符为Character，是UTF8字符，C语言的每个字符可以在编译时选择）。 byte里，有ASCII代码的字符，例如ABC，123，abc，也有一些特殊字符，例如回车，退格之类的。特殊字符很多是不能显示的。或者说，他们的显示方式没有标准，例如编码65到哪儿都是字母A，编码97到哪儿都是字符
Linux下禁用T440s，X240的一体化触摸板(touchpad) gashero linux ThinkPad 触摸板
自打1月买了Thinkpad T440s就一直很火大，其中最让人恼火的莫过于触摸板。 Thinkpad的经典就包括用了小红点(TrackPoint)。但是小红点只能定位，还是需要鼠标的左右键的。但是自打T440s等开始启用了一体化触摸板，不再有实体的按键了。问题是要是好用也行。实际使用中，触摸板一堆问题，比如定位有抖动，以及按键时会有飘逸。这就导致了单击经常就
graph_dfs hcx2013 Graph
package edu.xidian.graph; class MyStack { private final int SIZE = 20; private int[] st; private int top; public MyStack() { st = new int[SIZE]; top = -1; } public void push(i
Spring4.1新特性——Spring核心部分及其他 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
配置HiveServer2的安全策略之自定义用户名密码验证 liyonghui160com
具体从网上看 http://doc.mapr.com/display/MapR/Using+HiveServer2#UsingHiveServer2-ConfiguringCustomAuthentication LDAP Authentication using OpenLDAP Setting
一位30多的程序员生涯经验总结 pda158 编程工作生活咨询
1.客户在接触到产品之后，才会真正明白自己的需求。　　这是我在我的第一份工作上面学来的。只有当我们给客户展示产品的时候，他们才会意识到哪些是必须的。给出一个功能性原型设计远远比一张长长的文字表格要好。 2.只要有充足的时间，所有安全防御系统都将失败。　　安全防御现如今是全世界都在关注的大课题、大挑战。我们必须时时刻刻积极完善它，因为黑客只要有一次成功，就可以彻底打败你。 3.
分布式web服务架构的演变自由的奴隶 linux Web 应用服务器互联网
最开始，由于某些想法，于是在互联网上搭建了一个网站，这个时候甚至有可能主机都是租借的，但由于这篇文章我们只关注架构的演变历程，因此就假设这个时候已经是托管了一台主机，并且有一定的带宽了，这个时候由于网站具备了一定的特色，吸引了部分人访问，逐渐你发现系统的压力越来越高，响应速度越来越慢，而这个时候比较明显的是数据库和应用互相影响，应用出问题了，数据库也很容易出现问题，而数据库出问题的时候，应用也容易
初探Druid连接池之二——慢SQL日志记录 xingsan_zhang 日志连接池 druid 慢SQL
由于工作原因，这里先不说连接数据库部分的配置，后面会补上，直接进入慢SQL日志记录。 1.applicationContext.xml中增加如下配置： <bean abstract="true" id="mysql_database" class="com.alibaba.druid.pool.DruidDataSourc