图波列夫

RetinaNet Examples：NVIDIA 一站式训练、推理及模型转换解决方案

retinanet-examples 是英伟达提供的目标检测工程范例，针对端到端 GPU 处理进行了优化：

使用基于 Python 多进程的 apex.parallel.DistributedDataParallel 加速分布式训练；
apex.amp 优化混合精度训练；
NVIDIA DALI 加速数据预处理；
推理使用 TensorRT。

项目推荐安装 PyTorch NGC docker container：

nvidia-docker run --rm --ipc=host -it nvcr.io/nvidia/pytorch:19.05-py3

然而新发布的版本仅支持 compute capability 6.0 以上的 GPU。

main

Created with Raphaël 2.2.0 main args parse load_model worker End

parse 解析参数。
load_model 创建模型并加载参数。
Module.share_memory 未在文档中列出。
worker 是工作的函数。
torch.multiprocessing.spawn 生成使用args运行fn的nprocs个进程。如果其中一个进程以非零退出状态退出，则会终止其余进程，并引发异常并导致终止。在子进程中捕获异常的情况下，它将被转发，并且其跟踪包含在父进程中引发的异常中。

每个设备创建一个进程执行 worker。

    'Entry point for the retinanet command'

    args = parse(args or sys.argv[1:])

    model, state = load_model(args, verbose=True)
    if model: model.share_memory()

    world = torch.cuda.device_count()
    if args.command == 'export' or world <= 1:
        worker(0, args, 1, model, state)
    else:
        torch.multiprocessing.spawn(worker, args=(args, world, model, state), nprocs=world)

parse

ArgumentParser.add_subparsers
许多程序将其功能分解为许多子命令，例如，svn程序可以调用子命令如svn checkout、svn update和svn commit。当程序执行多个函数，每个函数需要不同类型的命令行参数时，以这种方式拆分功能可能是一个特别好的主意。ArgumentParser 支持使用 add_subparsers() 方法创建此类子命令。通常不带参数调用 add_subparsers() 方法，并返回一个特殊的操作对象。该对象有一个方法 add_parser()，它接受命令名和任何 ArgumentParser 构造函数参数，并返回一个可以照常修改的 ArgumentParser 对象。

参数说明：

title：默认情况下，如果提供了description，则使用“subcommands”，否则将使用title参数的标题。
description：帮助输出中子解析器组的描述，默认情况下为None。
prog：将在子命令帮助中显示的用法信息，默认情况下，程序的名称和subparser参数之前的任何位置参数。
parser_class：用于创建子解析器实例的类，默认情况下是当前解析器的类（例如 ArgumentParser）
action：在命令行遇到此参数时要采取的基本操作类型。
dest：存储子命令名的属性的名称；默认情况下，不存储任何值。
required：是否必须提供子命令，默认为False。
help：帮助输出中子解析器组的帮助，默认为None。
metavar：在帮助中显示可用子命令的字符串；默认情况下为None，并以{cmd1, cmd2, …}的形式显示子命令。

    parser = argparse.ArgumentParser(description='RetinaNet Detection Utility.')
    parser.add_argument('--master', metavar='address:port', type=str, help='Adress and port of the master worker', default='127.0.0.1:29500')

    subparsers = parser.add_subparsers(help='sub-command', dest='command')
    subparsers.required = True

    devcount = max(1, torch.cuda.device_count())

训练参数设置。
batch 为每个设备两张图。

    parser_train = subparsers.add_parser('train', help='train a network')
    parser_train.add_argument('model', type=str, help='path to output model or checkpoint to resume from')
    parser_train.add_argument('--annotations', metavar='path', type=str, help='path to COCO style annotations', required=True)
    parser_train.add_argument('--images', metavar='path', type=str, help='path to images', default='.')
    parser_train.add_argument('--backbone', action='store', type=str, nargs='+', help='backbone model (or list of)', default=['ResNet50FPN'])
    parser_train.add_argument('--classes', metavar='num', type=int, help='number of classes', default=80)
    parser_train.add_argument('--batch', metavar='size', type=int, help='batch size', default=2*devcount)
    parser_train.add_argument('--resize', metavar='scale', type=int, help='resize to given size', default=800)
    parser_train.add_argument('--max-size', metavar='max', type=int, help='maximum resizing size', default=1333)
    parser_train.add_argument('--jitter', metavar='min max', type=int, nargs=2, help='jitter size within range', default=[640, 1024])
    parser_train.add_argument('--iters', metavar='number', type=int, help='number of iterations to train for', default=90000)
    parser_train.add_argument('--milestones', action='store', type=int, nargs='*', help='list of iteration indices where learning rate decays', default=[60000, 80000])
    parser_train.add_argument('--schedule', metavar='scale', type=float, help='scale schedule (affecting iters and milestones)', default=1)
    parser_train.add_argument('--full-precision', help='train in full precision', action='store_true')
    parser_train.add_argument('--lr', metavar='value', help='learning rate', type=float, default=0.01)
    parser_train.add_argument('--warmup', metavar='iterations', help='numer of warmup iterations', type=int, default=1000)
    parser_train.add_argument('--gamma', metavar='value', type=float, help='multiplicative factor of learning rate decay', default=0.1)
    parser_train.add_argument('--override', help='override model', action='store_true')
    parser_train.add_argument('--val-annotations', metavar='path', type=str, help='path to COCO style validation annotations')
    parser_train.add_argument('--val-images', metavar='path', type=str, help='path to validation images')
    parser_train.add_argument('--post-metrics', metavar='url', type=str, help='post metrics to specified url')
    parser_train.add_argument('--fine-tune', metavar='path', type=str, help='fine tune a pretrained model')
    parser_train.add_argument('--logdir', metavar='logdir', type=str, help='directory where to write logs')
    parser_train.add_argument('--val-iters', metavar='number', type=int, help='number of iterations between each validation', default=8000)
    parser_train.add_argument('--with-dali', help='use dali for data loading', action='store_true')

测试参数设置。

    parser_infer = subparsers.add_parser('infer', help='run inference')
    parser_infer.add_argument('model', type=str, help='path to model')
    parser_infer.add_argument('--images', metavar='path', type=str, help='path to images', default='.')
    parser_infer.add_argument('--annotations', metavar='annotations', type=str, help='evaluate using provided annotations')
    parser_infer.add_argument('--output', metavar='file', type=str, help='save detections to specified JSON file', default='detections.json')
    parser_infer.add_argument('--batch', metavar='size', type=int, help='batch size', default=2*devcount)
    parser_infer.add_argument('--resize', metavar='scale', type=int, help='resize to given size', default=800)
    parser_infer.add_argument('--max-size', metavar='max', type=int, help='maximum resizing size', default=1333)
    parser_infer.add_argument('--with-dali', help='use dali for data loading', action='store_true')
    parser_infer.add_argument('--full-precision', help='inference in full precision', action='store_true')

模型导出参数。

    parser_export = subparsers.add_parser('export', help='export a model into a TensorRT engine')
    parser_export.add_argument('model', type=str, help='path to model')
    parser_export.add_argument('export', type=str, help='path to exported output')
    parser_export.add_argument('--size', metavar='height width', type=int, nargs='+', help='input size (square) or sizes (h w) to use when generating TensorRT engine', default=[1280])
    parser_export.add_argument('--batch', metavar='size', type=int, help='max batch size to use for TensorRT engine', default=2)
    parser_export.add_argument('--full-precision', help='export in full instead of half precision', action='store_true')
    parser_export.add_argument('--int8', help='calibrate model and export in int8 precision', action='store_true')
    parser_export.add_argument('--opset', metavar='version', type=int, help='ONNX opset version')
    parser_export.add_argument('--calibration-batches', metavar='size', type=int, help='number of batches to use for int8 calibration', default=10)
    parser_export.add_argument('--calibration-images', metavar='path', type=str, help='path to calibration images to use for int8 calibration', default="")
    parser_export.add_argument('--calibration-table', metavar='path', type=str, help='path of existing calibration table to load from, or name of new calibration table', default="")
    parser_export.add_argument('--verbose', help='enable verbose logging', action='store_true')

    return parser.parse_args(args)

load_model

检查是否指定了模型文件。

    if args.command != 'train' and not os.path.isfile(args.model):
        raise RuntimeError('Model file {} does not exist!'.format(args.model))

解析模型的扩展名。

    model = None
    state = {}
    _, ext = os.path.splitext(args.model)

训练模式下如果未指定模型文件则创建 Model 实例。Model.initialize 加载预训练的参数或者进行初始化。
Model.load 首先由基础网络创建模型，然后加载参数并获取训练状态变量。

    if args.command == 'train' and (not os.path.exists(args.model) or args.override):
        if verbose: print('Initializing model...')
        model = Model(args.backbone, args.classes)
        model.initialize(args.fine_tune)
        if verbose: print(model)

    elif ext == '.pth' or ext == '.torch':
        if verbose: print('Loading model from {}...'.format(os.path.basename(args.model)))
        model, state = Model.load(args.model)
        if verbose: print(model)

    elif args.command == 'infer' and ext in ['.engine', '.plan']:
        model = None
    
    else:
        raise RuntimeError('Invalid model format "{}"!'.format(args.ext))

    state['path'] = args.model
    return model, state

worker

train.train

infer.infer

model.export

worker 函数能够执行训练、推理、模型导出3种任务。

os.environ 是进程参数之一，表示字符串环境的mapping 对象。例如，environ['HOME']是主目录的路径名（在某些平台上），相当于 C 中的getenv("HOME")。这个映射是在第一次导入 os 模块时捕获的，通常在 Python 启动期间作为处理site.py的一部分。在此时间之后对环境所做的更改不会反映在 os.environ 中，除非直接修改 os.environ。

torch.cuda.set_device 设置当前设备。
torch.distributed.init_process_group 初始化默认的分布式进程组，这也将初始化分布式程序包。初始化进程组主要有两种方法：

明确指定store、rank和world_size。
指定init_method（URL 字符串），指示发现对等点的位置/方式。（可选）指定rank和world_size，或在URL中编码所有必需的参数并省略它们。

如果两者都未指定，则假定init_method为“env://”。

    'Per-device distributed worker'

    if torch.cuda.is_available():
        os.environ.update({
            'MASTER_PORT': args.master.split(':')[-1],
            'MASTER_ADDR': ':'.join(args.master.split(':')[:-1]),
            'WORLD_SIZE':  str(world),
            'RANK':        str(rank),
            'CUDA_DEVICE': str(rank)
        })

        torch.cuda.set_device(rank)
        torch.distributed.init_process_group(backend='nccl', init_method='env://')

        if args.batch % world != 0:
            raise RuntimeError('Batch size should be a multiple of the number of GPUs')

train 函数参数特别多。

    if args.command == 'train':
        train.train(model, state, args.images, args.annotations,
            args.val_images or args.images, args.val_annotations, args.resize, args.max_size, args.jitter, 
            args.batch, int(args.iters * args.schedule), args.val_iters, not args.full_precision, args.lr, 
            args.warmup, [int(m * args.schedule) for m in args.milestones], args.gamma, 
            is_master=(rank == 0), world=world, use_dali=args.with_dali,
            metrics_url=args.post_metrics, logdir=args.logdir, verbose=(rank == 0))

推理时调用 Engine::_load 加载模型。 Engine 类封装了 TensorRT CUDA engine，PYBIND11_MODULE 宏在 extensions.cpp 中导出 C++ 变量。
infer 进行推理。

    elif args.command == 'infer':
        if model is None:
            if rank == 0: print('Loading CUDA engine from {}...'.format(os.path.basename(args.model)))
            model = Engine.load(args.model)

        infer.infer(model, args.images, args.output, args.resize, args.max_size, args.batch,
            annotations=args.annotations, mixed_precision=not args.full_precision,
            is_master=(rank == 0), world=world, use_dali=args.with_dali, verbose=(rank == 0))

由路径获取校正图像列表并置乱。
Model.export 进行校正。

    elif args.command == 'export':
        onnx_only = args.export.split('.')[-1] == 'onnx'
        input_size = args.size * 2 if len(args.size) == 1 else args.size

        calibration_files = []
        if args.int8:
            # Get list of images to use for calibration
            if os.path.isdir(args.calibration_images):
                import glob
                file_extensions = ['.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG']
                for ex in file_extensions:
                    calibration_files += glob.glob("{}/*{}".format(args.calibration_images, ex), recursive=True)
                # Only need enough images for specified num of calibration batches
                if len(calibration_files) >= args.calibration_batches * args.batch:
                    calibration_files = calibration_files[:(args.calibration_batches * args.batch)]
                else:
                    print('Only found enough images for {} batches. Continuing anyway...'.format(len(calibration_files) // args.batch))

                random.shuffle(calibration_files)

        precision = "FP32"
        if args.int8:
            precision = "INT8"
        elif not args.full_precision:
            precision = "FP16"

        exported = model.export(input_size, args.batch, precision, calibration_files, args.calibration_table, args.verbose, onnx_only=onnx_only, opset=args.opset)
        if onnx_only:
            with open(args.export, 'wb') as out:
                out.write(exported)
        else:
            exported.save(args.export)

train

保留模型到nn_model。
convert_fixedbn_model 将模型中的 torch.nn.BatchNorm2d 替换为 FixedBatchNorm2d。

    'Train the model on the given dataset'

    # Prepare model
    nn_model = model
    stride = model.stride

    model = convert_fixedbn_model(model)
    if torch.cuda.is_available():
        model = model.cuda()

apex.amp.initialize 根据所选的opt_level和重写属性（如果有的话）初始化模型，优化器以及 Torch 张量和函数命名空间。构建完模型和优化器之后，应在将模型发送到 torch.nn.parallel.DistributedDataParallel 包装器之前，调用 apex.amp.initialize。请参见 ImageNet 示例中的 Distributed training。目前，尽管 Apex 可以处理任意数量的模型和优化器，但只应调用一次 apex.amp.initialize（参见相应的 Advanced Amp Usage topic）。如果您认为您的用例需要多次调用 apex.amp.initialize，请联系 NVIDIA。任何非None的属性关键字参数都将被解释为手动覆盖。为了避免重写脚本中的其他内容，请命名返回的模型/优化器以替换传递的模型/优化器。

apex.parallel.DistributedDataParallel 是一个模块包装器，它支持简单的多进程分布式数据并行训练，类似于 torch.nn.parallel.DistributedDataParallel。初始化过程中在进程之间广播参数，并且在 backward() 期间，规约平均进程间的梯度。

DistributedDataParallel 针对 NCCL 进行了优化。它在 backward() 期间交叠传输和通信，并且桶运较小的梯度传输以减少所需的传输总数，从而提高了性能。这与 NVCaffe 所做的优化类似。

    # Setup optimizer and schedule
    optimizer = SGD(model.parameters(), lr=lr, weight_decay=0.0001, momentum=0.9) 

    model, optimizer = amp.initialize(model, optimizer,
                                      opt_level = 'O2' if mixed_precision else 'O0',
                                      keep_batchnorm_fp32 = True,
                                      loss_scale = 128.0,
                                      verbosity = is_master)

    if world > 1: 
        model = DistributedDataParallel(model)
    model.train()

    if 'optimizer' in state:
        optimizer.load_state_dict(state['optimizer'])

    def schedule(train_iter):
        if warmup and train_iter <= warmup:
            return 0.9 * train_iter / warmup + 0.1
        return gamma ** len([m for m in milestones if m <= train_iter])
    scheduler = LambdaLR(optimizer, schedule)

DaliDataIterator 使用 NVIDIA DALI 加载数据。

    # Prepare dataset
    if verbose: print('Preparing dataset...')
    data_iterator = (DaliDataIterator if use_dali else DataIterator)(
        path, jitter, max_size, batch_size, stride,
        world, annotations, training=True)
    if verbose: print(data_iterator)


    if verbose:
        print('    device: {} {}'.format(
            world, 'cpu' if not torch.cuda.is_available() else 'gpu' if world == 1 else 'gpus'))
        print('    batch: {}, precision: {}'.format(batch_size, 'mixed' if mixed_precision else 'full'))
        print('Training model for {} iterations...'.format(iterations))

    # Create TensorBoard writer
    if logdir is not None:
        from tensorboardX import SummaryWriter
        if is_master and verbose:
            print('Writing TensorBoard logs to: {}'.format(logdir))
        writer = SummaryWriter(logdir=logdir)

Profiler 记录时间用于分析。
前向运行完后手动删除data。
apex.amp.scale_loss 在上下文管理器入口处，创建scaled_loss = (loss.float())*current loss scale。产生scaled_loss，以便用户可以调用scaled_loss.backward()：

with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()

在上下文管理器退出时（if delay_unscale=False），将检查梯度的 infs/NaNs 并反缩放，以便可以调用optimizer.step()。

    profiler = Profiler(['train', 'fw', 'bw'])
    iteration = state.get('iteration', 0)
    while iteration < iterations:
        cls_losses, box_losses = [], []
        for i, (data, target) in enumerate(data_iterator):
            scheduler.step(iteration)

            # Forward pass
            profiler.start('fw')

            optimizer.zero_grad()
            cls_loss, box_loss = model([data, target])
            del data
            profiler.stop('fw')

            # Backward pass
            profiler.start('bw')
            with amp.scale_loss(cls_loss + box_loss, optimizer) as scaled_loss:
                scaled_loss.backward()
            optimizer.step()

            # Reduce all losses
            cls_loss, box_loss = cls_loss.mean().clone(), box_loss.mean().clone()
            if world > 1:
                torch.distributed.all_reduce(cls_loss)
                torch.distributed.all_reduce(box_loss)
                cls_loss /= world
                box_loss /= world
            if is_master:
                cls_losses.append(cls_loss)
                box_losses.append(box_loss)

            if is_master and not isfinite(cls_loss + box_loss):
                raise RuntimeError('Loss is diverging!\n{}'.format(
                    'Try lowering the learning rate.'))

            del cls_loss, box_loss
            profiler.stop('bw')

            iteration += 1

主节点打印并记录信息。
post_metrics 使用 requests 发送信息。
ignore_sigint 忽略信号。
infer 根据模型进行推理。

            profiler.bump('train')
            if is_master and (profiler.totals['train'] > 60 or iteration == iterations):
                focal_loss = torch.stack(list(cls_losses)).mean().item()
                box_loss = torch.stack(list(box_losses)).mean().item()
                learning_rate = optimizer.param_groups[0]['lr']
                if verbose:
                    msg  = '[{:{len}}/{}]'.format(iteration, iterations, len=len(str(iterations)))
                    msg += ' focal loss: {:.3f}'.format(focal_loss)
                    msg += ', box loss: {:.3f}'.format(box_loss)
                    msg += ', {:.3f}s/{}-batch'.format(profiler.means['train'], batch_size)
                    msg += ' (fw: {:.3f}s, bw: {:.3f}s)'.format(profiler.means['fw'], profiler.means['bw'])
                    msg += ', {:.1f} im/s'.format(batch_size / profiler.means['train'])
                    msg += ', lr: {:.2g}'.format(learning_rate)
                    print(msg, flush=True)

                if logdir is not None:
                    writer.add_scalar('focal_loss', focal_loss,  iteration)
                    writer.add_scalar('box_loss', box_loss, iteration)
                    writer.add_scalar('learning_rate', learning_rate, iteration)
                    del box_loss, focal_loss

                if metrics_url:
                    post_metrics(metrics_url, {
                        'focal loss': mean(cls_losses),
                        'box loss': mean(box_losses),
                        'im_s': batch_size / profiler.means['train'],
                        'lr': learning_rate
                    })

                # Save model weights
                state.update({
                    'iteration': iteration,
                    'optimizer': optimizer.state_dict(),
                    'scheduler': scheduler.state_dict(),
                })
                with ignore_sigint():
                    nn_model.save(state)

                profiler.reset()
                del cls_losses[:], box_losses[:]

            if val_annotations and (iteration == iterations or iteration % val_iterations == 0):
                infer(model, val_path, None, resize, max_size, batch_size, annotations=val_annotations,
                    mixed_precision=mixed_precision, is_master=is_master, world=world, use_dali=use_dali, is_validation=True, verbose=False)
                model.train()

            if iteration == iterations:
                break

    if logdir is not None:
        writer.close()

infer

根据模型的类型确定执行后端。

    'Run inference on images from path'

    backend = 'pytorch' if isinstance(model, Model) or isinstance(model, DDP) else 'tensorrt'

    stride = model.module.stride if isinstance(model, DDP) else model.stride

tempfile.mktemp 已经废弃了。tempfile.mkstemp 尽可能以最安全的方式创建临时文件。假设平台正确实现 os.open() 的 os.O_EXCL 标志，文件创建中没有竞争条件。只有创建用户 ID 才能读写该文件。如果平台使用权限位指示文件是否可执行，则任何人都无法执行该文件。子进程不继承文件描述符。

    # Create annotations if none was provided
    if not annotations:
        annotations = tempfile.mktemp('.json')
        images = [{ 'id': i, 'file_name': f} for i, f in enumerate(os.listdir(path))]
        json.dump({ 'images': images }, open(annotations, 'w'))

    # TensorRT only supports fixed input sizes, so override input size accordingly
    if backend == 'tensorrt': max_size = max(model.input_size)

使用 DaliDataIterator 或者 DataIterator 加载数据。

    # Prepare dataset
    if verbose: print('Preparing dataset...')
    data_iterator = (DaliDataIterator if use_dali else DataIterator)(
        path, resize, max_size, batch_size, stride,
        world, annotations, training=False)
    if verbose: print(data_iterator)

判断是独立推理还是训练时的验证。

    # Prepare model
    if backend is 'pytorch':
        # If we are doing validation during training,
        # no need to register model with AMP again
        if not is_validation:
            if torch.cuda.is_available(): model = model.cuda()
            model = amp.initialize(model, None,
                               opt_level = 'O2' if mixed_precision else 'O0',
                               keep_batchnorm_fp32 = True,
                               verbosity = 0)

        model.eval()
 
    if verbose:
        print('   backend: {}'.format(backend))
        print('    device: {} {}'.format(
            world, 'cpu' if not torch.cuda.is_available() else 'gpu' if world == 1 else 'gpus'))
        print('     batch: {}, precision: {}'.format(batch_size,
            'unknown' if backend is 'tensorrt' else 'mixed' if mixed_precision else 'full'))
        print('Running inference...')

网络运行结果保存在列表中。

    results = []
    profiler = Profiler(['infer', 'fw'])
    with torch.no_grad():
        for i, (data, ids, ratios) in enumerate(data_iterator):
            # Forward pass
            profiler.start('fw')
            scores, boxes, classes = model(data)
            profiler.stop('fw')

            results.append([scores, boxes, classes, ids, ratios])

            profiler.bump('infer')
            if verbose and (profiler.totals['infer'] > 60 or i == len(data_iterator) - 1):
                size = len(data_iterator.ids)
                msg  = '[{:{len}}/{}]'.format(min((i + 1) * batch_size,
                    size), size, len=len(str(size)))
                msg += ' {:.3f}s/{}-batch'.format(profiler.means['infer'], batch_size)
                msg += ' (fw: {:.3f}s)'.format(profiler.means['fw'])
                msg += ', {:.1f} im/s'.format(batch_size / profiler.means['infer'])
                print(msg, flush=True)

                profiler.reset()

torch.distributed.all_gather 从列表中收集整个组的张量。

    # Gather results from all devices
    if verbose: print('Gathering results...')
    results = [torch.cat(r, dim=0) for r in zip(*results)]
    if world > 1:
        for r, result in enumerate(results):
            all_result = [torch.ones_like(result, device=result.device) for _ in range(world)]
            torch.distributed.all_gather(list(all_result), result)
            results[r] = torch.cat(all_result, dim=0)

主节点将结果拷贝到 cpu，然后收集整理并评测。
pycocotools.coco.COCO.getCatIds 获取类别 id 列表。

    if is_master:
        # Copy buffers back to host
        results = [r.cpu() for r in results]

        # Collect detections
        detections = []
        processed_ids = set()
        for scores, boxes, classes, image_id, ratios in zip(*results):
            image_id = image_id.item()
            if image_id in processed_ids:
                continue
            processed_ids.add(image_id)

            keep = (scores > 0).nonzero()
            scores = scores[keep].view(-1)
            boxes = boxes[keep, :].view(-1, 4) / ratios
            classes = classes[keep].view(-1).int()

            for score, box, cat in zip(scores, boxes, classes):
                x1, y1, x2, y2 = box.data.tolist()
                cat = cat.item()
                if 'annotations' in data_iterator.coco.dataset:
                    cat = data_iterator.coco.getCatIds()[cat]
                detections.append({
                    'image_id': image_id,
                    'score': score.item(),
                    'bbox': [x1, y1, x2 - x1 + 1, y2 - y1 + 1],
                    'category_id': cat
                })

COCO.loadRes 加载结果文件并返回结果 api 对象。
实例化一个 COCOeval 对象评测结果。

        if detections:
            # Save detections
            if detections_file and verbose: print('Writing {}...'.format(detections_file))
            detections = { 'annotations': detections }
            detections['images'] = data_iterator.coco.dataset['images']
            if 'categories' in data_iterator.coco.dataset:
                detections['categories'] = [data_iterator.coco.dataset['categories']]
            if detections_file:
                json.dump(detections, open(detections_file, 'w'), indent=4)

            # Evaluate model on dataset
            if 'annotations' in data_iterator.coco.dataset:
                if verbose: print('Evaluating model...')
                with redirect_stdout(None):
                    coco_pred = data_iterator.coco.loadRes(detections['annotations'])
                    coco_eval = COCOeval(data_iterator.coco, coco_pred, 'bbox')
                    coco_eval.evaluate()
                    coco_eval.accumulate()
                coco_eval.summarize()
        else:
            print('No detections!')

Model

getattr 返回object的命名属性的值。name必须是字符串。如果字符串是对象属性之一的名称，则结果是该属性的值。例如，getattr(x, 'foobar')相当于x.foobar。如果命名属性不存在，则返回default（如果提供），否则将引发 AttributeError。
torch.nn.ModuleDict 在字典中保存子模块。ModuleDict 可以像常规 Python 字典一样编制索引，只不过它包含的模块已正确注册，并且对所有 Module 方法可见。

ModuleDict 是一个有序字典并且遵循

插入顺序
在 update() 中，合并的OrderedDict 或另一个 ModuleDict（update() 的参数）的顺序。

请注意，使用其他无序映射类型（例如，Python 的普通字典）的 update() 不会保留合并映射的顺序。

    'RetinaNet - https://arxiv.org/abs/1708.02002'

    def __init__(self, backbones='ResNet50FPN', classes=80, config={}):
        super().__init__()

        if not isinstance(backbones, list):
            backbones = [backbones]

        self.backbones = nn.ModuleDict({b: getattr(backbones_mod, b)() for b in backbones})
        self.name = 'RetinaNet'
        self.exporting = False

        self.ratios = [1.0, 2.0, 0.5]
        self.scales = [4 * 2**(i/3) for i in range(3)]
        self.anchors = {}
        self.classes = classes

        self.threshold  = config.get('threshold', 0.05)
        self.top_n      = config.get('top_n', 1000)
        self.nms        = config.get('nms', 0.5)
        self.detections = config.get('detections', 100)

        self.stride = max([b.stride for _, b in self.backbones.items()])

分类和回归头均为4层卷积。
FocalLoss 基于 torch.nn.functional.binary_cross_entropy_with_logits。
SmoothL1Loss 为自行实现。

        # classification and box regression heads
        def make_head(out_size):
            layers = []
            for _ in range(4):
                layers += [nn.Conv2d(256, 256, 3, padding=1), nn.ReLU()]
            layers += [nn.Conv2d(256, out_size, 3, padding=1)]
            return nn.Sequential(*layers)

        anchors = len(self.ratios) * len(self.scales)
        self.cls_head = make_head(classes * anchors)
        self.box_head = make_head(4 * anchors)

        self.cls_criterion = FocalLoss()
        self.box_criterion = SmoothL1Loss(beta=0.11)

repr

object.__repr__ 由 repr() 内置函数调用以计算对象的“官方”字符串表示。如果可能的话，这应该看起来像一个有效的 Python 表达式，可用于重新创建具有相同值的对象（给定适当的环境）。如果这不可能，则应返回<…一些有用的描述…>形式的字符串。返回值必须是字符串对象。如果一个类定义了__repr__() 而不是 __str__()，那么当需要该类的实例的“非正式”字符串表示时，也会使用 _repr_()。这通常用于调试，因此表示是信息丰富且清晰的很重要。

        return '\n'.join([
            '     model: {}'.format(self.name),
            '  backbone: {}'.format(', '.join([k for k, _ in self.backbones.items()])),
            '   classes: {}, anchors: {}'.format(self.classes, len(self.ratios) * len(self.scales)),
        ])

initialize

如果是预训练的，加载模型并忽略cls_head.8。

        if pre_trained:
            # Initialize using weights from pre-trained model
            if not os.path.isfile(pre_trained):
                raise ValueError('No checkpoint {}'.format(pre_trained))

            print('Fine-tuning weights from {}...'.format(os.path.basename(pre_trained)))
            state_dict = self.state_dict()
            chk = torch.load(pre_trained, map_location=lambda storage, loc: storage)
            ignored = ['cls_head.8.bias', 'cls_head.8.weight']
            weights = { k: v for k, v in chk['state_dict'].items() if k not in ignored }
            state_dict.update(weights)
            self.load_state_dict(state_dict)

            del chk, weights
            torch.cuda.empty_cache()

否则调用基础网络的初始化方法，然后初始化检测和回归分支。

        else:
            # Initialize backbone(s)
            for _, backbone in self.backbones.items():
                backbone.initialize()

            # Initialize heads
            def initialize_layer(layer):
                if isinstance(layer, nn.Conv2d):
                    nn.init.normal_(layer.weight, std=0.01)
                    if layer.bias is not None:
                        nn.init.constant_(layer.bias, val=0)
            self.cls_head.apply(initialize_layer)
            self.box_head.apply(initialize_layer)

特殊初始化分类分支的最后一层。

        # Initialize class head prior
        def initialize_prior(layer):
            pi = 0.01
            b = - math.log((1 - pi) / pi)
            nn.init.constant_(layer.bias, b)
            nn.init.normal_(layer.weight, std=0.01)
        self.cls_head[-1].apply(initialize_prior)

forward

Created with Raphaël 2.2.0 forward x backbone cls_head box_head training? _compute_loss End cls_head.sigmoid exporting? generate_anchors decode nms yes no yes no

训练时，输入的x包含处理后的标注信息。
由基础网络提取特征，然后同时进行分类和回归。
如果是训练，则调用 _compute_loss 计算损失并返回。

        if self.training: x, targets = x

        # Backbones forward pass
        features = []
        for _, backbone in self.backbones.items():
            features.extend(backbone(x))

        # Heads forward pass
        cls_heads = [self.cls_head(t) for t in features]
        box_heads = [self.box_head(t) for t in features]

        if self.training:
            return self._compute_loss(x, cls_heads, box_heads, targets.float())

如果是导出，返回分类和回归输出。

        cls_heads = [cls_head.sigmoid() for cls_head in cls_heads]

        if self.exporting:
            self.strides = [x.shape[-1] // cls_head.shape[-1] for cls_head in cls_heads]
            return cls_heads, box_heads

否则，运行推理的后处理。
generate_anchors 从比例/比率生成锚点坐标。
decode 根据得分过滤结果并解码出边界框。
nms 进一步过滤结果。

        # Inference post-processing
        decoded = []
        for cls_head, box_head in zip(cls_heads, box_heads):
            # Generate level's anchors
            stride = x.shape[-1] // cls_head.shape[-1]
            if stride not in self.anchors:
                self.anchors[stride] = generate_anchors(stride, self.ratios, self.scales)

            # Decode and filter boxes
            decoded.append(decode(cls_head, box_head, stride,
                self.threshold, self.top_n, self.anchors[stride]))

        # Perform non-maximum suppression
        decoded = [torch.cat(tensors, 1) for tensors in zip(*decoded)]
        return nms(*decoded, self.nms, self.detections)

_extract_targets

generate_anchors

snap_to_anchors

generate_anchors 从比例/比率生成锚点坐标。
snap_to_anchors 参照锚点生成目标量。

        cls_target, box_target, depth = [], [], []
        for target in targets:
            target = target[target[:, -1] > -1]
            if stride not in self.anchors:
                self.anchors[stride] = generate_anchors(stride, self.ratios, self.scales)
            snapped = snap_to_anchors(
                target, [s * stride for s in size[::-1]], stride,
                self.anchors[stride].to(targets.device), self.classes, targets.device)
            for l, s in zip((cls_target, box_target, depth), snapped): l.append(s)
        return torch.stack(cls_target), torch.stack(box_target), torch.stack(depth)

_compute_loss

_extract_targets

self.cls_criterion

self.box_criterion

_extract_targets 获得分类和回归目标。
depth为样本选取掩码。

        cls_losses, box_losses, fg_targets = [], [], []
        for cls_head, box_head in zip(cls_heads, box_heads):
            size = cls_head.shape[-2:]
            stride = x.shape[-1] / cls_head.shape[-1]

            cls_target, box_target, depth = self._extract_targets(targets, stride, size)
            fg_targets.append((depth > 0).sum().float().clamp(min=1))

            cls_head = cls_head.view_as(cls_target).float()
            cls_mask = (depth >= 0).expand_as(cls_target).float()
            cls_loss = self.cls_criterion(cls_head, cls_target)
            cls_loss = cls_mask * cls_loss
            cls_losses.append(cls_loss.sum())

            box_head = box_head.view_as(box_target).float()
            box_mask = (depth > 0).expand_as(box_target).float()
            box_loss = self.box_criterion(box_head, box_target)
            box_loss = box_mask * box_loss
            box_losses.append(box_loss.sum())

        fg_targets = torch.stack(fg_targets).sum()
        cls_loss = torch.stack(cls_losses).sum() / fg_targets
        box_loss = torch.stack(box_losses).sum() / fg_targets
        return cls_loss, box_loss

save

保存基础网络、类别数和模型。
如果有的话，保存iteration、optimizer和scheduler。

        checkpoint = {
            'backbone': [k for k, _ in self.backbones.items()],
            'classes': self.classes,
            'state_dict': self.state_dict()
        }

        for key in ('iteration', 'optimizer', 'scheduler'):
            if key in state:
                checkpoint[key] = state[key]

        torch.save(checkpoint, state['path'])

load

创建基础网络，然后加载其中的参数。
torch.cuda.empty_cache 释放当前由缓存分配器保存的所有未占用的缓存内存，以便可以在其他 GPU 应用程序中使用这些缓存并在 nvidia-smi 中可见。

    @classmethod
    def load(cls, filename):
        if not os.path.isfile(filename):
            raise ValueError('No checkpoint {}'.format(filename))

        checkpoint = torch.load(filename, map_location=lambda storage, loc: storage)
        # Recreate model from checkpoint instead of from individual backbones
        model = cls(backbones=checkpoint['backbone'], classes=checkpoint['classes'])
        model.load_state_dict(checkpoint['state_dict'])

        state = {}
        for key in ('iteration', 'optimizer', 'scheduler'):
            if key in checkpoint:
                state[key] = checkpoint[key]

        del checkpoint
        torch.cuda.empty_cache()

        return model, state

export

如果 OpSet 的版本低于9，则定义upsample_nearest2d函数。

        import torch.onnx.symbolic

        if opset is not None and opset < 9:
            # Override Upsample's ONNX export from old opset if required (not needed for TRT 5.1+)
            @torch.onnx.symbolic.parse_args('v', 'is')
            def upsample_nearest2d(g, input, output_size):
                height_scale = float(output_size[-2]) / input.type().sizes()[-2]
                width_scale = float(output_size[-1]) / input.type().sizes()[-1]
                return g.op("Upsample", input,
                    scales_f=(1, 1, height_scale, width_scale),
                    mode_s="nearest")
            torch.onnx.symbolic.upsample_nearest2d = upsample_nearest2d

io.BytesIO 使用内存中字节缓冲区的流实现。它继承了 BufferedIOBase。调用 close() 方法时，将丢弃缓冲区。可选参数initial_bytes是一个包含初始数据的 bytes-like object。
torch.onnx.export 导出模型。
io.BytesIO.getvalue 返回包含缓冲区全部内容的 bytes。

        # Export to ONNX
        print('Exporting to ONNX...')
        self.exporting = True
        onnx_bytes = io.BytesIO()
        zero_input = torch.zeros([1, 3, *size]).cuda()
        extra_args = { 'opset_version': opset } if opset else {}
        torch.onnx.export(self.cuda(), zero_input, onnx_bytes, *extra_args)
        self.exporting = False

        if onnx_only:
            return onnx_bytes.getvalue()

generate_anchors 从比例/比率生成锚点坐标。
返回一个 Engine 对象。

        # Build TensorRT engine
        model_name = '_'.join([k for k, _ in self.backbones.items()])
        anchors = [generate_anchors(stride, self.ratios, self.scales).view(-1).tolist() 
            for stride in self.strides]
        return Engine(onnx_bytes.getvalue(), len(onnx_bytes.getvalue()), batch, precision,
            self.threshold, self.top_n, anchors, self.nms, self.detections, calibration_files, model_name, calibration_table, verbose)

DaliDataIterator

COCOPipeline

DaliDataIterator 定义的函数与 torch.utils.data.DataLoader 类似，但使用 DALI 实现数据并行加载。
输入的batch_size参数为总批量大小。

    'Data loader for data parallel using Dali'

    def __init__(self, path, resize, max_size, batch_size, stride, world, annotations, training=False):
        self.training = training
        self.resize = resize
        self.max_size = max_size
        self.stride = stride
        self.batch_size = batch_size // world
        self.mean = [255.*x for x in [0.485, 0.456, 0.406]]
        self.std = [255.*x for x in [0.229, 0.224, 0.225]]
        self.world = world
        self.path = path

contextlib.redirect_stdout 是用于临时将 sys.stdout 重定向到另一个文件或类文件对象的上下文管理器。此工具为输出硬连接到 stdout 的现有函数或类增加了灵活性。
创建一个 COCO 实例。
COCOPipeline 封装了 nvidia.dali.pipeline.Pipeline。

nvidia.dali.pipeline.Pipeline.build 构建管道。需要构建管道才能独立运行它。特定于框架的插件自动处理此步骤。

        # Setup COCO
        with redirect_stdout(None):
            self.coco = COCO(annotations)
        self.ids = list(self.coco.imgs.keys())
        if 'categories' in self.coco.dataset:
            self.categories_inv = { k: i for i, k in enumerate(self.coco.getCatIds()) }

        self.pipe = COCOPipeline(batch_size=self.batch_size, num_threads=2, 
            path=path, coco=self.coco, training=training, annotations=annotations, world=world, 
            device_id = torch.cuda.current_device(), mean=self.mean, std=self.std, resize=resize, max_size=max_size, stride=self.stride)

        self.pipe.build()

repr

功能说明。

        return '\n'.join([
            '    loader: dali',
            '    resize: {}, max: {}'.format(self.resize, self.max_size),
        ])

len

        return ceil(len(self.ids) // self.world / self.batch_size)

iter

nvidia.dali.pipeline.Pipeline.run 运行管道并返回结果。如果管道是在exec_pipelined选项设置为True的情况下创建的，则此函数也将开始预取下一次迭代以加快执行速度。不应与同一管道中的 nvidia.dali.pipeline.Pipeline.schedule_run()、nvidia.dali.pipeline.Pipeline.share_outputs() 和 nvidia.dali.pipeline.Pipeline.release_outputs() 混合使用。
ctypes.c_void_p 表示 C 语言的void *类型。该值表示为整数。构造函数接受可选的整数初始值设定项。
orch.Tensor.data_ptr 返回自张量的第一个元素的地址。

self.pipe.run()返回的结果均为 nvidia.dali.backend.TensorListCPU 或 nvidia.dali.backend.TensorListGPU 类型。
nvidia.dali.backend.TensorListCPU.copy_to_external 将此TensorList的内容复制到 CPU 内存中的外部指针（类型为 ctypes.c_void_p）。插件内部使用此函数与支持的深度学习框架中的张量进行交互。

nvidia.dali.backend.TensorListGPU.as_cpu 返回 TensorListCPU 对象，该对象是此 TensorListGPU 的副本。

            data, ratios, ids, num_detections = [], [], [], []
            dali_data, dali_boxes, dali_labels, dali_ids, dali_attrs, dali_resize_img = self.pipe.run()

            for l in range(len(dali_boxes)):
                num_detections.append(dali_boxes.at(l).shape[0])

            pyt_targets = -1 * torch.ones([len(dali_boxes), max(max(num_detections),1), 5])

            for batch in range(self.batch_size):
                id = int(dali_ids.at(batch)[0])
                
                # Convert dali tensor to pytorch
                dali_tensor = dali_data.at(batch)
                tensor_shape = dali_tensor.shape()

                datum = torch.zeros(dali_tensor.shape(), dtype=torch.float, device=torch.device('cuda'))
                c_type_pointer = ctypes.c_void_p(datum.data_ptr())
                dali_tensor.copy_to_external(c_type_pointer)

                # Calculate image resize ratio to rescale boxes
                prior_size = dali_attrs.as_cpu().at(batch)
                resized_size = dali_resize_img.at(batch).shape()
                ratio = max(resized_size) / max(prior_size)

                if self.training:
                    # Rescale boxes
                    b_arr = dali_boxes.at(batch)
                    num_dets = b_arr.shape[0]
                    if num_dets is not 0:
                        pyt_bbox = torch.from_numpy(b_arr).float()

                        pyt_bbox[:,0] *= float(prior_size[1])
                        pyt_bbox[:,1] *= float(prior_size[0])
                        pyt_bbox[:,2] *= float(prior_size[1])
                        pyt_bbox[:,3] *= float(prior_size[0])
                        # (l,t,r,b) ->  (x,y,w,h) == (l,r, r-l, b-t)
                        pyt_bbox[:,2] -= pyt_bbox[:,0]
                        pyt_bbox[:,3] -= pyt_bbox[:,1]
                        pyt_targets[batch,:num_dets,:4] = pyt_bbox * ratio

                    # Arrange labels in target tensor
                    l_arr = dali_labels.at(batch)
                    if num_dets is not 0:
                        pyt_label = torch.from_numpy(l_arr).float()
                        pyt_label -= 1 #Rescale labels to [0,79] instead of [1,80]
                        pyt_targets[batch,:num_dets, 4] = pyt_label.squeeze()

                ids.append(id)
                data.append(datum.unsqueeze(0))
                ratios.append(ratio)

            data = torch.cat(data, dim=0)

            if self.training:
                pyt_targets = pyt_targets.cuda(non_blocking=True)

                yield data, pyt_targets

            else:
                ids = torch.Tensor(ids).int().cuda(non_blocking=True)
                ratios = torch.Tensor(ratios).cuda(non_blocking=True)

                yield data, ids, ratios

COCOPipeline

pipeline.Pipeline

可以参考文档 COCO Reader with augmentations。

nvidia.dali.ops.COCOReader 是一个 CPU 运算符。从 COCO 数据集中读取数据，其每个目录中包含图像和一个注释文件。对于带有 m 个 bbox 的图像，将其 bbox 返回为 (m,4) Tensor (m * [x, y, w, h] or m * [left, top, right, bottom]) 和标签为 (m,1) Tensor (m * category_id)。
nvidia.dali.ops.nvJPEGDecoderSlice 是一个“混合”操作。根据给定大小和锚点的裁剪窗口使用 nvJPEG 库部分解码 JPEG 图像。输入必须以特定顺序提供3个张量：

encoded_dat包含编码图像数据；
begin包含(x,y)格式的裁剪起始像素坐标
size包含裁剪的像素尺寸，(w,h)格式。

对于begin和size，坐标必须在区间[0.0, 1.0]中。解码器输出的排序为HWC。

警告
此运算符现已弃用。请改用 ImageDecoderSlice

nvidia.dali.ops.RandomBBoxCrop 是一个 CPU 运算符。对图像执行预期裁剪，同时保持边界框和标签一致。输入必须作为两个张量提供：

BBoxes包含表示为[l,t,r,b]或[x,y,w,h]的边界框；
Labels包含每个边界框的相应标签。

得到的预期切图以两个张量形式提供：

Begin包含(x,y)格式的切图起始坐标；
Size包含(w,h)格式的切图大小。

边界框提供为(m*4)张量，其中每个边界框表示为[l,t,r,b]或[x,y,w,h]。丢弃与边框交并比小于阈值的标签。请注意，当allow_no_crop为False且阈值不包含0时，最好增加num_attempts，否则它可能会循环很长时间。

nvidia.dali.ops.BbFlip 是一个 CPU，GPU 运算符，执行边界框水平翻转（镜像）操作。输入为边界框坐标，格式为[x, y, w, h]或[left, top, right, bottom]。所有坐标都在图像坐标系中（即0.0-1.0）。

nvidia.dali.ops.Flip 是一个 CPU，GPU 运算符，在水平轴和(或)垂直轴上翻转图像。

nvidia.dali.ops.CoinFlip 是一个支持操作符。产生充满0和1的张量——随机掷硬币的结果，可用作选择操作的参数。

nvidia.dali.ops.Uniform 是一个支持操作符，产生均匀分布随机数的张量。

nvidia.dali.ops.Resize 是一个 CPU，GPU 运算符，调整图像大小。

nvidia.dali.ops.Paste 是一个 GPU 操作符。将输入图像粘贴到更大的画布上。画布大小等于输入大小*比率。

nvidia.dali.ops.CropMirrorNormalize 是一个 CPU，GPU 运算符。如果需要，融合执行裁剪、标准化、格式转换（NHWC 到 NCHW）和类型转换。标准化输入图像并使用以下公式生成输出：

    output = (input - mean) / std

请注意，不提供任何裁剪参数将仅导致镜像和标准化。该运算符允许序列输入。

        super().__init__(batch_size=batch_size, num_threads=num_threads, device_id = device_id, prefetch_queue_depth=num_threads, seed=42)

        self.path = path
        self.training = training
        self.coco = coco
        self.stride = stride
        self.iter = 0

        self.reader = ops.COCOReader(annotations_file=annotations, file_root=path, num_shards=world,shard_id=torch.cuda.current_device(), 
                                     ltrb=True, ratio=True, shuffle_after_epoch=True, save_img_ids=True)

        self.decode_train = ops.nvJPEGDecoderSlice(device="mixed", output_type=types.RGB)
        self.decode_infer = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
        self.bbox_crop = ops.RandomBBoxCrop(device='cpu', ltrb=True, scaling=[0.3, 1.0], thresholds=[0.1,0.3,0.5,0.7,0.9])

        self.bbox_flip = ops.BbFlip(device='cpu', ltrb=True)
        self.img_flip = ops.Flip(device='gpu')
        self.coin_flip = ops.CoinFlip(probability=0.5)

        if isinstance(resize, list): resize = max(resize)
        self.rand_resize = ops.Uniform(range=[resize, float(max_size)])

        self.resize_train = ops.Resize(device='gpu', interp_type=types.DALIInterpType.INTERP_CUBIC, save_attrs=True)
        self.resize_infer = ops.Resize(device='gpu', interp_type=types.DALIInterpType.INTERP_CUBIC, resize_longer=max_size, save_attrs=True)

        padded_size = max_size + ((self.stride - max_size % self.stride) % self.stride)

        self.pad = ops.Paste(device='gpu', fill_value = 0, ratio=1.1, min_canvas_size=padded_size, paste_x=0, paste_y=0)
        self.normalize = ops.CropMirrorNormalize(device='gpu', mean=mean, std=std, crop=padded_size, crop_pos_x=0, crop_pos_y=0)

define_graph

nvidia.dali.pipeline.Pipeline.define_graph 返回输出EdgeReference的列表。用户定义此函数以构造其管道的操作图。
self.reader()从数据集中读取数据。
如果是训练，读取图像并进行增广；如果是推理则仅解码和调整大小。

        images, bboxes, labels, img_ids = self.reader()

        if self.training:
            crop_begin, crop_size, bboxes, labels = self.bbox_crop(bboxes, labels)
            images = self.decode_train(images, crop_begin, crop_size)
            resize = self.rand_resize()
            images, attrs = self.resize_train(images, resize_longer=resize)

            flip = self.coin_flip()
            bboxes = self.bbox_flip(bboxes, horizontal=flip)
            images = self.img_flip(images, horizontal=flip)

        else:
            images = self.decode_infer(images)
            images, attrs = self.resize_infer(images)

        resized_images = images
        images = self.normalize(self.pad(images))

        return images, bboxes, labels, img_ids, attrs, resized_images

snap_to_anchors

torch.Tensor.nelement 是 torch.Tensor.numel 的别称。

如果boxes为空则直接返回0填充的张量。

    'Snap target boxes (x, y, w, h) to anchors'

    num_anchors = anchors.size()[0] if anchors is not None else 1
    width, height = (int(size[0] / stride), int(size[1] / stride))

    if boxes.nelement() == 0:
        return (torch.zeros([num_anchors, num_classes, height, width], device=device),
            torch.zeros([num_anchors, 4, height, width], device=device),
            torch.zeros([num_anchors, 1, height, width], device=device))

根据输出尺寸将锚点广播到每个位置。

    boxes, classes = boxes.split(4, dim=1)

    # Generate anchors
    x, y = torch.meshgrid([torch.arange(0, size[i], stride, device=device, dtype=classes.dtype) for i in range(2)])
    xyxy = torch.stack((x, y, x, y), 2).unsqueeze(0)
    anchors = anchors.view(-1, 1, 1, 4).to(dtype=classes.dtype)
    anchors = (xyxy + anchors).contiguous().view(-1, 4)

boxes由[x, y, width, height]转为[left, top, right, bottom]，便于计算交并比。

    # Compute overlap between boxes and anchors
    boxes = torch.cat([boxes[:, :2], boxes[:, :2] + boxes[:, 2:] - 1], 1)
    xy1 = torch.max(anchors[:, None, :2], boxes[:, :2])
    xy2 = torch.min(anchors[:, None, 2:], boxes[:, 2:])
    inter = torch.prod((xy2 - xy1 + 1).clamp(0), 2)
    boxes_area = torch.prod(boxes[:, 2:] - boxes[:, :2] + 1, 1)
    anchors_area = torch.prod(anchors[:, 2:] - anchors[:, :2] + 1, 1)
    overlap = inter / (anchors_area[:, None] + boxes_area - inter)

为每个锚框保留最佳匹配目标框。
box2delta 将边界框转换为锚框的增量。
torch.ones_like 返回填充标量值1的张量，其大小与input相同。torch.ones_like(input)等效于torch.ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)。

depth为样本选取掩码。

    # Keep best box per anchor
    overlap, indices = overlap.max(1)
    box_target = box2delta(boxes[indices], anchors)
    box_target = box_target.view(num_anchors, 1, width, height, 4)
    box_target = box_target.transpose(1, 4).transpose(2, 3)
    box_target = box_target.squeeze().contiguous()

    depth = torch.ones_like(overlap) * -1
    depth[overlap < 0.4] = 0 # background
    depth[overlap >= 0.5] = classes[indices][overlap >= 0.5].squeeze() + 1 # objects
    depth = depth.view(num_anchors, width, height).transpose(1, 2).contiguous()

生成目标类别。每个类别上的值为0或1。

torch.Tensor.scatter_ 将张量src中的所有值写入index张量中指定的索引处的self。对于src中的每个值，dimension != dim时其输出索引由src中的索引指定，dimension = dim时为index的索引值。

    # Generate target classes
    cls_target = torch.zeros((anchors.size()[0], num_classes + 1), device=device, dtype=boxes.dtype)
    if classes.nelement() == 0:
        classes = torch.LongTensor([num_classes], device=device).expand_as(indices)
    else:
        classes = classes[indices].long()
    classes = classes.view(-1, 1)
    classes[overlap < 0.4] = num_classes # background has no class
    cls_target.scatter_(1, classes, 1)
    cls_target = cls_target[:, :num_classes].view(-1, 1, width, height, num_classes)
    cls_target = cls_target.transpose(1, 4).transpose(2, 3)
    cls_target = cls_target.squeeze().contiguous()

    return (cls_target.view(num_anchors, num_classes, height, width),
        box_target.view(num_anchors, 4, height, width),
        depth.view(num_anchors, 1, height, width))

参考资料：

Multiprocessing best practices
PyTorch为何如此高效好用？来探寻深度学习框架的内部架构
【分布式训练】单机多卡的正确打开方式（三）：PyTorch
Pytorch tutorial 之Transfer Learning
Python’s Requests Library (Guide)
90秒训练AlexNet！商汤刷新纪录
S9243 Fast and Accurate Object Detection
NumPy array slice using None
Pytorch多机多卡分布式训练

你可能感兴趣的:(DeepLearning,ObjectDetection,PyTorch,GPU)

【JS】执行时长(100分) |思路参考+代码解析（C++） l939035548 JS 算法数据结构 c++
题目为了充分发挥GPU算力，需要尽可能多的将任务交给GPU执行，现在有一个任务数组，数组元素表示在这1秒内新增的任务个数且每秒都有新增任务。假设GPU最多一次执行n个任务，一次执行耗时1秒，在保证GPU不空闲情况下，最少需要多长时间执行完成。题目输入第一个参数为GPU一次最多执行的任务个数，取值范围[1,10000]第二个参数为任务数组长度，取值范围[1,10000]第三个参数为任务数组，数字范围
Faiss Tips：高效向量搜索与聚类的利器焦习娜Samantha
FaissTips：高效向量搜索与聚类的利器faiss_tipsSomeusefultipsforfaiss项目地址:https://gitcode.com/gh_mirrors/fa/faiss_tips项目介绍Faiss是由FacebookAIResearch开发的一个用于高效相似性搜索和密集向量聚类的库。它支持多种硬件平台，包括CPU和GPU，能够在海量数据集上实现快速的近似最近邻搜索（AN
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
2021-06-07 Do What You Are Meant To Do 春生阁
Don’tgiveupontryingtofindbalanceinyourlife.Sticktoyourpriorities.Rememberwhat’smostimportanttoyouanddoeverythingyoucantoputyourselfinapositionwhereyoucanfocusonthosepriorities,ratherthanbeingpulledbyt
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
【安装环境】配置MMTracking环境 xuanyu22 安装环境机器学习神经网络深度学习 python
版本v0.14.0安装torchnumpy的版本不能太高，否则后面安装时会发生冲突。先安装numpy，因为pytorch的安装会自动配置高版本numpy。condainstallnumpy=1.21.5mmtracking支持的torch版本有限，需要找到合适的condainstallpytorch==1.11.0torchvision==0.12.0cudatoolkit=10.2-cpytor
Python(PyTorch)和MATLAB及Rust和C++结构相似度指数测量导图亚图跨际 Python 交叉知识算法量化检查图像压缩质量低分辨率多光谱峰值信噪比端到端优化图像压缩手术机器人三维实景实时可微分渲染重建三维可视化
要点量化检查图像压缩质量低分辨率多光谱和高分辨率图像实现超分辨率分析图像质量图像索引/多尺度结构相似度指数和光谱角映射器及视觉信息保真度多种指标峰值信噪比和结构相似度指数测量结构相似性图像分类PNG和JPEG图像相似性近似算法图像压缩，视频压缩、端到端优化图像压缩、神经图像压缩、GPU变速图像压缩手术机器人深度估计算法重建三维可视化推理图像超分辨率算法模型三维实景实时可微分渲染算法MATLAB结构
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
Pyorch中 nn.Conv1d 与 nn.Linear 的区别迪三 #NN_Layer 神经网络
即一维卷积层和全联接层的区别nn.Conv1d和nn.Linear都是PyTorch中的层，它们用于不同的目的，主要区别在于它们处理输入数据的方式和执行的操作类型。nn.Conv1d通过应用滑动过滤器来捕捉序列数据中的局部模式，适用于处理具有时间或序列结构的数据。nn.Linear通过将每个输入与每个输出相连接，捕捉全局关系，适用于将输入数据作为整体处理的任务。1.维度与输入nn.Conv1d（一
图片中的上采样，下采样和通道融合(up-sample, down-sample, channel confusion) 迪三 #图像处理_PyTorch 计算机视觉深度学习人工智能
前言以conv2d为例（即图片），Pytorch中输入的数据格式为tensor，格式为:[N,C,W,H,W]第一维N.代表图片个数，类似一个batch里面有N张图片第二维C.代表通道数，在模型中输入如果为彩色，常用RGB三色图，那么就是3维，即C=3。如果是黑白的，即灰度图，那么只有一个通道，即C=1第三维H.代表图片的高度，H的数量是图片像素的列数第四维W.代表图片的宽度，W的数量是图片像素的
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
【大模型】triton inference server idiotyi 大模型自然语言处理语言模型人工智能
前言：tritoninferenceserver常用于大模型部署，可以采用http或GRPC调用，支持大部分的backend，单GPU、多GPU都可以支持，CPU也支持。本文主要是使用tritoninferenceserver部署大模型的简单流程示例。目录1.整体流程2.搭建本地仓库3.服务端代码4.启动服务5.客户端调用1.整体流程搭建模型仓库模型配置服务端调用代码docker启动服务客户端调用
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程牙牙要健康深度学习 onnx onnxruntime 深度学习 python 人工智能
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程提示:博主取舍了很多大佬的博文并亲测有效,分享笔记邀大家共同学习讨论文章目录【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程前言模型转换--pytorch转onnxWindows平台搭建依赖环境onnxruntime调用onnx模型ONNXRuntime推理核
Upstage 将发布新一代 LLM “Solar Pro “预览版吴脑的键客人工智能人工智能
SolarPro是最智能的LLM，经过优化可在单GPU上运行，性能超过微软、Meta和谷歌等科技巨头的模型。加州圣何塞2024年9月11日电/美通社/–Upstage今天宣布发布其下一代大型语言模型(LLM)SolarPro的预览版。加州圣何塞2024年9月11日电/美通社/–Upstage今天宣布发布其下一代大型语言模型(LLM)SolarPro的预览版。该预览版作为开源模型免费提供API访问，
使用vllIm部署大语言模型添砖JAVA的小墨机器学习
使用vllm部署大语言模型一般需要以下步骤：一、准备工作1.系统要求-操作系统：常见的Linux发行版（如Ubuntu、CentOS）或Windows（通过WSL）。-GPU支持：NVIDIAGPU并安装了适当的驱动程序。-足够的内存和存储空间。2.安装依赖-Python3.8及以上版本。-CUDA工具包（根据GPU型号选择合适的版本）。二、安装vllm1.创建虚拟环境（推荐）-使用Conda：c
大模型框架：vLLM m0_37559973 大模型大模型通义千问 Qwen
目录一、vLLM介绍二、安装vLLM2.1使用GPU进行安装2.2使用CPU进行安装2.3相关配置三、使用vLLM3.1离线推理3.2适配OpenAI-API的API服务一、vLLM介绍vLLM是伯克利大学LMSYS组织开源的大语言模型高速推理框架。它利用了全新的注意力算法「PagedAttention」，提供易用、快速、便宜的LLM服务。二、安装vLLM2.1使用GPU进行安装vLLM是一个Py
天下苦英伟达久矣！PyTorch官方免CUDA加速推理，Triton时代要来？诗者才子酒中仙物联网 /互联网 /人工智能 /其他 pytorch 人工智能 python
在做大语言模型（LLM）的训练、微调和推理时，使用英伟达的GPU和CUDA是常见的做法。在更大的机器学习编程与计算范畴，同样严重依赖CUDA，使用它加速的机器学习模型可以实现更大的性能提升。虽然CUDA在加速计算领域占据主导地位，并成为英伟达重要的护城河之一。但其他一些工作的出现正在向CUDA发起挑战，比如OpenAI推出的Triton，它在可用性、内存开销、AI编译器堆栈构建等方面具有一定的优势
pytorch安装(windows) m0_62244898 windows 人工智能
（1）下载pycharmPyCharm:thePythonIDEforProfessionalDevelopersbyJetBrains(2)下载anacondaAnaconda|TheWorld'sMostPopularDataSciencePlatform(3)创建一个新环境：torchcondacreate-ntorch-y(4)进入新环境condaactivatetorch(5)加入清华源
Unity3D GPUDriven渲染详解 Thomas_YXQ 开发语言 Unity3D 架构游戏 Unity
前言Unity3D中的GPUDriven渲染技术是一种通过最大化GPU的利用，减少CPU负担，从而提高渲染效率和帧率的方法。其核心思想是将更多的渲染任务转移到GPU上，充分利用现代图形硬件（显卡）的性能。以下是该技术的几个关键组件和它们的作用：对惹，这里有一个游戏开发交流小组，大家可以点击进来一起交流一下开发经验呀！1.BatchRendererGroup(BRG)BRG是Unity中用于批处理渲
深度学习入门篇：PyTorch实现手写数字识别 AI_Guru人工智能深度学习 pytorch 人工智能
深度学习作为机器学习的一个分支，近年来在图像识别、自然语言处理等领域取得了显著的成就。在众多的深度学习框架中，PyTorch以其动态计算图、易用性强和灵活度高等特点，受到了广泛的喜爱。本篇文章将带领大家使用PyTorch框架，实现一个手写数字识别的基础模型。手写数字识别简介手写数字识别是计算机视觉领域的一个经典问题，目的是让计算机能够识别并理解手写数字图像。这个问题通常作为深度学习入门的练习，因为
1. 下载安装RKNN的docker镜像 jcfszxc RKNN系列 c++Rockchip
安装好docker：1.Docker的安装进入网盘，下载镜像文件：网盘链接：https://console.zbox.filez.com/l/I00fc3密码：rknn下载最新的版本，当前最新版本2.1.0，（[[2024-09-01]]）：下载路径：GPU-Group01的分享/RKNPU2SDK/2.1.0/release/rknn-toolkit2-2.1.0-cp38-docker.tar
【ShuQiHere】小白也能懂的 TensorFlow 和 PyTorch GPU 配置教程 ShuQiHere tensorflow pytorch 人工智能
【ShuQiHere】在深度学习中，GPU的使用对于加速模型训练至关重要。然而，对于许多刚刚入门的小白来说，如何在TensorFlow和PyTorch中指定使用GPU进行训练可能会感到困惑。在本文中，我将详细介绍如何在这两个主流的深度学习框架中指定使用GPU进行训练，并确保每一个步骤都简单易懂，跟着我的步骤来，你也能轻松上手！1.安装所需库首先，确保你已经安装了TensorFlow或PyTorch
项目实战 ---- 商用落地视频搜索系统（10）---后台搜索Cache优化 PhoenixAI8 AI Python 商用视频搜索系统 vector db milvus redis cache
目录背景技术实现策略视频预处理阶段的cache技术视频搜索阶段的cache技术技术实现预处理阶段cache策略实现逻辑代码运行结果问题及注意点搜索阶段cache策略实现系统配置层面逻辑低版本GPUCPU本项目的配置高版本描述goahead策略cache加载策略本项目配置应用层搜索参数的配置配置项本项目的实际配置背景但目前为止，视频搜索系统已经可以正常使用和运转。并且他是基于多策略搜索算法的，能够在
TensorFlow的基本概念以及使用场景张柏慈决策树
TensorFlow是一个机器学习平台，用于构建和训练机器学习模型。它使用图形表示计算任务，其中节点表示数学操作，边表示计算之间的数据流动。TensorFlow的主要特点包括：1.多平台支持：TensorFlow可以运行在多种硬件和操作系统上，包括CPU、GPU和移动设备。2.自动求导：TensorFlow可以自动计算模型参数的梯度，通过优化算法更新参数，以提高模型的准确性。3.分布式计算：Ten
解决ModuleNotFoundError: No module named ‘torch的方法梅菊林各种问题解决方案开发语言
ModuleNotFoundError:Nomodulenamed‘torch’错误是Python在尝试导入名为torch的模块时找不到该模块而抛出的异常。torch是PyTorch深度学习框架的核心库，如果你的Python环境中没有安装这个库，尝试导入时就会遇到这个错误。文章目录报错问题报错原因解决方法报错问题当你尝试在Python脚本或交互式环境中执行以下命令时：importtorch如果Py
RTX 4090深度学习性能实测奉上！模型训练可提升60~80% 赋创小助手服务器深度学习人工智能图像处理自动驾驶
近期，我们对RTX4090涡轮版进行了完整的整机测试，本篇文章将分别围绕单卡，4卡，8卡RTX4090性能测试结果展开分享，以全面评估其相比上代RTX30系列的性能优势。首先让我们一起看看本次测试的硬件配置。测试硬件配置简单介绍一下本次使用的平台为超微SYS-420GP-TNR，这款GPU系统针对AI和图形密集型工作负载的灵活设计，4U双处理器（第三代英特尔®至强®），双根GPU系统，最多10个P
统一思想认识永夜-极光思想
1.统一思想认识的基础,才能有的放矢原因: 总有一种描述事物的方式最贴近本质,最容易让人理解. 如何让教育更轻松,在于找到最适合学生的方式. 难点在于,如何模拟对方的思维基础选择合适的方式. &
Joda Time使用笔记 bylijinnan java joda time
Joda Time的介绍可以参考这篇文章： http://www.ibm.com/developerworks/cn/java/j-jodatime.html 工作中也常常用到Joda Time，为了避免每次使用都查API，记录一下常用的用法： /** * DateTime变化（增减） */ @Tes
FileUtils API eksliang FileUtils FileUtils API
转载请出自出处：http://eksliang.iteye.com/blog/2217374 一、概述这是一个Java操作文件的常用库，是Apache对java的IO包的封装，这里面有两个非常核心的类FilenameUtils跟FileUtils，其中FilenameUtils是对文件名操作的封装;FileUtils是文件封装，开发中对文件的操作，几乎都可以在这个框架里面找到。非常的好用。
各种新兴技术不懂事的小屁孩技术
1:gradle Gradle 是以 Groovy 语言为基础，面向Java应用为主。基于DSL（领域特定语言）语法的自动化构建工具。现在构建系统常用到maven工具，现在有更容易上手的gradle，搭建java环境: http://www.ibm.com/developerworks/cn/opensource/os-cn-gradle/ 搭建android环境： http://m
tomcat6的https双向认证酷的飞上天空 tomcat6
1.生成服务器端证书 keytool -genkey -keyalg RSA -dname "cn=localhost,ou=sango,o=none,l=china,st=beijing,c=cn" -alias server -keypass password -keystore server.jks -storepass password -validity 36
托管虚拟桌面市场势不可挡蓝儿唯美
用户还需要冗余的数据中心，dinCloud的高级副总裁兼首席营销官Ali Din指出。该公司转售一个MSP可以让用户登录并管理和提供服务的用于DaaS的云自动化控制台，提供服务或者MSP也可以自己来控制。在某些情况下，MSP会在dinCloud的云服务上进行服务分层，如监控和补丁管理。 MSP的利润空间将根据其参与的程度而有所不同，Din说。 “我们有一些合作伙伴负责将我们推荐给客户作为个
spring学习——xml文件的配置 a-john spring
在Spring的学习中，对于其xml文件的配置是必不可少的。在Spring的多种装配Bean的方式中，采用XML配置也是最常见的。以下是一个简单的XML配置文件： <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.or
HDU 4342 History repeat itself 模拟 aijuans 模拟
来源：http://acm.hdu.edu.cn/showproblem.php?pid=4342 题意：首先让求第几个非平方数，然后求从1到该数之间的每个sqrt(i)的下取整的和。思路：一个简单的模拟题目，但是由于数据范围大，需要用__int64。我们可以首先把平方数筛选出来，假如让求第n个非平方数的话，看n前面有多少个平方数，假设有x个，则第n个非平方数就是n+x。注意两种特殊情况，即
java中最常用jar包的用途 asia007 java
java中最常用jar包的用途 jar包用途axis.jarSOAP引擎包commons-discovery-0.2.jar用来发现、查找和实现可插入式接口，提供一些一般类实例化、单件的生命周期管理的常用方法.jaxrpc.jarAxis运行所需要的组件包saaj.jar创建到端点的点到点连接的方法、创建并处理SOAP消息和附件的方法，以及接收和处理SOAP错误的方法. w
ajax获取Struts框架中的json编码异常和Struts中的主控制器异常的解决办法百合不是茶 js json编码返回异常
一:ajax获取自定义Struts框架中的json编码出现以下问题: 1,强制flush输出 json编码打印在首页 2, 不强制flush js会解析json 打印出来的是错误的jsp页面却没有跳转到错误页面 3, ajax中的dataType的json 改为text 会
JUnit使用的设计模式 bijian1013 java 设计模式 JUnit
JUnit源代码涉及使用了大量设计模式 1、模板方法模式（Template Method）定义一个操作中的算法骨架，而将一些步骤延伸到子类中去，使得子类可以不改变一个算法的结构，即可重新定义该算法的某些特定步骤。这里需要复用的是算法的结构，也就是步骤，而步骤的实现可以在子类中完成。
Linux常用命令（摘录） sunjing crond chkconfig
chkconfig --list 查看linux所有服务 chkconfig --add servicename 添加linux服务 netstat -apn | grep 8080 查看端口占用 env 查看所有环境变量 echo $JAVA_HOME 查看JAVA_HOME环境变量安装编译器 yum install -y gcc
【Hadoop一】Hadoop伪集群环境搭建 bit1129 hadoop
结合网上多份文档，不断反复的修正hadoop启动和运行过程中出现的问题，终于把Hadoop2.5.2伪分布式安装起来，跑通了wordcount例子。Hadoop的安装复杂性的体现之一是，Hadoop的安装文档非常多，但是能一个文档走下来的少之又少，尤其是Hadoop不同版本的配置差异非常的大。Hadoop2.5.2于前两天发布，但是它的配置跟2.5.0，2.5.1没有分别。 &nb
Anychart图表系列五之事件监听白糖_ chart
创建图表事件监听非常简单：首先是通过addEventListener('监听类型',js监听方法)添加事件监听，然后在js监听方法中定义具体监听逻辑。以钻取操作为例，当用户点击图表某一个point的时候弹出point的name和value，代码如下： <script> //创建AnyChart var chart = new AnyChart(); //添加钻取操作&quo
Web前端相关段子 braveCS web前端
Web标准：结构、样式和行为分离使用语义化标签 0）标签的语义：使用有良好语义的标签，能够很好地实现自我解释，方便搜索引擎理解网页结构，抓取重要内容。去样式后也会根据浏览器的默认样式很好的组织网页内容，具有很好的可读性，从而实现对特殊终端的兼容。 1）div和span是没有语义的：只是分别用作块级元素和行内元素的区域分隔符。当页面内标签无法满足设计需求时，才会适当添加div
编程之美-24点游戏 bylijinnan 编程之美
import java.util.ArrayList; import java.util.Arrays; import java.util.HashSet; import java.util.List; import java.util.Random; import java.util.Set; public class PointGame { /**编程之美
主页面子页面传值总结 chengxuyuancsdn 总结
1、showModalDialog returnValue是javascript中html的window对象的属性,目的是返回窗口值,当用window.showModalDialog函数打开一个IE的模式窗口时,用于返回窗口的值主界面 var sonValue=window.showModalDialog("son.jsp"); 子界面 window.retu
[网络与经济]互联网+的含义 comsci 互联网+
互联网+后面是一个人的名字 = 网络控制系统互联网+你的名字 = 网络个人数据库每日提示:如果人觉得不舒服,千万不要外出到处走动,就呆在床上,玩玩手游,更不能够去开车,现在交通状况不
oracle 创建视图 with check option daizj 视图 view oralce
我们来看下面的例子： create or replace view testview as select empno,ename from emp where ename like ‘M%’ with check option; 这里我们创建了一个视图，并使用了with check option来限制了视图。然后我们来看一下视图包含的结果： select * from testv
ToastPlugin插件在cordova3.3下使用 dibov Cordova
自己开发的Todos应用，想实现“ 再按一次返回键退出程序 ”的功能，采用网上的ToastPlugins插件，发现代码或文章基本都是老版本，运行问题比较多。折腾了好久才弄好。下面吧基于cordova3.3下的ToastPlugins相关代码共享。 ToastPlugin.java package&nbs
C语言22个系统函数 dcj3sjt126com c function
C语言系统函数一、数学函数下列函数存放在math.h头文件中Double floor(double num) 求出不大于num的最大数。Double fmod(x, y) 求整数x/y的余数。Double frexp(num, exp); double num; int *exp; 将num分为数字部分（尾数）x和以2位的指数部分n，即num=x*2n，指数n存放在exp指向的变量中，返回x。D
开发一个类的流程 dcj3sjt126com 开发
本人近日根据自己的开发经验总结了一个类的开发流程。这个流程适用于单独开发的构件，并不适用于对一个项目中的系统对象开发。开发出的类可以存入私人类库，供以后复用。以下是开发流程： 1. 明确类的功能，抽象出类的大概结构 2. 初步设想类的接口 3. 类名设计（驼峰式命名） 4. 属性设置(权限设置) 判断某些变量是否有必要作为成员属
java 并发 shuizhaosi888 java 并发
能够写出高伸缩性的并发是一门艺术在JAVA SE5中新增了3个包 java.util.concurrent java.util.concurrent.atomic java.util.concurrent.locks 在java的内存模型中，类的实例字段、静态字段和构成数组的对象元素都会被多个线程所共享，局部变量与方法参数都是线程私有的，不会被共享。
Spring Security（11）——匿名认证 234390216 Spring Security ROLE_ANNOYMOUS 匿名
匿名认证目录 1.1 配置 1.2 AuthenticationTrustResolver 对于匿名访问的用户，Spring Security支持为其建立一个匿名的AnonymousAuthenticat
NODEJS项目实践0.2[ express,ajax通信...] 逐行分析JS源代码 Ajax nodejs express
一、前言通过上节学习，我们已经 ubuntu系统搭建了一个可以访问的nodejs系统，并做了nginx转发。本节原要做web端服务及 mongodb的存取，但写着写着，web端就
在Struts2 的Action中怎样获取表单提交上来的多个checkbox的值 lhbthanks java html struts checkbox
第一种方法：获取结果String类型在 Action 中获得的是一个 String 型数据，每一个被选中的 checkbox 的 value 被拼接在一起，每个值之间以逗号隔开(,)。所以在 Action 中定义一个跟 checkbox 的 name 同名的属性来接收这些被选中的 checkbox 的 value 即可。以下是实现的代码：前台 HTML 代码：
003.Kafka基本概念 nweiren hadoop kafka
Kafka基本概念：Topic、Partition、Message、Producer、Broker、Consumer。 Topic：消息源（Message）的分类。 Partition： Topic物理上的分组，一
Linux环境下安装JDK roadrunners jdk linux
1、准备工作创建JDK的安装目录： mkdir -p /usr/java/ 下载JDK，找到适合自己系统的JDK版本进行下载： http://www.oracle.com/technetwork/java/javase/downloads/index.html 把JDK安装包下载到/usr/java/目录，然后进行解压： tar -zxvf jre-7
Linux忘记root密码的解决思路 tomcat_oracle linux
1：使用同版本的linux启动系统，chroot到忘记密码的根分区passwd改密码　　2：grub启动菜单中加入init=/bin/bash进入系统，不过这时挂载的是只读分区。根据系统的分区情况进一步判断. 　　3: grub启动菜单中加入 single以单用户进入系统. 　　4:用以上方法mount到根分区把/etc/passwd中的root密码去除　　例如: 　　ro
跨浏览器 HTML5 postMessage 方法以及 message 事件模拟实现 xueyou jsonp jquery 框架 UI html5
postMessage 是 HTML5 新方法，它可以实现跨域窗口之间通讯。到目前为止，只有 IE8+, Firefox 3, Opera 9, Chrome 3和 Safari 4 支持，而本篇文章主要讲述 postMessage 方法与 message 事件跨浏览器实现。postMessage 方法 JSONP 技术不一样，前者是前端擅长跨域文档数据即时通讯，后者擅长针对跨域服务端数据通讯，p

RetinaNet Examples：NVIDIA 一站式训练、推理及模型转换解决方案

main

parse

load_model

worker

train

infer

Model

__repr__

initialize

forward

_extract_targets

_compute_loss

save

load

export

DaliDataIterator

__repr__

__len__

__iter__

COCOPipeline

define_graph

snap_to_anchors

参考资料：

你可能感兴趣的:(DeepLearning,ObjectDetection,PyTorch,GPU)

repr

repr

len

iter