pytroch五——engine

总览

engine

inference.py : 测试时运行
trainer.py: 训练模型时运行

trainer.py

完整代码见链接：这里为了简单，赋值命令和logging操作，SummaryWriter操作就会省去：

一、import

import logging

import matplotlib.pyplot as plt
import numpy as np
import torch
from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
from ignite.handlers import ModelCheckpoint, Timer
from ignite.metrics import Loss, RunningAverage
from tensorboardX import SummaryWriter

from data.transforms import build_untransform
from data.transforms.transforms import COLORMAP
from utils.metric import Label_Accuracy

plt.switch_backend('agg')

二、定义do_train函数

def do_train(
        cfg,
        model,
        train_loader,
        val_loader,
        optimizer,
        loss_fn
):  # 以上均为实例

创建train_engine

trainer = create_supervised_trainer(model, optimizer, loss_fn, device=device)

通过该函数创建trainer，这个负责模型的训练，像以往的for index , data enumrater(dataloader)这类操作抽象化：

def create_supervised_trainer(model, optimizer, loss_fn,
                              device=None, non_blocking=False,
                              prepare_batch=_prepare_batch,
                              output_transform=lambda x, y, y_pred, loss: loss.item()):
    """
    Factory function for creating a trainer for supervised models.

    Args:
        model (`torch.nn.Module`): the model to train.
        optimizer (`torch.optim.Optimizer`): the optimizer to use.
        loss_fn (torch.nn loss function): the loss function to use.
        device (str, optional): device type specification (default: None).
            Applies to both model and batches.
        non_blocking (bool, optional): if True and this copy is between CPU and GPU, the copy may occur asynchronously
            with respect to the host. For other cases, this argument has no effect. # 默认即可
        prepare_batch (callable, optional): function that receives `batch`, `device`, `non_blocking` and outputs
            tuple of tensors `(batch_x, batch_y)`.  # 可调用函数
        output_transform (callable, optional): function that receives 'x', 'y', 'y_pred', 'loss' and returns value
            to be assigned to engine's state.output after each iteration. Default is returning `loss.item()`.

    Note: `engine.state.output` for this engine is defind by `output_transform` parameter and is the loss
        of the processed batch by default.

    Returns:
        Engine: a trainer engine with supervised update function.
    """
    if device:
        model.to(device)

    def _update(engine, batch):  # optimizer置零——样本——y_pred——计算loss——计算梯度——更新参数——输出loss
        model.train()
        optimizer.zero_grad()
        x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
        y_pred = model(x)
        loss = loss_fn(y_pred, y)
        loss.backward()
        optimizer.step()
        return output_transform(x, y, y_pred, loss)

    return Engine(_update)

创建evaluater
这里需要以dict的形式传入metric，

evaluator = create_supervised_evaluator(model, metrics={'mean_iu': Label_Accuracy(cfg.MODEL.NUM_CLASSES),
                                                            'loss': Loss(loss_fn)}, device=device)

def _prepare_batch(batch, device=None, non_blocking=False):
    """Prepare batch for training: pass to a device with options.

    """
    x, y = batch
    return (convert_tensor(x, device=device, non_blocking=non_blocking),
            convert_tensor(y, device=device, non_blocking=non_blocking))


def create_supervised_trainer(model, optimizer, loss_fn,
                              device=None, non_blocking=False,
                              prepare_batch=_prepare_batch,
                              output_transform=lambda x, y, y_pred, loss: loss.item()):
    """
    Factory function for creating a trainer for supervised models.

    Args:
        model (`torch.nn.Module`): the model to train.
        optimizer (`torch.optim.Optimizer`): the optimizer to use.
        loss_fn (torch.nn loss function): the loss function to use.
        device (str, optional): device type specification (default: None).
            Applies to both model and batches.
        non_blocking (bool, optional): if True and this copy is between CPU and GPU, the copy may occur asynchronously
            with respect to the host. For other cases, this argument has no effect.
        prepare_batch (callable, optional): function that receives `batch`, `device`, `non_blocking` and outputs
            tuple of tensors `(batch_x, batch_y)`.
        output_transform (callable, optional): function that receives 'x', 'y', 'y_pred', 'loss' and returns value
            to be assigned to engine's state.output after each iteration. Default is returning `loss.item()`.

    Note: `engine.state.output` for this engine is defind by `output_transform` parameter and is the loss
        of the processed batch by default.

    Returns:
        Engine: a trainer engine with supervised update function.
    """
    if device:
        model.to(device)

    def _update(engine, batch):
        model.train()
        optimizer.zero_grad()
        x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
        y_pred = model(x)
        loss = loss_fn(y_pred, y)
        loss.backward()
        optimizer.step()
        return output_transform(x, y, y_pred, loss)

    return Engine(_update)

Loss

注意到传入creater_evaluatior中的损失函数Loss（）

class Loss(Metric):
    """
    Calculates the average loss according to the passed loss_fn.

    Args:
        loss_fn (callable): a callable taking a prediction tensor, a target
            tensor, optionally other arguments, and returns the average loss
            over all observations in the batch.
        output_transform (callable): a callable that is used to transform the
            :class:`~ignite.engine.Engine`'s `process_function`'s output into the
            form expected by the metric.
            This can be useful if, for example, you have a multi-output model and
            you want to compute the metric with respect to one of the outputs.
            The output is is expected to be a tuple (prediction, target) or
            (prediction, target, kwargs) where kwargs is a dictionary of extra
            keywords arguments.
        batch_size (callable): a callable taking a target tensor that returns the
            first dimension size (usually the batch size).

    """

    def __init__(self, loss_fn, output_transform=lambda x: x,
                 batch_size=lambda x: x.shape[0]):
        super(Loss, self).__init__(output_transform)
        self._loss_fn = loss_fn
        self._batch_size = batch_size

    def reset(self):
        self._sum = 0
        self._num_examples = 0

    def update(self, output):
        if len(output) == 2:
            y_pred, y = output
            kwargs = {}
        else:
            y_pred, y, kwargs = output
        average_loss = self._loss_fn(y_pred, y, **kwargs)

        if len(average_loss.shape) != 0:
            raise ValueError('loss_fn did not return the average loss.')

        N = self._batch_size(y)
        self._sum += average_loss.item() * N
        self._num_examples += N

    def compute(self):
        if self._num_examples == 0:
            raise NotComputableError(
                'Loss must have at least one example before it can be computed.')
        return self._sum / self._num_examples

handle

1. ModelCheckpoint

保存文件时需要用到的类：

 checkpointer = ModelCheckpoint(output_dir, 'fcn', checkpoint_period, n_saved=10, require_empty=False)

函数原型：

class ModelCheckpoint(object):
    """ ModelCheckpoint handler can be used to periodically save objects to disk.

    This handler expects two arguments:

        - an :class:`~ignite.engine.Engine` object
        - a `dict` mapping names (`str`) to objects that should be saved to disk.

    See Notes and Examples for further details.

    Args:
        dirname (str):
            Directory path where objects will be saved.
        filename_prefix (str):
            Prefix for the filenames to which objects will be saved. See Notes
            for more details.
        save_interval (int, optional):
            if not None, objects will be saved to disk every `save_interval` calls to the handler.
            Exactly one of (`save_interval`, `score_function`) arguments must be provided.
        score_function (callable, optional):
            if not None, it should be a function taking a single argument,
            an :class:`~ignite.engine.Engine` object,
            and return a score (`float`). Objects with highest scores will be retained.
            Exactly one of (`save_interval`, `score_function`) arguments must be provided.
        score_name (str, optional):
            if `score_function` not None, it is possible to store its absolute value using `score_name`. See Notes for
            more details.
        n_saved (int, optional):
            Number of objects that should be kept on disk. Older files will be removed.
        atomic (bool, optional):
            If True, objects are serialized to a temporary file,
            and then moved to final destination, so that files are
            guaranteed to not be damaged (for example if exception occures during saving).
        require_empty (bool, optional):
            If True, will raise exception if there are any files starting with `filename_prefix`
            in the directory 'dirname'.
        create_dir (bool, optional):
            If True, will create directory 'dirname' if it doesnt exist.
        save_as_state_dict (bool, optional):
            If True, will save only the `state_dict` of the objects specified, otherwise the whole object will be saved.

    Notes:
          This handler expects two arguments: an :class:`~ignite.engine.Engine` object and a `dict`
          mapping names to objects that should be saved.

          These names are used to specify filenames for saved objects.
          Each filename has the following structure:
          `{filename_prefix}_{name}_{step_number}.pth`.
          Here, `filename_prefix` is the argument passed to the constructor,
          `name` is the key in the aforementioned `dict`, and `step_number`
          is incremented by `1` with every call to the handler.

          If `score_function` is provided, user can store its absolute value using `score_name` in the filename.
          Each filename can have the following structure:
          `{filename_prefix}_{name}_{step_number}_{score_name}={abs(score_function_result)}.pth`.
          For example, `score_name="val_loss"` and `score_function` that returns `-loss` (as objects with highest scores
          will be retained), then saved models filenames will be `model_resnet_10_val_loss=0.1234.pth`.

    Examples:
        >>> import os
        >>> from ignite.engine import Engine, Events
        >>> from ignite.handlers import ModelCheckpoint
        >>> from torch import nn
        >>> trainer = Engine(lambda batch: None)
        >>> handler = ModelCheckpoint('/tmp/models', 'myprefix', save_interval=2, n_saved=2, create_dir=True)
        >>> model = nn.Linear(3, 3)
        >>> trainer.add_event_handler(Events.EPOCH_COMPLETED, handler, {'mymodel': model})
        >>> trainer.run([0], max_epochs=6)
        >>> os.listdir('/tmp/models')
        ['myprefix_mymodel_4.pth', 'myprefix_mymodel_6.pth']
    """

2.timer

时间管理相关类：

  timer = Timer(average=True)

函数原型：

class Timer:
    """ Timer object can be used to measure (average) time between events.

    Args:
        average (bool, optional): if True, then when ``.value()`` method is called, the returned value
            will be equal to total time measured, divided by the value of internal counter.

    Attributes:
        total (float): total time elapsed when the Timer was running (in seconds).
        step_count (int): internal counter, usefull to measure average time, e.g. of processing a single batch.
            Incremented with the ``.step()`` method.
        running (bool): flag indicating if timer is measuring time.

    Notes:
        When using ``Timer(average=True)`` do not forget to call ``timer.step()`` everytime an event occurs. See
        the examples below.

    Examples:

        Measuring total time of the epoch:

        >>> from ignite.handlers import Timer
        >>> import time
        >>> work = lambda : time.sleep(0.1)
        >>> idle = lambda : time.sleep(0.1)
        >>> t = Timer(average=False)
        >>> for _ in range(10):
        ...    work()
        ...    idle()
        ...
        >>> t.value()
        2.003073937026784

        Measuring average time of the epoch:

        >>> t = Timer(average=True)
        >>> for _ in range(10):
        ...    work()
        ...    idle()
        ...    t.step()
        ...
        >>> t.value()
        0.2003182829997968

        Measuring average time it takes to execute a single ``work()`` call:

        >>> t = Timer(average=True)
        >>> for _ in range(10):
        ...    t.resume()
        ...    work()
        ...    t.pause()
        ...    idle()
        ...    t.step()
        ...
        >>> t.value()
        0.10016545779653825

        Using the Timer to measure average time it takes to process a single batch of examples:

        >>> from ignite.engine import Engine, Events
        >>> from ignite.handlers import Timer
        >>> trainer = Engine(training_update_function)
        >>> timer = Timer(average=True)
        >>> timer.attach(trainer,
        ...              start=Events.EPOCH_STARTED,
        ...              resume=Events.ITERATION_STARTED,
        ...              pause=Events.ITERATION_COMPLETED,
        ...              step=Events.ITERATION_COMPLETED)
    """

3.RunningAverage

    # automatically adding handlers via a special `attach` method of `RunningAverage` handler
    # 这里有一种滑动平均的感觉
    RunningAverage(output_transform=lambda x: x).attach(trainer, 'avg_loss')

这里的attach，我们可以看下函数原型：

class RunningAverage(Metric):
    """Compute running average of a metric or the output of process function.

    Args:
        src (Metric or None): input source: an instance of :class:`~ignite.metrics.Metric` or None. The latter
            corresponds to `engine.state.output` which holds the output of process function.
        alpha (float, optional): running average decay factor, default 0.98
        output_transform (callable, optional): a function to use to transform the output if `src` is None and
            corresponds the output of process function. Otherwise it should be None.

    Examples:

    .. code-block:: python

        alpha = 0.98
        acc_metric = RunningAverage(Accuracy(output_transform=lambda x: [x[1], x[2]]), alpha=alpha)
        acc_metric.attach(trainer, 'running_avg_accuracy')

        avg_output = RunningAverage(output_transform=lambda x: x[0], alpha=alpha)
        avg_output.attach(trainer, 'running_avg_loss')

        @trainer.on(Events.ITERATION_COMPLETED)
        def log_running_avg_metrics(engine):
            print("running avg accuracy:", engine.state.metrics['running_avg_accuracy'])
            print("running avg loss:", engine.state.metrics['running_avg_loss'])

    """

    def __init__(self, src=None, alpha=0.98, output_transform=None):   # 接受metric对象
        if not (isinstance(src, Metric) or src is None):
            raise TypeError("Argument src should be a Metric or None.")
        if not (0.0 < alpha <= 1.0):
            raise ValueError("Argument alpha should be a float between 0.0 and 1.0.")

        if isinstance(src, Metric):  # src为metric，则output_transform = None
            if output_transform is not None:
                raise ValueError("Argument output_transform should be None if src is a Metric.")
            self.src = src
            self._get_src_value = self._get_metric_value
            self.iteration_completed = self._metric_iteration_completed
        else:
            if output_transform is None:
                raise ValueError("Argument output_transform should not be None if src corresponds "
                                 "to the output of process function.")
            self._get_src_value = self._get_output_value
            self.update = self._output_update

        self.alpha = alpha
        super(RunningAverage, self).__init__(output_transform=output_transform)

    def reset(self):
        self._value = None

    def update(self, output):
        # Implement abstract method
        pass

    def compute(self):
        if self._value is None:
            self._value = self._get_src_value()
        else:
            self._value = self._value * self.alpha + (1.0 - self.alpha) * self._get_src_value()
        return self._value

    def attach(self, engine, name):
        # restart average every epoch
        engine.add_event_handler(Events.EPOCH_STARTED, self.started)
        # compute metric
        engine.add_event_handler(Events.ITERATION_COMPLETED, self.iteration_completed)
        # apply running average
        engine.add_event_handler(Events.ITERATION_COMPLETED, self.completed, name)

    def _get_metric_value(self):
        return self.src.compute()

    def _get_output_value(self):
        return self.src

    def _metric_iteration_completed(self, engine):
        self.src.started(engine)
        self.src.iteration_completed(engine)

    def _output_update(self, output):
        self.src = output

add handle to event

模型保存函数添加到epcoh_completerd事件中，

    trainer.add_event_handler(Events.EPOCH_COMPLETED, checkpointer, {'model': model.state_dict(),
                                                                     'optimizer': optimizer.state_dict()})

Event类的成员函数add_event_handler:

 def add_event_handler(self, event_name, handler, *args, **kwargs):
        """Add an event handler to be executed when the specified event is fired.

        Args:
            event_name: An event to attach the handler to. Valid events are from :class:`~ignite.engine.Events`
                or any `event_name` added by :meth:`~ignite.engine.Engine.register_events`.
            handler (callable): the callable event handler that should be invoked
            *args: optional args to be passed to `handler`.
            **kwargs: optional keyword args to be passed to `handler`.

        Notes:
              The handler function's first argument will be `self`, the :class:`~ignite.engine.Engine` object it
              was bound to.

              Note that other arguments can be passed to the handler in addition to the `*args` and  `**kwargs`
              passed here, for example during :attr:`~ignite.engine.Events.EXCEPTION_RAISED`.

        Example usage:

        .. code-block:: python

            engine = Engine(process_function)

            def print_epoch(engine):
                print("Epoch: {}".format(engine.state.epoch))

            engine.add_event_handler(Events.EPOCH_COMPLETED, print_epoch)

        """
        if event_name not in self._allowed_events:
            self._logger.error("attempt to add event handler to an invalid event %s.", event_name)
            raise ValueError("Event {} is not a valid event for this Engine.".format(event_name))

        event_args = (Exception(), ) if event_name == Events.EXCEPTION_RAISED else ()
        self._check_signature(handler, 'handler', *(event_args + args), **kwargs)

        self._event_handlers[event_name].append((handler, args, kwargs))
        self._logger.debug("added handler for event %s.", event_name)

就上面{'model': model.state_dict(), 'optimizer': optimizer.state_dict() 则一部分，以参数的形式，传递给checker，那么checker将会调用call函数。之后，不清楚
2. timer attain 到 train

    # automatically adding handlers via a special `attach` method of `Timer` handler
    timer.attach(trainer, start=Events.EPOCH_STARTED, resume=Events.ITERATION_STARTED,
                 pause=Events.ITERATION_COMPLETED, step=Events.ITERATION_COMPLETED)

timer的attain成员函数：

    def attach(self, engine, start=Events.STARTED, pause=Events.COMPLETED, resume=None, step=None):
        """ Register callbacks to control the timer.

        Args:
            engine (Engine):
                Engine that this timer will be attached to.
            start (Events):
                Event which should start (reset) the timer.
            pause (Events):
                Event which should pause the timer.
            resume (Events, optional):
                Event which should resume the timer.
            step (Events, optional):
                Event which should call the `step` method of the counter.

        Returns:
            self (Timer)

        """

        engine.add_event_handler(start, self.reset)
        engine.add_event_handler(pause, self.pause)

        if resume is not None:
            engine.add_event_handler(resume, self.resume)

        if step is not None:
            engine.add_event_handler(step, self.step)

記錄事件：

    # adding handlers using `trainer.on` decorator API
    @trainer.on(Events.ITERATION_COMPLETED)
    def log_training_loss(engine):
        iter = (engine.state.iteration - 1) % len(train_loader) + 1

        if iter % log_period == 0:
            logger.info("Epoch[{}] Iteration[{}/{}] Loss: {:.3f}"
                        .format(engine.state.epoch, iter, len(train_loader), engine.state.metrics['avg_loss']))
            writer.add_scalars("loss", {'train': engine.state.metrics['avg_loss']}, engine.state.iteration)

    # adding handlers using `trainer.on` decorator API
    @trainer.on(Events.EPOCH_COMPLETED)
    def log_training_results(engine):
        evaluator.run(train_loader)
        metrics = evaluator.state.metrics
        mean_iu = metrics['mean_iu']
        avg_loss = metrics['loss']
        logger.info("Training Results - Epoch: {} Mean IU: {:.3f} Avg Loss: {:.3f}"
                    .format(engine.state.epoch, mean_iu, avg_loss))
        writer.add_scalars("mean_iu", {'train': mean_iu}, engine.state.epoch)

    if val_loader is not None:
        # adding handlers using `trainer.on` decorator API
        @trainer.on(Events.EPOCH_COMPLETED)
        def log_validation_results(engine):
            evaluator.run(val_loader)
            metrics = evaluator.state.metrics
            mean_iu = metrics['mean_iu']
            avg_loss = metrics['loss']
            logger.info("Validation Results - Epoch: {} Mean IU: {:.3f} Avg Loss: {:.3f}"
                        .format(engine.state.epoch, mean_iu, avg_loss)
                        )
            writer.add_scalars("loss", {'validation': avg_loss}, engine.state.iteration)
            writer.add_scalars("mean_iu", {'validation': mean_iu}, engine.state.epoch)

    # adding handlers using `trainer.on` decorator API
    @trainer.on(Events.EPOCH_COMPLETED)
    def print_times(engine):
        logger.info('Epoch {} done. Time per batch: {:.3f}[s] Speed: {:.1f}[samples/s]'
                    .format(engine.state.epoch, timer.value() * timer.step_count,
                            train_loader.batch_size / timer.value()))
        timer.reset()

    @trainer.on(Events.EPOCH_COMPLETED)
    def plot_output(engine):
        model.eval()
        dataset = val_loader.dataset
        idx = np.random.choice(np.arange(len(dataset)), size=1).item()
        val_x, val_y = dataset[idx]
        val_x = val_x.to(device)
        with torch.no_grad():
            pred_y = model(val_x.unsqueeze(0))

        orig_img, val_y = untransform(val_x.cpu().data, val_y)
        pred_y = pred_y.max(1)[1].cpu().data[0].numpy()
        pred_val = cm[pred_y]
        seg_val = cm[val_y]

        # matplotlib
        fig = plt.figure(figsize=(9, 3))
        plt.subplot(131)
        plt.imshow(orig_img)
        plt.axis("off")

        plt.subplot(132)
        plt.imshow(seg_val)
        plt.axis("off")

        plt.subplot(133)
        plt.imshow(pred_val)
        plt.axis("off")
        writer.add_figure('show_result', fig, engine.state.iteration)

    trainer.run(train_loader, max_epochs=epochs)
    writer.close()

10分钟学会Docker的安装和使用明月醉窗台 #模型部署 docker 容器算法人工智能
文章目录1Docker简介2Docker安装2.1Windows安装2.2Linux安装2.2.1CentOS安装2.2.2Ubuntu安装2.3验证安装是否成功3Docker镜像（DockerImages）3.1镜像简介3.2镜像操作常用指令4Docker容器4.1容器简介4.2容器操作常用指令5DIY一个自己的Pytorch镜像6容器数据卷6.1问题描述6.2使用数据卷6.3具名挂载和匿名挂载
动手学深度学习（pytorch）学习记录20-自定义层[学习记录] walfar pytorch 深度学习 pytorch 学习
在深度学习中，自定义层是指开发者根据特定需求编写的神经网络层，而不是使用深度学习框架（如PyTorch、TensorFlow等）提供的现成层。自定义层可以让模型更加灵活，以适应特定的任务或数据集。目录没有参数的自定义层带参数的层没有参数的自定义层下面的CenteredLayer类要从其输入中减去均值。要构建它，只需继承基础层类并实现前向传播功能。importtorchimporttorch.nn.
动手学深度学习（pytorch）学习记录21-读写文件(模型与参数)[学习记录] walfar pytorch 深度学习 pytorch 学习
目录加载和保存张量加载和保存模型参数保存模型的好处众多，涵盖了从开发到部署的整个机器学习生命周期。节省资源：训练模型可能需要大量的时间和计算资源。保存模型可以避免重复训练，从而节省时间和计算资源。快速部署：一旦模型被训练并保存，它可以迅速部署到生产环境中，加速产品上市时间。版本控制：保存不同版本的模型有助于跟踪模型的迭代过程，便于比较和回滚到之前的版本。离线使用：保存的模型可以在没有网络连接的情况
PyTorch深度学习实战（26）—— PyTorch与Multi-GPU shangjg3 PyTorch深度学习实战深度学习 pytorch 人工智能
当拥有多块GPU时，可以利用分布式计算（DistributedComputation）与并行计算（ParallelComputation）的方式加速网络的训练过程。在这里，分布式是指有多个GPU在多台服务器上，并行指一台服务器上的多个GPU。在工作环境中，使用这两种方式加速模型训练是非常重要的技能。本文将介绍PyTorch中分布式与并行的常见方法，读者需要注意这二者的区别，并关注它们在使用时的注意
PyTorch 基础学习（14）- 归一化花千树-010 PyTorch pytorch 学习人工智能
系列文章：《PyTorch基础学习》文章索引概述归一化是数据预处理中的重要步骤之一，它可以将数据调整到特定的范围或分布，有助于加速训练并提高模型的性能。在机器学习中，不同的归一化方法适用于不同的场景。本文将详细介绍scikit-learn中的常见归一化方法及其应用。1.Min-Max归一化MinMaxScalerMin-Max归一化将数据缩放到指定范围，通常是[0,1]。这种方法保留了数据的相对关
PyTorch 基础学习花千树-010 大讨论 pytorch 学习人工智能
文章索引：PyTorch基础学习（1）-快速入门PyTorch基础学习（2）-张量TensorsPyTorch基础学习（3）-张量的数学操作PyTorch基础学习（4）-张量的类型PyTorch基础学习（5）-神经网络PyTorch基础学习（6）-函数APIPyTorch基础学习（7）-自动微分PyTorch基础学习（8）-多进程并发PyTorch基础学习（9）-训练优化器PyTorch基础学习（
Docker 安装Miniconda 奋进在AI路上的小李公司实习 python
下载Miniconda下载地址执行sh命令完成Miniconda的安装中间输入yes完成安装后，执行链接中的命令完成环境配置创建自己conda环境，需要配置清华镜像安装对应的pytorch，Pytorch的下载地址
PyTorch深度学习实战（27）—— PyTorch分布式训练 shangjg3 PyTorch深度学习实战深度学习 pytorch 分布式 python
本节将详细介绍如何进行神经网络的分布式训练。其中1.1将结合MPI介绍分布式训练的基本流程，1.2与1.3将分别介绍如何使用torch.distributed以及Horovod进行神经网络的分布式训练。1PyTorch分布式训练1.1使用MPI进行分布式训练下面讲解如何利用MPI进行PyTorch的分布式训练。这里主要介绍的是数据并行的分布式方法：每一块GPU都有同一个模型的副本，仅加载不同的数据
PyTorch Geometric（torch_geometric）简介小桥流水---人工智能机器学习算法深度学习人工智能 pytorch 人工智能 python
在深入探讨PyTorchGeometric（通常简称为PyG）之前，我们先了解一下它的背景和应用。PyG是基于PyTorch的一个扩展库，专为图数据和图网络模型设计。图网络是深度学习领域的一种强大工具，它能够处理结构化数据，如社交网络、分子结构、交通网络等。PyTorchGeometric的主要功能数据处理与加载：图数据的简化表示：PyG提供了一种高效的方式来表示和存储图数据。主要是通过Data对
深入理解PyTorch中的`torch.topk`函数！！！（个人总结，为了方便我自己复习，要是同时也能帮助到大家就更好了）小桥流水---人工智能人工智能深度学习机器学习算法 pytorch 人工智能 python
torch.topk深入理解PyTorch中的`torch.topk`函数1.`torch.topk`函数概述函数签名返回值2.基本用法示例1：找到一维张量的最大值示例2：在二维张量的指定维度上操作3.高级应用4.结论深入理解PyTorch中的torch.topk函数在深度学习和数据处理中，经常需要对数据进行排序并提取最重要的部分。PyTorch提供了一个非常有用的函数torch.topk，它能够
在 PyTorch 中，`permute` 方法是一个强大的工具，用于重排张量的维度。小桥流水---人工智能人工智能机器学习算法深度学习 pytorch 人工智能 python
在PyTorch中，permute方法是一个强大的工具，用于重排张量的维度。这在深度学习中非常有用，尤其是在处理具有多维数据（如图像、视频或复杂数组）的神经网络时。PyTorch中的permute方法详解1.permute方法概述在PyTorch中，permute方法允许用户重新排列张量的维度。这与NumPy的transpose方法类似，但提供了更灵活的多维重排能力。该方法非常有用，例如，当你需要
PyTorch概述 fydw_715 pytorch pytorch 人工智能 python
PyTorch是一个开源的机器学习框架，由Facebook的人工智能研究团队开发。它广泛用于深度学习和神经网络的研究和开发。PyTorch以其动态计算图、灵活性和简单易用的接口而闻名，深受研究人员和开发者的喜爱。以下是PyTorch的一些重要模块及其功能：torch简介：这是PyTorch的核心库，提供了张量（tensor）操作的基本功能。功能：支持张量的创建、操作和转换，涵盖数学运算、线性代数操
深入理解PyTorch中的MessagePassing 小桥流水---人工智能深度学习机器学习算法人工智能 pytorch 人工智能 python
深入理解PyTorch中的MessagePassing图神经网络（GraphNeuralNetworks，简称GNNs）在近年来已成为处理图形数据的一种强大工具，广泛应用于社交网络分析、蛋白质结构预测、知识图谱增强等多个领域。PyTorchGeometric（PyG）是基于PyTorch的一个库，专为图神经网络的研究和实现而设计。在PyG中，MessagePassing类是实现图神经网络层的核心组
python 数据分析损失数值如何放到csv中呢人工智能深度神经网络，Pytorch ,tensorflow zhangfeng1133 python 人工智能数据分析
损失数值如何放到csv中呢在Python中，使用`csv`模块将数据写入CSV文件是一种常见的操作。从你提供的代码片段来看，你想要将损失数值写入名为`middle_losse.csv`的文件中。但是，你提供的代码片段中存在一些需要修改的地方，以确保数据能够正确地写入CSV文件。首先，`csv.writer`对象的`writerows`方法需要一个可迭代对象，例如列表的列表，而不是单个列表。如果你的
查询vllm-flash-attn与之对应的pytorch 源来猿往运行环境 pytorch 人工智能 python
最近安装vllm的时候有时候pytorch版本总是弄错，这里写下vllm-flash-attn与pytoch对应关系打开网站vllm-flash-attn·PyPI查询历史版本点进去查询对应的pytoch
释放GPU潜能：PyTorch中torch.nn.DataParallel的数据并行实践 2401_85762266 pytorch 人工智能 python
释放GPU潜能：PyTorch中torch.nn.DataParallel的数据并行实践在深度学习模型的训练过程中，计算资源的需求往往随着模型复杂度的提升而增加。PyTorch，作为当前领先的深度学习框架之一，提供了torch.nn.DataParallel这一工具，使得开发者能够利用多个GPU进行数据并行处理，从而显著加速模型训练。本文将详细介绍如何在PyTorch中使用torch.nn.Dat
并行处理的魔法：PyTorch中torch.multiprocessing的多进程训练指南 liuxin33445566 人工智能深度学习机器学习
并行处理的魔法：PyTorch中torch.multiprocessing的多进程训练指南在深度学习领域，模型训练往往需要大量的计算资源和时间。PyTorch，作为当前最流行的深度学习框架之一，提供了torch.multiprocessing模块，使得开发者能够利用多核CPU进行多进程训练，从而显著加速训练过程。本文将深入探讨如何在PyTorch中使用torch.multiprocessing进行
20.神经网络 - 搭建小实战和 Sequential 的使用椰皮糖深度学习神经网络人工智能深度学习
神经网络-搭建小实战和Sequential的使用在PyTorch中，Sequential是一个容器（container）类，用于构建神经网络模型。它允许你按顺序（sequential）添加不同的网络层，并将它们串联在一起，形成一个网络模型。这样做可以方便地定义简单的前向传播过程，适用于许多基本的网络结构。Sequential的优点之一是其简洁性和易读性，特别适用于简单的网络结构。然而，对于更复杂的
【Rust日报】 2019-05-14：Rust中哪些特性是零开销抽象的六六子大顺1
tract-一个神经网络训练库Snips（一家做音频识别的创业公司）出品。在神经网络领域，现在基本已经被TensorFlow和PyTorch给占了。但是对于移动设备或IoT这些性能受限的设备，还有很多空间可以尝试。TensorFlow组推出了TensorFlowLite，微软的ONNX看上去也很有前景。一些硬件厂商也推出了他们自己的方案AndroidNNAPI,ARMNNSDK，AppleBNNS
PyTorch库学习之torch.mean函数 Midsummer-逐梦 #torch pytorch 学习人工智能
PyTorch库学习之torch.mean函数一、简介torch.mean是PyTorch库中的一个函数，用于计算张量的均值。它可以沿着指定的维度或者整个张量计算均值，是数据分析和机器学习中常用的操作之一。二、语法和参数语法:torch.mean(input,dim=None,keepdim=False,*,out=None)参数:input(torch.Tensor):输入张量。dim(int,
理解PyTorch版YOLOv5模型构架 LabVIEW_Python
一个深度学习模型，可以拆解为：模型构架(ModelArchitecture):下面详述激活函数(ActivationFunction)：YOLOv5在隐藏层中使用了LeakyReLU激活函数，在最后的检测层中使用了Sigmoid激活函数，参考这里优化函数(OptimizationFunction)：YOLOv5的默认优化算法是：SGD；可以通过命令行参数更改为Adam损失函数(LossFuncti
yolo v8 + flask部署到云服务器，以及问题记录智商不够_熬夜来凑 YOLO pytorch python flask
环境安装1、运行项目报错：nopythonapplicationfound,checkyourstartuplogsforerrors在云服务器pytorch版本安装错了，安装了GPU版本，需要安装CPU版本#CPUonly使用下面这段代码避免出现第二个错误pipinstalltorch==2.3.1torchvision==0.18.1torchaudio==2.3.1--index-urlht
pytorch | torch.contiguous()方法 Mopes__ 分享 pytorch 人工智能 python
torch.contiguous()方法语义上是“连续的”，经常与torch.permute()、torch.transpose()、torch.view()方法一起使用，要理解这样使用的缘由，得从pytorch多维数组的低层存储开始说起：touch.view()方法对张量改变“形状”其实并没有改变张量在内存中真正的形状，可以理解为：view方法没有拷贝新的张量，没有开辟新内存，与原张量共享内存；
并行计算的艺术：PyTorch中torch.cuda.nccl的多GPU通信精粹 2401_85763639 pytorch 人工智能 python
并行计算的艺术：PyTorch中torch.cuda.nccl的多GPU通信精粹在深度学习领域，模型的规模和复杂性不断增长，单GPU的计算能力已难以满足需求。多GPU并行计算成为提升训练效率的关键。PyTorch作为灵活且强大的深度学习框架，通过torch.cuda.nccl模块提供了对NCCL（NVIDIACollectiveCommunicationsLibrary）的支持，为多GPU通信提供
如何本地搭建 Whisper 语音识别模型？一文解决玩AI的小胡子 whisper AIGC 人工智能语音识别
Whisper是OpenAI开发的强大语音识别模型，适用于多种语言的语音转文字任务。要在本地搭建Whisper模型，需要完成以下几个步骤，确保模型在你的设备上顺利运行。1.准备环境首先，确保你的系统上安装了Python（版本3.8到3.11之间）。此外，还需要安装PyTorch，这是Whisper依赖的深度学习框架。2.安装Whisper在命令行中运行以下命令来安装Whisper和其依赖项：pip
精准掌控GPU：深度学习中PyTorch的torch.cuda.device应用指南 2401_85760095 深度学习 pytorch 人工智能
精准掌控GPU：深度学习中PyTorch的torch.cuda.device应用指南在深度学习的世界里，GPU加速已成为提升模型训练和推理速度的关键。PyTorch，作为当下最流行的深度学习框架之一，提供了torch.cuda.device这一强大的工具，允许开发者精确指定和控制GPU设备。本文将深入探讨如何在PyTorch中使用torch.cuda.device来指定GPU设备，优化你的深度学习
PyTorch实现CIFAR-10分类代码曹勖之 PyTorch学习之路深度学习 pytorch
这篇是PyTorch学习之路第七篇，用于记录PyTorch实现CIFAR-10分类代码（书上的代码有好多冗余）目录完整代码（还未训练）完整代码（已训练，直接载入模型）下面实例数据集位于：C:\Users\22130\Learning_Pytorch\dataset完整代码（还未训练）importtorchimporttorchvisionimporttorchvision.transformsas
【pytorch(cuda)】基于DQN算法的无人机三维城市空间航线规划（Python代码实现）程序猿鑫 python pytorch 算法
欢迎来到本博客❤️❤️博主优势：博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。⛳️座右铭：行百里者，半于九十。本文目录如下：目录⛳️赠与读者1概述一、研究背景与意义二、DQN算法概述三、基于DQN的无人机三维航线规划方法1.环境建模2.状态与动作定义3.奖励函数设计4.深度神经网络训练5.航线规划四、研究挑战与展望2运行结果3参考文献4Python代码实现⛳️赠与读者‍做科研，涉及到一个深在的
PYTORCH 官方文档，开发文档，Python编程人工智能深度机器学习 zhangfeng1133 pytorch 人工智能 python
PYTORCH文档PyTorchdocumentation—PyTorchmasterdocumentationPyTorch是一个使用GPU和CPU进行深度学习的优化张量库。本文档中描述的功能按版本状态分类:稳定:这些功能将被长期维护，并且在文档中通常不应该有重大的性能限制或缺口。我们还希望保持向后兼容性(虽然突破性的变化可能会发生，通知将提前一个版本)。测试版:这些功能被标记为测试版，因为AP
torch.nn到底是什么？ yanglamei1962 PyTorch学习教程 python 深度学习 pytorch
torch.nn到底是什么？我们建议将本教程作为笔记本而不是脚本来运行。要下载笔记本（.ipynb）文件，请单击页面顶部的链接。PyTorch提供设计精美的模块和类torch.nn，torch.optim，Dataset和DataLoader神经网络。为了充分利用它们的功能并针对您的问题对其进行自定义，您需要真正了解它们在做什么。为了建立这种理解，我们将首先在MNIST数据集上训练基本神经网络，而
Java开发中，spring mvc 的线程怎么调用？小麦麦子 spring mvc
今天逛知乎，看到最近很多人都在问spring mvc 的线程http://www.maiziedu.com/course/java/ 的启动问题，觉得挺有意思的，那哥们儿问的也听仔细，下面的回答也很详尽，分享出来，希望遇对遇到类似问题的Java开发程序猿有所帮助。问题：在用spring mvc架构的网站上，设一线程在虚拟机启动时运行，线程里有一全局
maven依赖范围 bitcarter maven
1.test 测试的时候才会依赖，编译和打包不依赖，如junit不被打包 2.compile 只有编译和打包时才会依赖 3.provided 编译和测试的时候依赖，打包不依赖，如：tomcat的一些公用jar包 4.runtime 运行时依赖，编译不依赖 5.默认compile 依赖范围compile是支持传递的，test不支持传递 1.传递的意思是项目A，引用
Jaxb org.xml.sax.saxparseexception : premature end of file darrenzhu xml premature JAXB
如果在使用JAXB把xml文件unmarshal成vo(XSD自动生成的vo)时碰到如下错误： org.xml.sax.saxparseexception : premature end of file 很有可能时你直接读取文件为inputstream，然后将inputstream作为构建unmarshal需要的source参数。InputSource inputSource = new In
CSS Specificity 周凡杨 html 权重 Specificity css
有时候对于页面元素设置了样式，可为什么页面的显示没有匹配上呢？ because specificity CSS 的选择符是有权重的，当不同的选择符的样式设置有冲突时，浏览器会采用权重高的选择符设置的样式。规则： HTML标签的权重是1 Class 的权重是10 Id 的权重是100
java与servlet g21121 servlet
servlet 搞java web开发的人一定不会陌生，而且大家还会时常用到它。下面是java官方网站上对servlet的介绍： java官网对于servlet的解释写道 Java Servlet Technology Overview Servlets are the Java platform technology of choice for extending and enha
eclipse中安装maven插件 510888780 eclipse maven
1.首先去官网下载 Maven： http://www.apache.org/dyn/closer.cgi/maven/binaries/apache-maven-3.2.3-bin.tar.gz 下载完成之后将其解压，我将解压后的文件夹：apache-maven-3.2.3，并将它放在 D:\tools目录下，即 maven 最终的路径是：D:\tools\apache-mave
jpa@OneToOne关联关系布衣凌宇 jpa
Nruser里的pruserid关联到Pruser的主键id，实现对一个表的增删改，另一个表的数据随之增删改。 Nruser实体类 //***************************************************************** @Entity @Table(name="nruser") @DynamicInsert @Dynam
我的spring学习笔记11-Spring中关于声明式事务的配置 aijuans spring 事务配置
这两天学到事务管理这一块，结合到之前的terasoluna框架，觉得书本上讲的还是简单阿。我就把我从书本上学到的再结合实际的项目以及网上看到的一些内容，对声明式事务管理做个整理吧。我看得Spring in Action第二版中只提到了用TransactionProxyFactoryBean和<tx:advice/>,定义注释驱动这三种，我承认后两种的内容很好，很强大。但是实际的项目当中
java 动态代理简单实现 antlove java handler proxy dynamic service
dynamicproxy.service.HelloService package dynamicproxy.service; public interface HelloService { public void sayHello(); } dynamicproxy.service.impl.HelloServiceImpl package dynamicp
JDBC连接数据库百合不是茶 JDBC编程 JAVA操作oracle数据库
如果我们要想连接oracle公司的数据库，就要首先下载oralce公司的驱动程序，将这个驱动程序的jar包导入到我们工程中; JDBC链接数据库的代码和固定写法; 1,加载oracle数据库的驱动; &nb
单例模式中的多线程分析 bijian1013 java thread 多线程 java多线程
谈到单例模式，我们立马会想到饿汉式和懒汉式加载，所谓饿汉式就是在创建类时就创建好了实例，懒汉式在获取实例时才去创建实例，即延迟加载。饿汉式： package com.bijian.study; public class Singleton { private Singleton() { } // 注意这是private 只供内部调用 private static
javascript读取和修改原型特别需要注意原型的读写不具有对等性 bijian1013 JavaScript prototype
对于从原型对象继承而来的成员，其读和写具有内在的不对等性。比如有一个对象A，假设它的原型对象是B，B的原型对象是null。如果我们需要读取A对象的name属性值，那么JS会优先在A中查找，如果找到了name属性那么就返回；如果A中没有name属性，那么就到原型B中查找name，如果找到了就返回；如果原型B中也没有
【持久化框架MyBatis3六】MyBatis3集成第三方DataSource bit1129 dataSource
MyBatis内置了数据源的支持，如： <environments default="development"> <environment id="development"> <transactionManager type="JDBC" /> <data
我程序中用到的urldecode和base64decode,MD5 bitcarter c MD5 base64decode urldecode
这里是base64decode和urldecode，Md5在附件中。因为我是在后台所以需要解码： string Base64Decode(const char* Data,int DataByte,int& OutByte) { //解码表 const char DecodeTable[] = { 0, 0, 0, 0, 0, 0
腾讯资深运维专家周小军：QQ与微信架构的惊天秘密 ronin47
社交领域一直是互联网创业的大热门，从PC到移动端，从OICQ、MSN到QQ。到了移动互联网时代，社交领域应用开始彻底爆发，直奔黄金期。腾讯在过去几年里，社交平台更是火到爆，QQ和微信坐拥几亿的粉丝，QQ空间和朋友圈各种刷屏，写心得，晒照片，秀视频，那么谁来为企鹅保驾护航呢？支撑QQ和微信海量数据背后的架构又有哪些惊天内幕呢？本期大讲堂的内容来自今年2月份ChinaUnix对腾讯社交网络运营服务中心
java-69-旋转数组的最小元素。把一个数组最开始的若干个元素搬到数组的末尾，我们称之为数组的旋转。输入一个排好序的数组的一个旋转，输出旋转数组的最小元素 bylijinnan java
public class MinOfShiftedArray { /** * Q69 旋转数组的最小元素 * 把一个数组最开始的若干个元素搬到数组的末尾，我们称之为数组的旋转。输入一个排好序的数组的一个旋转，输出旋转数组的最小元素。 * 例如数组{3, 4, 5, 1, 2}为{1, 2, 3, 4, 5}的一个旋转，该数组的最小值为1。 */ publ
看博客，应该是有方向的 Cb123456 反省看博客
看博客，应该是有方向的: 我现在就复习以前的，在补补以前不会的，现在还不会的，同时完善完善项目，也看看别人的博客. 我刚突然想到的: 1.应该看计算机组成原理，数据结构，一些算法，还有关于android,java的。 2.对于我，也快大四了，看一些职业规划的，以及一些学习的经验，看看别人的工作总结的. 为什么要写
[开源与商业]做开源项目的人生活上一定要朴素,尽量减少对官方和商业体系的依赖 comsci 开源项目
为什么这样说呢？因为科学和技术的发展有时候需要一个平缓和长期的积累过程，但是行政和商业体系本身充满各种不稳定性和不确定性，如果你希望长期从事某个科研项目，但是却又必须依赖于某种行政和商业体系，那其中的过程必定充满各种风险。。。所以，为避免这种不确定性风险，我
一个 sql优化（[精华] 一个查询优化的分析调整全过程！很值得一看） cwqcwqmax9 sql
见 http://www.itpub.net/forum.php?mod=viewthread&tid=239011 Web翻页优化实例提交时间: 2004-6-18 15:37:49 回复发消息环境： Linux ve
Hibernat and Ibatis dashuaifu Hibernate ibatis
Hibernate VS iBATIS 简介 Hibernate 是当前最流行的O/R mapping框架，当前版本是3.05。它出身于sf.net，现在已经成为Jboss的一部分了 iBATIS 是另外一种优秀的O/R mapping框架，当前版本是2.0。目前属于apache的一个子项目了。相对Hibernate“O/R”而言，iBATIS 是一种“Sql Mappi
备份MYSQL脚本 dcj3sjt126com mysql
#!/bin/sh # this shell to backup mysql #[email protected] (QQ:1413161683 DuChengJiu) _dbDir=/var/lib/mysql/ _today=`date +%w` _bakDir=/usr/backup/$_today [ ! -d $_bakDir ] && mkdir -p
iOS第三方开源库的吐槽和备忘 dcj3sjt126com ios
转自 ibireme的博客做iOS开发总会接触到一些第三方库，这里整理一下，做一些吐槽。目前比较活跃的社区仍旧是Github，除此以外也有一些不错的库散落在Google Code、SourceForge等地方。由于Github社区太过主流，这里主要介绍一下Github里面流行的iOS库。首先整理了一份 Github上排名靠
html wlwmanifest.xml eoems html xml
所谓优化wp_head()就是把从wp_head中移除不需要元素，同时也可以加快速度。步骤：加入到function.php remove_action('wp_head', 'wp_generator'); //wp-generator移除wordpress的版本号，本身blog的版本号没什么意义，但是如果让恶意玩家看到，可能会用官网公布的漏洞攻击blog remov
浅谈Java定时器发展 hacksin java 并发 timer 定时器
java在jdk1.3中推出了定时器类Timer,而后在jdk1.5后由Dou Lea从新开发出了支持多线程的ScheduleThreadPoolExecutor，从后者的表现来看，可以考虑完全替代Timer了。 Timer与ScheduleThreadPoolExecutor对比： 1. Timer始于jdk1.3,其原理是利用一个TimerTask数组当作队列
移动端页面侧边导航滑入效果 ini jquery Web html5 css javascirpt
效果体验：http://hovertree.com/texiao/mobile/2.htm可以使用移动设备浏览器查看效果。效果使用到jquery-2.1.4.min.js，该版本的jQuery库是用于支持HTML5的浏览器上，不再兼容IE8以前的浏览器，现在移动端浏览器一般都支持HTML5，所以使用该jQuery没问题。HTML文件代码： <!DOCTYPE html> <h
AspectJ+Javasist记录日志 kane_xie aspectj javasist
在项目中碰到这样一个需求，对一个服务类的每一个方法，在方法开始和结束的时候分别记录一条日志，内容包括方法名，参数名+参数值以及方法执行的时间。 @Override public String get(String key) { // long start = System.currentTimeMillis(); // System.out.println("Be
redis学习笔记 MJC410621 redis NoSQL
1)nosql数据库主要由以下特点：非关系型的、分布式的、开源的、水平可扩展的。 1，处理超大量的数据 2，运行在便宜的PC服务器集群上， 3，击碎了性能瓶颈。 1)对数据高并发读写。 2)对海量数据的高效率存储和访问。 3)对数据的高扩展性和高可用性。 redis支持的类型： Sring 类型 set name lijie get name lijie set na
使用redis实现分布式锁 qifeifei
在多节点的系统中，如何实现分布式锁机制，其中用redis来实现是很好的方法之一，我们先来看一下jedis包中，有个类名BinaryJedis,它有个方法如下： public Long setnx(final byte[] key, final byte[] value) { checkIsInMulti(); client.setnx(key, value); ret
BI并非万能，中层业务管理报表要另辟蹊径张老师的菜大数据 BI 商业智能信息化
BI是商业智能的缩写，是可以帮助企业做出明智的业务经营决策的工具，其数据来源于各个业务系统，如ERP、CRM、SCM、进销存、HER、OA等。 BI系统不同于传统的管理信息系统，他号称是一个整体应用的解决方案，是融入管理思想的强大系统：有着系统整体的设计思想，支持对所有
安装rvm后出现rvm not a function 或者ruby -v后提示没安装ruby的问题 wudixiaotie function
1.在~/.bashrc最后加入 [[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm" 2.重新启动terminal输入： rvm use ruby-2.2.1 --default 把当前安装的ruby版本设为默