yumaomi

语义分割系列26-VIT+SETR——Transformer结构如何在语义分割中大放异彩

SETR：《Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspectivewith Transformers》

重新思考语义分割范式，使用Transformer实现语义分割。

论文链接：SETR

VIT：《An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale》

首次提出使用Transformer进行图像分类。

至于为什么要在介绍SETR的同时介绍Vision Transformer，因为SETR本质上是使用VIT做backbone提取特征，使用传统的decoder来实现语义分割任务。

本文将介绍以下内容：

Vision Transformer如何将Transformer应用于图像、为什么Vision Transformer能在图像领域中大放异彩？
Self-Attention 和 Multi-Head Self-Attention是如何实现的，为什么MSA效果会好？
SETR如何结合Vision Transformer来实现语义分割。
Vision Transformer代码实现、SETR代码实现。

论文部分

引文

Vision Transformer

SETR

Camvid数据集测试

SETR model

Dataset

Train

Result

Inference

论文部分

引文

Transformer模型早在2017年就已提出，用于机器翻译等任务。在当时也以《Attention is all you need》一文惊艳了大众，至那以后，Transformer与Attention在各个任务中所向披靡。Transformer和Attention机制的成功，也启发了语义分割领域的工作研究。陆续提出了基于Spatial Attention机制的Non-Local机制，基于Channel Attention和Spatial Attention机制的DANet，以及后续许多简化Attention繁杂计算的CrissCross Attention、Interlaced Sparse Self-Attention、EM Attention等等。但这些机制还是依赖于传统的CNN结构，还未有纯Transformer结构应用于图像领域。

直到Vision Transformer的出现，An Image is Worth 16x16 Words，VIT的出现将纯Transformer结构引入到图像分类中，将图像分块、嵌入以后使用Transformer进行计算，通过MLP来实现分类，并在ImageNet中取得优秀的效果。

VIT的出现启发了语义分割领域，Transformer这种基于Attention的机制，比起CNN需要使用卷积来提升感受野的操作，Attention无疑更加优秀。在任意层，Transformer就能实现全局的感受野，建立全局依赖。而且，CNN网络往往需要将原始图像的分辨率采用到8倍甚至32倍，这样就会损失一些信息，而Transformer无需进行下采样就能实现特征提取，保留了图像的更多信息。

因此，SETR采取了VIT作为语义分割encoder-decoder结构中的encoder结构，作为编码器来提取图像特征。SETR在论文提交当天在ADE20K任务中获得了第一名的成绩，这证明SETR在语义分割任务中确实能够获得十分优秀的效果。

Vision Transformer

语义分割系列26-VIT+SETR——Transformer结构如何在语义分割中大放异彩_第1张图片

图1 Vision Transformer model

由于Transformer结构最先被设计用于机器翻译，而文本中的数据类型为序列结构。所以，当Transformer用于图像时，图像也需要被处理成一个序列。也就是图中的Linear Projection of Flattened Patches，将一张图像分成9个patch。为了保留图像patch的顺序，需要对每一个patch标上序号，也就是Position Embedding，VIT中，将这个Position信息加到每一个图像patch中。这样，每一个图像patch中就包含了他的位置信息。

经过嵌入的image patched，输入到Transformer Encoder（右图）结构中，首先进行Layer Norm，再输入到Multi-Head Self-Attention（MSA）中进行计算注意力，经过残差模块，再输入到Layer Norm、MLP、计算残差，这样经过L个Transformer结构得到最终输出。最后，将这个输出结果输入到一个MLP中，得到预测结果。

其中，MSA是是Transformer的最重要的一环，Transformer为何如此有效主要归功于MSA结构。MSA顾名思义，由多个Self-Attention结构堆叠，称为多头-自注意力。

对于Self-Attention

语义分割系列26-VIT+SETR——Transformer结构如何在语义分割中大放异彩_第2张图片

图2 Attention机制和MSA机制

$\large Attention = Softmax(\frac{Q\odot K^T}{\sqrt{d_k}})\odot V$

而对于MSA，可以理解为多个计算多个Self-Attention的结果，将结果进行拼接，得到MSA的输出。

当然，这里可能会有一个问题，为什么需要计算多个Self Attention？

因为，在计算Self Attention时，数据维度导致了计算量过大，同时在高纬度空间内学习特征也比较困难。而MSA在计算Self Attention时，需要对数据进行一个降维（或者叫“截断”），这样只选到了原始数据中的一部分，计算出来的Attention自然不够全面。因此，计算多个经过降维（截断）的Self Attention，每一个attention计算过程并行，且参数不共享，保证计算结果的全面性和效率。同时，这里也有一个优势，多个Self Attention计算时，能够在不同的特征子空间内计算其Attention，使结果更加丰富，模型对数据特征的理解也更加深入。

import torch
from torch import nn

from einops import rearrange, repeat
from einops.layers.torch import Rearrange

# Multi-Head Self-Attention
class Attention(nn.Module):
    def __init__(self, dim, heads = 8, dim_head = 64, dropout = 0.):
        super().__init__()
        inner_dim = dim_head *  heads
        project_out = not (heads == 1 and dim_head == dim)

        self.heads = heads
        self.scale = dim_head ** -0.5

        self.attend = nn.Softmax(dim = -1)
        self.dropout = nn.Dropout(dropout)

        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias = False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, dim),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        qkv = self.to_qkv(x).chunk(3, dim = -1)
        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = self.heads), qkv)

        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale

        attn = self.attend(dots)
        attn = self.dropout(attn)

        out = torch.matmul(attn, v)
        out = rearrange(out, 'b h n d -> b n (h d)')
        return self.to_out(out)

if __name__ == "__main__":
    MSA = Attention(dim=256, heads = 8, dim_head = 64, dropout = 0.).cpu()
    # 1 is batch, 1024 is data, 256 is channels
    img = torch.randn(1, 1024, 256).cpu()
    preds = MSA(img)
    print(preds.shape)

下面也给出VIT作为语义分割encoder部分的代码实现，输出为四个 [batch, channels, h, w]。前三个，为中间的Transformer层输出，用于设计辅助损失。最后一个为最终输出，用于输出分割结果。

import torch
from torch import nn

from einops import rearrange, repeat
from einops.layers.torch import Rearrange

# helpers
def pair(t):
    return t if isinstance(t, tuple) else (t, t)

# classes 
class PreNorm(nn.Module):
    def __init__(self, dim, fn):
        super().__init__()
        self.norm = nn.LayerNorm(dim)
        self.fn = fn
    def forward(self, x, **kwargs):
        return self.fn(self.norm(x), **kwargs)

class FeedForward(nn.Module):
    def __init__(self, dim, hidden_dim, dropout = 0.):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, dim),
            nn.Dropout(dropout)
        )
    def forward(self, x):
        return self.net(x)

class Attention(nn.Module):
    def __init__(self, dim, heads = 8, dim_head = 64, dropout = 0.):
        super().__init__()
        inner_dim = dim_head *  heads
        project_out = not (heads == 1 and dim_head == dim)

        self.heads = heads
        self.scale = dim_head ** -0.5

        self.attend = nn.Softmax(dim = -1)
        self.dropout = nn.Dropout(dropout)

        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias = False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, dim),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        qkv = self.to_qkv(x).chunk(3, dim = -1)
        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = self.heads), qkv)

        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale

        attn = self.attend(dots)
        attn = self.dropout(attn)

        out = torch.matmul(attn, v)
        out = rearrange(out, 'b h n d -> b n (h d)')
        return self.to_out(out)

class Transformer(nn.Module):
    def __init__(self, dim, depth, heads, dim_head, mlp_dim, dropout = 0., out_indices = (9, 14, 19, 23)):
        super().__init__()
        self.out_indices = out_indices
        assert self.out_indices[-1] == depth - 1

        self.layers = nn.ModuleList([])
        
        for _ in range(depth):
            self.layers.append(nn.ModuleList([
                PreNorm(dim, Attention(dim, heads = heads, dim_head = dim_head, dropout = dropout)),
                PreNorm(dim, FeedForward(dim, mlp_dim, dropout = dropout))
            ]))

    def forward(self, x):
        out = []
        for index, (attn, ff) in enumerate(self.layers):
            x = attn(x) + x
            x = ff(x) + x

            if index in self.out_indices:
                out.append(x)

        return out

class ViT(nn.Module):
    def __init__(self, *, image_size, patch_size, dim, depth, heads, mlp_dim, channels = 3, dim_head = 64, dropout = 0., emb_dropout = 0., out_indices = (9, 14, 19, 23)):
        super().__init__()
        image_height, image_width = pair(image_size)
        patch_height, patch_width = pair(patch_size)

        assert image_height % patch_height == 0 and image_width % patch_width == 0, 'Image dimensions must be divisible by the patch size.'

        num_patches = (image_height // patch_height) * (image_width // patch_width)
        patch_dim = channels * patch_height * patch_width

        self.to_patch_embedding = nn.Sequential(
            Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = patch_height, p2 = patch_width),
            nn.Linear(patch_dim, dim),
        )

        self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, dim))
        self.cls_token = nn.Parameter(torch.randn(1, 1, dim))
        self.dropout = nn.Dropout(emb_dropout)

        self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, dropout, out_indices=out_indices)

        self.out = Rearrange("b (h w) c->b c h w", h=image_height//patch_height, w=image_width//patch_width)

    def forward(self, img):
        x = self.to_patch_embedding(img)
        b, n, _ = x.shape
        cls_tokens = repeat(self.cls_token, '1 1 d -> b 1 d', b = b)
        x = torch.cat((cls_tokens, x), dim=1)
        x += self.pos_embedding[:, :(n + 1)]
        x = self.dropout(x)

        out = self.transformer(x)

        for index, transformer_out in enumerate(out):
            # delete cls_tokens and transform output to [b, c, h, w]
            out[index] = self.out(transformer_out[:,1:,:])

        return out


import torch
if __name__ == "__main__":
    v = ViT(image_size = (256, 256), patch_size = 256//16, dim = 1024, depth = 24, heads = 16, mlp_dim = 2048, dropout = 0.1, emb_dropout = 0.1, out_indices = (9, 14, 19, 23)).cpu()
    img = torch.randn(1, 3, 256, 256).cpu()
    preds = v(img)
    for output in preds:
        print(output.size())

    """
    output:
        transformer layer 9:     torch.Size([1, 1024, 16, 16])  for aux loss
        transformer layer 14:    torch.Size([1, 1024, 16, 16])  for aux loss
        transformer layer 19:    torch.Size([1, 1024, 16, 16])  for aux loss
        transformer layer 23:    torch.Size([1, 1024, 16, 16])  for segmentation
    """

SETR

回到本文主题，VIT的成功启发了SETR。

图3 SETR model

在这里（图3），a中采用了VIT结构。假设一张图像x（H×W×3）大小，如果直接输入到VIT中，序列化成H×W×3，这对于Transformer的二次复杂度而言，运算量可能过大，因此，作者在这里做了一个下采样操作，将图像x映射成（H/16×W/16×3），这样就可以得到H×W×3/256的序列大小。将序列嵌入并编码位置后，得到最终的输入E = {e1 + p1, e2 + p2, · · · , eL + pL}，其中e是embedding，p是positon information，L为序列长度。

如上文提到，Vit的输出为 [1, 1024, 16, 16] 大小，也就是[batch, channels, h, w]，传统的语义分割的encoder输出模型。在这里，同样需要对结果进行上采样。

作者设计了三种上采样模式来完成，分别为：

Naive mode：直接上采样16倍，16->256。
Progressive UPsampling（PUP）通过卷积->2倍上采样->卷积->2倍上采样...的逐步上采样模式（图3 b）。
Multi-Level feature Aggregation (MLA)获取Transformer中间层结果，聚合后4倍上采样。

作者也在ADE20K、Pascal VOC等数据集上测试了几个上采样模块的结果。PUP和MLA结果会比Naive更好，同时，两者之间差距并不是很大。

在各个数据集中，SETR也获得比FCN baseline更好的效果。这说明SETR能够较好完成语义分割任务。

作者也可视化了中间层的中间结果，可以看到Transformer结构确实能够学到图像中的一些信息。

注：SETR模型在Camvid测试中给出。

Camvid数据集测试

SETR model

import torch
from torch import nn

from einops import rearrange, repeat
from einops.layers.torch import Rearrange

# helpers
def pair(t):
    return t if isinstance(t, tuple) else (t, t)

# classes 
class PreNorm(nn.Module):
    def __init__(self, dim, fn):
        super().__init__()
        self.norm = nn.LayerNorm(dim)
        self.fn = fn
    def forward(self, x, **kwargs):
        return self.fn(self.norm(x), **kwargs)

class FeedForward(nn.Module):
    def __init__(self, dim, hidden_dim, dropout = 0.):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, dim),
            nn.Dropout(dropout)
        )
    def forward(self, x):
        return self.net(x)

class Attention(nn.Module):
    def __init__(self, dim, heads = 8, dim_head = 64, dropout = 0.):
        super().__init__()
        inner_dim = dim_head *  heads
        project_out = not (heads == 1 and dim_head == dim)

        self.heads = heads
        self.scale = dim_head ** -0.5

        self.attend = nn.Softmax(dim = -1)
        self.dropout = nn.Dropout(dropout)

        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias = False)

        self.to_out = nn.Sequential(
            nn.Linear(inner_dim, dim),
            nn.Dropout(dropout)
        ) if project_out else nn.Identity()

    def forward(self, x):
        qkv = self.to_qkv(x).chunk(3, dim = -1)
        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = self.heads), qkv)

        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale

        attn = self.attend(dots)
        attn = self.dropout(attn)

        out = torch.matmul(attn, v)
        out = rearrange(out, 'b h n d -> b n (h d)')
        return self.to_out(out)

class Transformer(nn.Module):
    def __init__(self, dim, depth, heads, dim_head, mlp_dim, dropout = 0., out_indices = (9, 14, 19, 23)):
        super().__init__()
        self.out_indices = out_indices
        assert self.out_indices[-1] == depth - 1

        self.layers = nn.ModuleList([])
        
        for _ in range(depth):
            self.layers.append(nn.ModuleList([
                PreNorm(dim, Attention(dim, heads = heads, dim_head = dim_head, dropout = dropout)),
                PreNorm(dim, FeedForward(dim, mlp_dim, dropout = dropout))
            ]))

    def forward(self, x):
        out = []
        for index, (attn, ff) in enumerate(self.layers):
            x = attn(x) + x
            x = ff(x) + x

            if index in self.out_indices:
                out.append(x)

        return out

class ViT(nn.Module):
    def __init__(self, *, image_size, patch_size, dim, depth, heads, mlp_dim, channels = 3, dim_head = 64, dropout = 0., emb_dropout = 0., out_indices = (9, 14, 19, 23)):
        super().__init__()
        image_height, image_width = pair(image_size)
        patch_height, patch_width = pair(patch_size)

        assert image_height % patch_height == 0 and image_width % patch_width == 0, 'Image dimensions must be divisible by the patch size.'

        num_patches = (image_height // patch_height) * (image_width // patch_width)
        patch_dim = channels * patch_height * patch_width

        self.to_patch_embedding = nn.Sequential(
            Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = patch_height, p2 = patch_width),
            nn.Linear(patch_dim, dim),
        )

        self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, dim))
        self.cls_token = nn.Parameter(torch.randn(1, 1, dim))
        self.dropout = nn.Dropout(emb_dropout)

        self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, dropout, out_indices=out_indices)

        self.out = Rearrange("b (h w) c->b c h w", h=image_height//patch_height, w=image_width//patch_width)

    def forward(self, img):
        x = self.to_patch_embedding(img)
        b, n, _ = x.shape
        cls_tokens = repeat(self.cls_token, '1 1 d -> b 1 d', b = b)
        x = torch.cat((cls_tokens, x), dim=1)
        x += self.pos_embedding[:, :(n + 1)]
        x = self.dropout(x)

        out = self.transformer(x)

        for index, transformer_out in enumerate(out):
            # delete cls_tokens and transform output to [b, c, h, w]
            out[index] = self.out(transformer_out[:,1:,:])

        return out


class PUPHead(nn.Module):
    def __init__(self, num_classes):
        super(PUPHead, self).__init__()
        
        self.UP_stage_1 = nn.Sequential(
            nn.Conv2d(1024, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
        )        
        self.UP_stage_2 = nn.Sequential(
            nn.Conv2d(256, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
        )        
        self.UP_stage_3= nn.Sequential(
            nn.Conv2d(256, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
        )        
        self.UP_stage_4= nn.Sequential(
            nn.Conv2d(256, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)
        )
    
        self.cls_seg = nn.Conv2d(256, num_classes, 3, padding=1)

    def forward(self, x):
        x = self.UP_stage_1(x)
        x = self.UP_stage_2(x)
        x = self.UP_stage_3(x)
        x = self.UP_stage_4(x)
        x = self.cls_seg(x)
        return x

class SETR(nn.Module):
    def __init__(self, num_classes, image_size, patch_size, dim, depth, heads, mlp_dim, channels = 3, dim_head = 64, dropout = 0., emb_dropout = 0., out_indices = (9, 14, 19, 23)):
        super(SETR, self).__init__()
        self.out_indices = out_indices
        self.num_classes = num_classes
        self.VIT = ViT( image_size=image_size, patch_size=patch_size, dim=dim, depth=depth, heads=heads, mlp_dim=mlp_dim, 
                        channels = channels, dim_head = dim_head, dropout = dropout, emb_dropout = emb_dropout, out_indices = out_indices)

        
        self.Head = nn.ModuleDict()

        for index, indices in enumerate(self.out_indices):
            self.Head["Head"+str(indices)] = PUPHead(num_classes)
        
    def forward(self, x):
        VIT_OUT = self.VIT(x)

        out = []
        for index, indices in enumerate(self.out_indices):
            # 最后一个是最后层的输出
            out.append(self.Head["Head"+str(indices)](VIT_OUT[index]))
        return out


if __name__ == "__main__":
    # VIT-Large  设置了16个patch
    SETRNet = SETR(num_classes=3, image_size=256, patch_size=256//16, dim=1024, depth = 24, heads = 16, mlp_dim = 2048, out_indices = (9, 14, 19, 23)).cpu()
    img = torch.randn(1, 3, 256, 256).cpu()
    preds = SETRNet(img)
    for output in preds:
        print("output: ",output.size())

Dataset

# 导入库
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import Dataset, DataLoader, random_split
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2
 
torch.manual_seed(17)
# 自定义数据集CamVidDataset
class CamVidDataset(torch.utils.data.Dataset):
    def __init__(self, images_dir, masks_dir):
        self.transform = A.Compose([
            A.Resize(256, 256),
            A.HorizontalFlip(),
            A.VerticalFlip(),
            A.Normalize(),
            ToTensorV2(),
        ]) 
        self.ids = os.listdir(images_dir)
        self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
        self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]
 
    
    def __getitem__(self, i):
        # read data
        image = np.array(Image.open(self.images_fps[i]).convert('RGB'))
        mask = np.array( Image.open(self.masks_fps[i]).convert('RGB'))
        image = self.transform(image=image,mask=mask)
        
        return image['image'], image['mask'][:,:,0]
        
    def __len__(self):
        return len(self.ids)
    
    
# 设置数据集路径
DATA_DIR = r'../database/camvid/camvid/' # 根据自己的路径来设置
x_train_dir = os.path.join(DATA_DIR, 'train_images')
y_train_dir = os.path.join(DATA_DIR, 'train_labels')
x_valid_dir = os.path.join(DATA_DIR, 'valid_images')
y_valid_dir = os.path.join(DATA_DIR, 'valid_labels')
    
train_dataset = CamVidDataset(
    x_train_dir, 
    y_train_dir, 
)
val_dataset = CamVidDataset(
    x_valid_dir, 
    y_valid_dir, 
)
 
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True,drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=4, shuffle=True,drop_last=True)

Train

# 在模型训练前，建议先加载vit的权重
model = SETR(num_classes=32, image_size=256, patch_size=256//16, dim=1024, depth = 24, heads = 16, mlp_dim = 2048).cuda()


from d2l import torch as d2l
from tqdm import tqdm
import pandas as pd
import monai
from torchcontrib.optim import SWA
# training loop 100 epochs
epochs_num = 100
# 选用SGD优化器来训练
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
schedule = monai.optimizers.LinearLR(optimizer, end_lr=0.05, num_iter=int(epochs_num*0.75))
# 使用SWA优化 来提升SGD的效果
steps_per_epoch = int(len(train_loader.dataset) / train_loader.batch_size)
swa_start = int(epochs_num*0.75)
optimizer = SWA(optimizer, swa_start=swa_start*steps_per_epoch, swa_freq=steps_per_epoch, swa_lr=0.05)

# 损失函数选用多分类交叉熵损失函数
lossf = nn.CrossEntropyLoss(ignore_index=255)


def evaluate_accuracy_gpu(net, data_iter, device=None):
    if isinstance(net, nn.Module):
        net.eval()  # Set the model to evaluation mode
        if not device:
            device = next(iter(net.parameters())).device
    # No. of correct predictions, no. of predictions
    metric = d2l.Accumulator(2)

    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                # Required for BERT Fine-tuning (to be covered later)
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            output = net(X)
            pred = output[-1]
            metric.add(d2l.accuracy(pred, y), d2l.size(y))
    return metric[0] / metric[1]


# 训练函数
def train_ch13(net, train_iter, test_iter, loss, optimizer, num_epochs, schedule, swa_start=swa_start, devices=d2l.try_all_gpus()):
    timer, num_batches = d2l.Timer(), len(train_iter)
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1], legend=['train loss', 'train acc', 'test acc'])
    net = nn.DataParallel(net, device_ids=devices).to(devices[0])
    # 用来保存一些训练参数

    loss_list = []
    train_acc_list = []
    test_acc_list = []
    epochs_list = []
    time_list = []
    lr_list = []
    

    for epoch in range(num_epochs):
        # Sum of training loss, sum of training accuracy, no. of examples,
        # no. of predictions
        metric = d2l.Accumulator(4)
        for i, (X, labels) in enumerate(train_iter):
            timer.start()

            if isinstance(X, list):
                X = [x.to(devices[0]) for x in X]
            else:
                X = X.to(devices[0])
            gt = labels.long().to(devices[0])

            net.train()
            optimizer.zero_grad()
            result = net(X)
            
            pred = result[-1]
            seg_loss = loss(result[-1], gt)

            aux_loss_1 = loss(result[0], gt)
            aux_loss_2 = loss(result[1], gt)
            aux_loss_3 = loss(result[2], gt)

            loss_sum = seg_loss + 0.2*aux_loss_1 + 0.3*aux_loss_2 + 0.4*aux_loss_3

            l = loss_sum
            loss_sum.sum().backward()
            optimizer.step()

            acc = d2l.accuracy(pred, gt)
            metric.add(l, acc, labels.shape[0], labels.numel())

            timer.stop()
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,(metric[0] / metric[2], metric[1] / metric[3], None))
                
        if optimizer.state_dict()['param_groups'][0]['lr']>0.05:
            schedule.step()

        test_acc = evaluate_accuracy_gpu(net, test_iter)
        
        if (epoch + 1) >= swa_start:
            if epoch == 0 or epoch % 5 == 5 - 1 or epoch == num_epochs - 1:
                # Batchnorm update
                optimizer._reset_lr_to_swa()
                optimizer.swap_swa_sgd()
                optimizer.bn_update(train_iter, net, device='cuda')
                test_acc = evaluate_accuracy_gpu(net, test_iter)
                optimizer.swap_swa_sgd()
        
        animator.add(epoch + 1, (None, None, test_acc))

        print(f"epoch {epoch+1}/{epochs_num} --- loss {metric[0] / metric[2]:.3f} --- train acc {metric[1] / metric[3]:.3f} --- test acc {test_acc:.3f} --- lr {optimizer.state_dict()['param_groups'][0]['lr']} --- cost time {timer.sum()}")
        
        #---------保存训练数据---------------
        df = pd.DataFrame()
        loss_list.append(metric[0] / metric[2])
        train_acc_list.append(metric[1] / metric[3])
        test_acc_list.append(test_acc)
        epochs_list.append(epoch+1)
        time_list.append(timer.sum())
        lr_list.append(optimizer.state_dict()['param_groups'][0]['lr'])
        
        df['epoch'] = epochs_list
        df['loss'] = loss_list
        df['train_acc'] = train_acc_list
        df['test_acc'] = test_acc_list
        df["lr"] = lr_list
        df['time'] = time_list
        
        df.to_excel("../blork_file/savefile/test.xlsx")
        #----------------保存模型------------------- 
        if np.mod(epoch+1, 5) == 0:
            torch.save(net, f'../blork_file/checkpoints/test{epoch+1}.pth')

    # 保存下最后的model
    torch.save(net, f'../blork_file/checkpoints/test.pth')


train_ch13(model, train_loader, val_loader, lossf, optimizer, epochs_num, schedule=schedule)

Result

Transformer比较难训练，作者这里只训练了一半。

Inference

这里还做了一点Inference的实现，发现效果其实比较一般，损失函数选择mIoU可能会比交叉熵会好一点，或者将一部分类别删掉简化一下也会好一点。

import torch
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import torch.nn as nn

# 截取模型
# 读取老模型
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 剪枝掉多余的decoder
class prune_model(nn.Module):
    def __init__(self, encode, decode):
        super(prune_model, self).__init__()
        self.encode = encode
        self.decode = decode

    def forward(self, x):
        x = self.encode(x)
        x = self.decode(x[-1])
        return x


model = torch.load(r"../checkpoints/test_last.pth")
# 创建新模型
new_model = prune_model(model.VIT, model.Head.Head23).to(device)
model = new_model#.cpu()

NCLASSES = 32
Cam_COLORMAP = [[128, 0, 0], [0, 128, 0], [128, 128, 0],
                [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],
                [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],
                [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],
                [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],
                [0, 64, 128], [0, 32, 128],[0, 16, 128],[0, 64, 64],[0, 64, 32],
                [0, 64, 16],[64, 64, 128],[0, 32, 16],[32,32,32],[16,16,16],[32,16,128],
                [192,16,16],[32,32,196],
               ]

#32类
Cam_CLASSES = ['Animal','Archway','Bicyclist','Bridge','Building','Car','CartLuggagePram','Child',
               'Column_Pole','Fence','LaneMkgsDriv','LaneMkgsNonDriv','Misc_Text','MotorcycleScooter',
               'OtherMoving','ParkingBlock','Pedestrian','Road','RoadShoulder','Sidewalk','SignSymbol',
               'Sky', 'SUVPickupTruck','TrafficCone','TrafficLight', 'Train','Tree','Truck_Bus', 'Tunnel',
               'VegetationMisc', 'Void','Wall']
               
assert len(Cam_COLORMAP) == len(Cam_CLASSES) == 32


image = Image.open("../database/camvid/camvid/train_images/0001TP_008100.png").convert("RGB")
image = image.resize((256, 256))
temp = image

image = np.array(image)

Transform = A.Compose([
    A.Resize(256, 256),
    A.HorizontalFlip(),
    A.Normalize(),
    ToTensorV2(),
]) 

image = Transform(image=image)
image = image['image'].view(1, 3, 256, 256)

image = image.cuda()

preds = model(image)
out = np.array(preds.argmax(1).view(1, 256, 256).permute(1,2,0).view(256,256).cpu().detach().numpy())

seg_img = np.zeros((256, 256, 3))

colors = Cam_COLORMAP
colors[0][0] = 0
colors[0][1] = 0
colors[0][2] = 0

for c in range(NCLASSES):
    seg_img[:,:,0] += ((out[:,:] == c )*( colors[c][0] )).astype('uint8')
    seg_img[:,:,1] += ((out[:,:] == c )*( colors[c][1] )).astype('uint8')
    seg_img[:,:,2] += ((out[:,:] == c )*( colors[c][2] )).astype('uint8')

seg_img = Image.fromarray(np.uint8(seg_img))
image = Image.blend(temp,seg_img,0.5)
image = image.resize((960, 720), Image.BILINEAR)

plt.figure(figsize=(10, 8))
plt.imshow(image)
plt.show()

结果比较抽象，不过有大部分确实是预测到了而已。

本文总结

本文介绍了Vision Transformer和SETR在语义分割中的应用。
在介绍Vision Transformer时，还顺带讲解了Multi-Head Self-Attention的实现原理，以及为何MSA会比Self-Attention有效果。
对于SETR，本质是VIT+Decoder结构，其中Decoder主要由三种设计：Naive PUP MLA，本文只实现了PUP结构。
基于SETR在Camvid数据集上完成了一部分训练和推理（虽然我们的推理结果比较一般，这Transformer模型也确实比较难以训练）。但作者确实是在总多数据集上得到了比较好的一个效果。说明Transformer结构应用在语义分割领域是可行的。

你可能感兴趣的:(语义分割,pytorch,计算机视觉,transformer,人工智能,深度学习)

【论文阅读】实时全能分割模型万里守约论文阅读论文阅读图像分割图像处理计算机视觉
文章目录导言1、论文简介2、论文主要方法3、论文针对的问题4、论文创新点总结导言在最近的计算机视觉领域，针对实时多任务分割的需求日益增长，特别是在交互式分割、全景分割和视频实例分割等多种应用场景中。为了解决这些挑战，本文介绍了一种新方法——RMP-SAM（Real-TimeMulti-PurposeSegmentAnything），旨在实现实时的多功能分割。RMP-SAM结合了动态卷积与高效的模型
震惊！ “深度学习”都在学习什么扉间798 深度学习学习人工智能
常见的机器学习分类算法俗话说三个臭皮匠胜过诸葛亮这里面集成学习就是将单一的算法弱弱结合算法融合用投票给特征值加权重AdaBoost集成学习算法通过迭代训练一系列弱分类器，给予分类错误样本更高权重，使得后续弱分类器更关注这些样本，然后将这些弱分类器线性组合成强分类器，提高整体分类性能。（一）投票机制投票是一种直观且常用的算法融合策略。在多分类问题中，假设有多个分类器对同一数据进行分类判断。每个分类器
NLP高频面试题（十）——目前常见的几种大模型架构是啥样的 Chaos_Wang_ NLP常见面试题自然语言处理架构人工智能
深入浅出：目前常见的几种大模型架构解析随着Transformer模型的提出与发展，语言大模型迅速崛起，已经成为人工智能领域最为关注的热点之一。本文将为大家详细解析几种目前常见的大模型架构，帮助读者理解其核心差异及适用场景。1.什么是LLM（大语言模型）？LLM通常指参数量巨大、能够捕捉丰富语义信息的Transformer模型，它们通过海量的文本数据训练而成，能够实现高度逼真的文本生成、复杂的语言理
深度学习 | pytorch + torchvision + python 版本对应及环境安装 zfgfdgbhs 深度学习 python pytorch
目录一、版本对应二、安装命令（pip）1.版本（1）v2.5.1~v2.0.0（2）v1.13.1~v1.11.0（3）v1.10.1~v1.7.02.安装全过程（1）选择版本（2）安装结果参考文章一、版本对应下表来自pytorch的github官方文档：pytorch/vision:Datasets,TransformsandModelsspecifictoComputerVisionpytor
机器学习 Day01人工智能概述山北雨夜漫步机器学习人工智能
1.什么样的程序适合在gpu上运行计算密集型的程序：此类程序主要运算集中在寄存器，寄存器读写速度快，而GPU拥有强大的计算能力，能高效处理大量的寄存器运算，因此适合在GPU上运行。像科学计算中的数值模拟、密码破解等场景的程序，都属于计算密集型，在GPU上运行可大幅提升运算速度。易于并行的程序：GPU采用SIMD架构，有众多核心，同一时间每个核心适合做相同的事。易于并行的程序能充分利用GPU这一特性
《今日AI-人工智能-编程日报》-源自2025年3月20日小亦编辑部每日AI-人工智能-编程日报人工智能大数据
一、AI行业动态英伟达新一代AI芯片Rubin发布计划英伟达宣布其新一代AI芯片Rubin将于2026年下半年推出，下下一代AI芯片架构命名为Feynman，计划于2028年登场。同时，英伟达还推出了RTXPRO6000系列Blackwell专业卡，拥有24064核心、96GB显存和最高600W功耗。OpenAI星际之门数据中心建设进展OpenAI的首个数据中心“星际之门”预计于2026年中在德克
PyTorch核心基础知识点 niuTaylor 编程区 pytorch 人工智能 python
PyTorch核心基础知识点，结合最新特性与工业级实践，按优先级和逻辑关系分层解析：▍核心基石：张量编程（TensorProgramming）1.张量创建（8种生产级初始化）#设备自动选择（2024最佳实践）device="cuda"iftorch.cuda.is_available()else"mps"iftorch.backends.mps.is_available()else"cpu"#关键
一文讲清楚深度学习和机器学习平凡而伟大. 机器学习人工智能深度学习机器学习人工智能
目录1.定义机器学习（MachineLearning,ML）深度学习（DeepLearning,DL）2.工作原理机器学习深度学习3.应用场景机器学习深度学习4.主要区别5.为什么选择深度学习？6.总结深度学习和机器学习是人工智能（AI）领域中两个密切相关但有所区别的概念。要清楚地解释它们之间的关系，我们可以从定义、工作原理、应用场景以及两者的主要区别等方面进行探讨。1.定义机器学习（Machin
AIOps：解决企业IT挑战的智能利器雅菲奥朗认证培训 AIOps SRE 可观测性
前言：在当今数字化的时代，企业IT基础设施和应用程序规模不断扩大，面临着日益复杂的挑战。在这种情况下，AIOps人工智能运维成为解决企业IT运维困境的智能利器。AIOps与可观测性密切相关，可观测性是实现AIOps的基础。通过收集、监视和理解系统数据，AIOps能够自动化运维任务、实时监控系统状态、预测潜在问题，从而提高效率和稳定性。AIOps尤其适用于IT运维部门，这是一个迫切需要此类技术的群体
使用AIOps进行更好的事件管理茵赛飞3D CAD数据转换软件 pagerduty devops 人工智能运维
DevOps为科技界带来了更加协作和高效的工作流程。随着AIOps的集成，自动化更进一步，使用人工智能为团队提供更快的根本原因分析和算法降噪。主要从采用AIOps中受益的主要领域之一是事件管理。AIOps可以帮助DevOps团队自动化工作流程，以实现更智能、更高效的事件管理，从而腾出时间让IT运营团队成员专注于创新以改善用户体验。在本文中，我们将了解AIOps如何从检测和识别到响应改进事件管理，以
AI大模型编程能力对比：Deepseek&Claude&Gemini 黑夜路人（heiyeluren） AI人工智能人工智能 ai AIGC 语言模型
在当今快速发展的技术领域，人工智能（AI）模型在编程和数据处理方面的应用越来越广泛。不同的AI模型因其独特的设计理念和技术优势，适用于不同的编程任务和场景。本文将对三种主流的AI模型——DeepSeekv3、GeminiFlash2.0和Claude3.5Sonnet的编程能力进行详细对比，帮助读者根据具体需求选择最合适的工具。同时对DeepSeekv3、GeminiFlash2.0和Claude
DeepSeek：智能搜索与分析的新纪元 XRC2231 学习
在人工智能浪潮席卷全球的今天，DeepSeek如同一颗璀璨的新星，以其独特的魅力和强大的功能，在AI领域脱颖而出。DeepSeek，这一基于深度学习和数据挖掘技术的智能搜索与分析系统，不仅重新定义了搜索引擎的边界，更以其卓越的性能和广泛的应用场景，为全球用户带来了前所未有的智能体验。本文将从DeepSeek的定义、特点、应用场景、优势等方面进行全面而深入的介绍，带您领略这一新兴技术的独特魅力。一、
哈尔滨工业大学DeepSeek公开课人工智能：大模型原理技术与应用-从GPT到DeepSeek｜附视频下载方法你觉得205 人工智能机器学习大数据 ai 知识图谱 python 运维
导读INTRODUCTION今天继续哈尔滨工业大学车万翔教授带来了一场主题为“DeepSeek技术前沿与应用”的报告。本报告深入探讨了大语言模型在自然语言处理（NLP）领域的核心地位及其发展历程，从基础概念出发，延伸至语言模型在机器翻译、拼音输入法、语音识别等任务中的关键作用。强调了语言模型不仅辅助其他NLP任务，本身也蕴含大量知识，如地理信息、语义理解和推理能力。随着技术的发展，尤其是trans
Linux部署模型报错OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_mod dkgee linux pytorch 运维
报错内容：OSError:Errornofilenamedpytorch_model.bin,tf_model.h5,model.ckpt.indexorflax_model.msgpackfoundindirectory主要原因是transformer版本不对，需要升级pipinstall--upgradehuggingface_hubpipinstalltransformers[torch]其
大模型学习终极指南：从新手到专家的必经之路，全网最详尽解析，你敢挑战吗？大模型入门教程学习人工智能 AI 大模型大模型学习大模型教程 AI大模型
随着人工智能技术的飞速发展，大模型（Large-ScaleModels）已经成为推动自然语言处理（NLP）、计算机视觉（CV）等领域进步的关键因素。本文将为您详细介绍从零开始学习大模型直至成为专家的全过程，包括所需掌握的知识点、学习资源以及实践建议等。无论您是初学者还是有一定基础的专业人士，都能从中获得有价值的指导。一、基础知识准备在开始学习大模型之前，需要先掌握一些基础知识，这些知识将为后续的学
编程内容简述！恶霸不委屈开发语言青少年编程汇编 java python
编程是指通过计算机语言来开发软件、程序和应用的过程，通常通过编写一系列的指令，来让计算机完成特定的任务。编程可以涉及多个领域和技术，以下是一些主要的编程内容：1.编程语言编程语言是程序员与计算机进行沟通的桥梁，不同的编程语言适用于不同的任务。常见的编程语言有：Python：简单易学，适用于数据分析、人工智能、网页开发等。JavaScript：网页开发中不可或缺的语言，用于动态网页和前端开发。Jav
大模型Agent 和 RAG 的关系大数据追光猿大模型语言模型人工智能学习方法 transformer
Agent和RAG（Retrieval-AugmentedGeneration）是两种在自然语言处理（NLP）和人工智能领域中广泛使用的技术，它们在功能、目标和实现方式上既有区别又有联系。以下是它们的关系及其协同作用的详细分析。1.Agent和RAG的定义（1）Agent定义：Agent是一种智能体，能够感知环境并采取行动以完成特定任务。在NLP领域，Agent通常指一个基于大语言模型（LLM）的
国产模型能否挑战 GPT-4？一文拆解 DeepSeek-V3 架构与实战应用 AI筑梦师人工智能学习框架架构深度学习 python agi 人工智能 tensorflow
✳️一、引言✅1.1DeepSeek-V3发布背景与定位随着大模型技术的快速演进，从GPT-3到GPT-4，全球在通用人工智能方向取得了长足进展。但与此同时，开源社区始终缺乏一个真正兼顾性能、效率、中文能力和实用性的高质量大模型。DeepSeek-V3的推出正是在这个背景下的一次关键突破。DeepSeek-V3是由中国团队DeepSeek开发的第三代大语言模型，它具备以下几个核心特性：开源可商用：
Agent、RAG、LangChain的概念及作用北极冰雨大模型人工智能
Agent：概念：在人工智能中，Agent通常指的是能够执行任务或做出决策的实体，可以是简单的程序，也可以是复杂的系统，如自动化客服助手、推荐系统等，甚至可以是软件代理、机器人或虚拟助手等各种形式。作用：它能利用内置的大语言模型来做出规划，决定执行哪些步骤，以及每个步骤需要调用哪些工具（如RAG），之后调用相应的工具，最终完成任务。例如，在客服问答场景中，Agent可以根据用户的问题，规划出需要查
AI模型技术演进与行业应用图谱智能计算研究中心其他
内容概要当前AI模型技术正经历从基础架构到行业落地的系统性革新。主流深度学习框架如TensorFlow和PyTorch持续优化动态计算图与分布式训练能力，而MXNet凭借高效的异构计算支持在边缘场景崭露头角。与此同时，模型压缩技术通过量化和知识蒸馏将参数量降低60%-80%，联邦学习则通过加密梯度交换实现多机构数据协同训练。在应用层面，医疗诊断模型通过迁移学习在CT影像分类任务中达到98.2%的准
模型优化驱动产业应用创新智能计算研究中心其他
内容概要当前模型优化技术的迭代正沿着多维路径快速演进，其核心驱动力在于突破算法性能与产业需求间的适配瓶颈。以自适应学习机制与迁移学习框架为基础的优化策略，显著提升了模型在跨场景应用中的泛化能力，而超参数自动调优技术则通过PyTorch、TensorFlow等主流框架的接口标准化，降低了复杂模型的开发门槛。在部署层面，边缘计算与联邦学习的协同应用不仅缩短了金融预测、医疗影像分析等场景的响应延迟，更通
DeepSeek多语言AI高效应用实践智能计算研究中心其他
内容概要在人工智能技术快速迭代的背景下，DeepSeek系列模型凭借混合专家架构（MoE）与670亿参数规模，在多语言处理、视觉语言理解及复杂任务生成领域实现了突破性进展。本文系统性拆解其技术架构设计逻辑，聚焦论文写作、代码生成、SEO关键词拓展三大核心场景，分析模型在高生成质量、低使用成本维度的差异化优势。技术维度DeepSeekProver传统单模态模型多语言支持97种语言动态切换单一语种优化
基于roop/insightface将视频中包含指定人脸的视频片段提取并合并成新视频阆遤 python roop pytorch insightface
利用insightface.app.FaceAnalysis提最一个视频中包含指定人脸的视频片段，并将其合并成一个新视频，使用“buffalo_l”模型，模型需安装在代码当前目录下的.\models中。需要roop或其他支持pytorch、insightface、moviepy的环境。pytorch安装请见我其他文章。#cython:language_level=3str#-*-coding:ut
关于pytorch3d的安装诚威_lol_中大努力中人工智能 pytorch 人工智能 python
更新1：2025_2_04今天发现，原来的pytorch3d不见了，在我的aaa1环境中。重新安装，我发现最好用的还是去github下载最新的pytorch3d的zip，unzip之后，进去pipinstall-e.然后安装成功！1、参考文章1：windows安装PyTorch3D详细指南-哔哩哔哩(bilibili.com)这篇文章巨好2、参考文章2：pytorch3d/INSTALL.mdat
AI大模型训练教程 Small踢倒coffee_氕氘氚 python自学经验分享笔记
1.引言随着人工智能技术的快速发展，大模型（如GPT-3、BERT等）在自然语言处理、计算机视觉等领域取得了显著的成果。训练一个大模型需要大量的计算资源、数据和专业知识。本教程将带你了解如何从零开始训练一个AI大模型。2.准备工作2.1硬件要求GPU：推荐使用NVIDIA的高性能GPU，如A100、V100等。内存：至少64GBRAM。存储：SSD存储，至少1TB。#2.2软件环境操作系统：Lin
使用Jupyter Notebook进行深度学习编程 - 深度学习教程 shandianfk_com ChatGPT AI jupyter 深度学习 ide
大家好，今天我们要聊聊如何使用JupyterNotebook进行深度学习编程。深度学习是人工智能领域中的一项重要技术，通过模仿人脑神经网络的方式进行学习和分析。JupyterNotebook作为一个强大的工具，可以帮助我们轻松地进行深度学习编程，尤其适合初学者和研究人员。本文将带领大家一步步了解如何在JupyterNotebook中开展深度学习项目。一、什么是JupyterNotebook？Jup
Opencv之计算机视觉一闭月之泪舞计算机视觉计算机视觉 opencv python
一、环境准备使用opencv库来实现简单的计算机视觉。需要安装两个库：opencv-python和opencv-contrib-python，版本可以自行选择，注意不同版本的opencv中的某些函数名和用法可能不同pipinstallopencv-python==3.4.18.65-ihttps://pypi.tuna.tsinghua.edu.cn/simplepipinstallopencv-
计算机视觉总结 Trank-Lw 计算机视觉深度学习人工智能
以下是针对上述问题的详细解答，并结合代码示例进行说明：1.改进YOLOv5人脸检测模块，复杂光照场景准确率从98.2%提升至99.5%优化具体过程：光照补偿：在数据预处理阶段，采用自适应光照补偿算法，对图像进行实时增强，以减少光照变化对人脸检测的影响。数据增强：在训练数据中增加复杂光照场景下的样本，如强光、弱光、背光等，通过数据增强提高模型对不同光照条件的适应性。模型调整：对YOLOv5模型的网络
英伟达常用GPU参数速查表，含B300..... Ai17316391579 深度学习服务器人工智能机器学习服务器电脑计算机视觉深度学习神经网络
英伟达常用GPU参数速查表，收藏备用：含RTX5090、RTX4090D、L40、L20、A100、A800、H100、H800、H20、H200、B200、B300、GB300.....专注于高性能计算人工智能细分领域kyfwq001#5090##4090##英伟达“新核弹”B200发布##英伟达##英伟达B300##GPU##服务器##显卡##英伟达H800/A800芯片将禁售#
深度学习 Deep Learning 第8章深度学习优化 odoo中国 AI编程人工智能深度学习人工智能优化
深度学习第8章深度学习的优化章节概述本章深入探讨了深度学习中的优化技术，旨在解决模型训练过程中面临的各种挑战。优化是深度学习的核心环节，直接关系到模型的训练效率和最终性能。本章首先介绍了优化在深度学习中的特殊性，然后详细讨论了多种优化算法，包括随机梯度下降（SGD）、动量法、Nesterov动量法、AdaGrad、RMSProp和Adam等。此外，还探讨了参数初始化策略、自适应学习率方法以及二阶优
ztree设置禁用节点 3213213333332132 JavaScript ztree json setDisabledNode Ajax
ztree设置禁用节点的时候注意，当使用ajax后台请求数据,必须要设置为同步获取数据，否者会获取不到节点对象，导致设置禁用没有效果。 $(function(){ showTree(); setDisabledNode(); });
JVM patch by Taobao bookjovi java HotSpot
在网上无意中看到淘宝提交的hotspot patch，共四个，有意思，记录一下。 7050685：jsdbproc64.sh has a typo in the package name 7058036：FieldsAllocationStyle=2 does not work in 32-bit VM 7060619：C1 should respect inline and
将session存储到数据库中 dcj3sjt126com sql PHP session
CREATE TABLE sessions ( id CHAR(32) NOT NULL, data TEXT, last_accessed TIMESTAMP NOT NULL, PRIMARY KEY (id) ); <?php /** * Created by PhpStorm. * User: michaeldu * Date
Vector 171815164 vector
public Vector<CartProduct> delCart(Vector<CartProduct> cart, String id) { for (int i = 0; i < cart.size(); i++) { if (cart.get(i).getId().equals(id)) { cart.remove(i);
各连接池配置参数比较 g21121 连接池
排版真心费劲，大家凑合看下吧，见谅~ Druid DBCP C3P0 Proxool 数据库用户名称 Username Username User 数据库密码 Password Password Password 驱动名
[简单]mybatis insert语句添加动态字段 53873039oycg mybatis
mysql数据库,id自增,配置如下： <insert id="saveTestTb" useGeneratedKeys="true" keyProperty="id" parameterType=&
struts2拦截器配置云端月影 struts2拦截器
struts2拦截器interceptor的三种配置方法方法1. 普通配置法 <struts> <package name="struts2" extends="struts-default"> &
IE中页面不居中，火狐谷歌等正常 aijuans IE中页面不居中
问题是首页在火狐、谷歌、所有IE中正常显示，列表页的页面在火狐谷歌中正常，在IE6、7、8中都不中，觉得可能那个地方设置的让IE系列都不认识，仔细查看后发现，列表页中没写HTML模板部分没有添加DTD定义，就是<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3
String,int,Integer,char 几个类型常见转换 antonyup_2006 html sql .net
如何将字串 String 转换成整数 int? int i = Integer.valueOf(my_str).intValue(); int i=Integer.parseInt(str); 如何将字串 String 转换成Integer ? Integer integer=Integer.valueOf(str); 如何将整数 int 转换成字串 String ? 1.
PL/SQL的游标类型百合不是茶显示游标(静态游标)隐式游标游标的更新和删除 %rowtype ref游标(动态游标)
游标是oracle中的一个结果集,用于存放查询的结果; PL/SQL中游标的声明; 1,声明游标 2,打开游标(默认是关闭的); 3,提取数据 4,关闭游标注意的要点:游标必须声明在declare中,使用open打开游标,fetch取游标中的数据,close关闭游标隐式游标:主要是对DML数据的操作隐
JUnit4中@AfterClass @BeforeClass @after @before的区别对比 bijian1013 JUnit4 单元测试
一.基础知识 JUnit4使用Java5中的注解（annotation），以下是JUnit4常用的几个annotation： @Before：初始化方法对于每一个测试方法都要执行一次（注意与BeforeClass区别，后者是对于所有方法执行一次）@After：释放资源对于每一个测试方法都要执行一次（注意与AfterClass区别，后者是对于所有方法执行一次
精通Oracle10编程SQL(12)开发包 bijian1013 oracle 数据库 plsql
/* *开发包 *包用于逻辑组合相关的PL/SQL类型（例如TABLE类型和RECORD类型）、PL/SQL项（例如游标和游标变量）和PL/SQL子程序（例如过程和函数） */ --包用于逻辑组合相关的PL/SQL类型、项和子程序，它由包规范和包体两部分组成 --建立包规范：包规范实际是包与应用程序之间的接口，它用于定义包的公用组件，包括常量、变量、游标、过程和函数等 --在包规
【EhCache二】ehcache.xml配置详解 bit1129 ehcache.xml
在ehcache官网上找了多次，终于找到ehcache.xml配置元素和属性的含义说明文档了，这个文档包含在ehcache.xml的注释中！ ehcache.xml ： http://ehcache.org/ehcache.xml ehcache.xsd ： http://ehcache.org/ehcache.xsd ehcache配置文件的根元素是ehcahe ehcac
java.lang.ClassNotFoundException: org.springframework.web.context.ContextLoaderL 白糖_ java eclipse spring tomcat Web
今天学习spring+cxf的时候遇到一个问题：在web.xml中配置了spring的上下文监听器： <listener> <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class> </listener> 随后启动
angular.element boyitech AngularJS AngularJS API angular.element
angular.element 描述: 包裹着一部分DOM element或者是HTML字符串，把它作为一个jQuery元素来处理。（类似于jQuery的选择器啦）如果jQuery被引入了，则angular.element就可以看作是jQuery选择器，选择的对象可以使用jQuery的函数；如果jQuery不可用，angular.e
java-给定两个已排序序列，找出共同的元素。 bylijinnan java
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class CommonItemInTwoSortedArray { /** * 题目：给定两个已排序序列，找出共同的元素。 * 1.定义两个指针分别指向序列的开始。 * 如果指向的两个元素
sftp 异常，有遇到的吗？求解 Chen.H java jcraft auth jsch jschexception
com.jcraft.jsch.JSchException: Auth cancel at com.jcraft.jsch.Session.connect(Session.java:460) at com.jcraft.jsch.Session.connect(Session.java:154) at cn.vivame.util.ftp.SftpServerAccess.connec
[生物智能与人工智能]神经元中的电化学结构代表什么? comsci 人工智能
我这里做一个大胆的猜想,生物神经网络中的神经元中包含着一些化学和类似电路的结构,这些结构通常用来扮演类似我们在拓扑分析系统中的节点嵌入方程一样,使得我们的神经网络产生智能判断的能力,而这些嵌入到节点中的方程同时也扮演着"经验"的角色.... 我们可以尝试一下...在某些神经
通过LAC和CID获取经纬度信息 dai_lm lac cid
方法1：用浏览器打开http://www.minigps.net/cellsearch.html，然后输入lac和cid信息(mcc和mnc可以填0)，如果数据正确就可以获得相应的经纬度方法2：发送HTTP请求到http://www.open-electronics.org/celltrack/cell.php?hex=0&lac=<lac>&cid=&
JAVA的困难分析 datamachine java
前段时间转了一篇SQL的文章（http://datamachine.iteye.com/blog/1971896），文章不复杂，但思想深刻，就顺便思考了一下java的不足，当砖头丢出来，希望引点和田玉。 -----------------------------------------------------------------------------------------
小学5年级英语单词背诵第二课 dcj3sjt126com english word
money 钱 paper 纸 speak 讲，说 tell 告诉 remember 记得，想起 knock 敲，击，打 question 问题 number 数字，号码 learn 学会，学习 street 街道 carry 搬运，携带 send 发送，邮寄，发射 must 必须 light 灯，光线，轻的 front
linux下面没有tree命令 dcj3sjt126com linux
centos p安装 yum -y install tree mac os安装 brew install tree 首先来看tree的用法 tree 中文解释：tree 功能说明：以树状图列出目录的内容。语　　法：tree [-aACdDfFgilnNpqstux][-I <范本样式>][-P <范本样式
Map迭代方式，Map迭代，Map循环蕃薯耀 Map循环 Map迭代 Map迭代方式
Map迭代方式，Map迭代，Map循环 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年
Spring Cache注解+Redis hanqunfeng spring
Spring3.1 Cache注解依赖jar包：  <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-redis</artifactId>
Guava中针对集合的 filter和过滤功能 jackyrong filter
在guava库中，自带了过滤器(filter)的功能，可以用来对collection 进行过滤，先看例子： @Test public void whenFilterWithIterables_thenFiltered() { List<String> names = Lists.newArrayList("John"
学习编程那点事 lampcy 编程 android PHP html5
一年前的夏天，我还在纠结要不要改行，要不要去学php？能学到真本事吗？改行能成功吗？太多的问题，我终于不顾一切，下定决心，辞去了工作，来到传说中的帝都。老师给的乘车方式还算有效，很顺利的就到了学校，赶巧了，正好学校搬到了新校区。先安顿了下来，过了个轻松的周末，第一次到帝都，逛逛吧！接下来的周一，是我噩梦的开始，学习内容对我这个零基础的人来说，除了勉强完成老师布置的作业外，我已经没有时间和精力去
架构师之流处理---------bytebuffer的mark,limit和flip nannan408 ByteBuffer
1.前言。如题，limit其实就是可以读取的字节长度的意思，flip是清空的意思，mark是标记的意思。 2.例子. 例子代码: String str = "helloWorld"; ByteBuffer buff = ByteBuffer.wrap(str.getBytes()); Sy
org.apache.el.parser.ParseException: Encountered " ":" ": "" at line 1, column 1 Everyday都不同 $转义 el表达式
最近在做Highcharts的过程中，在写js时，出现了以下异常：严重: Servlet.service() for servlet jsp threw exception org.apache.el.parser.ParseException: Encountered " ":" ": "" at line 1,
用Java实现发送邮件到163 tntxia java实现
/* 在java版经常看到有人问如何用javamail发送邮件？如何接收邮件？如何访问多个文件夹等。问题零散，而历史的回复早已经淹没在问题的海洋之中。本人之前所做过一个java项目，其中包含有WebMail功能，当初为用java实现而对javamail摸索了一段时间，总算有点收获。看到论坛中的经常有此方面的问题，因此把我的一些经验帖出来，希望对大家有些帮助。此篇仅介绍用
探索实体类存在的真正意义 java小叶檀 POJO
一. 实体类简述实体类其实就是俗称的POJO,这种类一般不实现特殊框架下的接口，在程序中仅作为数据容器用来持久化存储数据用的 POJO（Plain Old Java Objects）简单的Java对象它的一般格式就是 public class A{ private String id; public Str