APolaris。

CVPR2019 Relation-Shape CNN、ICCV2019 Dense Point论文及源码学习

笔记参考了这两篇论文的作者的讲解，b站链接放在了文末。

文章目录

Relation-Shape CNN

Relation-Shape CNN
源码学习

rscnn_msn_seg.py
pointnet2_modules.py

PointnetSAModuleBase
PointnetSAModuleMSG
PointnetSAModule

pytorch_utils.py

RSConv
GloAvgConv
GroupAll

DensePoint

DensePoint
源码学习

densepoint_cls_L6_k24_g2.py
pointnet2_modules.py
pytorch_utils.py

PointConv
EnhancedPointConv
FC

点云处理存在的问题
研究方向

Relation-Shape CNN

Relation-Shape Convolutional Neural Network for Point Cloud Analysis
作者认为通过感知点云的形状特征可以更好的完成点云上的分类、分割等任务，因此需要设计一种可以描述某区域内点云形状的编码 $f_{P_{sub}}$ ，在设计时，作者选用了球体区域。建模如下：
$\mathbf{f}_{P_{\text {usb }}}=\sigma\left(\mathcal{A}\left(\left\{\mathcal{T}\left(\mathbf{f}_{x_{j}}\right), \forall x_{j}\right\}\right)\right), d_{i j}fPusb =σ(A({T(fxj),∀xj})),dij<r∀xj∈N(xi)$

源码学习

以分割网络为例，网络模型位于models目录下的rscnn_msn_seg.py，参数设置位于cfgs目录下的config_msn_partseg.yaml，训练使用train_partseg.sh. (果然还是更喜欢pytorch一些，感觉比tf写起来简洁清晰)
The code is heavily borrowed from Pointnet2_PyTorch.

rscnn_msn_seg.py

分割模型总共设置了4层PointnetSAModuleMSG，2层PointnetSAModule、4层PointnetFPModule和2层conv1d，其中PointnetSAModuleMSG用于提取局部特征，PointnetSAModule用于提取全局特征（global pooling），仅对forward部分添加了注释。各层的实现细节在pointnet2_modules.py中。

class RSCNN_MSN(nn.Module):
    r"""
        PointNet2 with multi-scale grouping
        Semantic segmentation network that uses feature propogation layers
        Parameters
        ----------
        num_classes: int
            Number of semantics classes to predict over -- size of softmax classifier that run for each point
        input_channels: int = 6
            Number of input channels in the feature descriptor for each point.  If the point cloud is Nx9, this
            value should be 6 as in an Nx9 point cloud, 3 of the channels are xyz, and 6 are feature descriptors
        use_xyz: bool = True
            Whether or not to use the xyz position of a point as a feature
    """

    def __init__(self, num_classes, input_channels=0, relation_prior=1, use_xyz=True):
        super().__init__()

        self.SA_modules = nn.ModuleList()
        c_in = input_channels
        self.SA_modules.append(     # 0
            PointnetSAModuleMSG(
                npoint=1024,
                radii=[0.075, 0.1, 0.125],
                nsamples=[16, 32, 48],
                mlps=[[c_in, 64], [c_in, 64], [c_in, 64]],
                first_layer=True,
                use_xyz=use_xyz,
                relation_prior=relation_prior
            )
        )
        c_out_0 = 64*3

        c_in = c_out_0
        self.SA_modules.append(    # 1
            PointnetSAModuleMSG(
                npoint=256,
                radii=[0.1, 0.15, 0.2],
                nsamples=[16, 48, 64],
                mlps=[[c_in, 128], [c_in, 128], [c_in, 128]],
                use_xyz=use_xyz,
                relation_prior=relation_prior
            )
        )
        c_out_1 = 128*3

        c_in = c_out_1
        self.SA_modules.append(    # 2
            PointnetSAModuleMSG(
                npoint=64,
                radii=[0.2, 0.3, 0.4],
                nsamples=[16, 32, 48],
                mlps=[[c_in, 256], [c_in, 256], [c_in, 256]],
                use_xyz=use_xyz,
                relation_prior=relation_prior
            )
        )
        c_out_2 = 256*3

        c_in = c_out_2
        self.SA_modules.append(    # 3
            PointnetSAModuleMSG(
                npoint=16,
                radii=[0.4, 0.6, 0.8],
                nsamples=[16, 24, 32],
                mlps=[[c_in, 512], [c_in, 512], [c_in, 512]],
                use_xyz=use_xyz,
                relation_prior=relation_prior
            )
        )
        c_out_3 = 512*3
        
        self.SA_modules.append(   # 4   global pooling
            PointnetSAModule(
                nsample = 16,
                mlp=[c_out_3, 128], use_xyz=use_xyz
            )
        )
        global_out = 128
        
        self.SA_modules.append(   # 5   global pooling
            PointnetSAModule(
                nsample = 64,
                mlp=[c_out_2, 128], use_xyz=use_xyz
            )
        )
        global_out2 = 128

        self.FP_modules = nn.ModuleList()
        self.FP_modules.append(
            PointnetFPModule(mlp=[256 + input_channels, 128, 128])
        )
        self.FP_modules.append(PointnetFPModule(mlp=[512 + c_out_0, 256, 256]))
        self.FP_modules.append(PointnetFPModule(mlp=[512 + c_out_1, 512, 512]))
        self.FP_modules.append(
            PointnetFPModule(mlp=[c_out_3 + c_out_2, 512, 512])
        )

        self.FC_layer = nn.Sequential(
            pt_utils.Conv1d(128+global_out+global_out2+16, 128, bn=True), nn.Dropout(),
            pt_utils.Conv1d(128, num_classes, activation=None)
        )

    def _break_up_pc(self, pc):
        xyz = pc[..., 0:3].contiguous()
        features = (
            pc[..., 3:].transpose(1, 2).contiguous()
            if pc.size(-1) > 3 else None
        )

        return xyz, features

    def forward(self, pointcloud: torch.cuda.FloatTensor, cls):
        r"""
            pointcloud: Variable(torch.cuda.FloatTensor)
                (B, N, 3 + input_channels) tensor
                Point cloud to run predicts on
                Each point in the point-cloud MUST
                be formated as (x, y, z, features...)
        """
        # 将坐标数据和feature数据分开
        xyz, features = self._break_up_pc(pointcloud)
        
        l_xyz, l_features = [xyz], [features]
        for i in range(len(self.SA_modules)):
            # SA_modules的前5层，每层得到的li_xyz和li_feature打乱顺序后存入
            # l_xyz和l_feature，作为下一次的输入，li_feature[4]为全局特征
            if i < 5:
                li_xyz, li_features = self.SA_modules[i](l_xyz[i], l_features[i])
                if li_xyz is not None:
                    random_index = np.arange(li_xyz.size()[1])
                    np.random.shuffle(random_index)
                    li_xyz = li_xyz[:, random_index, :]
                    li_features = li_features[:, :, random_index]
                l_xyz.append(li_xyz)
                l_features.append(li_features)
        # 再次计算一个全局特征
        _, global_out2_feat = self.SA_modules[5](l_xyz[3], l_features[3])
        # PointnetSAModuleMSG各层的计算顺序与添加顺序相反，大概这么写起来比较优雅，感觉hhhhh
        for i in range(-1, -(len(self.FP_modules) + 1), -1):
            l_features[i - 1 - 1] = self.FP_modules[i](
                l_xyz[i - 1 - 1], l_xyz[i - 1], l_features[i - 1 - 1], l_features[i - 1]
            )
        # 把全局特征、FPmodules得到的特征和class one-hot-vector concate到一起
        cls = cls.view(-1, 16, 1).repeat(1, 1, l_features[0].size()[2])         # object class one-hot-vector
        l_features[0] = torch.cat((l_features[0], l_features[-1].repeat(1, 1, l_features[0].size()[2]), global_out2_feat.repeat(1, 1, l_features[0].size()[2]), cls), 1)
        # 最后使用两层Conv1d得到分割结果
        return self.FC_layer(l_features[0]).transpose(1, 2).contiguous()

pointnet2_modules.py

大概是为了提高代码复用率，模型的forward均在_PointnetSAModuleBase中定义，PointnetSAModuleMSG、PointnetSAModule两个类也基本在PointnetSAModuleMSG中定义好了。但具体的RSConv、GroupAll和GloAvgConv定义在了pytorch_utils.py。
PointnetSAModuleMSG首先对输入点进行最远点采样，然后将采样得到的点的数据和原始点云数据以及特征输入传入QueryAndGroup层（类似于pointnet++的sampling & grouping层），得到shape为[B, 3+C, npoint, nsample]的tensor，然后传入SharedRSConv层进行处理。Relation-Shape CNN的partseg部分共设计了4层这样的PointnetSAModuleMSG，每层均包含3个尺度的操作。
PointnetSAModule，即global convolutional pooling操作，包含GroupAll层和GloAvgConv层。共设计了2层PointnetSAModule。

PointnetSAModuleBase

主要定义了前向传播函数，首先进行最远点采样，然后通过QueryAndGroup（sampling&grouping），再使用RSConv，如果使用了MSG，最后返回的是多尺度特征concat到一起的tensor，否则直接返回计算得到的特征。其中的furthest_point_sample和gather_operation的具体实现在pointnet2_utils.py、utils\csrc\sampling.c和utils\csrc\sampling_gpu.cu中。

class _PointnetSAModuleBase(nn.Module):

    def __init__(self):
        super().__init__()
        self.npoint = None
        self.groupers = None
        self.mlps = None

    def forward(self, xyz: torch.Tensor,
                features: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):
        r"""
        Parameters
        xyz :  (B, N, 3) tensor of the xyz coordinates of the points
        features : (B, N, C) tensor of the descriptors of the the points

        Returns
        new_xyz (B, npoint, 3) tensor of the new points' xyz
        new_features :(B, npoint, \sum_k(mlps[k][-1])) tensor of the new_points descriptors
        """

        new_features_list = []
        # (B,3,N)
        xyz_flipped = xyz.transpose(1, 2).contiguous()
        if self.npoint is not None:
            # 最远点采样 fps_idx size=[B, npoint]
            fps_idx = pointnet2_utils.furthest_point_sample(xyz, self.npoint)  
            # 从xyz中挑出最远点采样得到的点的坐标 new_xyz size=[B,npoint,3]
            new_xyz = pointnet2_utils.gather_operation(xyz_flipped, fps_idx).transpose(1, 2).contiguous() 
            fps_idx = fps_idx.data
        else:
            new_xyz = None
            fps_idx = None
        # 逐层计算特征，并更新new_feature list
        for i in range(len(self.groupers)):
            new_features = self.groupers[i](xyz, new_xyz, features, fps_idx) if self.npoint is not None else self.groupers[i](xyz, new_xyz, features)  
            new_features = self.mlps[i](
                new_features
            )  # (B, mlp[-1], npoint)

            new_features_list.append(new_features)
        # concat new_features_list的特征并返回
        return new_xyz, torch.cat(new_features_list, dim=1)

PointnetSAModuleMSG

npoint == None的部分，均对应于PointnetSAModule。

class PointnetSAModuleMSG(_PointnetSAModuleBase):
    r"""Pointnet set abstrction layer with multiscale grouping
    Parameters
    npoint : int
        Number of points
    radii : list of float32
        list of radii to group with
    nsamples : list of int32
        Number of samples in each ball query
    mlps : list of list of int32
        Spec of the pointnet before the global max_pool for each scale
    bn : bool
        Use batchnorm
    """

    def __init__(
            self,
            *,
            npoint: int,
            radii: List[float],
            nsamples: List[int],
            mlps: List[List[int]],
            use_xyz: bool = True,
            bias = True,
            init = nn.init.kaiming_normal,
            first_layer = False,
            relation_prior = 1
    ):
        super().__init__()
        assert len(radii) == len(nsamples) == len(mlps)
        self.npoint = npoint
        self.groupers = nn.ModuleList()
        self.mlps = nn.ModuleList()
        
        # initialize shared mapping functions
        C_in = (mlps[0][0] + 3) if use_xyz else mlps[0][0]
        C_out = mlps[0][1]
        # 根据relation_prior设置in_channels
        if relation_prior == 0:
            in_channels = 1
        elif relation_prior == 1 or relation_prior == 2:
            in_channels = 10
        else:
            assert False, "relation_prior can only be 0, 1, 2."
        # 定义mapping modules，将特征映射到高维空间,仅在npoint不为None时使用，即只在PointnetSAModuleMSG中
        # npoint=None对应的为PointnetSAModule
        # 如果是第一层添加mapping_func1、mapping_func2、xyz_raising
        if first_layer:
            mapping_func1 = nn.Conv2d(in_channels = in_channels, out_channels = math.floor(C_out / 2), kernel_size = (1, 1), 
                                      stride = (1, 1), bias = bias)
            mapping_func2 = nn.Conv2d(in_channels = math.floor(C_out / 2), out_channels = 16, kernel_size = (1, 1), 
                                  stride = (1, 1), bias = bias)
            xyz_raising = nn.Conv2d(in_channels = C_in, out_channels = 16, kernel_size = (1, 1), 
                                  stride = (1, 1), bias = bias)
            init(xyz_raising.weight)
            if bias: #如果使用bias 初始化为0
                nn.init.constant(xyz_raising.bias, 0)
        # 否则只添加mapping_func1、mapping_func2
        elif npoint is not None:
            mapping_func1 = nn.Conv2d(in_channels = in_channels, out_channels = math.floor(C_out / 4), kernel_size = (1, 1), 
                                      stride = (1, 1), bias = bias)
            mapping_func2 = nn.Conv2d(in_channels = math.floor(C_out / 4), out_channels = C_in, kernel_size = (1, 1), 
                                  stride = (1, 1), bias = bias)
        # 初始化mapping_func1、mapping_func2的权重
        if npoint is not None:
            init(mapping_func1.weight)
            init(mapping_func2.weight)
            if bias:
                nn.init.constant(mapping_func1.bias, 0)
                nn.init.constant(mapping_func2.bias, 0)    
                     
            # channel raising mapping
            cr_mapping = nn.Conv1d(in_channels = C_in if not first_layer else 16, out_channels = C_out, kernel_size = 1, 
                                      stride = 1, bias = bias)
            init(cr_mapping.weight)
            nn.init.constant(cr_mapping.bias, 0)
        if first_layer:
            mapping = [mapping_func1, mapping_func2, cr_mapping, xyz_raising]
        elif npoint is not None:
            mapping = [mapping_func1, mapping_func2, cr_mapping]
        # 根据radius为每层添加不同尺度的conv2d
        for i in range(len(radii)):
            radius = radii[i]
            nsample = nsamples[i]
            # 类似于pointnet++中的sampling & grouping 层 
            self.groupers.append(
                pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz)  # [B, 3+C, npoint, nsample]
                if npoint is not None else pointnet2_utils.GroupAll(use_xyz) #  [B, 3+C, 1, N]
            )
            mlp_spec = mlps[i]
            # 如果使用xyz作为特征输入，in_channels需要添加3个通道
            if use_xyz:
                mlp_spec[0] += 3
            # 设置模型的rsconv层
            if npoint is not None:
                self.mlps.append(pt_utils.SharedRSConv(mlp_spec, mapping = mapping, relation_prior = relation_prior, first_layer = first_layer))
            else:   # global convolutional pooling
                self.mlps.append(pt_utils.GloAvgConv(C_in = C_in, C_out = C_out))

PointnetSAModule

在 PointnetSAModuleMSG中已经定义好了，只包含GroupAll层和GloAvgConv层。根据前面rscnn_msn_seg.py中的设置，GroupAll返回的是xyz和feature concat到一起的tensor，shape为[B,C+3,1,N]，然后通过GloAvgConv进行特征提取得到shape为[B,c_out,N]的tensor。

class PointnetSAModule(PointnetSAModuleMSG):
    r"""Pointnet set abstrction layer
     Parameters
    ----------
    npoint : int Number of features
    radius : float  Radius of ball
    nsample : int  Number of samples in the ball query
    mlp : list  Spec of the pointnet before the global max_pool
    bn : bool  Use batchnorm
    """
    def __init__(
            self,
            *,
            mlp: List[int],
            npoint: int = None,
            radius: float = None,
            nsample: int = None,
            use_xyz: bool = True,
    ):
        super().__init__(
            mlps=[mlp],
            npoint=npoint,
            radii=[radius],
            nsamples=[nsample],
            use_xyz=use_xyz
        )

pytorch_utils.py

RSConv

主要就是根据前面设计的mapping函数来实现的，参数共享主要是因为输入的tensor不只包含中心点的信息，还包含相邻点和中心点与邻点之间的relation shape（作者使用的是欧氏距离）。

class RSConv(nn.Module):
    '''
    Input shape: (B, C_in, npoint, nsample)
    Output shape: (B, C_out, npoint)
    '''
    def __init__(
            self, 
            C_in, 
            C_out,
            activation = nn.ReLU(inplace=True),
            mapping = None,
            relation_prior = 1,
            first_layer = False
    ):
        super(RSConv, self).__init__()                                             
        self.bn_rsconv = nn.BatchNorm2d(C_in) if not first_layer else nn.BatchNorm2d(16)
        self.bn_channel_raising = nn.BatchNorm1d(C_out)
        self.bn_xyz_raising = nn.BatchNorm2d(16)
        if first_layer:
            self.bn_mapping = nn.BatchNorm2d(math.floor(C_out / 2))
        else: 
            self.bn_mapping = nn.BatchNorm2d(math.floor(C_out / 4))
        self.activation = activation
        self.relation_prior = relation_prior
        self.first_layer = first_layer
        self.mapping_func1 = mapping[0]
        self.mapping_func2 = mapping[1]
        self.cr_mapping = mapping[2]
        if first_layer:
            self.xyz_raising = mapping[3]
    def forward(self, input): # input: (B, 3 + 3 + C_in, npoint, centroid + nsample)
        # (B, C_in, npoint, nsample+1), input features
        x = input[:, 3:, :, :]           
        C_in = x.size()[1]
        nsample = x.size()[3]
        if self.relation_prior == 2:
            abs_coord = input[:, 0:2, :, :]
            delta_x = input[:, 3:5, :, :]
            zero_vec = Variable(torch.zeros(x.size()[0], 1, x.size()[2], nsample).cuda())
        else:
            abs_coord = input[:, 0:3, :, :]  # (B, 3, npoint, nsample+1), absolute coordinates
            delta_x = input[:, 3:6, :, :]    # (B, 3, npoint, nsample+1), normalized coordinates
        # (B, 3, npoint, nsample),  centroid point
        coord_xi = abs_coord[:, :, :, 0:1].repeat(1, 1, 1, nsample)   
        h_xi_xj = torch.norm(delta_x, p = 2, dim = 1).unsqueeze(1)
        # h_xi_xj size=(B, 10, npoint, nsample)
        if self.relation_prior == 1: 
            h_xi_xj = torch.cat((h_xi_xj, coord_xi, abs_coord, delta_x), dim = 1)
        elif self.relation_prior == 2:
            h_xi_xj = torch.cat((h_xi_xj, coord_xi, zero_vec, abs_coord, zero_vec, delta_x, zero_vec), dim = 1)
        del coord_xi, abs_coord, delta_x
        # 使用预先定义好的mapping layers以及batch_norm层和activation function将feature映射到高维空间处理
        h_xi_xj = self.mapping_func2(self.activation(self.bn_mapping(self.mapping_func1(h_xi_xj))))
        if self.first_layer:
            x = self.activation(self.bn_xyz_raising(self.xyz_raising(x)))
        x = F.max_pool2d(self.activation(self.bn_rsconv(torch.mul(h_xi_xj, x))), kernel_size = (1, nsample)).squeeze(3)   # (B, C_in, npoint)
        del h_xi_xj
        x = self.activation(self.bn_channel_raising(self.cr_mapping(x)))
        return x
class RSConvLayer(nn.Sequential):
    def __init__(
            self,
            in_size: int,
            out_size: int,
            activation=nn.ReLU(inplace=True),
            conv=RSConv,
            mapping = None,
            relation_prior = 1,
            first_layer = False
    ):
        super(RSConvLayer, self).__init__()

        conv_unit = conv(
            in_size,
            out_size,
            activation = activation,
            mapping = mapping,
            relation_prior = relation_prior,
            first_layer = first_layer
        )

        self.add_module('RS_Conv', conv_unit)
                
class SharedRSConv(nn.Sequential):
    def __init__(
            self,
            args: List[int],
            *,
            activation=nn.ReLU(inplace=True),
            mapping = None,
            relation_prior = 1,
            first_layer = False
    ):
        super().__init__()

        for i in range(len(args) - 1):
            self.add_module(
                'RSConvLayer{}'.format(i),
                RSConvLayer(
                    args[i],
                    args[i + 1],
                    activation = activation,
                    mapping = mapping,
                    relation_prior = relation_prior,
                    first_layer = first_layer
                )
            )

GloAvgConv

Conv2d+BatchNorm2d+acitvation function(ReLU)+max_pool2d

class GloAvgConv(nn.Module):
    '''
    Input shape: (B, C_in, 1, nsample)
    Output shape: (B, C_out, npoint)
    '''
    def __init__(
            self, 
            C_in, 
            C_out, 
            init=nn.init.kaiming_normal, 
            bias = True,
            activation = nn.ReLU(inplace=True)
    ):
        super(GloAvgConv, self).__init__()
        self.conv_avg = nn.Conv2d(in_channels = C_in, out_channels = C_out, kernel_size = (1, 1), 
                                  stride = (1, 1), bias = bias) 
        self.bn_avg = nn.BatchNorm2d(C_out)
        self.activation = activation
        
        init(self.conv_avg.weight)
        if bias:
            nn.init.constant(self.conv_avg.bias, 0)
    def forward(self, x):
        nsample = x.size()[3]
        x = self.activation(self.bn_avg(self.conv_avg(x)))
        x = F.max_pool2d(x, kernel_size = (1, nsample)).squeeze(3)
        
        return x

GroupAll

rscnn_msn_seg.py的模型中，该层用于把xyz和feature concate到一起，返回一个shape为[B,3+C,1,N]的tensor。

class GroupAll(nn.Module):
    def __init__(self, use_xyz: bool = True):
        super().__init__()
        self.use_xyz = use_xyz
    def forward(
            self,
            xyz: torch.Tensor,
            new_xyz: torch.Tensor,
            features: torch.Tensor = None
    ) -> Tuple[torch.Tensor]:
        r"""
        Parameters
        xyz : xyz coordinates of the features (B, N, 3)
        new_xyz :  Ignored
        features : Descriptors of the features (B, C, N)

        Returns
        new_features :  (B, C + 3, 1, N) tensor
        """
        # grouped_xyz shape=[B,3,N]
        grouped_xyz = xyz.transpose(1, 2).unsqueeze(2)
        if features is not None:
            grouped_features = features.unsqueeze(2)
            if self.use_xyz:
                # new_features shape = [B, 3 + C, 1, N]
                new_features = torch.cat([grouped_xyz, grouped_features],dim=1)  
            else:
                # new_features shape = [B, C, 1, N]
                new_features = grouped_features
        else:
            # new_features shape = [B, 3, 1, N]
            new_features = grouped_xyz
        return new_features

DensePoint

DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing
Motivation：引入上下文信息（多尺度的信息）可以更好的对目标的pattern进行识别。
Context：potential semantic dependencies between a target pattern and its surroundings.
最直接的做法是使用多尺度学习，但像PointNet++使用的多尺度学习，参数量、Flops会增加，一般最多会做到三个尺度。从我们的认识逻辑上来讲，不同尺度的信息反应的语义程度是不同的，而在同层进行多尺度的学习得到的是同一层的semantic level，而不是不同级别的semantic level。
作者希望能够聚集如下图中所示的多个上下文信息来进行识别。

在进行实现时，参考了DenseNet的做法，将不同层的不同感受野处理得到的信息进行aggregation，并使用filter grouping来增强卷积PConv的效果。

最后在Model40、Model10的classification效果还不错，对于噪声和信息丢失具有很好的鲁棒性。

源码学习

densepoint_cls_L6_k24_g2.py

DensePoint的分类网络结构设计部分，主要包括两个stage、一个global pooling模块以及最后的全连接层模块，stage1包含一层PointnetSAModuleMSG，stage2包含4层PointnetSAModuleMSG，global pooling包含一层PointnetSAModule，最后的FC_layer包括3层全连接层，层与层之间添加了dropout。
因为在设置PointnetSAModuleMSG层时，使用的conv操作视参数设置而定，根据参数的设定，stage1只包含一层PointConv层(ppools)，在stage2中，第一层PointnetSAModuleMSG同样使用了PointConv层，此后的三层使用了EnhancedPointConv操作(pconvs)，global pooling使用的则是GloAvgConv层。PointConv、EnhancedPointConv、GloAvgConv和FC的细节定义在pytorch_utils.py中。

# DensePoint: 2 PPools + 3 PConvs + 1 global pool;
class DensePoint(nn.Module):
    r"""
        PointNet2 with multi-scale grouping
        Semantic segmentation network that uses feature propogation layers

        Parameters
        ----------
        num_classes: int
            Number of semantics classes to predict over -- size of softmax classifier that run for each point
        input_channels: int = 6
            Number of input channels in the feature descriptor for each point.  If the point cloud is Nx9, this
            value should be 6 as in an Nx9 point cloud, 3 of the channels are xyz, and 6 are feature descriptors
        use_xyz: bool = True
            Whether or not to use the xyz position of a point as a feature
    """

    def __init__(self, num_classes, input_channels=0, use_xyz=True):
        super().__init__()

        self.SA_modules = nn.ModuleList()
        
        # stage 1 begin
        self.SA_modules.append(
            PointnetSAModuleMSG(
                npoint=512,
                radii=[0.25],
                nsamples=[64],
                mlps=[[input_channels, 96]],
                use_xyz=use_xyz,
                pool=True
            )
        )
        # stage 1 end
        
        # stage 2 begin
        input_channels = 96
        self.SA_modules.append(
            PointnetSAModuleMSG(
                npoint=128,
                radii=[0.32],
                nsamples=[64],
                mlps=[[input_channels, 93]],
                use_xyz=use_xyz,
                pool=True
            )
        )
        
        input_channels = 93
        self.SA_modules.append(
            PointnetSAModuleMSG(
                npoint=128,
                radii=[0.39],
                nsamples=[16],
                mlps=[[input_channels, 96]],
                group_number=2,
                use_xyz=use_xyz,
                after_pool=True
            )
        )
        
        input_channels = 117
        self.SA_modules.append(
            PointnetSAModuleMSG(
                npoint=128,
                radii=[0.39],
                nsamples=[16],
                mlps=[[input_channels, 96]],
                group_number=2,
                use_xyz=use_xyz
            )
        )
        
        input_channels = 141
        self.SA_modules.append(
            PointnetSAModuleMSG(
                npoint=128,
                radii=[0.39],
                nsamples=[16],
                mlps=[[input_channels, 96]],
                group_number=2,
                use_xyz=use_xyz,
                before_pool=True
            )
        )
        # stage 2 end
       
        # global pooling
        input_channels = 165
        self.SA_modules.append(
            PointnetSAModule(
                mlp=[input_channels, 512], use_xyz=use_xyz
            )
        )

        self.FC_layer = nn.Sequential(
            pt_utils.FC(512, 512, activation=nn.ReLU(inplace=True), bn=True),
            nn.Dropout(p=0.5),
            pt_utils.FC(512, 256, activation=nn.ReLU(inplace=True), bn=True),
            nn.Dropout(p=0.5),
            pt_utils.FC(256, num_classes, activation=None)
        )

    def _break_up_pc(self, pc):
        xyz = pc[..., 0:3].contiguous()
        features = (
            pc[..., 3:].transpose(1, 2).contiguous()
            if pc.size(-1) > 3 else None
        )
        return xyz, features

    def forward(self, pointcloud: torch.cuda.FloatTensor):
        r"""
            Forward pass of the network

            Parameters
            ----------
            pointcloud: Variable(torch.cuda.FloatTensor)
                (B, N, 3 + input_channels) tensor
                Point cloud to run predicts on
                Each point in the point-cloud MUST
                be formated as (x, y, z, features...)
        """
        xyz, features = self._break_up_pc(pointcloud)
        for module in self.SA_modules:
            xyz, features = module(xyz, features)
        return self.FC_layer(features.squeeze(-1))

pointnet2_modules.py

PointnetSAModuleMSG、PointnetSAModule以及PointnetFPModule的具体实现。在此前Relation-Shape CNN的实现上做了修改，_PointnetSAModuleBase、PointnetSAModule、PointnetFPModule与Relation-Shape CNN的实现相比没有什么变化，主要的修改在PointnetSAModuleMSG部分，根据参数设置来设置添加的是PointConv、EnhancedPointConv、GloAvgConv中的那一层。

class PointnetSAModuleMSG(_PointnetSAModuleBase):
    r"""Pointnet set abstrction layer with multiscale grouping
    Parameters
    npoint :  Number of points
    radii : list of radii to group with
    nsamples :  Number of samples in each ball query
    mlps : Spec of the pointnet before the global max_pool for each scale
    bn : bool, Use batchnorm
    """
    def __init__(
            self,
            *,
            npoint: int,
            radii: List[float],
            nsamples: List[int],
            mlps: List[List[int]],
            group_number = 1,
            use_xyz: bool = True,
            pool: bool = False,
            before_pool: bool = False,
            after_pool: bool = False,
            bias = True,
            init = nn.init.kaiming_normal
    ):
        super().__init__()
        assert len(radii) == len(nsamples) == len(mlps)
        self.pool = pool
        self.npoint = npoint
        self.groupers = nn.ModuleList()
        self.mlps = nn.ModuleList()
        if pool:
            # 定义和初始化pintconv操作
            C_in = (mlps[0][0] + 3) if use_xyz else mlps[0][0]
            C_out = mlps[0][1]
            pconv = nn.Conv2d(in_channels = C_in, out_channels = C_out, kernel_size = (1, 1), 
                                       stride = (1, 1), bias = bias)
            init(pconv.weight)
            if bias:
                nn.init.constant(pconv.bias, 0)
            convs = [pconv]
        # 根据输入条件设置卷积操作
        for i in range(len(radii)):
            radius = radii[i]
            nsample = nsamples[i]
            self.groupers.append(
                pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz)
                if npoint is not None else pointnet2_utils.GroupAll(use_xyz)
            )
            mlp_spec = mlps[i]
            if use_xyz:
                mlp_spec[0] += 3
            if npoint is None:
                self.mlps.append(pt_utils.GloAvgConv(C_in = mlp_spec[0], C_out = mlp_spec[1]))
            elif pool:
                self.mlps.append(pt_utils.PointConv(C_in = mlp_spec[0], C_out = mlp_spec[1], convs = convs))
            else:
                self.mlps.append(pt_utils.EnhancedPointConv(C_in = mlp_spec[0], C_out = mlp_spec[1], group_number = group_number, before_pool = before_pool, after_pool = after_pool))

pytorch_utils.py

PointConv、EnhancedPointConv、GloAvgConv等的具体操作。GloAvgConv与Relation-shape CNN中一样，没有变化。

PointConv

PointConv比较简单，根据传入的预先设定参数convs来处理，结构是convs+batchnorm2d+activation function(ReLU)+max pooling

class PointConv(nn.Module):
    '''
    Input shape: (B, C_in, npoint, nsample)
    Output shape: (B, C_out, npoint)
    '''
    def __init__(self, C_in, C_out, convs=None):
        super(PointConv, self).__init__()
        self.bn = nn.BatchNorm2d(C_out)
        self.activation = nn.ReLU(inplace=True)
        self.pconv = convs[0]
    def forward(self, x): # x: (B, C_in, npoint, nsample)
        nsample = x.size(3)
        x = self.activation(self.bn(self.pconv(x)))
        return F.max_pool2d(x, kernel_size = (1, nsample)).squeeze(3)

EnhancedPointConv

每次通过conv层来从输入点中获得特征，并与此前的得到的特征concat到一起。cls模型中共设置了3次EnhancedPointConv，更加直观的操作方式可以参考上文中的figure3和其上方的图。

class EnhancedPointConv(nn.Module):
    '''
    Input shape: (B, C_in, npoint, nsample)
    Output shape: (B, C_out, npoint)
    '''
    def __init__(self, C_in, C_out, group_number=1, before_pool=False, after_pool=False, init=nn.init.kaiming_normal, bias=True):
        super(EnhancedPointConv, self).__init__()
        self.before_pool, self.after_pool = before_pool, after_pool
        C_small = math.floor(C_out/4)
        self.conv_phi = nn.Conv2d(in_channels = C_in, out_channels = C_out, groups = group_number, kernel_size = (1, 1),
                                  stride = (1, 1), bias = bias)    # ~\phi function: grouped version
        self.conv_psi = nn.Conv1d(in_channels = C_out, out_channels = C_small, kernel_size = 1,
                              stride = 1, bias = bias)             # \psi function
        # 如果是在pool层之后h或者之前需要添加一次batch_norm
        if not after_pool:
            self.bn_cin = nn.BatchNorm2d(C_in)
        self.bn_phi = nn.BatchNorm2d(C_out)
        if before_pool:
            self.bn_concat = nn.BatchNorm1d(C_in-3+C_small)
        self.activation = nn.ReLU(inplace=True)
        self.dropout = nn.Dropout(p=0.2)

        init(self.conv_phi.weight)
        init(self.conv_psi.weight)
        if bias:
            nn.init.constant(self.conv_phi.bias, 0)
            nn.init.constant(self.conv_psi.bias, 0)

    def forward(self, input): # x: (B, C_in, npoint, nsample)
        # input[0]为本次最远点采样得到的点的xyz，input[1]为上次一操作得到的特征
        x, last_feat = input[0], input[1]
        nsample = x.size(3)
        # 如果该层在pool操作之后前需要添加一次batch_norm操作
        if not self.after_pool:
            x = self.activation(self.bn_cin(x))
        # 通过两次卷积操作得到新的特征，并与此前的特征concat到一起作为下一次的输入特征
        x = self.activation(self.bn_phi(self.conv_phi(x)))
        x = F.max_pool2d(x, kernel_size=(1, nsample)).squeeze(3)
        x = torch.cat((last_feat, self.dropout(self.conv_psi(x))), dim=1)
        # 如果是在pool操作之前也需要进行一次batch_norm操作
        if self.before_pool:
            x = self.activation(self.bn_concat(x))
        return x

FC

class FC(nn.Sequential):

    def __init__(
            self,
            in_size: int,
            out_size: int,
            *,
            activation=nn.ReLU(inplace=True),
            bn: bool = False,
            init=None,
            preact: bool = False,
            name: str = ""
    ):
        super().__init__()

        fc = nn.Linear(in_size, out_size, bias=not bn)
        if init is not None:
            init(fc.weight)
        if not bn:
            nn.init.constant(fc.bias, 0)

        if preact:
            if bn:
                self.add_module(name + 'bn', BatchNorm1d(in_size))

            if activation is not None:
                self.add_module(name + 'activation', activation)

        self.add_module(name + 'fc', fc)

        if not preact:
            if bn:
                self.add_module(name + 'bn', BatchNorm1d(out_size))

            if activation is not None:
                self.add_module(name + 'activation', activation)

点云处理存在的问题

在百万点云数据上的高效处理
多传感器数据的结合（2D、3D数据）
高精度识别
鲁棒性

研究方向

几何深度学习（geometric DL）、segmentation、detection、completion（补全）、registration（配准）
capsule、GAN、one-shot/zero-shot、meta-learning、NAS

参考：
bilibili: 中科院模式识别国家重点实验室在读博士生刘永成：深度学习在3D点云处理中的探索
github: yongchengliu

你可能感兴趣的:(语义分割)

深度学习在环境感知中的应用：案例与代码实现
让机器学会“看”世界：深度学习如何赋能环境感知？关键词深度学习|环境感知|计算机视觉|传感器融合|语义分割|目标检测|自动驾驶摘要环境感知是机器与外界互动的“眼睛和耳朵”——从自动驾驶汽车识别行人，到智能机器人避开障碍物，再到城市监控系统检测异常，所有智能系统都需要先“理解”环境，才能做出决策。传统环境感知方法依赖手工特征提取，难以应对复杂场景；而深度学习通过数据驱动的方式，让机器从大量数据中自动
BEV+Transformer Monkey PilotX 自动驾驶 transformer 深度学习人工智能
在自动驾驶系统中，BEV（Bird’sEyeView）+Transformer主要应用于感知与环境建图（Perception&SceneUnderstanding）环节，尤其是在多传感器融合、目标检测、语义分割、轨迹预测等任务中。在自动驾驶中的关键应用场景应用环节BEV+Transformer的作用感知（Perception）多摄像头图像融合成BEV视角，进行目标检测、语义分割预测（Predict
RAG实战指南 Day 11：文本分块策略与最佳实践在未来等你 RAG实战指南 RAG 检索增强生成文本分块语义分割文档处理 NLP 人工智能
【RAG实战指南Day11】文本分块策略与最佳实践文章标签RAG,检索增强生成,文本分块,语义分割,文档处理,NLP,人工智能,大语言模型文章简述文本分块是RAG系统构建中的关键环节，直接影响检索准确率。本文深入解析5种主流分块技术：1)固定大小分块的实现与调优技巧；2)基于语义的递归分割算法；3)文档结构感知的分块策略；4)LLM增强的智能分块方法；5)多模态混合内容处理方案。通过电商知识库和科
语义分割模型的轻量化与准确率提升研究 pk_xz123456 仿真模型深度学习算法 transformer 深度学习人工智能算法数据结构
语义分割模型的轻量化与准确率提升研究1.引言语义分割是计算机视觉领域的核心任务之一，它要求模型为图像中的每个像素分配一个类别标签。随着深度学习的发展，语义分割模型在多个领域得到了广泛应用，如自动驾驶、医学影像分析、遥感图像解译等。然而，现有的语义分割模型往往面临两个主要挑战：模型复杂度高导致难以部署在资源受限的设备上，以及准确率仍有提升空间以满足实际应用需求。本文将从模型轻量化和准确率提升两个角度
初始CNN(卷积神经网络) 超龄超能程序猿机器学习 cnn 人工智能神经网络
卷积神经网络（ConvolutionalNeuralNetwork，简称CNN）作为深度学习的重要分支，在图像识别、目标检测、语义分割等领域大放异彩。无论是手机上的人脸识别解锁，还是自动驾驶汽车对道路和行人的识别，背后都离不开CNN的强大能力一、CNN诞生的背景与意义在CNN出现之前，传统的图像识别方法主要依赖人工提取特征，例如使用SIFT（尺度不变特征变换）、HOG（方向梯度直方图）等算法。这些
ConvNeXT：面向 2020 年代的卷积神经网络
摘要视觉识别的“咆哮二十年代”始于VisionTransformer（ViT）的引入，ViT很快取代了ConvNet，成为图像分类任务中的最新最强模型。然而，vanillaViT在应用于目标检测、语义分割等通用计算机视觉任务时面临困难。HierarchicalTransformer（如SwinTransformer）重新引入了若干ConvNet的先验知识，使Transformer成为实用的通用视觉
Python机器学习实战——逻辑回归（附完整代码和结果）小白熊XBX 机器学习机器学习 python 逻辑回归
Python机器学习实战——逻辑回归（附完整代码和结果）关于作者作者：小白熊作者简介：精通c#、Halcon、Python、Matlab，擅长机器视觉、机器学习、深度学习、数字图像处理、工业检测识别定位、用户界面设计、目标检测、图像分类、姿态识别、人脸识别、语义分割、路径规划、智能优化算法、大数据分析、各类算法融合创新等等。联系邮箱：[email protected]科研辅导、知识付费答疑、个性化定制
ResNet（Residual Network）不想秃头的程序神经网络语音识别人工智能深度学习网络残差网络神经网络
ResNet（ResidualNetwork）是深度学习中一种经典的卷积神经网络（CNN）架构，由微软研究院的KaimingHe等人在2015年提出。它通过引入残差连接（SkipConnection）解决了深度神经网络中的梯度消失问题，使得网络可以训练极深的模型（如上百层），并在图像分类、目标检测、语义分割等任务中取得了突破性成果。以下是ResNet的详细介绍：一、核心思想ResNet的核心创新是
【深度学习加速探秘】Winograd 卷积算法：让计算效率 “飞” 起来 heimeiyingwang 算法深度学习算法人工智能
一、为什么需要Winograd卷积算法？从“卷积计算瓶颈”说起在深度学习领域，卷积神经网络（CNN）被广泛应用于图像识别、目标检测、语义分割等任务。然而，卷积操作作为CNN的核心计算单元，其计算量巨大，消耗大量的时间和计算资源。随着模型规模不断增大，传统卷积算法的计算效率成为限制深度学习发展的一大瓶颈。Winograd卷积算法的出现，犹如一把利刃，直击传统卷积计算的痛点。它通过巧妙的数学变换，大幅
基于深度学习的智能图像语义分割系统：技术与实践 Blossom.118 机器学习与人工智能深度学习人工智能 python 分类音视频机器学习 sklearn
前言图像语义分割是计算机视觉领域中的一个重要任务，其目标是将图像中的每个像素分配到预定义的语义类别中。这一技术在自动驾驶、医学影像分析、机器人视觉等多个领域有着广泛的应用。近年来，深度学习技术，尤其是卷积神经网络（CNN）及其变体，为图像语义分割带来了显著的改进。本文将详细介绍基于深度学习的智能图像语义分割系统的原理、实现方法以及实际应用案例。一、图像语义分割的基本概念1.1什么是图像语义分割？图
[论文阅读]PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers 颜笑晏晏论文阅读
1.摘要双分支网络结构已显示出其对实时语义分割任务的效率性和有效性。然而，低级细节和高级语义的直接融合将导致细节特征容易被周围上下文信息淹没，即本文中的超调(overshoot)，这限制了现有两个分支模型的准确性的提高。在本文中，我们在卷积神经网络（CNN）和比例积分微分（PID）控制器之间架起了桥梁，并揭示了双分支网络只是一个比例积分（PI）控制器，当然也会存在类似的超调问题。为了解决这个问题，
【GitHub开源项目实战】DINOv2 自监督视觉模型深度解构：多任务零微调性能与多分辨率表征架构解析观熵 GitHub开源项目实战 github 开源架构人工智能
DINOv2自监督视觉模型深度解构：多任务零微调性能与多分辨率表征架构解析关键词DINOv2、自监督视觉模型、ViT、多分辨率表示、语义分割、深度估计、Zero-shot、图像表示学习、OpenCLIP替代、MetaAI摘要DINOv2是由MetaAIResearch推出的下一代自监督视觉基础模型，在保持不依赖人工标签的前提下，显著提升了多任务性能，尤其在语义分割、图像分类、深度估计等下游任务中超
python批量修改xml文件爱上答复 xml
计算机视觉领域是当下比教热门的一个研究领域，包括目标检测，实例分割，语义分割等，不可避免会涉及到xml文件的修改，如果一两个文件的话，修改起来还算简答，但是实际情况中，远不止一个文件，且一个文件中也会包含多组属性。所以直接上代码，我习惯用pycharm编辑器来实现。importxml.dom.minidomforiinrange(0,100,5):path1="xxx"+str(i)+".xml"
鸿蒙开发实战之Image Kit重构美颜相机图像处理管线 harmonyos-next
一、核心能力突破通过ImageKit实现三大技术革新：硬件加速处理4K图像处理延迟降至16ms（NPU+GPU协同）支持10bitHDR管线（BT.2020色域）AI增强算法实时皮肤质感分析（98%毛孔保留率）智能背景重构（语义分割精度±1像素）跨平台一致性相同算法在麒麟/骁龙平台输出差异{updatePreview(result);});//超分辨率重建image.superResolution
【语义分割专栏】3：Segnet原理篇 fouen 语义分割人工智能计算机视觉深度学习神经网络 pytorch
文章目录前言背景介绍Segnet核心剖析池化索引(poolingIndices)其他细节编码器解码器的对称结构Segnet模型代码结语参考资料前言本篇文章收录于语义分割专栏，如果对语义分割领域感兴趣的，可以去看看专栏，会对经典的模型以及代码进行详细的讲解哦！其中会包含可复现的代码！带大家深入语义分割的领域，将从原理，代码深入讲解，希望大家能从中有所收获，其中很多内容都包含着自己的一些想法以及理解，
COCO-Stuff数据集：基于YOLOv5的多类别目标检测与分割实现 YOLO实战营深度学习YOLO实战项目 YOLO 目标检测人工智能无人机计算机视觉数据挖掘 ui
一、引言随着计算机视觉领域的不断发展，目标检测和语义分割已经成为深度学习中的两个关键任务。COCO-Stuff是一个包含多达172个类别的大型数据集，用于训练和评估计算机视觉算法。与传统的COCO数据集不同，COCO-Stuff不仅包含常见的物体检测任务，还增加了诸如“天空”、“树木”和“路面”等场景理解任务。通过对这个数据集的处理与利用，我们可以构建一个更加丰富的目标检测与语义分割系统。本博客将
深度学习之路——CNN卷积神经网络详解 DeepLinkDeepLink Ai 深度学习 cnn 人工智能
深度学习之路——CNN卷积神经网络详解前言卷积神经网络（ConvolutionalNeuralNetwork,CNN）作为深度学习领域的基础模型，推动了人工智能在图像、视频等方向的爆炸式发展。无论是图像分类、目标检测，还是语义分割、自动驾驶，CNN几乎无处不在。本文将带你系统了解CNN的基本原理、结构组成、常用网络、应用场景及简单代码实现。1.什么是CNN？CNN是一类专门处理类似网格结构数据（如
【仿生系统】qwen的仿生机器人解决方案 DFminer 机器人
收到完整需求后，我将从系统架构设计、算法实现路径、训练方法三个维度为您构建完整的仿生机器人解决方案。以下是分阶段实施方案：一、核心系统架构设计（模块化可进化架构）1.多模态感知引擎-视觉子系统：YOLOv8+SAM组合实现实时物体检测+场景语义分割-听觉子系统：Whisper+SpeakerEmbedding+情感识别三重处理-语言理解：基于LLaMA3的增量式语义解析（IncrementalPa
相机--RGB相机猿饵块数码相机
教程RGB--深度相机--激光雷达RGB相机原理：仅捕获红（R）、绿（G）、蓝（B）三通道的彩色图像，输出2D像素矩阵，无深度信息。核心作用：2D视觉任务：目标检测、图像分类、语义分割（如YOLO、ResNet）。优点：成本低：单传感器，硬件简单（如普通手机摄像头）。缺点：无深度信息：需额外算法或传感器获取3D数据。RGB相机和单目相机定义区别名称定义角度典型输出是否包含深度信息RGB相机数据格式
自动驾驶可行驶区域划分综述吃旺旺雪饼的小男孩自动驾驶自动驾驶人工智能机器学习
可行使区域划分1.数据采集与融合的深度解析1.1传感器类型与数据特性1.2多传感器融合方法2.环境感知与特征提取的细节2.1车道线检测技术2.2道路边界识别2.3障碍物检测与区域划分3.可行驶区域划分的实现3.1语义分割与几何建模3.2动态场景处理4.路径规划与决策的细节4.1局部路径规划4.2全局路径规划5.关键技术挑战的深入分析5.1复杂场景处理5.2实时性与计算优化5.3安全与冗余设计6.典
深度学习在建筑物提取中的应用综述一瞬祈望数据集深度学习人工智能
深度学习在建筑物提取中的应用综述目录深度学习在建筑物提取中的应用综述@[toc](目录)深度学习在建筑物提取中的应用综述一、建筑物提取简介二、深度学习方法分类1.语义分割（SemanticSegmentation）2.实例分割（InstanceSegmentation）3.边界感知分割（Boundary-awareSegmentation）4.多模态融合方法三、主流建筑物提取公开数据集及分析四、数
使用paddleX进行目标检测详解狸不凡机器学习深度学习神经网络
前言使用百度开源的paddleX工具，我们可以很容易快速训练出使用我们自己标注的数据的目标检测，图像分类，实例分割，语义分割的深度网络模型，本文，主要记录如何全流程使用pddleX来训练一个简单用于检测猫狗ppyolo_tiny模型。（一）数据准备这里的图片，我们直接在百度图片上搜索“猫狗”，随机下载10张图片，存到“JPEGImages文件夹”里。（二）使用labelme标注工具进行标注（1）l
高精地图与SLAM：依赖停车场高精地图提供结构信息，结合SLAM（同步定位与地图构建）技术实现实时定位与导航百态老人人工智能机器学习算法
基于现有资料，截至2025年3月1日，高精地图与SLAM技术在停车场场景中的结合应用主要体现在以下几个方面：1.SLAM的实时定位与增量地图构建SLAM技术通过激光雷达、摄像头、IMU等传感器实时采集环境特征（如停车场内的柱子、停车线、减速带等），并利用算法（如GraphSLAM、EKF、视觉语义分割）进行匹配定位，同时构建增量式地图。这种能力使得车辆即使初次进入未知停车场，也能在无GNSS信号的
PaddleX 使用案例非小号 AI scikit-learn pytorch 人工智能 python 机器学习
以下是PaddleX的典型使用案例，涵盖图像分类、目标检测和语义分割三大场景，展示其从数据准备到模型部署的全流程：案例1：图像分类-垃圾分类识别场景：识别可回收垃圾、有害垃圾、厨余垃圾和其他垃圾四类图片。步骤1：数据准备与标注#1.创建项目目录mkdirgarbage_classification&&cdgarbage_classification#2.下载示例数据集（约2000张图片，4分类）w
飞桨（PaddlePaddle）在机器学习全流程（数据采集、处理、标注、建模、分析、优化）非小号 AI paddlepaddle 机器学习人工智能
以下是飞桨（PaddlePaddle）在机器学习全流程（数据采集、处理、标注、建模、分析、优化）中常用的模型、函数及工具链，结合其生态特点分类说明：一、数据采集与标注1.数据采集工具PaddleX（图像/视频场景）功能：支持图像分类、目标检测、语义分割任务的数据标注，集成标注工具（如矩形框、多边形标注）。官网工具：PaddleX数据标注工具用法：通过图形化界面或命令行启动标注工具，输出标准VOC/
基于RGB与多光谱图像的农田语义分割技术研究及应用中达瑞和-高光谱·多光谱相机
随着智慧农业的发展，精准监测农田环境与作物生长状态成为关键需求。传统遥感技术受限于光谱分辨率与成像条件，难以满足精细化管理要求。本文以无人机搭载中达瑞和S810多光谱相机为技术载体，结合深度学习算法，提出单模态与多模态融合的农田语义分割方法。通过构建专用数据集与创新网络架构，显著提升了复杂场景下的分割精度与环境适应性，为精准农业提供了高效解决方案。一、研究背景与技术挑战农业生产的数字化监测依赖高精
动态神经网络(Dynamic NN)在边缘设备的算力分配策略：MoE架构实战分析学术猿之吻神经网络架构人工智能算法量子计算深度学习机器学习
一、边缘计算场景的算力困境在NVIDIAJetsonOrinNX（64TOPSINT8）平台上部署视频分析任务时，开发者面临三重挑战：动态负载波动视频流分辨率从480p到4K实时变化，帧率波动范围20-60FPS能效约束设备功耗需控制在15W以内（被动散热）多任务耦合典型场景需同步处理：目标检测（YOLOv8s）行为识别（SlowFast）语义分割（DeepLabv3）二、MoE架构的核心技术解析
助力移动机器人下游任务！Mobile-Seed：联合语义分割和边缘检测 3Ｄ视觉工坊 3D视觉从入门到精通计算机视觉
点击下方卡片，关注「3D视觉工坊」公众号选择星标，干货第一时间送达来源：3D视觉工坊添加小助理：dddvision，备注：语义分割，拉你入群。文末附行业细分群0.写在前面移动机器人经常需要定位语义目标和目标边缘，但大多数研究只集中在语义分割的部署上。今天笔者为大家推荐一篇开源工作，实现了语义分割和边缘检测的联合学习。下面一起来阅读一下这项工作~1.论文信息标题：Mobile-Seed:JointS
YOLO11改进-注意力-引入通道压缩的自注意力机制CRA 一勺汤 YOLOv11模型改进系列网络 YOLO YOLOv11 目标检测模块魔改 YOLOv11改进
在语义分割任务中存在MetaFormer架构应用局限于自注意力计算效率低的问题。为解决这些问题，提出提出CRA模块。CRA它通过将查询和键的通道维度缩减为一维，在考虑全局上下文提取的同时，显著降低了自注意力的计算成本，提高了网络的计算效率。本文将CRA与C2PSA相结合，在降低计算成本的同时提高精度。代码：https://github.com/tgf123/YOLOv8_improve/blob/
深度学习直接缝了别的模型，在论文中这种创新点应该如何描述呢？深度学习入门深度学习人工智能神经网络语音识别计算机视觉 transformer AI写作
作为散养硕士，我们希望能早早发小论文，然后去实习&考公&考编，虽然知道网上大家都说缝模块来水论文，那怎样才能优雅的缝出一篇中稿率更高的论文（即如何更好地讲故事）呢？简洁版：相似领域找灵感，边试边改勇投稿。1.怎么找模块？（1）缝一些常见模块（2）相似领域比如说，最新的顶刊顶会的通用骨干网络、可以作为你的骨干网络，相似领域的模块，可以作为你其中信息融合或者其他的模块。多模态的目标检测/语义分割/目标
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring