Hanawh

【论文阅读】End-to-End Object Detection with Transformers

损失函数
网络架构

Transformer

代码运行

创新：

引入transformer，提出DETR（DEtection TRansformer）模型
去除以往模型引入的先验知识（achor、nms）
大物体精度高—得益于transformer的non-local computations，但是对小物体精度低

损失函数

网络的输出为N个检测框(包含类别)，因为很难去度量检测结果的好坏，就提出了一个损失。首先将gt box也变成长度为N的序列以便和网络输出进行匹配，不够长度的用 $\emptyset$ 补充，然后对这个序列排列组合，找到和预测的N个序列损失最小的序列来优化。这样就可以得到一对一的关系，也不用后处理操作NMS。
损失函数具体实现？？

网络架构

结构分为3部分：

backbone：[3, H, W] 变为[2048, H/32, W/32]
encoder-decoder transformer
FFN（feed forward network）

Transformer

这部分建议不懂得可以先看李宏毅老师的transformer讲解（b站自行搜索）以及https://jalammar.github.io/illustrated-transformer/

看论文过程中不太懂
positional encoding如何表达??

代码运行

数据处理部分：(只对部分代码进行说明，基本的数据提取和变换不做介绍）
构建一个batch的时候，通过collate_fn传入到dataloader里面收集处理该batch，对图像的像素主要通过NestedTensor这个类进行处理，从而可以处理不同大小的图片，处理之后每个batch的大小可能不一样，batch里面的图像大小是一样的。

# DETR/util/misc.py
def collate_fn(batch):
    #print(len(batch)) #2
    #print(batch[0][0].shape) #和batch[0][1]中‘size’一样
    #print(batch[0][1]) #包含0图片的各种信息
    # batch 是多个img boxes label 等信息组成的列表 通过zip从里面各取出元素组成新的列表
    batch = list(zip(*batch)) 
    batch[0] = NestedTensor.from_tensor_list(batch[0]) # batch[0]就仅仅包含了reshape后的batch img
    return tuple(batch)

class NestedTensor(object):
    def __init__(self, tensors, mask):
        self.tensors = tensors
        self.mask = mask

    def to(self, *args, **kwargs):
        cast_tensor = self.tensors.to(*args, **kwargs)
        cast_mask = self.mask.to(*args, **kwargs) if self.mask is not None else None
        return type(self)(cast_tensor, cast_mask)

    def decompose(self):
        return self.tensors, self.mask

    @classmethod # 可以不需要实例化
    def from_tensor_list(cls, tensor_list): # 表示自身对象的self和自身类的cls参数
        # TODO make this more general
        if tensor_list[0].ndim == 3:
            # TODO make it support different-sized images
            max_size = tuple(max(s) for s in zip(*[img.shape for img in tensor_list]))
            #print(max_size) # 该batch里面长宽最大的值
            #print(len(tensor_list)) # 2
            # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))
            batch_shape = (len(tensor_list),) + max_size
            #print(batch_shape)
            b, c, h, w = batch_shape
            dtype = tensor_list[0].dtype
            device = tensor_list[0].device
            tensor = torch.zeros(batch_shape, dtype=dtype, device=device)
            mask = torch.ones((b, h, w), dtype=torch.bool, device=device)
            for img, pad_img, m in zip(tensor_list, tensor, mask):
                pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
                m[: img.shape[1], :img.shape[2]] = False
        else:
            raise ValueError('not supported')
        return cls(tensor, mask)

    def __repr__(self):
        return repr(self.tensors)

网络结构

backbone部分主要由resnet50和PosionEncoding两个层构成，将resnet50最后一层的输出再输入到PosionEncoding里面进行对位置的编码。其主要变换就是对特征图上的每一个像素的x和y坐标分别生成一个128维的向量，前半部分由sin函数得到，后半部分由cos函数得到。然后将x和y生成的encoding向量拼接起来。

class PositionEmbeddingSine(nn.Module):
    """
    This is a more standard version of the position embedding, very similar to the one
    used by the Attention is all you need paper, generalized to work on images.
    """
    def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
        super().__init__()
        self.num_pos_feats = num_pos_feats # embedding的一半
        self.temperature = temperature
        self.normalize = normalize
        if scale is not None and normalize is False:
            raise ValueError("normalize should be True if scale is passed")
        if scale is None:
            scale = 2 * math.pi
        self.scale = scale

    def forward(self, tensor_list):
        ipdb.set_trace()
        x = tensor_list.tensors
        mask = tensor_list.mask # mask中以左上角为起始点，resize后的原图大小区域为false，其他区域为1
        not_mask = ~mask
        y_embed = not_mask.cumsum(1, dtype=torch.float32) # 按列累加
        x_embed = not_mask.cumsum(2, dtype=torch.float32) # 按行累加
        if self.normalize:
            eps = 1e-6
            y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale # y_embed[:, -1:, :] 最大的数
            x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale

        dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device) 
        dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats) # 128

        pos_x = x_embed[:, :, :, None] / dim_t # torch.Size([2, 24, 36, 128])
        pos_y = y_embed[:, :, :, None] / dim_t
        pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)
        pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)
        pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
        return pos # torch.Size([2, 256, 24, 36])

Encoder

TransformerEncoder(
  (layers): ModuleList(
    (0): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
    (1): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
    (2): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
    (3): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
    (4): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
    (5): TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
  )
)

Decoder

TransformerDecoder(
  (layers): ModuleList(
    (0): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
    (1): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
    (2): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
    (3): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
    (4): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
    (5): TransformerDecoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (multihead_attn): MultiheadAttention(
        (out_proj): Linear(in_features=256, out_features=256, bias=True)
      )
      (linear1): Linear(in_features=256, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=256, bias=True)
      (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
      (dropout3): Dropout(p=0.1, inplace=False)
    )
  )
  (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)

detr

class DETR(nn.Module):
    """ This is the DETR module that performs object detection """
    def __init__(self, backbone, transformer, num_classes, num_queries, aux_loss=False):
        """ Initializes the model.
        Parameters:
            backbone: torch module of the backbone to be used. See backbone.py
            transformer: torch module of the transformer architecture. See transformer.py
            num_classes: number of object classes
            num_queries: number of object queries, ie detection slot. This is the maximal number of objects
                         DETR can detect in a single image. For COCO, we recommend 100 queries.
            aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
        """
        super().__init__()
        self.num_queries = num_queries # 100
        self.transformer = transformer
        hidden_dim = transformer.d_model
        self.class_embed = nn.Linear(hidden_dim, num_classes + 1)
        self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3) # 3个全连接层
        self.query_embed = nn.Embedding(num_queries, hidden_dim)
        self.input_proj = nn.Conv2d(backbone.num_channels, hidden_dim, kernel_size=1)
        self.backbone = backbone
        self.aux_loss = aux_loss

    def forward(self, samples: NestedTensor):
        """ The forward expects a NestedTensor, which consists of:
               - samples.tensors: batched images, of shape [batch_size x 3 x H x W]
               - samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels 

            It returns a dict with the following elements:
               - "pred_logits": the classification logits (including no-object) for all queries.
                                Shape= [batch_size x num_queries x (num_classes + 1)]
               - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                               (center_x, center_y, height, width). These values are normalized in [0, 1],
                               relative to the size of each individual image (disregarding possible padding).
                               See PostProcess for information on how to retrieve the unnormalized bounding box.
               - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                                dictionnaries containing the two above keys for each decoder layer.
        """
        if not isinstance(samples, NestedTensor):
            samples = NestedTensor.from_tensor_list(samples)
        features, pos = self.backbone(samples) # self.backbone长度为2 resnet和PositionEmbeddingSine()
        src, mask = features[-1].decompose()
        hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0] # torch.Size([6, 2, 100, 256])
        outputs_class = self.class_embed(hs) # torch.Size([6, 2, 100, 92])
        outputs_coord = self.bbox_embed(hs).sigmoid() # torch.Size([6, 2, 100, 4])
        out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]} # [2, 100, 92] [2, 100, 4]
        if self.aux_loss:
            out['aux_outputs'] = [{'pred_logits': a, 'pred_boxes': b}
                                  for a, b in zip(outputs_class[:-1], outputs_coord[:-1])] # 将前面的5个decoder的输出作为aux_loss
        return out

损失函数部分
首先是利用HungarianMatcher（models/matcher.py）进行匹配，这个部分主要计算了3个损失。对于分类损失cost_class，将相对应图片上的所有类别通道的预测值取出来([2, 100, num_class]变为[2, 100 ,num_obj])，然后再取负号。对于回归损失，直接计算所有2x100个序列的预测值和所有该batch中gt bbox的距离。最后再计算一个giou损失。将这三个损失加权相加得到最后的损失。然后利用scipy.optimize中的linear_sum_assignmen函数完成匈牙利算法（一个出发点只能对应一个终点），选出3个最佳序列。

然后就是计算真正的损失了，分类损失用NLL，回归损失用L1，cardinality error计算的是真正的gt数和预测出来的gt数的差距，不是真正意义上的损失，不进行梯度回传。

测试阶段

他将最终的100的预测结果都进评估。

《互联网时代教师自主成长的模式研究》论文阅读与思考2 宁超群
2.第二部分教师自主成长的模式建构，实质上是对新网师底层逻辑的描述。你认为，新网师的培训模式与传统常见的培训模式有哪些区别？这些区别有什么意义或价值？读完第二部分后，你对新网师有哪些新的认识或理解？你认为新网师目前哪些方面做得好，哪些方面做得还不够？答：我认为新网师的培训模式与传统常见的培训模式有以下区别：（1）培训对象的参与动机不同。新网师学员的参与是自觉自愿、积极主动，而传统培训更多是被迫参与
【定位系列论文阅读】-Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition（一）醉酒柴柴论文阅读学习笔记
这里写目录标题概述研究内容Abstract第一段（介绍本文算法大致结构与优点）1.Introduction介绍第一段（介绍视觉位置识别的重要性）第二段（VPR的两种常见方法，本文方法结合了两种方法）第三段（本文贡献）第四段（为证明本文方法优越性，进行的测试以及比较）2.RelatedWork相关工作第一段（介绍早期与深度学习的全局图像描述符）第二段（介绍局部关键点描述符）第三段（局部描述符可以进一
论文阅读笔记（十九）：YOLO9000: Better, Faster, Stronger __Sunshine__ 笔记 YOLO9000 detection classification
WeintroduceYOLO9000,astate-of-the-art,real-timeobjectdetectionsystemthatcandetectover9000objectcategories.FirstweproposevariousimprovementstotheYOLOdetectionmethod,bothnovelanddrawnfrompriorwork.Theim
论文阅读笔记: DINOv2: Learning Robust Visual Features without Supervision 小夏refresh 论文计算机视觉深度学习论文阅读笔记深度学习计算机视觉人工智能
DINOv2:LearningRobustVisualFeatureswithoutSupervision论文地址:https://arxiv.org/abs/2304.07193代码地址:https://github.com/facebookresearch/dinov2摘要大量数据上的预训练模型在NLP方面取得突破，为计算机视觉中的类似基础模型开辟了道路。这些模型可以通过生成通用视觉特征(即无
周四 2020-01-09 08:00 - 24:30 多云 02h10m 么得感情的日更机器
南昌。二〇二〇年一月九日基本科研[1]:1.论文阅读论文--二小时十分2.论文实现实验--小时3.数学SINS推导回顾--O分4.科研参考书【】1)的《》看0/0页-5.科研文档1)组织工作[1]:例会--英语能力[2]:1.听力--十分2.单词--五分3.口语--五分4.英语文档1)编程能力[2]:1.编程语言C语言--O分2.数据结构与算法C语言数据结构--O分3.编程参考书1)陈正冲的《C语
【论文阅读】Mamba:选择状态空间模型的线性时间序列建模（二） syugyou Mamba状态空间模型论文阅读
文章目录3.4一个简化的SSM结构3.5选择机制的性质3.5.1和门控机制的联系3.5.2选择机制的解释3.6额外的模型细节A讨论：选择机制C选择SSM的机制Mamba论文第一部分Mamba:选择状态空间模型的线性时间序列建模(一)3.4一个简化的SSM结构如同结构SSM，选择SSM是单独序列变换可以灵活地整合进神经网络。H3结构式最知名SSM结构地基础，其通常包括受线性注意力启发的和MLP交替地
SAFEFL: MPC-friendly Framework for Private and Robust Federated Learning论文阅读笔记慘綠青年627 论文阅读笔记深度学习
SAFEFL:MPC-friendlyFrameworkforPrivateandRobustFederatedLearning适用于私有和鲁棒联邦学习的MPC友好框架SAFEFL，这是一个利用安全多方计算(MPC)来评估联邦学习(FL)技术在防止隐私推断和中毒攻击方面的有效性和性能的框架。概述传统机器学习（ML）：集中收集数据->隐私保护问题privacy-preservingML(PPML)采
MixMAE(MixMIM):用于分层视觉变压器有效预训练的混合和掩码自编码器论文阅读皮卡丘ZPC 扩散模型阅读论文阅读
论文:MixMAE(arxiv.org)代码:Sense-X/MixMIM:MixMIM:MixedandMaskedImageModelingforEfficientVisualRepresentationLearning(github.com)摘要:本文提出MixMAE(MixedandmaskAutoEncoder)，这是一种简单而有效的预训练方法，适用于各种层次视觉变压器。现有的分层视觉变
【论文阅读】LLM4CP: Adapting Large Language Models for Channel Prediction（2024） Bosenya12 科研学习论文阅读语言模型人工智能信道预测时间序列
摘要Channelprediction（信道预测）isaneffectiveapproach（有效方法）forreducingthefeedback（减少反馈）orestimationoverhead（估计开销）inmassivemulti-inputmulti-output（大规模多输入输出）(m-MIMO)systems.However,existingchannelpredictionmet
【论文阅读】AugSteal: Advancing Model Steal With Data Augmentation in Active Learning Frameworks（2024） Bosenya12 科研学习模型窃取论文阅读模型窃取模型提取数据增强主动学习
摘要Withtheproliferationof（随着）machinelearningmodels（机器学习模型）indiverseapplications,theissueofmodelsecurity（模型的安全问题）hasincreasinglybecomeafocalpoint（日益成为人们关注的焦点）.Modelstealattacks（模型窃取攻击）cancausesignifican
Bert系列：论文阅读Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline 凝眸伏笔 nlp 论文阅读 bert reranker retrieval
一句话总结：提出LocalizedContrastiveEstimation(LCE)，来优化检索排序。摘要预训练的深度语言模型(LM)在文本检索中表现出色。基于丰富的上下文匹配信息，深度LM微调重新排序器从候选集合中找出更为关联的内容。同时，深度lm也可以用来提高搜索索引，构建更好的召回。当前的reranker方法并不能完全探索到检索结果的效果。因此，本文提出了LocalizedContrast
A Tutorial on Near-Field XL-MIMO Communications Towards 6G【论文阅读笔记】 Cc小跟班【论文阅读】相关论文阅读笔记
此系列是本人阅读论文过程中的简单笔记，比较随意且具有严重的偏向性（偏向自己研究方向和感兴趣的），随缘分享，共同进步~论文主要内容：建立XL-MIMO模型，考虑NUSW信道和非平稳性；基于近场信道模型，分析性能（SNRscalinglaws，波束聚焦、速率、DoF）XL-MIMO设计问题：信道估计、波束码本、波束训练、DAMXL-MIMO信道特性变化：UPW➡NUSW空间平稳–>空间非平稳（可视区域
论文阅读：scMGCA----模型方法 dundunmm 论文阅读论文阅读人工智能聚类生物聚类单细胞聚类单细胞分析
Yu,Z.,Su,Y.,Lu,Y.etal.Topologicalidentificationandinterpretationforsingle-cellgeneregulationelucidationacrossmultipleplatformsusingscMGCA.NatCommun14,400(2023).https://doi.org/10.1038/s41467-023-36134
论文阅读：scHybridBERT dundunmm 论文阅读机器学习人工智能神经网络深度学习单细胞基因测序
ZhangWei,WuChenjun,XingFeiyang,JiangMingfeng,ZhangYixuan,LiuQi,ShiZhuoxing,DaiQi,scHybridBERT:integratinggeneregulationandcellgraphforspatiotemporaldynamicsinsingle-cellclustering,BriefingsinBioinform
【论文阅读】Purloining Deep Learning Models Developed for an Ultrasound Scanner to a Competitor Machine Bosenya12 科研学习模型窃取论文阅读深度学习人工智能模型安全
TheArtoftheSteal:PurloiningDeepLearningModelsDevelopedforanUltrasoundScannertoaCompetitorMachine（2024）摘要Atransferfunctionapproach（传递函数方法）hasrecentlyproveneffectiveforcalibratingdeeplearning(DL)algorit
《Motion Forecasting with Dual Consistency and Multi-Pseudo-Target Supervision》论文阅读之DCMS 山水之间2018 无人驾驶 Paper Reading 大数据轨迹预测自动驾驶人工智能
目录摘要1简介2相关工作3.方法3.1结构3.2双重一致性约束3.3多伪目标监督3.4学习4实验4.1实验装置4.2实验结果4.3消融研究4.4泛化能力5限制6结论DCMS：具有双重一致性和多伪目标监督的运动预测香港科技大学暂无代码。摘要我们提出了一种具有双重一致性约束和多伪目标监督的运动预测新框架。运动预测任务通过结合过去的空间和时间信息来预测车辆的未来轨迹。DCMS的一个关键设计是提出双重一致
时序预测相关论文阅读笔记能力越小责任越小YA 论文阅读笔记时序预测 Transformer
笔记链接：【有道云笔记】读论文（记录）https://note.youdao.com/s/52ugLbot用于个人学习记录。
【论文阅读|cryoET】本周粗读汇总吃吃今天努力学习了吗冷冻电镜三维重建论文阅读
论文1：CryoDRGN-ET：深度重建生成网络以可视化细胞内动态生物分子Abstract虽然冷冻电子断层扫描可以以分子分辨率揭示结构，但图像处理算法仍然是解决原位生物分子结构异质性的瓶颈。本文介绍CryoDRGN-ET用于cryoET断层图的异质重建。CryoDRGN-ET直接从子断层扫描倾斜系列图像中学习三维密度图的深度生成模型，并且可以捕获成分和构象不同的状态。通过原位恢复肺炎支原体核糖体中
Your Diffusion Model is Secretly a Zero-Shot Classifier论文阅读笔记 Rising_Flashlight 论文阅读笔记计算机视觉
YourDiffusionModelisSecretlyaZero-ShotClassifier论文阅读笔记这篇文章我感觉在智源大会上听到无数个大佬讨论，包括OpenAISora团队负责人，谢赛宁，好像还有杨植麟。虽然这个文章好像似乎被引量不是特别高，但是和AI甚至人类理解很本质的问题很相关，即是不是要通过生成来构建理解的问题，文章的做法也很巧妙，感觉是一些学者灵机一动的产物，好好学习一个！摘要这
【论文阅读】QUEEN: Query Unlearning against Model Extraction（2024） Bosenya12 科研学习模型窃取论文阅读提取攻击模型安全
摘要Modelextractionattacks（模型提取攻击）currentlyposeanon-negligiblethreat（不可忽视的威胁）tothesecurity（安全性）andprivacy（隐私性）ofdeeplearningmodels.Byqueryingthemodelwithasmalldataset（通过小数据集查询模型）andusingthequeryresultsa
【论文阅读33】Deep learning optoacoustic tomography with sparse data 弹伦琴的雷登【论文阅读系列】人工智能深度学习论文阅读图像处理
Deeplearningoptoacoustictomographywithsparsedata论文题目：基于稀疏数据的深度学习光声断层扫描论文链接：Deeplearningoptoacoustictomographywithsparsedata|NatureMachineIntelligence代码链接：GitHub-ndavoudi/sparse_artefact_unet数据链接：Data发
论文阅读瞎记(四) Cascade R-CNN: Delving into High Quality Object Detection 2017 码大哥深度学习人工智能
概述在物体检测中1，IOU阈值被用于判定正负样本。在低IOU阈值比如0.5的状态下训练模型经常产生噪音预测，然而检测效果会随着IOU增加而降低。两个主要因素：1.训练时的过拟合，正样本指数消失2.检测器最优IOU与输入假设的不匹配。一个单阶段的物体检测器CascadeR-CNN被提出用于解决这些问题。网络由一个检测序列组成，这些序列训练时会伴随IOU增长从而对FP样本更加有选择性地判别。检测器一个
【论文阅读】LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation 进击的乔洋论文阅读语言模型人工智能计算机视觉
【论文阅读】LLM4SGG:LargeLanguageModelsforWeaklySupervisedSceneGraphGenerationabstract由于全监督方法严重依赖昂贵标注，最近弱监督场景图生成(WSSGG)研究替代方案出现。在这一点上（Inthisregard），针对WSSGG的研究主要利用图像标题（imagecaption）来获取非局部三元组，而主要关注将非局部三元组建立在图
Code Llama: Open Foundation Models for Code论文阅读 yang_daxia 大模型 llama codellama
整体介绍CodeLlama发布了3款模型，包括基础模型、Python专有模型和指令跟随模型，参数量分别为7B、13B、34B和70B。这些模型在长达16ktokens的序列上训练。都是基于Llama2。作者针对infilling(FIM)、长上下文、指令专门做了微调long-contextfine-tuning(LCFT).codellama细节CodeLlama模型家族初始化:所有CodeLla
【论文阅读】Model Stealing Attacks Against Inductive Graph Neural Networks（2021） Bosenya12 科研学习模型窃取论文阅读图神经网络模型窃取
摘要Manyreal-worlddata（真实世界的数据）comeintheformofgraphs（以图片的形式）.Graphneuralnetworks(GNNs图神经网络),anewfamilyofmachinelearning(ML)models,havebeenproposedtofullyleveragegraphdata（充分利用图数据）tobuildpowerfulapplicat
VIT论文阅读： A Image is Worth 16x16 Words Undefined游侠论文阅读
简介在2024年，大家都知道了transformer的故事，但是在4年前,CNN和Transformer谁才是CV的未来，还没有那么确定。在简介部分，作者提到了一个令人失望的事实，在基于imagenet的实验中发现，transformer的表现差于同尺寸的ResNet。作者把原因归结到biastranslationequivarianceandlocality，这些CNN具有，但是transfor
【论文阅读】GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation Bosenya12 模型窃取科研学习论文阅读知识蒸馏成员推理攻击黑盒
摘要While（虽然）DeepNeuralNetworks(DNNs)havedemonstratedremarkableperformanceintasksrelatedtoperception（感知）andcontrol（控制）,therearestillseveralunresolvedconcerns（未解决的问题）regardingtheprivacyoftheirtrainingdat
【论文阅读】APMSA: Adversarial Perturbation Against Model Stealing Attacks（2023） Bosenya12 科研学习模型窃取论文阅读模型窃取防御对抗性扰动
摘要TrainingaDeepLearning(DL)model（训练深度学习模型）requiresproprietarydata（专有数据）andcomputing-intensiveresources（计算密集型资源）.Torecouptheirtrainingcosts（收回训练成本）,amodelprovidercanmonetizeDLmodelsthroughMachineLearni
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport论文阅读笔记猪猪想上树论文阅读笔记
ConditionalFlowMatching:Simulation-FreeDynamicOptimalTransport笔记发现问题连续正规化流（CNF）是一种有吸引力的生成式建模技术，但在基于模拟的最大似然训练中受到了限制。解决问题介绍一种新的条件流匹配（CFM)，一种针对CNFs的免模拟训练目标。具有稳定的回归目标，用于扩散模型中的随机流，但享有确定性流模型的有效推断。与扩散模型和CNF目
《论文阅读》EmpDG：多分辨率交互式移情对话生成 COLING 2020 365JHWZGo 情感对话论文阅读共情回复回复生成对话系统多分辨率对抗学习
《论文阅读》EmpDG：多分辨率交互式移情对话生成COLING2020前言简介模型架构共情生成器交互鉴别器损失函数前言亲身阅读感受分享，细节画图解释，再也不用担心看不懂论文啦~无抄袭，无复制，纯手工敲击键盘~今天为大家带来的是《EmpDG:Multi-resolutionInteractiveEmpatheticDialogueGeneration》出版：COLING时间：2020类型：共情回复关
jvm调优总结（从基本概念到深度优化） oloz java jvm jdk 虚拟机应用服务器
JVM参数详解：http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html Java虚拟机中，数据类型可以分为两类：基本类型和引用类型。基本类型的变量保存原始值，即：他代表的值就是数值本身；而引用类型的变量保存引用值。“引用值”代表了某个对象的引用，而不是对象本身，对象本身存放在这个引用值所表示的地址的位置。
【Scala十六】Scala核心十：柯里化函数 bit1129 scala
本篇文章重点说明什么是函数柯里化，这个语法现象的背后动机是什么，有什么样的应用场景，以及与部分应用函数(Partial Applied Function)之间的联系 1. 什么是柯里化函数 A way to write functions with multiple parameter lists. For instance def f(x: Int)(y: Int) is a
HashMap dalan_123 java
HashMap在java中对很多人来说都是熟的；基于hash表的map接口的非同步实现。允许使用null和null键；同时不能保证元素的顺序；也就是从来都不保证其中的元素的顺序恒久不变。 1、数据结构在java中，最基本的数据结构无外乎：数组和引用（指针），所有的数据结构都可以用这两个来构造，HashMap也不例外，归根到底HashMap就是一个链表散列的数据
Java Swing如何实时刷新JTextArea，以显示刚才加append的内容周凡杨 java 更新 swing JTextArea
在代码中执行完textArea.append("message")后，如果你想让这个更新立刻显示在界面上而不是等swing的主线程返回后刷新，我们一般会在该语句后调用textArea.invalidate()和textArea.repaint()。问题是这个方法并不能有任何效果，textArea的内容没有任何变化，这或许是swing的一个bug，有一个笨拙的办法可以实现
servlet或struts的Action处理ajax请求 g21121 servlet
其实处理ajax的请求非常简单，直接看代码就行了： //如果用的是struts //HttpServletResponse response = ServletActionContext.getResponse(); // 设置输出为文字流 response.setContentType("text/plain"); // 设置字符集 res
FineReport的公式编辑框的语法简介老A不折腾 finereport 公式总结
FINEREPORT用到公式的地方非常多，单元格（以=开头的便被解析为公式），条件显示，数据字典，报表填报属性值定义，图表标题，轴定义，页眉页脚，甚至单元格的其他属性中的鼠标悬浮提示内容都可以写公式。简单的说下自己感觉的公式要注意的几个地方： 1.if语句语法刚接触感觉比较奇怪，if(条件式子,值1,值2)，if可以嵌套，if(条件式子1，值1，if(条件式子2，值2，值3)
linux mysql 数据库乱码的解决办法墙头上一根草 linux mysql 数据库乱码
linux 上mysql数据库区分大小写的配置 lower_case_table_names=1 1-不区分大小写 0-区分大小写修改/etc/my.cnf 具体的修改内容如下: [client] default-character-set=utf8 [mysqld] datadir=/var/lib/mysql socket=/va
我的spring学习笔记6-ApplicationContext实例化的参数兼容思想 aijuans Spring 3
ApplicationContext能读取多个Bean定义文件，方法是： ApplicationContext appContext = new ClassPathXmlApplicationContext（ new String[]｛“bean-config1.xml”，“bean-config2.xml”，“bean-config3.xml”，“bean-config4.xml
mysql 基准测试之sysbench annan211 基准测试 mysql基准测试 MySQL测试 sysbench
1 执行如下命令，安装sysbench-0.5： tar xzvf sysbench-0.5.tar.gz cd sysbench-0.5 chmod +x autogen.sh ./autogen.sh ./configure --with-mysql --with-mysql-includes=/usr/local/mysql
sql的复杂查询使用案列与技巧百合不是茶 oracle sql 函数数据分页合并查询
本片博客使用的数据库表是oracle中的scott用户表; ------------------- 自然连接查询查询 smith 的上司(两种方法) &
深入学习Thread类 bijian1013 java thread 多线程 java多线程
一．线程的名字下面来看一下Thread类的name属性，它的类型是String。它其实就是线程的名字。在Thread类中，有String getName()和void setName(String)两个方法用来设置和获取这个属性的值。同时，Thr
JSON串转换成Map以及如何转换到对应的数据类型 bijian1013 java fastjson net.sf.json
在实际开发中，难免会碰到JSON串转换成Map的情况，下面来看看这方面的实例。另外，由于fastjson只支持JDK1.5及以上版本，因此在JDK1.4的项目中可以采用net.sf.json来处理。一.fastjson实例 JsonUtil.java package com.study; impor
【RPC框架HttpInvoker一】HttpInvoker：Spring自带RPC框架 bit1129 spring
HttpInvoker是Spring原生的RPC调用框架，HttpInvoker同Burlap和Hessian一样，提供了一致的服务Exporter以及客户端的服务代理工厂Bean，这篇文章主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中
【Mahout二】基于Mahout CBayes算法的20newsgroup的脚本分析 bit1129 Mahout
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information re
nginx三种获取用户真实ip的方法 ronin47
随着nginx的迅速崛起，越来越多公司将apache更换成nginx. 同时也越来越多人使用nginx作为负载均衡, 并且代理前面可能还加上了CDN加速，但是随之也遇到一个问题：nginx如何获取用户的真实IP地址,如果后端是apache,请跳转到<apache获取用户真实IP地址>，如果是后端真实服务器是nginx，那么继续往下看。实例环境：用户IP 120.22.11.11
java-判断二叉树是不是平衡 bylijinnan java
参考了 http://zhedahht.blog.163.com/blog/static/25411174201142733927831/ 但是用java来实现有一个问题。由于Java无法像C那样“传递参数的地址，函数返回时能得到参数的值”，唯有新建一个辅助类：AuxClass import ljn.help.*; public class BalancedBTree {
BeanUtils.copyProperties VS PropertyUtils.copyProperties 诸葛不亮 PropertyUtils BeanUtils
BeanUtils.copyProperties VS PropertyUtils.copyProperties 作为两个bean属性copy的工具类，他们被广泛使用，同时也很容易误用，给人造成困然；比如：昨天发现同事在使用BeanUtils.copyProperties copy有integer类型属性的bean时，没有考虑到会将null转换为0，而后面的业
[金融与信息安全]最简单的数据结构最安全 comsci 数据结构
现在最流行的数据库的数据存储文件都具有复杂的文件头格式，用操作系统的记事本软件是无法正常浏览的，这样的情况会有什么问题呢？从信息安全的角度来看，如果我们数据库系统仅仅把这种格式的数据文件做异地备份，如果相同版本的所有数据库管理系统都同时被攻击，那么
vi区段删除 Cwind linux vi 区段删除
区段删除是编辑和分析一些冗长的配置文件或日志文件时比较常用的操作。简记下vi区段删除要点备忘。 vi概述引文中并未将末行模式单独列为一种模式。单不单列并不重要，能区分命令模式与末行模式即可。 vi区段删除步骤： 1. 在末行模式下使用:set nu显示行号非必须，随光标移动vi右下角也会显示行号，能够正确找到并记录删除开始行
清除tomcat缓存的方法总结 dashuaifu tomcat 缓存
用tomcat容器，大家可能会发现这样的问题，修改jsp文件后，但用IE打开依然是以前的Jsp的页面。出现这种现象的原因主要是tomcat缓存的原因。解决办法如下: 在jsp文件头加上 <meta http-equiv="Expires" content="0"> <meta http-equiv="kiben&qu
不要盲目的在项目中使用LESS CSS dcj3sjt126com Web less
　如果你还不知道LESS CSS是什么东西，可以看一下这篇文章，是我一朋友写给新人看的《CSS——LESS》　　不可否认，LESS CSS是个强大的工具，它弥补了css没有变量、无法运算等一些“先天缺陷”，但它似乎给我一种错觉，就是为了功能而实现功能。　　比如它的引用功能 ? .rounded_corners{
[入门]更上一层楼 dcj3sjt126com PHP yii2
更上一层楼通篇阅读完整个“入门”部分，你就完成了一个完整 Yii 应用的创建。在此过程中你学到了如何实现一些常用功能，例如通过 HTML 表单从用户那获取数据，从数据库中获取数据并以分页形式显示。你还学到了如何通过 Gii 去自动生成代码。使用 Gii 生成代码把 Web 开发中多数繁杂的过程转化为仅仅填写几个表单就行。本章将介绍一些有助于更好使用 Yii 的资源：
Apache HttpClient使用详解 eksliang httpclient http协议
Http协议的重要性相信不用我多说了，HttpClient相比传统JDK自带的URLConnection，增加了易用性和灵活性（具体区别，日后我们再讨论），它不仅是客户端发送Http请求变得容易，而且也方便了开发人员测试接口（基于Http协议的），即提高了开发的效率，也方便提高代码的健壮性。因此熟练掌握HttpClient是很重要的必修内容，掌握HttpClient后，相信对于Http协议的了解会
zxing二维码扫描功能 gundumw100 android zxing
经常要用到二维码扫描功能现给出示例代码 import com.google.zxing.WriterException; import com.zxing.activity.CaptureActivity; import com.zxing.encoding.EncodingHandler; import android.app.Activity; import an
纯HTML+CSS带说明的黄色导航菜单 ini html Web html5 css hovertree
HoverTree带说明的CSS菜单:纯HTML+CSS结构链接带说明的黄色导航在线体验效果：http://hovertree.com/texiao/css/1.htm代码如下,保存到HTML文件可以看到效果： <!DOCTYPE html > <html > <head> <title>HoverTree
fastjson初始化对性能的影响 kane_xie fastjson 序列化
之前在项目中序列化是用thrift，性能一般，而且需要用编译器生成新的类，在序列化和反序列化的时候感觉很繁琐，因此想转到json阵营。对比了jackson，gson等框架之后，决定用fastjson，为什么呢，因为看名字感觉很快。。。网上的说法： fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器，来自阿里巴巴的工程师开发。
基于Mybatis封装的增删改查实现通用自动化sql mengqingyu DAO
1.基于map或javaBean的增删改查可实现不写dao接口和实现类以及xml，有效的提高开发速度。 2.支持自定义注解包括主键生成、列重复验证、列名、表名等 3.支持批量插入、批量更新、批量删除 <bean id="dynamicSqlSessionTemplate" class="com.mqy.mybatis.support.Dynamic
js控制input输入框的方法封装(数字，中文，字母，浮点数等) qifeifei javascript js
在项目开发的时候，经常有一些输入框，控制输入的格式，而不是等输入好了再去检查格式，格式错了就报错，体验不好。 /** 数字，中文，字母,浮点数(+/-/.) 类型输入限制，只要在input标签上加上 jInput="number,chinese,alphabet,floating" 备注：floating属性只能单独用*/ funct
java 计时器应用 tangqi609567707 java timer
mport java.util.TimerTask; import java.util.Calendar; public class MyTask extends TimerTask { private static final int
erlang输出调用栈信息 wudixiaotie erlang
在erlang otp的开发中，如果调用第三方的应用，会有有些错误会不打印栈信息，因为有可能第三方应用会catch然后输出自己的错误信息，所以对排查bug有很大的阻碍，这样就要求我们自己打印调用的栈信息。用这个函数：erlang:process_display (self (), backtrace).需要注意这个函数只会输出到标准错误输出。也可以用这个函数：erlang:get_s

【论文阅读】End-to-End Object Detection with Transformers