Phoenixtree_DongZhao

MyDLNote - Inpainting: Image Inpainting with Learnable Bidirectional Attention Maps

Image Inpainting with Learnable Bidirectional Attention Maps

我的博客尽可能提取文章内的主要传达的信息，既不是完全翻译，也不简单粗略。论文的motivation和网络设计细节，将是我写这些博客关注的重点。

文章：

http://openaccess.thecvf.com/content_ICCV_2019/papers/Xie_Image_Inpainting_With_Learnable_Bidirectional_Attention_Maps_ICCV_2019_paper.pdf

本文是针对部分卷积 Image Inpainting for Irregular Holes Using Partial Convolutions 存在的两个问题而提出的模型。部分卷积的这两个问题是： re-normalization 不是自动学习的，mask-updating 只考虑前向传播。本文提出了一种学习特征 re-normalization 和mask-updating 的可学习注意图模块，该模块能够有效地适应不规则孔洞和卷积层的扩展。阅读本博客，可参看我部分卷积的博客。

【相关博客】 MyDLNote - Network: [18ECCV] Image Inpainting for Irregular Holes Using Partial Convolutions

Image Inpainting with Learnable Bidirectional Attention Maps

Abstract

Introduction

Proposed Method

Revisiting Partial Convolution

Learnable Attention Maps

Learnable Bidirectional Attention Maps

Model Architecture

Loss Functions

Abstract

Most convolutional network (CNN)-based inpainting methods adopt standard convolution to indistinguishably treat valid pixels and holes, making them limited in handling irregular holes and more likely to generate inpainting results with color discrepancy and blurriness. Partial convolution has been suggested to address this issue, but it adopts handcrafted feature renormalization, and only considers forward mask-updating. In this paper, we present a learnable attention map module for learning feature re-normalization and mask-updating in an end-to-end manner, which is effective in adapting to irregular holes and propagation of convolution layers. Furthermore, learnable reverse attention maps are introduced to allow the decoder of U-Net to concentrate on filling in irregular holes instead of reconstructing both holes and known regions, resulting in our learnable bidirectional attention maps. Qualitative and quantitative experiments show that our method performs favorably against state-of-the-arts in generating sharper, more coherent and visually plausible inpainting results. The source code and pre-trained models will be available at: https://github.com/Vious/LBAM_inpainting/.

基于CNN的 inpainting 算法不能较好地处理非规则空洞修复，会产生颜色差异和模糊问题。

partial conv. 能够较好地解决非规则空洞修复问题，但它采用人为设定的特征重正则化 re-normalization，只考虑 mask 的正向更新。

[MyNote: 这里的 re-normalization，指的是原文中 sum(1)/sum(M) 比例因子，用来调整适当的比例对有效 (unmasked) 输入的变化量，请参考原文公式（1）。]

本文提出了一种可学习的 attention map 模型，以端到端的方式学习特征 re-normalization 和 mask 更新。

引入可学习的反向注意图，使 U-Net 的解码器专注于填充不规则的空洞，而不是同时重建空洞和已知区域，从而得到可学习的双向注意图。

Introduction

There may exist multiple potential solutions for the given holes in an image, i.e., the holes can be filled with any plausible hypotheses coherent with the surrounding known regions. And the holes can be of complex and irregular patterns, further increasing the difficulty of image inpainting.

问题：这些空洞可以用任何与周围已知区域相一致的合理假设来填补。空洞的形状复杂、不规则，进一步增加了绘画的难度。

PatchMatch, gradually fill in holes by searching and copying similar patches from known regions. Albeit exemplar-based methods are effective in hallucinating detailed textures, they are still limited in capturing high-level semantics, and may fail to generate complex and non-repetitive structures.

尽管基于范例的方法在产生细节纹理方面是有效的，但是它们在捕获高级语义方面仍然是有限的，并且可能不能生成复杂的和非重复的结构。

Benefited from the powerful representation ability and large scale training, CNN-based methods are effective in hallucinating semantically plausible result. And adversarial loss [8] has also been deployed to improve the perceptual quality and naturalness of the result. Nonetheless, most existing CNN-based methods usually adopt standard convolution which indistinguishably treats valid pixels and holes. Thus, they are limited in handling irregular holes and more likely to generate inpainting results with color discrepancy and blurriness. As a remedy, several postprocessing techniques [10, 34] have been introduced but are still inadequate in resolving the artifacts.

基于 CNN 的方法得益于强大的表达能力和大规模的训练，能够有效地实现语义上的似是而非的结果。对抗性损失也被用来改善感知质量和结果的自然性。然而，多数现有的CNN方法无法区分有效的像素和空洞。因此，它们在处理不规则孔洞时受到限制，更容易产生色彩不一致和模糊的效果。作为补救措施，引入了几种后处理技术 [10,34]，但在解决 artifacts 方面仍然不够。

For better handling irregular holes and suppressing color discrepancy and blurriness, partial convolution (PConv) [17] has been suggested. In each PConv layer, mask convolution is used to make the output conditioned only on the unmasked input, and feature re-normalization is introduced for scaling the convolution output. A mask-updating rule is further presented to update a mask for the next layer, making PConv very effective in handling irregular holes. Nonetheless, PConv adopts hard 0-1 mask and handcrafted feature re-normalization by absolutely trusting all filling-in intermediate features. Moreover, PConv considers only forward mask-updating and simply employs allone mask for decoder features.

部分卷积 (PConv) 可以更好地处理不规则孔洞和抑制颜色差异和模糊。在每个 PConv 层中，使用 mask 卷积使输出仅以unmasked 的输入为条件，并引入特征重归一化来缩放卷积输出。进一步提出了一个 mask-updating 规则来更新下一层的掩码，使得 PConv 在处理不规则孔洞时非常有效。尽管如此，PConv 采用了硬 0-1mask（即在decoder中，使用的是全为 1 的 mask，即对全部图像进行修复）和手工制作的特征重新标准化（原文中的sum(1)/sum(M)），完全信任所有填充的中间特征。

In this paper, we take a step forward and present the modules of learnable bidirectional attention maps for the re-normalization of features on both encoder and decoder of the U-Net architecture. To begin with, we revisit PConv without bias, and show that the mask convolution can be safely avoided and the feature re-normalization can be interpreted as a re-normalization guided by hard 0-1 mask. To overcome the limitations of hard 0-1 mask and handcrafted mask-updating, we present a learnable attention map module for learning feature re-normalization and mask-updating. Benefited from the end-to-end training, the learnable attention map is effective in adapting to irregular holes and propagation of convolution layers.

在本文中，提出了可学习的双向注意映射模块，用于重新标准化的特征在 U-Net 的编码器和解码器。首先，我们对 PConv 进行了研究，证明了 mask 卷积是可以安全避免的，特征的 re-normalization 可以解释为在 hard 0-1 mask 指导下的 re-normalization。

为了克服 hard 0-1 mask 和人为 mask-updating 的局限性，我们提出了一种可学习的注意力地图模块，用于学习特征的重新规格化和 mask-updating。得益于端到端的训练，可学习注意图能够有效地适应不规则孔洞和卷积层的传播。

Furthermore, PConv simply uses all-one mask on the decoder features, making the decoder should hallucinate both holes and known regions. Note that the encoder features of known region will be concatenated, it is natural that the decoder is only required to focus on the inpainting of holes. Therefore, we further introduce learnable reverse attention maps to allow the decoder of U-Net concentrate only on filling in holes, resulting in our learnable bidirectional attention maps. In contrast to PConv, the deployment of learnable bidirectional attention maps empirically is beneficial to network training, making it feasible to include adversarial loss for improving visual quality of the result.

此外，PConv 简单地在解码器特性上使用 all-one mask，使解码器产生空穴和已知区域的幻觉。注意到 U-Net 的特点，已知区域已经通过 skip-connection 从 encoder 传到了 decoder，所以自然地，decoder 只需要学习空洞区域就可以了。

因此，我们进一步引入可学习的反向注意图，使 U-Net 的解码器只专注于填补漏洞，从而得到可学习的双向注意图。

与 PConv 相比，可学习的双向注意力地图的经验部署有利于网络训练，使得包含对抗损失以提高结果的视觉质量成为可能。

Proposed Method

Revisiting Partial Convolution

PConv 如图 2（a）所示。

A PConv layer generally involves three steps, i.e., (i) mask convolution, (ii) feature re-normalization, and (iii) mask-updating. Denote by $F^{in}$ the input feature map and M the corresponding hard 0-1 mask. We further let be the convolution filter and be its bias. To begin with, we introduce the convolved mask $M^c = M \otimes k_{1/9}$ , where ⊗ denotes the convolution operator, $k_{1/9}$ denotes a 3 × 3 convolution filter with each element 1/9 . The process of PConv can be formulated as,

where denotes the attention map, and denotes the updated mask. We further define the activation functions for attention map and updated mask as,

From Eqns. (1)∼(5) and Fig. 2(a), PConv can also be explained as a special interplay model between mask and convolution feature map.

$k_{1/9}$ 就是 3x3 的卷积核，其每个元素都是 1/9。摘要和引言说的 re-normalization 是固定的就是指这个 $k_{1/9}$ 是固定的。

其实就是原文中的比例因子 sum(1)/sum()，用来调整适当的比例对有效 (unmasked) 输入的变化量。

就是 mask 的更新。

PConv 也可以解释为 mask 与卷积特征图之间特殊的交互作用模型。

However, PConv adopts the hand-crafted convolution filter $k_{1/9}$ as well as hand-crafted activation functions and , thereby giving some leeway for further improvements.

Moreover, the nondifferential property of also increases the difficulty of end-to-end learning. To our best knowledge, it remains a difficult issue to incorporate adversarial loss to train a U-Net with PConv.

Furthermore, PConv only considers the mask and its updating for encoder features. As for decoder features, it simply adopts all-one mask, making PConv limited in filling holes.

本文认为，对抗损失函数不能很好的应用于 PConv 构成的 U-Net 网络里。（为什么？因为 mask 更新是不可微分的吗？为什么mask 更新不可微分就会影响对抗损失函数的效果呢？）

PConv 只考虑 encoder 的 mask 更新，而 decoder 则使用全部是 1 的 mask，这限制了修复能力。（已知区域已经通过 skip-connection 从 encoder 传到了 decoder，所以自然地，decoder 只需要学习空洞区域就可以了。）

Figure 2. Interplay models between mask and intermediate feature for PConv and our learnable bidirectional attention maps. Here, the white holes in $M^{in}$ denotes missing region with value 0, and the black area denotes the known region with value 1.

Learnable Attention Maps

The convolution layer without bias has been widely adopted in U-Net for image-to-image translation and image inpainting. When the bias is removed, it can be readily seen from Eqn. (2) that the convolution features in updated holes are zeros. Thus, the mask convolution in Eqn. (1) is equivalently rewritten as standard convolution,

Then, the feature re-normalization in Eqn. (2) can be interpreted as the element-wise product of convolution feature and attention map,

在图到图和图像修复都采用卷积层中不使用偏差 b。公式（2）可以理解为卷积的特征图与注意力图之间的元素相乘操作。

Even though, the handcrafted convolution filter $k_{1/9}$ is fixed and not adapted to the mask. The activation function for updated mask absolutely trusts the inpainting result in the region , but it is more sensible to assign higher confidence to the region with higher .

但是卷积核 $k_{1/9}$ 是固定的，并没有与 mask 相适应。更新 mask 的激活函数绝对信任区域内的 inpainting 结果，但更明智的做法是将更高的置信度赋给具有更高的区域。[也就是说，PConv 只考虑更新 mask 中大于0的部分，而本文认为更好的做法是更新的 mask 最好是将获得较高的值的区域作为新的 mask。]

To overcome the above limitations, we suggest learnable attention map which generalizes PConv without bias from three aspects.

First, to make the mask adaptive to irregular holes and propagation along with layers, we substitute $k_{1/9}$ with layer-wise and learnable convolution filters .

Second, instead of hard 0-1 mask-updating, we modify the activation function for updated mask as,

where $\alpha \geqslant 0$ is a hyperparameter and we set $\alpha = 0.8$ . One can see that degenerates into when $\alpha = 0$ .

Third, we introduce an asymmetric Gaussian-shaped form as the activation function for attention map,

where $\alpha$ , $\mu$ , , and $\gamma_r$ are the learnable parameters, we initialize them as $\alpha$ = 1.1, $\mu$ = 2.0, = 1.0, $\gamma_r$ = 1.0 and learn them in an end-to-end manner.

To sum up, the learnable attention map adopt Eqn. (6) in Step (i), and the next two steps are formulated as,

Fig. 2(b) illustrates the interplay model of learnable attention map. In contrast to PConv, our learnable attention map is more flexible and can be end-to-end trained, making it effective in adapting to irregular holes and propagation of convolution layers.

针对 $k_{1/9}$ 是固定的，及 mask 更新只考虑 >0 的情况（0-1 分段函数），提出三个方面的改进：

首先，为了使 mask 能够适应不规则的孔洞并随层传播，我们使用了分层的、可学习的卷积滤波器。

其次，我们将更新后的 mask 的激活函数修改为公式（8）。【具体地说，就是将分段 0-1 函数先改成 ReLU，然后再做个伽马变换。】

第三，我们引入了一个不对称的高斯形态作为注意力图的激活函数公式（9），其中 $\alpha$ , $\mu$ , , 和 $\gamma_r$ 是可学习参数，我们将它们初始化为 $\alpha$ = 1.1、 $\mu$ = 2.0、 = 1.0、 $\gamma_r$ = 1.0，并以端到端方式学习它们。

图 2(b) 为可学习注意图的交互作用模型。与 PConv 相比，我们的可学习注意图更加灵活，可以端到端的训练，使得它能够有效地适应不规则的孔和卷积层的传播。

Learnable Bidirectional Attention Maps

When incorporating PConv with U-Net for inpainting, the method only updates the masks along with the convolution layers for encoder features. However, all-one mask is generally adopted for decoder features. As a result, the -th layer of decoder feature in both known regions and holes should be hallucinated using both -th layer of encoder feature and -th layer of decoder feature. Actually, the -th layer of encoder feature will be concatenated with the -th layer of decoder feature, and we can only focus on the generation of the -th layer of decoder feature in the holes.

We further introduce learnable reverse attention maps to the decoder features. Denote by the convolved mask for encoder feature $F^{in }_e$ . Let $M^c_d=M_d \otimes k_{M_{d}}$ be the convolved mask for decoder feature $F^{in }_d$ . The first two steps of learnable reverse attention map can be formulated as,

where and are the convolution filters. And we define as the reverse attention map. Then, the mask is updated and deployed to the former decoder layer,

Fig. 2(c) illustrates the interplay model of reverse attention map. In contrast to forward attention maps, both encoder feature (mask) and decoder feature (mask) are considered. Moreover, the updated mask in reverse attention map is applied to the former decoder layer, while that in forward attention map is applied to the next encoder layer.

PConv 只考虑了 encoder 的部分卷积，而没有考虑 decoder。于是提出了逆向可学习注意力图模型，这样就形成了 Learnable Bidirectional Attention Maps。反向和正向是相同的模型，只是输入输出不同罢了。decoder 的输出则是二者的相加。

图2(c)为逆向注意图的交互作用模型。与前向注意映射相比，编码器特征 (mask) 和解码器特征 (mask) 都被考虑在内。另外，注意图的反向更新的掩码应用于前一解码器层，而注意图的正向更新的掩码应用于下一编码器层。

By incorporating forward and reverse attention maps with U-Net, Fig. 3 shows the full learnable bidirectional attention maps. Given an input image $I^{in}$ with irregular holes, we use $M^{in}$ to denote the binary mask, where ones indicate the valid pixels and zeros indicate the pixels in holes. From Fig. 3, the forward attention maps take $M^{in }$ as the input mask for the re-normalization of the first layer of encoder feature, and gradually update and apply the mask to next encoder layer. In contrast, the reverse attention maps take $1-M^{in}$ as the input for the re-normalization of the last (i.e., -th) layer of decoder feature, and gradually update and apply the mask to former decoder layer. Benefited from the end-to-end learning, our learnable bidirectional attention maps (LBAM) are more effective in handling irregular holes. The introduction of reverse attention maps allows the decoder concentrate only on filling in irregular holes, which is also helpful to inpainting performance. Our LBAM is also beneficial to network training, making it feasible to exploit adversarial loss for improving visual quality.

通过使用 U-Net 合并正向和反向注意力地图，图 3 显示了完整的可学习的双向注意力地图。对于具有不规则孔洞的输入图像 $I^{in}$ ， $M^{in}$ 用 1 表示有效的像素，用 0 表示孔洞中的像素。从图 3 中可以看出，前向注意图作为编码器特征的第一层 re-normalization 的输入 mask $M^{in }$ ，并逐步更新应用到下一层编码器。相反，反向注意图 $1-M^{in}$ 作为最后一个注意图的 re-normalization 的输入，即第层的解码器特征，并逐步更新和适用于前解码器层的掩码。得益于端到端的学习，我们的可学习双向注意图 (LBAM)在处理不规则孔洞时更加有效。反向注意图的引入使得解码器只专注于填充不规则的孔，这也有助于inpainting性能。我们的LBAM也有利于网络训练，使得利用竞争损失来提高视觉质量成为可能。

Figure 3. The network architecture of our model. The circle with triangle inside denotes operation form of Eqn.( 12), gA and gM represent activation functions of Eqn.( 9) and mask updating function of Eqn.( 8).

Model Architecture

We modify the U-Net architecture [11] of 14 layers by removing the bottleneck layer and incorporating with bidirectional attention maps (see Fig. 3). In particular, forward attention layers are applied to the first six layers of encoder, while reverse attention layers are adopted to the last six layers of decoder. For all the U-Net layers and the forward and reverse attention layers, we use convolution filters with the kernel size of 4 × 4, stride 2 and padding 1, and no bias parameters are used. In the U-Net backbone, batch normalization and leaky ReLU nonlinearity are used to the features after re-normalization, and tanh nonlinearity is deployed right after convolution for the last layer. Fig. 3 also provides the size of feature map for each layer, and more details of the network architecture are given in the suppl.

我们修改了 14 层的 U-Net 架构，去掉了瓶颈层，加入了双向注意力地图 (见图3)。特别是，编码器的前 6 层采用了正向注意力层，而解码器的后 6 层采用了反向注意力层。对于所有的 U-Net 层以及正向和反向的注意层，我们使用了核尺寸为 4×4、stride 2 和padding 1 的卷积滤波器，并且没有使用任何偏置参数。在 U-Net 骨干网中，对重新归一化后的特征采用批处理归一化和漏电ReLU非线性，最后一层卷积后立即展开tanh非线性。图3还提供了每个层的 feature map 的大小，增刊中给出了更多的网络架构细节。

Loss Functions

Pixel Reconstruction Loss.

Perceptual Loss.

Style Loss.

Adversarial Loss.

Adversarial loss [8] has been widely adopted in image generation [24, 27, 38] and low level vision [16] for improving the visual quality of generated images. In order to improve the training stability of GAN, Arjovsky et al. [1] exploit the Wasserstein distance for measuring the distribution discrepancy between generated and real images, and Gulrajani et al. [9] further introduce gradient penalty for enforcing the Lipschitz constraint in discriminator. Following [9], we formulate the adversarial loss as, where D(·) represents the discriminator. Iˆ is sampled from Igt and Iout by linear interpolation with a randomly selected factor, λ is set to 10 in our experiments. We empirically find that it is difficult to train the PConv model when including adversarial loss. Fortunately, the incorporation of learnable attention maps is helpful to ease the training, making it feasible to learn LBAM with adversarial loss. Please refer to the suppl. for the network architecture of the 7-layer discriminator used in our implementation.

Model Objective

Taking the above loss functions into account, the model objective of our LBAM can be formed as,

where λ1, λ2, λ3, and λ4 are the tradeoff parameters. In our implementation, we empirically set λ1 = 1, λ2 = 0.1, λ3 = 0.05 and λ4 = 120.

魔都AI医疗哪家强？全景揭秘科技创新与未来钱景！
引言上海作为中国科技创新的先锋城市，正在AI医疗领域崭露头角。根据2024年12月的数据，上海拥有34家专注于AI药物研发的公司，占全国预临床研究的60%和临床试验的47%。这些公司利用深度学习、大语言模型（LLM）和计算机视觉等技术，革新药物发现、医疗影像分析和数据治理，推动医疗行业的智能化转型。从全球首个人工智能医院“AgentHospital”到AI驱动的诊断系统，上海的AI医疗生态正在重塑
用Python实现数据可视化的实用指南庞队千Virginia
用Python实现数据可视化的实用指南practical-python-data-viz-guideResourcesforteaching&learningpracticaldatavisualizationwithpython.项目地址:https://gitcode.com/gh_mirrors/pr/practical-python-data-viz-guide项目介绍在数据驱动的时代，数
deepseek学术论文全流程深度辅助指南（从开题至答辩）
在学术论文的创作旅程中，从开题到答辩的每一个阶段都至关重要。以下为你详细介绍如何借助高效工具和技巧，顺利完成这一复杂过程。阶段一：开题攻坚操作流程精准定位研究方向：输入指令「我是机械工程专业本科学生，请推荐5个适合毕业设计的智能机器人相关课题，要求：具有创新性但不过于前沿；需要仿真实验而非实物制作；附相关参考文献查找关键词」。通过明确专业、课题类型及具体要求，为研究方向的确定奠定基础。精心优化题目
深度学习基础2 TY-2025 深度学习深度学习人工智能
5.张量索引操作（1）索引操作行列索引列表索引print(data[[0,2],[1,2]])#返回(0,1)，(2,2)两个位置的元素print(data[[[0],[1]],[1,2]])#返回0，1行的1，2列共4个元素范围索引print(data[:3,:2])#前3行前2列数据print(data[2:,:2])#第2行到最后的前2列数据布尔索引tensor([[0,7,6,5,9],[
进阶向:DeepSeek AI对话系统深度解析,从API调用到会话管理 nightunderblackcat Python进阶人工智能 php 开发语言
第一部分：系统架构与核心功能1.1整体架构设计┌───────────────┐┌──────────────┐┌───────────────┐│用户交互层│───▶│API调用层│───▶│会话管理层│└───────────────┘└──────────────┘└───────────────┘▲▲▲│││┌───────┴───────┐┌──────┴──────┐┌───────┴
Python 领域 vllm 安装与环境配置全攻略 Python编程之道 Python编程之道 python 开发语言 ai
Python领域vllm安装与环境配置全攻略关键词：Python、vllm、安装、环境配置、深度学习摘要：本文围绕Python领域中vllm的安装与环境配置展开，全面且深入地介绍了vllm的相关知识。首先阐述了背景信息，包括目的范围、预期读者、文档结构和术语表。接着详细讲解了vllm的核心概念与联系，分析其核心算法原理并给出具体操作步骤，还引入了相关数学模型和公式进行说明。通过项目实战，提供代码实
语义分割模型的轻量化与准确率提升研究 pk_xz123456 仿真模型深度学习算法 transformer 深度学习人工智能算法数据结构
语义分割模型的轻量化与准确率提升研究1.引言语义分割是计算机视觉领域的核心任务之一，它要求模型为图像中的每个像素分配一个类别标签。随着深度学习的发展，语义分割模型在多个领域得到了广泛应用，如自动驾驶、医学影像分析、遥感图像解译等。然而，现有的语义分割模型往往面临两个主要挑战：模型复杂度高导致难以部署在资源受限的设备上，以及准确率仍有提升空间以满足实际应用需求。本文将从模型轻量化和准确率提升两个角度
AIGC领域AI作画：在数字雕塑中的应用实践 AI原生应用开发 AI 原生应用开发 AIGC AI作画 ai
AIGC领域AI作画：在数字雕塑中的应用实践关键词：AIGC、AI作画、数字雕塑、生成对抗网络、3D建模、艺术创作、深度学习摘要：本文深入探讨了AIGC(人工智能生成内容)技术在数字雕塑领域的创新应用。我们将从技术原理、算法实现到实际案例，全面解析AI如何赋能传统数字雕塑创作流程。文章首先介绍AIGC在艺术创作中的背景和发展现状，然后详细讲解核心算法原理和数学模型，接着通过实际项目案例展示AI作画
Python项目如何读取nacos配置 Tizzy JJ 服务器 python pycharm
目录一、nacos配置示例二、python读取nacos配置一、nacos配置示例在Nacos中创建yaml格式配置（DataID:your-data-id）#Nacos配置文件(your-data-id.yaml)app:env:productionversion:1.2.3apis:deepseek:api_key:"sk-your-deepseek-key-here"timeout:30da
【零基础学AI】第33讲：强化学习基础 - 游戏AI智能体 1989 0基础学AI 人工智能游戏 transformer 分类深度学习神经网络
本节课你将学到理解强化学习的基本概念和框架掌握Q-learning算法原理使用Python实现贪吃蛇游戏AI训练能够自主玩游戏的智能体开始之前环境要求Python3.8+PyTorch2.0+Gymnasium(原OpenAIGym)NumPyMatplotlib推荐使用JupyterNotebook进行实验前置知识Python基础编程（第1-8讲）基本数学概念（函数、导数）神经网络基础（第23讲
Python深度学习实践：建立端到端的自动驾驶系统 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python深度学习实践：建立端到端的自动驾驶系统1.背景介绍自动驾驶系统是当今科技领域最具挑战性和前景的应用之一。它融合了计算机视觉、深度学习、规划与控制等多个领域的先进技术,旨在实现车辆的自主感知、决策和操控。随着人工智能技术的不断发展,越来越多的公司和研究机构投入了大量资源来开发自动驾驶系统。Python作为一种高效、易学且开源的编程语言,在这一领域扮演着重要角色。本文将探讨如何利用Pyth
【AI论文】Skywork-Reward-V2：通过人机协同实现偏好数据整理的规模化扩展
摘要：尽管奖励模型（RewardModels，RMs）在基于人类反馈的强化学习（ReinforcementLearningfromHumanFeedback，RLHF）中发挥着关键作用，但当前最先进的开源奖励模型在大多数现有评估基准上表现欠佳，无法捕捉人类复杂且微妙的偏好谱系。即便采用先进训练技术的方法也未能显著提升性能。我们推测，这种脆弱性主要源于偏好数据集的局限性——这些数据集往往范围狭窄、标
03 数据可视化的世界非常广阔，除了已提到的类型，还有许多更细分或前沿的可视化形式。晨曦543210 信息可视化人工智能
十五、机器学习与数据科学专用图表特征重要性图（FeatureImportancePlot）用途：展示机器学习模型中各特征对预测结果的贡献度。示例：随机森林模型中影响房价预测的关键因素。混淆矩阵热力图（ConfusionMatrixHeatmap）用途：分类模型性能评估，显示预测结果与真实标签的对比。示例：疾病诊断模型的真阳性/假阳性分布。学习曲线（LearningCurve）用途：分析模型训练过程
当争论者还在讨论AI的边界，实践者早已用这些技术解决实际问题渡难繁辰人工智能拥抱AI 人工智能 ai
——普通人参与AI革命的关键路径一、AI应用五大核心组件（通俗拆解版）1️⃣LLM：AI的「决策核心」本质：大型语言模型（如DeepSeek、通义千问），具备语言理解与生成能力能力边界：✅处理文本类任务（写作/翻译/摘要）❌无法获取实时信息（如最新股价）⚠️存在“幻觉”（虚构信息）风险案例对比：问：“鲁迅和周树人什么关系？”基础LLM：“两位都是著名作家”（错误）增强版LLM：“周树人是鲁迅本名”
Python 爬虫实战：Selenium 爬取豆瓣相册（图片分类 + 标签提取）西攻城狮北 python 爬虫 selenium
一、引言豆瓣作为国内知名的社区平台，其相册功能允许用户上传和分享各类图片，涵盖电影海报、音乐专辑、生活记录等多个领域。这些图片数据对于了解用户兴趣、进行内容推荐和市场调研具有重要价值。然而，豆瓣对直接的数据访问设定了诸多限制，因此，本文将介绍如何通过Python爬虫技术结合Selenium自动化工具，合法高效地爬取豆瓣相册图片，并运用深度学习技术实现图片分类和标签提取。二、开发环境搭建（一）编程语
全球 AI HR 浪潮下的中国实践：从效率革命到战略重构 weixin_54980836 人工智能重构
一、全球AIHR的技术跃迁与价值重构在DeepSeek、ChatGPT引发的生成式AI革命中，人力资源管理领域正经历着从“工具替代”到“认知重构”的范式转变。Gartner《2025年人力资源技术趋势报告》指出，AI在HR场景的应用已从简历筛选、薪资计算等基础效率工具，升级为支持组织战略决策的“数字伙伴”。这种转变的底层逻辑，源于大模型技术带来的三大突破：多模态交互能力：AI已能同时处理文本、语音
【深度学习】大模型GLM-4-9B Chat ，微调与部署(3) TensorRT-LLM、TensorRT量化加速、Triton部署 XD742971636 深度学习机器学习深度学习人工智能
文章目录获取TensorRT-LLM代码：构建docker镜像并安装TensorRT-LLM：运行docker镜像：安装依赖魔改下部分package代码：量化：构建图：全局参数插件配置常用配置参数测试推理是否可以代码推理CLI推理性能测试小结验证是否严重退化使用NVIDIATriton部署在线推理服务器代码弄下来编译镜像启动容器安装依赖量化构建trtengines图Triton模板说明实操发起Tr
大白话解释深度学习中多尺度特征融合及其意义来自宇宙的曹先生深度学习人工智能
想象一下，你正在看一幅城市街道的照片。在这张照片中，你可能会看到：远处的小汽车，它们在图像中看起来很小。近处的大巴士，它们在图像中看起来很大。还有一些行人，他们可能在不同的距离上，大小各异。假设你想训练一个计算机程序来识别和分割这些不同的物体（汽车、巴士、行人）。如果这个程序只能在一个固定的尺度上“看”图像，比如说只能处理大物体，它可能会错过那些远处的小汽车，因为这些小汽车在图像中占据的像素很少。
SpringBoot多数据源动态切换方案：AbstractRoutingDataSource详解 fanxbl957 Web spring boot 后端 java
博主介绍：Java、Python、js全栈开发“多面手”，精通多种编程语言和技术，痴迷于人工智能领域。秉持着对技术的热爱与执着，持续探索创新，愿在此分享交流和学习，与大家共进步。DeepSeek-行业融合之万象视界(附实战案例详解100+)全栈开发环境搭建运行攻略：多语言一站式指南(环境搭建+运行+调试+发布+保姆级详解)感兴趣的可以先收藏起来，希望帮助更多的人SpringBoot多数据源动态切换
构建企业级大模型运行监控体系：健康度五级指标与实战部署路径全解析
构建企业级大模型运行监控体系：健康度五级指标与实战部署路径全解析关键词：模型运行监控、健康度分级体系、DeepSeek、私有化部署、Prometheus、Grafana、异常检测、推理稳定性、性能观测、可视化大屏摘要：在DeepSeek大模型私有化部署的生产环境中，传统的“是否可用”监控已难以满足对模型稳定性、推理质量与异常风险的精细管理需求。为此，企业必须构建一套基于五级健康度模型的全维监控体系
企业级多模型服务架构（MaaS）私有部署实战指南：统一调度、模型隔离与服务编排全路径解析观熵架构人工智能私有化部署
企业级多模型服务架构（MaaS）私有部署实战指南：统一调度、模型隔离与服务编排全路径解析关键词：多模型服务架构、MaaS、私有化部署、模型管理、推理调度、模型编排、TritonInferenceServer、DeepSeek、模型隔离、企业级AI平台摘要：随着企业对多任务、多模型能力的需求日益增长，MaaS（Model-as-a-Service）架构已成为私有部署中的关键支撑技术之一。该文聚焦当前
想要了解大模型，看懂这一篇就够了！大模型工作流程及核心参数介绍！ Gq.xxu qwen3 vllm transforms 大语言模型部署深度学习人工智能
若想深入探究大模型核心参数的效果与作用，就务必先弄清大模型的工作流程，明确核心参数在流程各阶段的效能与功能，知晓其具体含义。一，大模型的工作流程大模型运行时的工作原理可以概括为输入处理→特征提取→模型推理→结果生成四个核心阶段，整个过程融合了深度学习架构、自然语言处理技术以及分布式计算能力。从用户输入到大模型输出，整个工作的处理流程如下：输入文本→分词→嵌入+位置编码→Transformer多层处
深度学习-Tensor
Tensor张量：与numpy中的ndarray不同之处：tensor可以在GPU或其他专用硬件上运行，以加速计算。一、Tensor初始化1.直接从数据中创建data=[[1,2],[3,4]]x_data=torch.tensor(data)2.从numpy数组创建np_array=np.array(data)x_np=torch.from_numpy(np_array)3.从另一个Tensor
基于存算一体架构的实时深度学习推理优化瑕疵热点资讯架构深度学习人工智能
博客主页：瑕疵的CSDN主页Gitee主页：瑕疵的gitee主页⏩文章专栏：《热点资讯》基于存算一体架构的实时深度学习推理优化基于存算一体架构的实时深度学习推理优化基于存算一体架构的实时深度学习推理优化引言存算一体架构的核心优势1.能效比突破2.实时性保障架构设计与实现技术1.存储单元创新2.硬件加速器设计3.电路级优化深度学习推理优化策略1.模型压缩技术2.硬件-软件协同优化3.运行时调度典型应
【人工智能】ChatGPT、DeepSeek-R1、DeepSeek-V3 辨析 G皮T #大语言模型人工智能 LLM 大语言模型 chatgpt deepseek DeepSeek-R1 DeepSeek-V3
ChatGPT、DeepSeek-R1、DeepSeek-V3辨析1.ChatGPT对比DeepSeek1.1技术相似点1.2主要差异1.3关键区别1.4如何选择1.5总结2.DeepSeek-R1对比DeepSeek-V32.1DeepSeek-R12.2DeepSeek-V32.3核心区别总结2.4如何选择3.R1和V3有什么含义3.1DeepSeekR1的"R"3.2DeepSeekV3的"
AlphaEvolve：谷歌的算法进化引擎 | 从数学证明到芯片设计的AI自主发现新纪元大千AI助手人工智能 Python #OTHER 算法人工智能深度学习 AlphaEvolve google gemini
AlphaEvolve：谷歌的算法进化引擎|从数学证明到芯片设计的AI自主发现新纪元——结合大语言模型与进化计算，重塑科学发现与工程优化的通用智能体本文由「大千AI助手」原创发布，专注用真话讲AI，回归技术本质。拒绝神话或妖魔化。搜索「大千AI助手」关注我，一起撕掉过度包装，学习真实的AI技术！⚙️一、核心定义与技术架构AlphaEvolve是由谷歌DeepMind开发的通用科学AI智能体，其核心
【深度学习新浪潮】什么是上下文长度？小米玄戒Andrew 深度学习新浪潮深度学习人工智能 LLM 语言模型大模型模型优化上下文长度
大型语言模型（LLM）的上下文长度是指模型在处理当前输入时能够有效利用的历史文本长度，通常以token（如单词、子词或标点）为单位衡量。例如，GPT-4支持128Ktoken的上下文，而Llama4Scout甚至达到了10Mtoken的惊人规模。这一指标直接影响模型在长文档理解、多轮对话等复杂任务中的表现。一、上下文长度的合理范围上下文长度的选择需结合具体应用场景：日常对话：通常需要8K–32Kt
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models UnknownBody LLM Daily Causal and Reasoning 语言模型人工智能
文章主要内容总结本文围绕大推理模型（LRMs）的推理能力展开系统研究，通过可控谜题环境分析其在不同问题复杂度下的表现，揭示其优势与局限性：研究背景与问题：当前LRMs（如OpenAIo1/o3、DeepSeek-R1等）虽在推理基准测试中表现提升，但对其底层能力、缩放特性及局限性的理解不足。现有评估依赖数学和编码基准，存在数据污染且缺乏对推理轨迹的深度分析。研究方法：采用可控谜题环境（如汉诺塔、跳
Manus AI与多语言手写识别 tonngw 人工智能
技术文章大纲：ManusAI与多语言手写识别引言手写识别技术的发展背景与市场需求ManusAI的定位与核心技术优势多语言场景下的挑战与机遇ManusAI的核心技术架构基于深度学习的端到端手写识别模型多模态数据融合（笔迹压力、书写轨迹等）自适应语言模型与字符集扩展机制多语言手写识别的关键技术非拉丁语系（中文、阿拉伯语等）的笔迹特征提取小样本语言数据的迁移学习策略上下文感知与语法纠错在低资源语言中的应
DeepSeek 帮助自己的工作
引言简述人工智能助手在职场中的普及趋势DeepSeek作为智能创作助手的核心功能概述DeepSeek的核心能力信息检索与整合：基于用户意图精准搜索并生成答案多场景应用：技术文档撰写、数据分析、代码生成等交互优化：遵循用户指定的格式与内容规范职场应用场景与实操案例技术文档撰写自动生成API文档框架根据需求补充技术细节示例代码块与公式的规范化输出数据分析支持快速检索行业数据并生成可视化建议数学建模中的
分享100个最新免费的高匿HTTP代理IP mcj8089 代理IP 代理服务器匿名代理免费代理IP 最新代理IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ 120.198.243.130:80,中国/广东省 58.251.78.71:8088,中国/广东省 183.207.228.22:83,中国/
mysql高级特性之数据分区 annan211 java 数据结构 mongodb 分区 mysql
mysql高级特性 1 以存储引擎的角度分析，分区表和物理表没有区别。是按照一定的规则将数据分别存储的逻辑设计。器底层是由多个物理字表组成。 2 分区的原理分区表由多个相关的底层表实现，这些底层表也是由句柄对象表示，所以我们可以直接访问各个分区。存储引擎管理分区的各个底层表和管理普通表一样(所有底层表都必须使用相同的存储引擎)，分区表的索引只是
JS采用正则表达式简单获取URL地址栏参数 chiangfai js 地址栏参数获取
GetUrlParam:function GetUrlParam(param){ var reg = new RegExp("(^|&)"+ param +"=([^&]*)(&|$)"); var r = window.location.search.substr(1).match(reg); if(r!=null
怎样将数据表拷贝到powerdesigner (本地数据库表) Array_06 powerDesigner
================================================== 1、打开PowerDesigner12，在菜单中按照如下方式进行操作 file->Reverse Engineer->DataBase 点击后，弹出 New Physical Data Model 的对话框 2、在General选项卡中 Model name:模板名字，自
logbackのhelloworld 飞翔的马甲日志 logback
一、概述 1.日志是啥？当我是个逗比的时候我是这么理解的：log.debug()代替了system.out.print(); 当我项目工作时，以为是一堆得.log文件。这两天项目发布新版本，比较轻松，决定好好地研究下日志以及logback。传送门1：日志的作用与方法： http://www.infoq.com/cn/articles/why-and-how-log 上面的作
新浪微博爬虫模拟登陆随意而生新浪微博
转载自：http://hi.baidu.com/erliang20088/item/251db4b040b8ce58ba0e1235 近来由于毕设需要，重新修改了新浪微博爬虫废了不少劲，希望下边的总结能够帮助后来的同学们。现行版的模拟登陆与以前相比，最大的改动在于cookie获取时候的模拟url的请求
synchronized 香水浓 java thread
Java语言的关键字，可用来给对象和方法或者代码块加锁，当它锁定一个方法或者一个代码块的时候，同一时刻最多只有一个线程执行这段代码。当两个并发线程访问同一个对象object中的这个加锁同步代码块时，一个时间内只能有一个线程得到执行。另一个线程必须等待当前线程执行完这个代码块以后才能执行该代码块。然而，当一个线程访问object的一个加锁代码块时，另一个线程仍然
maven 简单实用教程 AdyZhang maven
1. Maven介绍 1.1. 简介 java编写的用于构建系统的自动化工具。目前版本是2.0.9，注意maven2和maven1有很大区别，阅读第三方文档时需要区分版本。 1.2. Maven资源见官方网站；The 5 minute test，官方简易入门文档；Getting Started Tutorial，官方入门文档；Build Coo
Android 通过 intent传值获得null aijuans android
我在通过intent 获得传递兑现过的时候报错，空指针,我是getMap方法进行传值，代码如下 1 2 3 4 5 6 7 8 9 public void getMap(View view){ Intent i =
apache 做代理报如下错误：The proxy server received an invalid response from an upstream baalwolf response
网站配置是apache＋tomcat,tomcat没有报错，apache报错是： The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /. Reason: Error reading fr
Tomcat6 内存和线程配置 BigBird2012 tomcat6
1、修改启动时内存参数、并指定JVM时区（在windows server 2008 下时间少了8个小时）在Tomcat上运行j2ee项目代码时，经常会出现内存溢出的情况，解决办法是在系统参数中增加系统参数： window下，在catalina.bat最前面 set JAVA_OPTS=-XX:PermSize=64M -XX:MaxPermSize=128m -Xms5
Karam与TDD bijian1013 Karam TDD
一.TDD 测试驱动开发（Test-Driven Development,TDD）是一种敏捷（AGILE）开发方法论，它把开发流程倒转了过来，在进行代码实现之前，首先保证编写测试用例，从而用测试来驱动开发（而不是把测试作为一项验证工具来使用）。 TDD的原则很简单： a.只有当某个
[Zookeeper学习笔记之七]Zookeeper源代码分析之Zookeeper.States bit1129 zookeeper
public enum States { CONNECTING, //Zookeeper服务器不可用，客户端处于尝试链接状态 ASSOCIATING, //？？？ CONNECTED, //链接建立，可以与Zookeeper服务器正常通信 CONNECTEDREADONLY, //处于只读状态的链接状态，只读模式可以在
【Scala十四】Scala核心八：闭包 bit1129 scala
Free variable A free variable of an expression is a variable that’s used inside the expression but not defined inside the expression. For instance, in the function literal expression (x: Int) => (x
android发送json并解析返回json ronin47 android
package com.http.test; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import
一份IT实习生的总结 brotherlamp PHP php资料 php教程 php培训 php视频
今天突然发现在不知不觉中自己已经实习了 3 个月了，现在可能不算是真正意义上的实习吧，因为现在自己才大三，在这边撸代码的同时还要考虑到学校的功课跟期末考试。让我震惊的是，我完全想不到在这 3 个月里我到底学到了什么，这是一件多么悲催的事情啊。同时我对我应该 get 到什么新技能也很迷茫。所以今晚还是总结下把，让自己在接下来的实习生活有更加明确的方向。最后感谢工作室给我们几个人这个机会让我们提前出来
据说是2012年10月人人网校招的一道笔试题-给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。将重物放到天平左侧，问在两边如何添加砝码 bylijinnan java
public class ScalesBalance { /** * 题目： * 给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。（假设N无限大，但一种重量的砝码只有一个） * 将重物放到天平左侧，问在两边如何添加砝码使两边平衡 * * 分析： * 三进制 * 我们约定括号表示里面的数是三进制，例如 47=(1202
dom4j最常用最简单的方法 chiangfai dom4j
要使用dom4j读写XML文档,需要先下载dom4j包,dom4j官方网站在 http://www.dom4j.org/目前最新dom4j包下载地址:http://nchc.dl.sourceforge.net/sourceforge/dom4j/dom4j-1.6.1.zip 解开后有两个包,仅操作XML文档的话把dom4j-1.6.1.jar加入工程就可以了,如果需要使用XPath的话还需要
简单HBase笔记 chenchao051 hbase
一、Client-side write buffer 客户端缓存请求描述：可以缓存客户端的请求，以此来减少RPC的次数，但是缓存只是被存在一个ArrayList中，所以多线程访问时不安全的。可以使用getWriteBuffer()方法来取得客户端缓存中的数据。默认关闭。二、Scan的Caching 描述： next( )方法请求一行就要使用一次RPC,即使
mysqldump导出时出现when doing LOCK TABLES daizj mysql mysqdump 导数据
　　执行　mysqldump -uxxx -pxxx -hxxx -Pxxxx database tablename > tablename.sql　导出表时，会报 mysqldump: Got error: 1044: Access denied for user 'xxx'@'xxx' to database 'xxx' when doing LOCK TABLES 解决
CSS渲染原理 dcj3sjt126com Web
从事Web前端开发的人都与CSS打交道很多，有的人也许不知道css是怎么去工作的，写出来的css浏览器是怎么样去解析的呢？当这个成为我们提高css水平的一个瓶颈时，是否应该多了解一下呢？一、浏览器的发展与CSS
《阿甘正传》台词 dcj3sjt126com
Part Ⅰ: 《阿甘正传》Forrest Gump经典中英文对白 Forrest: Hello! My names Forrest. Forrest Gump. You wanna Chocolate? I could eat about a million and a half othese. My momma always said life was like a box ochocol
Java处理JSON dyy_gusi json
Json在数据传输中很好用，原因是JSON 比 XML 更小、更快，更易解析。在Java程序中，如何使用处理JSON，现在有很多工具可以处理，比较流行常用的是google的gson和alibaba的fastjson，具体使用如下： 1、读取json然后处理 class ReadJSON { public static void main(String[] args)
win7下nginx和php的配置 geeksun nginx
1. 安装包准备 nginx : 从nginx.org下载nginx-1.8.0.zip php：从php.net下载php-5.6.10-Win32-VC11-x64.zip， php是免安装文件。 RunHiddenConsole: 用于隐藏命令行窗口 2. 配置 # java用8080端口做应用服务器，nginx反向代理到这个端口即可 p
基于2.8版本redis配置文件中文解释 hongtoushizi redis
转载自： http://wangwei007.blog.51cto.com/68019/1548167 在Redis中直接启动redis-server服务时, 采用的是默认的配置文件。采用redis-server xxx.conf 这样的方式可以按照指定的配置文件来运行Redis服务。下面是Redis2.8.9的配置文
第五章常用Lua开发库3-模板渲染 jinnianshilongnian nginx lua
动态web网页开发是Web开发中一个常见的场景，比如像京东商品详情页，其页面逻辑是非常复杂的，需要使用模板技术来实现。而Lua中也有许多模板引擎，如目前我在使用的lua-resty-template，可以渲染很复杂的页面，借助LuaJIT其性能也是可以接受的。如果学习过JavaEE中的servlet和JSP的话，应该知道JSP模板最终会被翻译成Servlet来执行；而lua-r
JZSearch大数据搜索引擎颠覆者 JavaScript
系统简介：大数据的特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。最后这一点也是和传统的数据挖掘技术有着本质的不同。业界将其归纳为4个“V”——Volume，Variety，Value，Velocity。大数据搜索引
10招让你成为杰出的Java程序员 pda158 java 编程框架
如果你是一个热衷于技术的 Java 程序员，那么下面的 10 个要点可以让你在众多 Java 开发人员中脱颖而出。　　 1. 拥有扎实的基础和深刻理解 OO 原则　　对于 Java 程序员，深刻理解 Object Oriented Programming（面向对象编程）这一概念是必须的。没有 OOPS 的坚实基础，就领会不了像 Java 这些面向对象编程语言
tomcat之oracle连接池配置小网客 oracle
tomcat版本7.0 配置oracle连接池方式：修改tomcat的server.xml配置文件： <GlobalNamingResources> <Resource name="utermdatasource" auth="Container" type="javax.sql.DataSou
Oracle 分页算法汇总 vipbooks oracle sql 算法 .net
这是我找到的一些关于Oracle分页的算法，大家那里还有没有其他好的算法没？我们大家一起分享一下！ -- Oracle 分页算法一 select * from ( select page.*,rownum rn from (select * from help) page -- 20 = (currentPag

MyDLNote - Inpainting: Image Inpainting with Learnable Bidirectional Attention Maps

Image Inpainting with Learnable Bidirectional Attention Maps

Abstract

Introduction

Proposed Method

Revisiting Partial Convolution

Learnable Attention Maps

Learnable Bidirectional Attention Maps

Model Architecture

Loss Functions

你可能感兴趣的:(deep,learning,深度学习)