XingHe_XingHe_

2020_CIKM_Partial Relationship Aware Influence Diffusion via a Multi-channel Encoding Scheme for Soc

[论文阅读笔记]2020_CIKM_Partial Relationship Aware Influence Diffusion via a Multi-channel Encoding Scheme for Social Recommendation

论文下载地址： https://doi.org/10.1145/3340531.3412016
发表期刊：CIKM
Publish time: 2020
作者及单位:

Bo Jin Dalian University of Technology Dalian, China [email protected]
Ke Cheng Dalian University of Technology Dalian, China [email protected]
Liang Zhang∗ Dongbei University of Finance and Economics Dalian, China [email protected]
Yanjie Fu University of Central Florida Orange City, FL [email protected]
Minghao Yin Northeast Normal University Changchun, China [email protected]
Lu Jiang Northeast Normal University Changchun, China [email protected]

数据集： 正文中的介绍

Yelp (用的Diffet)
Flickr (用的Diffet)

代码：

（作者没给）

其他：

其他人写的文章

简要概括创新点：用的DiffNet的数据集，可能就是针对DiffNet++进行挑战的。①channel就是embedding的dimension。 ②multi-label就是 user-多个item(label)。③partial relationship就是因为某些关系形成的子连通域（子图）。④Atttention是以a channel-wise way进行的，传统的是以node-wise way进行的，其实是更精细化了。⑤数据集本身就是稀疏的，SoRec本来就是为了解决稀疏性，本文就说恰好基于稀疏性，有点面向数据编程了(如果不稀疏的话，计算量就bao了)

(3) In light of this, we propose MEGCN (Multi-channel Encoding Graph Convolutional Network), a partial relationship aware influence diffusion model, for social recommendation. (有鉴于此，我们提出了一种 基于部分关系的影响扩散模型 MEGCN（Multi-channel Encoding Graph Convolutional Network），用于社会推荐。)

The key idea behind this model is a feature-wise message computation with sparse regularization in the influence diffusion process . (该模型背后的关键思想是在 影响扩散过程 中采用 稀疏正则化 的 特征信息计算。)

More specifically, the layer-wise diffusion process starts with an initial embedding for each user, which can be a free vector that captures the latent interest or any explicit features. (更具体地说，分层扩散过程 从每个用户的初始嵌入开始，该嵌入可以是捕获潜在兴趣或任何显式特征的自由向量。)

Then a GCN-like information aggregation is conducted in each layer, which can help capture neighborhood contexts. At its core, under the assumption of channel-wise graph sparsity, the traditional node-wise message computation is changed into feature-wise computation, such that user interests and shared interests will be simultaneously captured in the layer-wise inforation diffusion process. (然后在每一层中进行 类似GCN 的信息聚合，这有助于捕获邻域上下文。其核心是在 信道图稀疏的假设下 ，将传统的节点消息计算 转化为 特征计算 ，从而在 分层信息扩散过程 中同时捕获用户兴趣和 共享兴趣。)

In the data preparation step, we filter out users with less than 2 historical action records and 2 social neighbors in both datasets. (在数据准备阶段，我们筛选出两个数据集中历史动作记录和社交邻居少于2个的用户。)

ABSTRACT

(1) Social recommendation tasks exploit social connections to enhance recommendation performance. (社会推荐任务利用社会关系来提高推荐性能。)
(2) To fully utilize each user’s first-order and high-order neighborhood preferences, recent approaches incorporate influence diffusion process for better user preference modeling. (为了充分利用每个用户的一阶和高阶邻域偏好，最近的方法结合了影响扩散过程，以便更好地进行用户偏好建模。)
(3) Despite the superior performance of these models, they either neglect the latent individual interests hidden in the user-item interactions or rely on computationally expensive graph attention models to uncover the item-induced sub-relations, which essentially determine the influence propagation passages. (尽管这些模型的性能优越，但它们要么忽略了隐藏在用户-项目交互中的潜在个人兴趣，要么依赖计算昂贵的图形注意模型来揭示项目诱导的子关系，这些子关系本质上决定了影响的传播通道。)
(4) Considering the sparse substructures are derived from original social network, we name them as partial relationships between users. (考虑到稀疏子结构源自原始社交网络，我们将其命名为用户之间的部分关系。)
(5) We argue such relationships can be directly modeled such that both personal interests and shared interests can propagate along a few channels (or dimensions) of latent users’ embeddings. (我们认为这种关系可以直接建模，这样个人兴趣和共享兴趣都可以通过潜在用户嵌入的几个渠道（或维度）传播。)
(6) To this end, we propose a partial relationship aware influence diffusion structure via a computationally efficient multi-channel encoding scheme. (为此，我们通过一种计算高效的多通道编码方案，提出了一种部分关系感知的影响扩散结构。)
- Specifically, the encoding scheme first simplifies graph attention operation based on a channel-wise sparsity assumption, (具体而言，编码方案首先基于信道稀疏性假设简化了图注意操作，)
- and then adds an InfluenceNorm function to maintain such sparsity. (然后添加一个InfluenceNorm函数来保持这种稀疏性。)
- Moreover, ChannelNorm is designed to alleviate the oversmoothing problem in graph neural network models. (此外，ChannelNorm被设计用于缓解图神经网络模型中的过度平滑问题。)
(7) Extensive experiments on two benchmark datasets show that our method is comparable to state-of-the-art graph attention-based social recommendation models while capturing user interests according to partial relationships more efficiently. （在两个基准数据集上的大量实验表明，我们的方法可以与最先进的基于图形注意的社会推荐模型相媲美，同时可以更有效地根据部分关系捕获用户兴趣。）

CCS CONCEPTS

• Human-centered computing → Social recommendation; • Information systems → Recommender systems.

KEYWORDS

Social Network; Recommendation; Graph Neural Network

1 INTRODUCTION

(1) Discerning user preference with sparse user behavior data is a key issue in a recommendation task. The social recommendation has emerged as a pioneering direction based on the social influence theory which states connected people would show similar interests pattern [30, 33, 35, 39]. (在推荐任务中，利用稀疏的用户行为数据识别用户偏好是一个关键问题。社会推荐是基于社会影响理论的一个开创性方向，该理论指出，有关联的人会表现出相似的偏好模式[30、33、35、39]。)
- Ideally, a successful social recommendation system should capture temporal or evolutionary individual interests and shared interests. Hence uncovering self interests [2] hidden in user-item interactions while identifying the shared interests according to social influence become the most essential. （理想情况下，一个成功的社会推荐系统应该能够捕捉到 暂时的 或 进化的 个人兴趣和共同的兴趣。因此，发现隐藏在用户项目交互中的 自我兴趣 [2]，同时根据社会影响确定 共享兴趣 成为最重要的。）
- However, unlike single-label graph learning tasks, a recommendation system based on social network is usually a multi-label graph learning task with extreme label (or user-item interaction) sparsity. (然而，与 单标签图学习任务 不同，基于社交网络的推荐系统通常是具有 极端标签（或用户项交互）稀疏性 的 多标签图学习任务 。)
- In fact, it is the sparse label-induced social relationship substructures [4, 37] that determine the social influence channels. (事实上，决定社会影响渠道的是 稀疏标签诱导的社会关系子结构 [4,37]。)
- As a result, the traditional assumption that linked users in a social network tend to share the same interests no longer fits for all the users. (因此，社交网络中链接用户倾向于共享相同兴趣的传统假设不再适用于所有用户。)
  - For instance, Figure 1 illustrates the frequency distribution of similar behaviors between each user and all neighbors according to social network structure of Yelp and Flickr. The right-skewed distributions demonstrate though users in social network show some similar behaviors with neighborhoods, they largely maintain their own interests. Therefore, it is of utmost importance to capture social influence according to sparse substructures rather than the original social network structures. (例如，图1显示了根据Yelp和Flickr的社交网络结构，每个用户和所有邻居之间相似行为的频率分布。右偏分布表明，尽管社交网络中的用户在社区中表现出一些类似的行为，但他们基本上维护了自己的兴趣。因此，根据稀疏的子结构而不是原始的社会网络结构来获取社会影响力至关重要。)
(2) Previous studies for social recommendation attempt to model social effects in various ways, such as by trust propagation, regularization loss, matrix factorization, network embedding and deep neural networks [30, 31, 38]. However, these studies have some limitations. (之前的社会推荐研究试图以各种方式模拟社会效应，例如通过信任传播、正则化损失、矩阵分解、网络嵌入和深度神经网络[30,31,38]。然而，这些研究有一些局限性。)
- First, most works model friends’ influence statically or dynamically without differentiating influence importance, (首先，大多数作品对朋友的影响力进行静态或动态建模，而不区分影响力的重要性)
  - such as spatial Graph Neural Network (GNN) approaches [11,30,36], which apparently ignore the differences of social effects, and will further lead to oversmoothing problems. (例如，空间图神经网络（GNN）方法[11,30,36]，它显然忽略了社会效应的差异，并将进一步导致过度平滑问题。)
- Second, even though attention-based models [29–31] have been proposed to distinguish social effects, they mainly calculate the node-wise similarities, which are computationally expensive in large-scale social recommendation tasks. (第二，尽管基于注意的模型[29–31]被提出用于区分社会效应，但它们主要计算节点相似性，这在大规模社会推荐任务中计算成本很高。)
- In essence, Diffnet++ [29] and DANSER [31] achieve superior performance because they model the label-substructures [4] by incorporating user-item interactions in the information diffusion process of the social network. (本质上，Diffnet++[29]和DANSER[31]实现了卓越的性能，因为它们通过在社交网络的 信息扩散过程中 结合用户项交互来建模 标签子结构 [4]。)
- In our work, we name the label-induced substructures as partial relationships. Moreover, the light-weight convolution [28] inspires us that node-wise computational complexity can be mitigated by splitting the node representation into chunks. (在我们的工作中，我们将标签诱导的子结构命名为部分关系。此外，轻量级卷积[28]启发我们，可以通过将节点表示拆分为块来降低节点计算复杂性。)
  - Hence, we argue that such partial relationships can be directly reflected in each dimension of user embedding without using burdensome node-wise attention-based models. (因此，我们认为，这种局部关系可以直接反映在用户嵌入的每个维度上，而无需使用繁重的基于节点的注意模型。)
(3) In light of this, we propose MEGCN (Multi-channel Encoding Graph Convolutional Network), a partial relationship aware influence diffusion model, for social recommendation. (有鉴于此，我们提出了一种 基于部分关系的影响扩散模型 MEGCN（Multi-channel Encoding Graph Convolutional Network），用于社会推荐。)
- The key idea behind this model is a feature-wise message computation with sparse regularization in the influence diffusion process . (该模型背后的关键思想是在 影响扩散过程 中采用 稀疏正则化 的 特征信息计算。)
- More specifically, the layer-wise diffusion process starts with an initial embedding for each user, which can be a free vector that captures the latent interest or any explicit features. (更具体地说，分层扩散过程 从每个用户的初始嵌入开始，该嵌入可以是捕获潜在兴趣或任何显式特征的自由向量。)
- Then a GCN-like information aggregation is conducted in each layer, which can help capture neighborhood contexts. At its core, under the assumption of channel-wise graph sparsity, the traditional node-wise message computation is changed into feature-wise computation, such that user interests and shared interests will be simultaneously captured in the layer-wise inforation diffusion process. (然后在每一层中进行 类似GCN 的信息聚合，这有助于捕获邻域上下文。其核心是在 信道图稀疏的假设下 ，将传统的节点消息计算 转化为 特征计算 ，从而在 分层信息扩散过程 中同时捕获用户兴趣和 共享兴趣。)
(4) We summarize the contributions of this paper as follows: (我们将本文的贡献总结如下：)
- We propose MEGCN, a partial relationship aware influence diffusion model with channel-wise sparsity, to efficiently capture both self interests and share interests in the social recommendation tasks. (我们提出了一个具有 通道稀疏性 的 部分关系感知影响扩散模型 MEGCN，以有效地捕获社交推荐任务中的 自我兴趣 和 共享兴趣 。)
  - In the information diffusion process, MEGCN realizes feature-wise message computation under the graph channel-wise sparsity assumption. (在 信息扩散 过程中，MEGCN在 图通道稀疏性假设 下实现了 特征信息计算 。)
- We design two normalization schemes, InfluenceNorm and ChannelNorm, to guarantee channel-wise sparsity, such that they can model partial relationships and alleviate the over-smoothing problem in graph convolution networks. (为了 保证信道稀疏性 ，我们设计了两种正则化方案 InfluenceNorm 和 ChannelNorm ，这样它们可以对部分关系进行建模，缓解图卷积网络中的 过度平滑 问题。)
- We demonstrate the effectiveness of our model on two real world datasets. Experimental results show that MEGCN can achieve comparable recommendation performance to graph attention-based models with much less computational cost. (我们在两个真实数据集上展示了我们模型的有效性。实验结果表明，MEGCN能够以更低的计算成本实现与基于图形注意的模型相当的推荐性能。)

2 PRELIMINARIES

2.1 Problem Formalization

Definition 1 (partial relationship). (部分关系)

(1) Unlike graph-based single-label learning tasks , a social recommendation task fundamentally involves a social network as well as a user-item interaction graph. Then, each user from the user-item interaction graph can be regarded as a node with multiple labels (i.e. items), which forms various label-substructures when combined with social network structures. (与 基于图的单标签学习任务 不同，社交推荐任务基本上涉及社交网络和用户项交互图。然后，用户项目交互图中的每个用户都可以被视为一个具有多个标签（即项目）的节点，当与社交网络结构结合时，这些标签会形成各种标签子结构。)
(2) Consequently, self interests and shared interests can simultaneously propagate along the label-induced sub social networks structures, which are named as partial relationships between users. (因此，自我兴趣和共享兴趣可以沿着标签诱导的子社会网络结构同时传播，这种子社会网络结构被称为用户之间的 部分关系。)

Definition 2 (graph with channel-wise sparsity). (具有通道稀疏性的图)

(1) A graph has channel-wise sparsity when element-wise embedding multiplication between two neighbors results in a sparse embedding, of which non-zero dimensions form information passing channels. (当两个邻域之间的元素嵌入相乘得到稀疏嵌入时，图具有通道稀疏性，其中非零维形成信息传递通道。)
(2) As a result, the sparse partial relationships can be modeled via using the sparse channels. (因此，可以通过使用稀疏通道对稀疏部分关系进行建模。)

============

(1) In general, a social network can be described as a set of nodes and links, with nodes representation matrix $X$ and adjacency matrix $A$ containing link weights. The degree matrix $D_{ii} = \sum_j A_{ij}$ represents the sum of all link weights to node $i$ . In consistent with most studies, identity matrix $I$ is used in the graph Laplacian operation and help realize self-loop operation on the graph. (一般来说，社交网络可以描述为一组节点和链接，节点表示矩阵 $X$ 和邻接矩阵 $a$ 包含链接权重。度矩阵 $D_{ii}=\sum_ja_{ij}$ 表示节点 $i$ 的所有链接权重之和。与大多数研究一致，单位矩阵 $I$ 用于 图形拉普拉斯运算 并帮助在图形上实现 自循环运算。)
(2) In a social recommendation system, there are two sets of entities (a user set $U$ and an item set $V$ ) and two graphs (a user-item interaction graph $R$ and a social network $S$ ). (在社交推荐系统中，有两组实体（用户集 $U$ 和项目集 $V$ ）和两个图形（用户项目交互图 $R$ 和社交网络 $S$ ）。)
- Note that the interaction graph $R$ can be an explicit or implicit feedback-based rating graph. (请注意，交互图 $R$ 可以是基于显式或隐式反馈的评分图。)
- In particular, in an implicit user-item interaction scenario (e.g. purchasing an item, voting for a song), we let $r_{ai}$ denote the link weight between user $a$ and item $i$ , and $r_{ai}$ is 1 if user $a$ interacted with item $i$ , otherwise $r_{ai}$ is 0. (特别是，在一个隐式用户项交互场景中（例如，购买一个项目(物品)，为一首歌投票），我们让 $r_{ai}$ 表示用户 $a$ 和项 $i$ 之间的链接权重，如果用户 $a$ 与项 $i$ 交互，则 $r_{ai}$ 为1，否则 $r_{ai}$ 为0。)
- We let $d$ represent the number of channels or dimensions of an entity embedding, (我们让 $d$ 表示实体嵌入的通道数或维度，)
- $M$ denote the number of users in social network. ( $M$ 表示社交网络中的用户数量。)
- and $N$ denote the total number of links between users. ( $N$ 表示用户之间的链接总数。)
(3) Besides, the associated feature matrix $P$ and $Q$ of users (e.g, user profile) and items (item text representation, item visual representation)) are usually provided. (此外，通常会提供用户（例如用用户画像）和项目（项目文本表示、项目视觉表示）的相关特征矩阵 $P$ 和 $Q$ 。)
- Then, the recommendation problem can be formalized as a link prediction task on the graph $R$ : given $R$ , $S$ , $P$ and $Q$ , a social recommendation aims to predict users’ unknown interests to items in $R$ : $\tilde{R} = f(R, S, P, Q)$ :. Given the problem definition, we introduce the preliminaries that are closely related to our proposed model. (然后，推荐问题可以形式化为 链接预测任务 在图 $R$ ：给定 $R$ 、 $S$ 、 $P$ 和 $Q$ ，社交推荐旨在预测用户对 $R$ 中项目的未知兴趣： $R:\tilde{R}=f（R，S，P，Q）$ 。给出了问题的定义，我们介绍了与我们提出的模型密切相关的预备工作。)
(4) Note that social recommendation is a typical graph-based multilabel learning task with one user interacts with multiple items, which needs to model label-induced sub-relations among users. (请注意，社交推荐是一项 典型的基于图形的多标签学习任务 ，一个用户与多个项目交互，需要对 标签诱导的用户子关系进行建模。)
- As explained before, our work aims to model such sparse relationships via channel-wise sparsity. (如前所述，我们的工作旨在通过通道稀疏性对此类稀疏关系进行建模)
- After splitting the embedding of user $u$ into unit chunks [28], each chunk or channel of latent vector $X_u$ can be regarded as an independent information passage in user $u$ ’s ego network $S_u$ , thus modeling the label-substructure relationships in the social network. (在将用户 $u$ 嵌入到单元块中[28]之后，潜在向量 $X_u$ 的每个块或通道可以被视为用户 $u$ 的自我网络 $S_u$ 中的独立信息通道，从而对社交网络中的 标签子结构关系 进行建模.)
- Our work models such relationships in the aforementioned function $f$ . In the rest of the paper, we use $\odot$ to represent element-wise multiplication. (我们的工作在上述函数 $f$ 中对这种关系进行了建模。在本文的其余部分中，我们使用 $\odot$ 表示 元素乘法 。)

2.2 Graph Convolutional Network

(1) Graph Convolutional Network [3] models the graph message aggregation process as spectral convolution, in which node representations $g_\theta = diag(\theta)$ are transformed into Fourier domain with a filter parameterized by $\theta = R^d$ : (图形卷积网络[3]将图形消息聚合过程 建模为 谱卷积，其中节点表示形式 $g\theta=diag(\theta)$ 被转换到 傅立叶域 ，过滤器由 $\theta=R^d$ 参数化：)
- where $U$ is formed by eigenvectors of normalized graph Laplacian matrix $I_N - D^{-\frac{1}{2}} A D^{-\frac{1}{2}} = U \Lambda U^T$ . ( $U$ 由 归一化图拉普拉斯矩阵 的特征向量构成)
- However, eigenvalue decomposition is computationally expensive. (然而，特征值分解的计算代价很高。)
- Therefore, it was suggested that $g_\theta$ could be approximated by a truncated expansion in terms of Chebyshev polynomials $T_k(x)$ up to $K$ -th order [12]: (因此，有人建议， $g_theta$ 可以用截断展开式近似，即 切比雪夫多项式 $T_k（x）$ 到第 $K$ 阶[12]：)
- where $\hat{\Lambda} = \frac{2}{\lambda_{max}} \Lambda - I_N$ denotes the maximum eigenvalue of $L$ , (表示 $L 的最大特征值$ )
- and $\theta'_k$ is the vector of Chebyshev coefficients. (是切比雪夫系数的向量。)
(2) In particular, when $\lambda_{max} \approx 2$ and $K = 1$ , we obtain the first-order form of GCN [3]: (特别是，当 $\lambda_{max} \approx 2$ 且 $K = 1$ 时，我们得到GCN[3]的一阶形式：)
- where free parameters $\theta'_0$ and $\theta'_1$ are shared over the whole graph. (自由参数 $\theta'_0$ and $\theta'_1$ 在整个图中共享。)
(3) In effect, GCN constrains the number of parameters to prevent overfitting and normalizes the updated node vector to alleviate numerical instability problems (i.e. the renormalization trick $I_N + D^{-\frac{1}{2}} A D^{-\frac{1}{2}} \longrightarrow \hat{D}^{-\frac{1}{2}} \hat{A} \hat{D}^{-\frac{1}{2}}$ , with $\hat{A} = A + I_N$ and $\hat{D}_{ii} = \sum_j \hat{A_{ij}}$ ). The final update function for GCN can be formulated as: (实际上，GCN限制参数的数量以防止过度拟合，并对更新的节点向量进行规范化以缓解 数值不稳定问题)
- which can be regarded as an average on the embeddings of first-order neighbors and the central node. This can easily make nodes indistinguishable and thus cause the oversmoothing problem. (这可以看作是一阶邻居和中心节点嵌入的平均值。这很容易使节点难以区分，从而导致 过度平滑 问题。)

2.3 Graph Attention Network

(1) Graph Attention Network [20] changes the operations of graph aggregation and combination in GCN to a weighted summation form by performing self-attention on each connected node pair. The weight between a center node $u$ and a neighbor node $s$ can be measured by: (图注意网络 [20]通过在每个连接的节点对上执行自我注意，将GCN中的图形聚合和组合操作更改为 加权求和 形式。中心节点 $u$ 和邻居节点 $s$ 之间的权重可以通过以下方式测量：)
- where $s i m$ is the similarity measure function, such as cosine similarity in most cases. (其中， $s i m$ 是相似性度量函数，例如大多数情况下的余弦相似性。)
(2) The coefficients are then normalized by softmax function to constrain the transformed value to avoid numerical instability problems: (然后，通过softmax函数对系数进行归一化，以约束转换后的值，以避免数值不稳定问题：)
(3) Then, the updated node embedding in a graph attention layer can be formulated as: (然后，嵌入在图注意层中的更新节点可以表示为)

2.4 Multi-channel Encoding Scheme

(1) Graph Attention Network utilizes cosine similarity metric to measure the importance weight of the first order neighbors. More formally, given central user node $u$ ’s representation $X_u$ and one of its first-order neighbor node $s$ ’s representation $X^s_u$ , the similarity score can be calculated as: ( 图注意网络 利用 余弦相似性度量 来度量一阶邻域的重要性权重。更正式地说，给定中央用户节点 $u$ 的表示形式 $X_u$ 和它的一阶邻居节点 $s$ 的一个表示形式 $X^s_u$ , 相似度得分可计算为：)
- where $s$ denotes one of its neighbors ( $s$ 表示它的一个邻居)
- and $d$ is the number of dimensions or channels. ( $d$ 是维度或通道的数量。)
- After applying the above operation to all first-order neighbors, the similarity values of all neighbor nodes are normalized by softmax activation function to obtain the final node-wise linking weight. (将上述操作应用于所有一阶邻居后，通过softmax激活函数对所有邻居节点的相似性值进行归一化，以获得最终的 节点链接权重 。)
(2) However, the method is computationally expensive due to multiple times of pairwise similarity computations. (然而，由于多次成对相似性计算，该方法的计算成本很高。)
- For instance, for those nodes with hundreds of neighbors, the computational cost is unacceptable. (例如，对于具有数百个邻居的节点，计算成本是不可接受的。)
- With channel-wise sparsity assumption, we could simplify GAT while modeling label-substructure relationships on the social network structure. (在通道稀疏性假设下，我们可以简化GAT，同时在社会网络结构上建模标签子结构关系。)
(3) When graph attention operations are performed on nodes of graph with sparse channel representations, pairwise node similarity can be approximated as: (当在具有稀疏通道表示的图的节点上执行图注意操作时，成对节点相似性可以近似为：)
- where the channels set ${topk\}$ contains the channels with the largest element-wise multiplication values between $u$ and $s$ . (通道集 ${topk\}$ 包含最大元素乘法值在 $u$ 和 $s$ 之间的通道。)
(4) This approximation mainly depends on channel-wise sparsity between two adjoint nodes, (这种近似主要取决于两个伴随节点之间的通道稀疏性，)
- for example, when channel is sparse enough, the equation can be simplified into a single multiplication operation without summation parts. (例如，当通道足够稀疏时，方程可以简化为 一个乘法运算，而不需要求和部分。)
(5) Under such circumstances, linking weight between user $u$ and $s$ can be approximated by: (在这种情况下，用户 $u$ 和 $s$ 之间的链接权重可以近似为：)
(6)Then, the node-wise multiplication between nodes are simplified into element-wise multiplication , and weighted aggregation in GAT can be transformed into: (然后，将节点间的节点乘法简化为元素乘法，GAT中的加权聚合可以转化为：)
(7) Note that the above normalization operation is performed in a channel-wise way while original graph attention mechanism realizes message computation, aggregation and update in a node-wise way. (注意，上述规一化操作是以通道方式执行的，而 原始的图形注意机制 以 节点方式 实现消息计算、聚合和更新。)
- In essence, they have the same mathematical expression. (本质上，它们有相同的数学表达式。)
- However, the channel-wise sparsity ensures that a few channels are sufficient for importance measurement, such that we can perform message computation after message aggregation instead of before it, which largely decreases the computational complexity of original GAT-based models. (然而，信道稀疏性确保了几个信道足以进行重要性度量，因此我们可以在消息聚合之后而不是之前执行消息计算，这大大降低了基于GAT的原始模型的计算复杂度。)
- Besides, the equation can be regarded as controlling influence propagation on only some of the channels, which reflects partial relationships among users. (此外，该方程可被视为仅控制部分信道上的影响传播，这反映了用户之间的部分关系。)
- In particular, it can be regarded as a chunk-based aggregation with chunk size 1, where only limited chunks (channels) transfer information. (特别是，它可以被视为块大小为1的基于块的聚合，其中只有有限的块（通道）传输信息。)
- The framework of MEGCN is shown in Figure 2.
  
  图2:MEGCN模型与GAT模型。MEGCN的动机来自两个方面：1）基于GAT的模型主要适用于单标签的图形学习任务。2）基于GAT的模型通过点积计算节点相似性，计算成本很高。在左边，我们展示了基于GAT的模型在多标签学习中需要多次叠加
  任务。在右边，我们展示了MEGCN的通道稀疏性实现了GAT模型的轻量级版本，同时通过建模标签子结构适用于多标签学习任务。根据元素乘法计算通道相似性。
  
  图3:MEGCN和GAT模型在不同尺寸上的均方误差比较。如（a）所示，当一个图具有通道稀疏性时，两个模型的输出是不可区分的。至于（b），当信道携带足够的稀疏信息来区分不同的邻居时，两个模型的输出是不可区分的。

2.5 Sanity Check and Discussions

(1) To show our model’s motivations, we test the channel-wise sparsity assumption on handcrafted datasets. (为了展示我们模型的动机，我们在手工制作的数据集上测试了通道稀疏性假设)
- First, we generate 100 nodes with each embedding dimension following normal Gaussian distribution. Without the loss of generality, the dimension size is set in the range [16, 32, 64]. (首先，我们生成100个节点，每个嵌入维度遵循 正态高斯分布 。在不丧失通用性的情况下，维度大小设置在[16,32,64]范围内。)
- Then, we form the connected network structure by generating a random variable that follows the uniform distribution. As long as the value of the variable is larger than a sparse rate, which is defined as the percentage of non-zero dimensions of the original node embedding, we add a link between two nodes. (然后，我们通过生成服从均匀分布的随机变量来形成连通网络结构。只要变量的值大于稀疏率，即原始节点嵌入的非零维百分比，我们就在两个节点之间添加链接。)
- Note that channel-wise sparsity in our work means sparsity along a few channels after element-wise multiplication of connected nodes embeddings. Therefore, it is hard to directly generate sparsity after multiplication operation. (注意，在我们的工作中，通道稀疏性是指在连接节点嵌入的元素相乘之后，沿着几个通道的稀疏性。因此，乘法运算后很难直接生成稀疏性。)
- Instead, we use node representation sparsity to approximate channel-wise sparsity. (相反，我们使用节点表示稀疏性来近似信道稀疏性。)
- In particular, we test the effects of different node representation sparsity and adjacency matrix sparsity to show how they affect the results. (特别是，我们测试了不同节点表示稀疏性和邻接矩阵稀疏性的影响，以显示它们如何影响结果。)
(2) As shown in Figure 3(a) , the mean squared error difference between MEGCN output and GAT output with the same sparse node representation is minimal when node sparsity percentage is very low. (如图3(a)所示，当节点稀疏度百分比非常低时，具有相同稀疏节点表示的MEGCN输出和GAT输出之间的 均方误差差 最小。)
- The graph and node embeddings are generated with adjacency matrix sparsity of 0.2, and each node has around 20 neighbors. (图和节点嵌入是在邻接矩阵稀疏度为0.2的情况下生成的，每个节点大约有20个邻居。)
- As for the number of neighbors, when the number of linking channels in each pair is larger than the number of neighbors, GAT and MEGCN achieve the similar performance. The results are shown in Figure 3(b) with node representation sparsity rate of 0.2. (至于邻域数，当每对中的链接通道数大于邻域数时，GAT和MEGCN的性能相似。结果如图3（b）所示，节点表示稀疏率为0.2。)
- However, in most cases, the number of channels can be far less than the number of neighbors, which makes it hard to distinguish node in each channel. In response, we could use neighborhood sampling techniques to help limit the channel-wise information confusion. (然而，在大多数情况下，通道的数量可能远远少于邻居的数量，这使得很难区分每个通道中的节点。作为回应，我们可以使用 邻域采样技术 来帮助 限制通道信息混乱 。)
(3) To this end, we could adopt the same ideas on the social network-based recommendation system, which is a multi-label learning task with extreme label sparsity. (为此，我们可以在基于社交网络的推荐系统上采用相同的想法，这是一个具有 极端标签稀疏性 的 多标签学习任务。)
- The label-substructures induce the partial relationships among users in the original social network. (标签子结构诱导了原始社交网络中用户之间的部分关系。)
- We need develop extra some normalization approaches to ensure the channel-wise sparsity such that personal interests and shared interests can simultaneously propagate along the sparsity channels. (我们需要开发一些额外的规范化方法来确保通道的稀疏性，这样个人兴趣和共享兴趣可以同时沿着稀疏通道传播。)
- We hope the normalization function in Eq. (11) can generate sparse channel links, which requires normalization functions shrink most channel values to zeros while enlarging the non-zero ones. Hence, the equation can generate a sparse update on user representation and further lead to channel-wise sparsity. (我们希望公式（11）中的归一化函数能够生成稀疏信道链路，这要求归一化函数将大多数信道值收缩为零，同时放大非零信道值。因此，该方程可以生成用户表示的稀疏更新，并进一步导致信道稀疏性。)
- In general, softmax function is regarded as useful normalization function. (通常，softmax函数被认为是有用的归一化函数。)
- However, after updating a user’s representation, values among channels will be imbalanced due to the nonlinear normalization nature of softmax function. As a result, it can further lead to gradient explosion or vanishing on some linking channels, as well as overfitting problems. (然而，在更新用户表示后，由于softmax函数的非线性归一化性质，信道之间的值将不平衡。因此，它可能会进一步导致某些连接通道上的 梯度爆炸或消失 ，以及 过拟合问题。)
- Therefore, we should normalize the values in each channel of the node representation after each update step. (因此，我们应该在每个更新步骤之后规范化节点表示的每个通道中的值。)

3 THE PROPOSED MODEL

图4：提出模型的说明。左侧子图显示了整个模型框架，右侧子图显示了影响扩散过程中的MEGCN。特别是对于MEGCN，稀疏影响被用来转换多信道影响和用户的信道稀疏性。需要注意的是，规范化掩码可以被视为社交网络上的一种兴趣分离，以模拟部分关系，它将社交网络划分为每个通道中具有稀疏链接的兴趣子结构。特别是，MEGCN通过元素级产品操作捕获多通道特征相似性。影响力管理的提出是为了保证社会影响力的稀疏性。此外，ChannelNorm设计用于平衡通道值，从而防止数值问题，并增加节点区分，以缓解过平滑问题。

3.1 Model Framework

We introduce MEGCN into the social recommend system and the overall framework is shown in Figure 4. The model is composed of five Layers: Embedding Layer, Fusion Layer, User Action Aggregating Layer, Influence Diffusion Layer, Prediction Layer. We detail each parts as follows: (我们将MEGCN引入社交推荐系统，总体框架如图4所示。该模型由五层组成：嵌入层、融合层、用户行为聚合层、影响扩散层、预测层。我们对每个部分的详细说明如下：)

3.1.1 Embedding Layer.

(1) For each user or item, the original discrete one-hot encoding generates extreme feature sparsity. (对于每个用户或项目，**原始的离散one-hot编码会产生极端的特征稀疏性。)
(2) Hence, we use an embedding layer to encode users and items with corresponding continuous values. Formally, given the one-hot representations of a user or an item, the embedding layer performs an index operation on free user embedding matrix $X$ or item embedding matrix $Y$ . For instance, user $a$ ’s free latent embedding is obtained as: (因此，我们使用嵌入层用相应的连续值对用户和项目进行编码。形式上，给定用户或项目的 one-hot表示 ，嵌入层对自由用户嵌入矩阵 $X$ 或项目嵌入矩阵 $Y$ 执行索引操作。例如，用户 $a$ 的自由潜在嵌入如下所示：)

3.1.2 Fusion Layer.

(1) Besides free embedding, items and users have their associated side information, e.g. text description, which can be regarded as feature embedding. (除了自由嵌入外，项目和用户还有相关的 辅助信息 ，例如文本描述，这可以被视为特征嵌入。)
(2) Fusion layer takes free embedding $X$ , $Y$ ( $X_a$ for user $a$ or $Y_i$ for item $i$ ) and features embedding $P$ , $Q$ ( $P_a$ for user $a$ or $Q_i$ for item $i$ ) as input, and outputs an initial user fusion embedding $h^0_a$ or an item fusion embedding $v_i$ via a fully connected neural network. For instance, the initial fusion embedding for user $a$ is calculated as: (Fusion layer以自由嵌入 $X$ 、 $Y$ （ $X_a$ 用于用户 $a$ 或 $Y_i$ 用于项目 $i$ ）和嵌入 $P$ 、 $Q$ （ $P_a$ 用于用户 $a$ 或 $Q_i$ 用于项目 $i$ ）为输入，并通过一个完全连接的神经网络输出初始用户融合嵌入 $h^0_a$ 或项目融合嵌入 $v_i$ 。例如，用户 $a$ 的初始融合嵌入计算如下：)
- where $W$ and $b$ are the parameters that need to be learned, ( $W$ 和 $b$ 是需要学习的参数，)
- and $\sigma$ is the activation function. ( $\sigma$ 是激活函数。)
(3) Obviously, by setting $W$ and $b$ as an identity matrix and zero vector, respectively, we could simplify the fusion layer into a concatenation operation. Similarly, we can obtain the item fusion embedding $v_i$ for item $i$ . (显然，通过将 $W$ 和 $b$ 分别设置为单位矩阵和零向量，我们可以将融合层简化为串联操作。类似地，我们可以为 $i$ 项获得 $v_i$ 项。)

3.1.3 User Action Aggregating Layer.

(1) We capture a user’s behavior encoding by mean pooling on the fusion embedding of items the user has interacted with in the past. Note that the method was firstly proposed in SVD++ [10]. For instance, we obtain user $a$ ’s behavior encoding $w_a$ by: (我们通过对用户过去交互过的项目进行融合嵌入，通过平均池化来捕获用户的行为编码。请注意，该方法最初是在 SVD++ [10]中提出的。例如，我们通过以下方式获得用户 $a$ 的行为编码 $w_a$ ：)
- where $v_b$ is the item fusion vector for item $b$ in user action history $R_a$ , ( $v_b$ 是用户操作历史 $R_a$ 中 $b$ 项的项融合向量，)
- and $w_a$ denotes behavior encoding of user $a$ learned from history actions. ( $w_a$ 表示用户 $a$ 从历史动作中学习到的行为编码。)

3.1.4 Influence Diffusion Layer.

(1) By feeding user interest into the influence diffusion layer, we model the propagation dynamics of users’ interest in social network $S$ . (通过将用户兴趣输入影响扩散层，我们对社交网络 $S$ 中用户兴趣的动态传播进行建模 )
- For each user $a$ , we use $h^k_a$ to represent interest after $k$ hops of dynamic influence diffusion in the social network. (对于每个用户 $a$ ，我们使用 $h^k_a$ 表示在社交网络中进行 $k$ 次动态影响扩散后的兴趣。)
- Each diffusion layer contains three operations: (每个扩散层包含三个操作：)
  - selecting neighbors to diffuse information, (选择邻居去传播信息，)
  - aggregating neighbor influence （聚合邻居的影响）
  - and combining self interest with neighbors’ influence [30]. (将自身兴趣与邻居的影响结合起来)
- To capture partial relationships among users in the social network, we propose MEGCN in the information diffusion process. (为了捕捉社交网络中用户之间的部分关系，我们在信息扩散过程中提出了MEGCN。)
- In MEGCN, we use channel-wise similarity to measure the importance of social influence in each channel from a neighbor. (在MEGCN中，我们使用通道相似性来衡量邻居在每个渠道中的社会影响力的重要性。)
- Each dimension of $h_a$ can be regarded as a channel of an independent interest. ( $h_a$ 的每个维度可以看作是一个独立兴趣的通道。) 这篇文章的着力点
- For instance, the interest of color and shape could be stored in the first and second channel respectively. (例如，颜色和形状的兴趣可以分别存储在第一和第二通道中。)
- Besides, we assume users’ interests on some channels are inherently stable, which means users will selectively absorb information from neighbors with partial relationships. (此外，我们假设用户在某些渠道上的兴趣本质上是稳定的，这意味着用户会有选择地从部分关系的邻居那里吸收信息。)
- Therefore, with sparse channel representation, we could capture interest similarity on different channels between connected nodes. The proposed model puts the message computation part after message aggregation part instead of before it in other GNN models. (因此，通过稀疏信道表示，我们可以捕获连接节点之间不同信道上的兴趣相似性。在其他GNN模型中，该模型将消息计算部分放在消息聚合部分之后，而不是放在消息计算部分之前。)

3.1.4.1 Aggregating

The model can be regarded as a subgraph information diffusion on each channel, where nodes distinguish from each other by controlling the sparse linking rate. (该模型可以看作是 每个通道上 的子图信息扩散，其中节点通过控制稀疏链接速率来区分彼此。)
We perform a linear transformation on aggregated information from neighbors $\tilde{a}$ in user’s ego network $S_a$ to obtain neighbor influence vector $c^k$ : (我们对来自用户自己网络 $S_a$ 中邻居 $\tilde{a}$ 的聚合信息执行线性变换，以获得邻居影响向量 $c^k$ ：)

3.1.4.2 InfluenceNorm

(1) InfluenceNorm first conducts elemen-twise multiplication to identify important channels from the neighborhood influence vector. (InfluenceNorm首先执行elemen twise乘法，从邻域影响向量中识别重要通道。)
(2) Then, after normalization, InfluenceNorm generates a sparse channel-wise mask that determines importance weight of each channel in different label-substructure relations. (然后，在标准化之后，InfluenceForm生成一个 稀疏的通道掩码 ，该掩码确定不同标签子结构关系中每个通道的重要性权重。)
(3) In effect, inspired by graph attention network, which uses softmax to normalize the importance weight between each node pair, we use softmax as the normalization function. The sparse influence is computed as: (实际上，受图形注意网络(该网络使用softmax规范化每个节点对之间的重要性权重)的启发，我们使用softmax作为规范化函数。稀疏影响的计算公式为：)
(4) We add a residual part $ac^k$ to avoid gradient vanishing problem and a hyperparameter $\alpha$ to control the weight. (我们添加了一个残差部分 $ac^k$ 以避免 梯度消失 问题，和一个超参数 $\alpha$ 来控制权重。)

3.1.4.3 ChannelNorm

(1) After performing softmax operation in InfluenceNorm, we obtain node embeddings with highly imbalanced channel values. (在InfluenceNorm中执行softmax操作后，我们获得了 通道值高度不平衡 的节点嵌入。)
(2) Hence, the model may face numerically unstable problems in some channels, which will lead to a local minimum. (因此，该模型在某些通道中可能面临数值不稳定问题，这将导致局部极小值。)
Meanwhile, channel-wise sparsity will divide the original social network into multi-channel label-substructures, (同时，通道稀疏性将原始社会网络划分为多渠道标签子结构)
- i.e. partial relationships, which can be better modeled with a larger substructure diversity or channel-wise distinction. (例如.部分关系，可以用更大的子结构多样性或通道差异更好地建模。)
(3) Hence, we adopt the ideas from PairNorm [41] to control the value imbalance between channels, and propose ChannelNorm: (因此，我们采用PairNorm[41]的思想来控制通道之间的价值不平衡，并提出了 ChannelNorm)
- where $\gamma$ and $\beta$ represent trainable channel-wise scale and shift vectors in ChannelNorm operation, (表示ChannelNorm操作中 可训练的通道尺度 和 移位向量，)
- and $\delta$ is the hyperparameter that controls the normalized variance. (是控制标准化方差的超参数。)
(4) We add such parameters for a better channel-wise distinction. After updating sparse influence with ChannelNorm, users preserve their own interests while accepting sparse influence from neighbors, which can preserve necessary distinction between each node pair, thus alleviating the oversmoothing problem. (我们添加这些参数是为了更好地区分通道。在使用ChannelNorm更新稀疏影响后，用户在接受邻居的稀疏影响的同时保留了自己的兴趣，这可以在每个节点对之间保留必要的区别，从而缓解 过度平滑 问题。)

3.1.5 Prediction Layer.

After $K$ hops of social influence propagation, we obtain each user’s final embedding by combining information via influence diffusion and user action aggregating. The final predicted rating is measured by the multiplication of the two latent vectors. (在社交影响传播的 $K$ 跳之后，我们通过影响扩散和用户行为聚合来组合信息，从而获得每个用户的最终嵌入。最终的预测评级通过两个潜在向量的乘积来衡量。)

3.1.6 Loss Function.

We use the same pair-wise ranking-based loss function in [16, 30]: (我们在[16,30]中使用了相同的 基于成对排序的损失函数 ：)
- where $D_a = \{(i, j) | i \in R_a \land j \in V - R_a\}$ denotes the pairwise training data for user $a$ with user history action set $R_a$ (表示用户 $a$ 和用户历史操作集 $R_a$ 的成对训练数据)
- and $\sigma(x)$ is the sigmoid function,

3.2 Training

In this section, we introduce some implementation details in the model training part. (在本节中，我们将在模型培训部分介绍一些实现细节。)

3.2.1 Mini-Batch Training.

In practice, to avoid huge computational consumption, we divide training data into batches following the procedure that each user’s training records are ensured in the same mini-batch, which helps avoid the repeated computation of each user a’s latent embedding $h^k_a$ in the iterative influence diffusion layers [30]. For every epoch, we shuffle the training set to generate different batches. (在实践中，为了避免巨大的计算消耗，我们按照确保每个用户的训练记录在同一个小批量中的过程将训练数据分成多个批次，这有助于避免重复计算每个用户a的潜在嵌入 $h^k_a$ 在迭代影响扩散层中[30]。对于每个时代，我们都会对训练集进行打乱，以生成不同的批次。)

3.2.2 Negative Sampling.

Since we only observe positive feedbacks in the original two datasets, we use the negative sampling technique to obtain pseudo negative feedbacks at each iteration in the training process, with the assumption that all items without implicit feedbacks in training data will have equal probability to be selected as a negative sample. (由于我们只在原始的两个数据集中观察到正反馈，我们使用负采样技术在训练过程中的每次迭代中获得 伪负反馈 ，假设所有在训练数据中没有隐含反馈的项目被选为负样本的概率相等。)

3.2.3 Dropout and Regularization.

Since the MEGCN uses sparse influence to update users’ interest, the model can easily run into a local minima. (由于MEGCN使用稀疏影响来更新用户的兴趣，因此该模型很容易陷入局部极小值)
Hence, we use feature-wise dropout between different diffusion layers. (因此，我们在不同的扩散层之间使用 基于特征的dropout。)
To reduce the effects of channel information mixture when the number of neighbors is huge, we also apply dropout on adjacency matrix, which can be regarded as a sampling among neighbors, such that users update their interests based on a subset of neighbors. (为了减少邻居数量巨大时信道信息混合的影响，我们还对邻接矩阵应用了dropout，邻接矩阵可以被视为邻居之间的抽样，这样用户可以根据邻居的子集更新兴趣。)
As for the regularization, we use L2-regularization on both user and item embeddings with the same regularization rate. (对于正则化，我们在用户和项目嵌入上使用L2正则化，并且正则化率相同。)

3.3 Discussion

3.3.1 Time Complexity Analysis.

(1) Compared with traditional recommendation models, the main additional time costs of the social recommendations occur in the layer-wise influence diffusion process. (与传统的推荐模型相比，社会推荐的额外时间成本主要发生在 分层影响扩散过程中。)
(2) More specifically, linear transform operation in GCN only costs $O(MKd^2)$ , (更具体地说，GCN中的线性变换操作只需花费 $O(MKd^2)$ ，)
- where $M$ is the number of users in graph, ( $M$ 是图中的用户数)
- $K$ is the step of influence diffusion, ( $K$ 是影响力扩散的步数)
- and $d$ is the length of features or channels. ( $d$ 是特征或通道的长度。)
(3) MEGCN uses sparse influence to capture multi-channel distinction between social influence and self interests, which costs an additional time complexity $O (M K d)$ compared with GCN . ( MEGCN使用 稀疏影响 来捕捉社会影响和个人兴趣之间的 多通道区别 ，与 GCN 相比，这需要额外的时间复杂度 $O （ M K d ）$ 。)
However, GAT needs to computs an attention score between each pair of connected nodes, thus it is the most time consuming algorithm with computational complexity $O(M^2 K d^2)$ in full version and $O(MNK d^2)$ in sparse version ( $N$ denotes links number), which is around ten times larger than other models. (然而，GAT需要计算每对连接节点之间的注意分数，因此它是最耗时的算法，完整版本的计算复杂度为 $O(M^2 K d^2)$ ,稀疏版本的计算复杂度为 $O(MNK d^2)$ （ $N$ 表示链路数），大约是其他模型的十倍。)

4 EXPERIMENTS

In this section, we evaluate our methods via the experiments on two real-world recommendation tasks. (在本节中，我们通过在两个实际推荐任务上的实验来评估我们的方法。)

4.1 Data Preparation

We evaluate the performance of our model on two recommendation datasets with social network: Yelp and Flickr. (我们在两个具有社交网络的推荐数据集：Yelp和Flickr上评估了我们的模型的性能。)
These two datasets are provided in DiffNet [30] and have been explained in detail. (DiffNet[30]中提供了这两个数据集，并对其进行了详细解释。)
In the data preparation step, we filter out users with less than 2 historical action records and 2 social neighbors in both datasets. (在数据准备阶段，我们筛选出两个数据集中历史动作记录和社交邻居少于2个的用户。)
Then the datasets for experiments are built as follows. We randomly divide 10% of datasets as test set, 10% as validation set, and the rest 80% as the training set. The detail statistics of the two datasets after preprocessing are shown in Table 1. (然后，实验数据集如下所示。我们随机将10%的数据集划分为测试集，10%为验证集，其余80%为训练集。预处理后的两个数据集的详细统计数据如表1所示。)

4.2 Baselines

Since the social recommendation task involves social networks, we compare MEGCN with various state-of-the-art social network-based graph neural networks: (由于社交推荐任务涉及社交网络，我们将MEGCN与各种最先进的基于社交网络的图形神经网络进行比较：)

SVD++ [13]. SVD++ only utilizes user-item interaction graph without any other side information, and it can be seen as the baseline model for recommendation systems without social networks. (SVD++只利用用户项交互图，没有任何其他方面的信息，可以看作是没有社交网络的推荐系统的基线模型。)
TrustSVD [10]. TrustSVD is the first model to use social network on recommendation systems. The model only aggregates one hop neighborhood information without other similarity assumption. (TrustSVD是第一个在推荐系统中使用社交网络的模型。该模型只聚合一跳邻域信息，没有其他相似性假设。)
DiffNet [30]. Compared with TrustSVD, DiffNet incorporates multi-hop dynamic diffusion layers. The method uses first-order approximation of graph spectral network.
GCN [3]. Compared with Diffnet, GCN adds a renormalization trick on the aggregating operation. (与Diffnet相比，GCN在聚合操作上增加了一个重归一化技巧。)
GAT [20]. GAT leverages an attention layer to calculate the correlation score between two neighbor nodes and sets it as the rate of information aggregation. (GAT利用注意层计算两个相邻节点之间的相关性得分，并将其设置为信息聚合速率。)
DualGAT [31]. DualGAT extends social effects from user domain to item domain and leverages dual graph attention mechanism to collaboratively learn representations for static and dynamic social effects. Similar works can be found in [29] and [7]. In essence, their work can model item (label)-substructure by incorporating user-item interactions in the information diffusion process of the social network. Other works [42, 43] that utilize attention mechanism without social network effects are not shown in our experiments. (DualGAT将社会效果从用户域扩展到项目域，并利用双图注意机制协作学习静态和动态社会效果的表示。类似的作品可以在[29]和[7]中找到。本质上，他们的工作可以通过在社交网络的信息扩散过程中加入用户项目交互来模拟项目（标签）子结构。我们的实验中没有展示其他利用注意机制而没有社交网络效应的作品[42,43]。)

4.3 Evaluation Metrics

As our work focuses on recommending top-N items, we use two ranking-based evaluation metrics, which are Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG). The two metrics are defined as: (由于我们的工作重点是推荐排名前N的项目，我们使用了两个基于排名的评估指标，即命中率（HR）和标准化贴现累积收益（NDCG）。这两个指标定义为：)

4.3.1 Hit Ratio

measures the number of items that the user likes n the test data that has been successfully predicted in the top-N ranking list. It can be given as: (测量用户喜欢的项的数量n在前n个排名列表中成功预测的测试数据。可给出如下公式：)

4.3.2 Normalized Discounted Cumulative Gain

considers the hit positions of the items and gives a higher score if the hit items in the top positions [19]. (考虑项目的命中位置，如果命中项目位于顶部位置，则会给出更高的分数[19])

4.4 Results Summary

4.4.1 Parameter Setting.

For all the models that are based on the latent factor models, we initialize the latent vectors with small random values. (对于所有基于潜在因子模型的模型，我们用小的随机值初始化潜在向量。)
In the model learning process, we use Adam as the optimizing method for gradient descent based methods with an initial learning rate of 0.001. (在模型学习过程中，我们使用Adam作为基于梯度下降的方法的优化方法，初始学习率为0.001。)
And the batch size is set as 512.
For the parameter settings, we set the embedding dimension size as 64.
Moreover, we adjust some baseline models to make sure all the models have consistent embedding dimensions.
In our model, there are four hyperparameters to be fine-tuned.
Learning rate $l r$ is selected from $[1 e - 1, 1 e - 2, 1 e - 3, 1 e - 4, 1 e - 5]$ ,
normalization variance $\delta$ is searched in the range of [1,2,4,8],
l2 normalization rate $\lambda$ is selected from $[1 e - 2, 2 e - 2, 3 e - 2, 4 e - 2]$ ,
and residual rate $\alpha$ is searched in the range of [0.1,0.2,0.3,0.4,0.5].
We fine-tune those hyperparameters to get the best results on validation set.
The values of two metrics in the test datasets are obtained after 400 epochs.

4.4.2 Overall Comparison.

(1) Table 2 compares the average HR and NDCG of the benchmarks with our MEGCN model. (表2比较了基准的平均HR和NDCG与我们的MEGCN模型。)
(2) First, we can observe that our model outperforms others in all two datasets, which indicates that channel-wise sparsity-based model performs better than traditional graph attention models. (首先，我们可以观察到，我们的模型在所有两个数据集中都优于其他模型，这表明基于通道稀疏性的模型比传统的图形注意模型表现更好。)
(3) Second, the performance of GCN is better than DiffNet in most cases, which indicates that the stacking of layers with more parameters can enhance the performance of the model. (第二，在大多数情况下，GCN的性能优于DiffNet，这表明具有更多参数的层的叠加可以提高模型的性能。)
(4) Third, we find GAT models outperform other Laplacian Smoothing-based models, which proves the importance of distinguishing different neighbors. (第三，我们发现GAT模型优于其他基于拉普拉斯平滑的模型，这证明了区分不同邻域的重要性。)
In addition, dualGAT achieves the best performance among baseline models since it considers both static and dynamic social influence. (此外，dualGAT在基线模型中表现最好，因为它同时考虑了 静态和动态社会影响 。)

4.4.3 Partial Relationship Analysis.

To find out how channel-wise sparsity in MEGCN helps improve model performance, we conduct extra experiments using Flickr dataset. (为了了解MEGCN中的通道稀疏性如何帮助提高模型性能，我们使用Flickr数据集进行了额外的实验。)
We obtain users’ representations after two hops of influence diffusion based on our model. (基于我们的模型，我们在两跳影响扩散后获得用户的表示。)
As shown in Figure 5, about 30% of channels perform like GCN with scale weight around 1, and the others are typical MEGCN outputs with sparse influence update. (如图5所示，大约30%的通道的性能类似于GCN，标度权重约为1，其他通道是典型的MEGCN输出，具有稀疏影响更新。)
Based on our assumption in equation 10, they model partial relationships with enough channel-wise sparsity. (基于我们在等式10中的假设，它们以足够的信道稀疏性来模拟部分关系。)
As a result, MEGCN assigns different weights to different channels for a better recommendation performance. (因此，MEGCN为不同的渠道分配不同的权重，以获得更好的推荐性能)

4.4.4 Oversmoothing Problem.

(1) We compare models performance after multi-hop influence diffusion. (我们比较了多跳影响扩散后模型的性能。)
(2) One of the disadvantages of GCN models is the oversmoothing problem, which refers to the phenomenon that as nodes dynamically aggregate information from their neighbors, nodes will become similar with each other after multiple hops of diffusion. (GCN模型的缺点之一是 过度平滑 问题，它指的是当节点动态地聚集来自其邻居的信息时，节点在经过多跳扩散后会变得彼此相似的现象。)
(3) We test our model with three ascending diffusion depths, which are shown to have increasingly negative effects on recommendation performance in [1]. Nevertheless, as illustrated in Table 3, our model can achieve consistent prediction performance even after many hops of information aggregation, which shows that our model can alleviate oversmoothing problem in the dynamic influence diffusion process with InfluenceNorm and ChannelNorm. (我们用三个上升的扩散深度来测试我们的模型，在[1]中，这三个扩散深度对推荐性能的负面影响越来越大。然而，如表3所示，我们的模型即使在多次信息聚合之后也能实现一致的预测性能，这表明我们的模型可以缓解带有InfluenceNorm和ChannelNorm的动态影响扩散过程中的过度平滑问题。)
(4) In sum, the model can separate multi-channel interests in the social influence context. (总之，该模型可以在社会影响背景下分离多渠道利益。)

4.4.5 Time Cost and Channel Independence.

We list the feature correlation scores and training/testing time costs for different models in Table 4. (我们在表4中列出了不同模型的特征相关分数和培训/测试时间成本。)
On one hand, we can observe that MEGCN achieves around 10 times computational efficiency compared with advanced attention-based GNN models, such as GAT and DualGAT, while generating a similar lower correlation score. It indicates MEGCN better models partial relationship in different channels. (一方面，我们可以观察到，与先进的基于注意的GNN模型（如GAT和DualGAT）相比，MEGCN的计算效率大约是后者的10倍，同时产生了类似的较低的相关分数。这表明MEGCN能更好地模拟不同渠道的部分关系。)
On the other hand, it displays similar time costs compared with traditional GCN models. (另一方面，它显示出与传统GCN模型类似的时间成本。)
In particular, the feature-wise correlation matrix of user interests generated by DiffNet and MEGCN are shown in Figure 6. The lighter color heatmap directly shows the lower correlation score between features via MEGCN model. (特别是，DiffNet和MEGCN生成的用户兴趣特征相关矩阵如图6所示。通过MEGCN模型，浅色热图直接显示了特征之间较低的相关性得分。)
Better channel independence stands for a more reliable partial relationship. (更好的渠道独立性意味着更可靠的部分关系。)
Both MEGCN and DualGAT achieve a good performance on such metric, which indicate that MEGCN can model partial relationship like GAT-based models. (MEGCN和DualGAT在这种度量上都取得了很好的性能，这表明MEGCN可以像基于GAT的模型一样建模部分关系。)

5 RELATED WORK

5.1 Social Recommendation

Social effects have been widely exploited in recommendation systems to solve data sparsity problems. (在推荐系统中，社会效应被广泛用于解决 数据稀疏 问题。)
Fundamentally, current studies can be divided into similarity-based methods and model-based methods. (从根本上讲，目前的研究可以分为 基于相似性 的方法和 基于模型 的方法。)
Especially, most current methods leverage deep neural networks [2, 6, 21, 26, 30, 39] to capture nonlinear information. (尤其是，目前大多数方法都利用 深度神经网络 [2,6,21,26,30,39]来捕捉非线性信息。)
However, they either assume social influence dominates the user behavior or are too computationally expensive. Note that, besides direct social effects, various peer relations [24, 34] can be contructed to improve representation learning and recommendation performance. (然而，他们要么认为社会影响主导了用户行为，要么计算成本太高。请注意，除了直接的社会影响外，还可以构建各种 同伴关系 [24,34]，以提高表征学习和推荐表现。)

5.2 Graph Neural Networks

(1) Traditional graph embedding methods [8, 9, 17] and original Graph Convolution Network [3] mainly focus on embedding nodes from a single fixed graph, thus lacking the ability to generalize to unseen nodes. (传统的 图嵌入 方法[8,9,17]和原始 图卷积网络 [3]主要关注于从单个固定图中嵌入节点，因此缺乏推广到不可见节点的能力。)
(2) Then, [11, 22, 36] proposed to aggregate graph information in an inductive way, and the dynamic structure and feature information aggregation mechanism have helped achieve impressive performance in graph-involved tasks. (然后，[11,22,36]提出以 归纳的方式 聚合图形信息，动态结构 和 特征信息聚合机制 有助于在涉及图形的任务中获得令人印象深刻的性能。)
(3) However, in essence, Graph Convolution Network is a special form of Laplace Smoothing [14], which is a spectrum method that constrains neighborhood to be similar. Therefore, it is not appropriate to directly utilize GCN and its variants [11, 36] directly in social recommendation task since they tend to attach equal importance weights to neighbors. (然而，在本质上，图卷积网络 是 拉普拉斯平滑 的一种特殊形式[14]，这是一种 *频谱方法，限制邻域相似 。因此，在社会推荐任务中直接使用GCN及其变体[11,36]是不合适的，因为它们 倾向于对邻居同等重视。)
(4) One possible solution is to utilize graph attention network [20] as it can learn the latent embeddings of each node by attending to its neighbors following a self-attention strategy. (一种可能的解决方案是利用 图注意网络[20]，因为它可以通过遵循 自我注意策略 关注其邻居来学习每个节点的 潜在嵌入 。)
Nevertheless, it is computationally expensive in large scale networks. (然而，在大规模网络中，它的计算成本很高。)
Besides, various works have proposed to leverage substructures [22, 23, 25] information in graph-based learning models. (此外，各种研究已经提出在基于图的学习模型中利用子结构 [22,23,25]信息。)

5.3 Multi-label Learning

(1) Multi-label learning [40] has been widely applied to different machine learning tasks, such as text categorization [15, 18], image annotation [27]. (多标签学习[40]已广泛应用于不同的机器学习任务，如 文本分类 [15,18]， 图像注释 [27]。)
(2) However, in most cases, a multi-label learning task faces empty relevant label problem [32] and then model performance can be largely affected when incomplete multi-labels are used for learning. (然而，在大多数情况下，多标签学习任务面临空相关标签问题 [32]，当使用不完整的多标签进行学习时，模型性能会受到很大影响。)
(3) [5] proposes to handle such problems by learning from semi-supervised weak-label data. ([5] 提出了利用 半监督弱标签数据 来处理这类问题。)
(4) Inspired by [5], since graph-based models are typical semi-supervised, multi-task learning can be realized according to graph-based models. [37] is one of the earliest works applying multi-label learning on recommendation tasks. (受[5]的启发，由于基于图的模型是典型的半监督模型，因此可以根据基于图的模型实现多任务学习。[37]是最早将多标签学习应用于推荐任务的著作之一。)

6 CONCLUSION

(1) In this paper, we propose MEGCN, a graph neural network based on channel-wise sparsity. (本文提出了一种基于信道稀疏性的图神经网络MEGCN。)
(2) MEGCN simplifies GAT operation and utilizes two models, InfluenceNorm and ChannelNorm, to capture both self interest and shared interest in the influence diffusion process of social recommendation task. (MEGCN简化了GAT的操作，并利用两种模型，即InfluenceNorm和ChannelNorm，在社会推荐任务的影响扩散过程中捕获了 个人兴趣 和 共享兴趣 。)
(3) Essentially, our work can model the sparse label-induced structures in the original social network, namely partial relationship, without suffering from expensive computational cost of graph attention based models. (本质上，我们的工作可以对原始社会网络中的稀疏标签诱导结构（即部分关系）进行建模，而不必承受基于图形注意模型的昂贵计算成本。)
(4) Finally, experimental results validate the effectiveness of the proposed models. In particular, the MEGCN achieves the highest HR and NDCG score on all datasets. (最后，实验结果验证了所提模型的有效性。特别是，MEGCN在所有数据集上的HR和NDCG得分最高。)
(1) Future research will be conducted from both theoretical and practical perspectives. Theoretically, we will explore how to further improve the performance of the MEGCN models from the optimization point of view. (未来的研究将从理论和实践两个角度进行。理论上，我们将从优化的角度探讨如何进一步提高MEGCN模型的性能。)
(2) Besides, since graph sub-structure has been validated to be useful for graph modeling [22, 25], we will explore the usefulness of such information in the social recommendation task. (此外，由于图的子结构已被证实对图建模有用[22,25]，我们将探讨这些信息在社会推荐任务中的有用性。)
(3) In practice, we will apply the method to other graph-based multi-label tasks and test whether it it applicable for all kinds of tasks. (在实践中，我们将把该方法应用于其他基于图的多标签任务，并测试它是否适用于所有类型的任务。)

ACKNOWLEDGMENTS

REFERENCES

你可能感兴趣的:(#,Social,Rec,深度学习,人工智能,推荐系统)

Neo4j 的向量搜索（Neo4jVector）和常见的向量数据库（比如 Milvus、Qdrant）之间的区别与联系
先说联系（共同点）点内容✅都支持向量检索都可以基于embedding（向量）做相似度搜索，比如给一段文本、找出最相似的若干条记录。✅都用于语义检索你可以把它们用在RAG（检索增强生成）、ChatwithDocs、智能问答、推荐系统等应用里。✅都支持批量插入、查询都可以批量向数据库中插入文本+向量，然后用向量做top-k检索（如search(k=8)）。✅都和LangChain集成它们都可以通过la
开源模型应用落地-OpenAI Agents SDK-集成MCP与Qwen3-8B模型的创新应用探索（七）开源技术探险家开源模型-实际应用落地开源 python ai 人工智能
一、前言在人工智能技术飞速发展的今天，如何将先进的模型和技术无缝结合，成为推动行业变革的关键。OpenAIAgents通过集成模型上下文协议（MCP）和阿里巴巴推出的Qwen3-8B模型，正开启一场智能应用的革命。这种创新的结合不仅提升了AI代理与外部工具之间的通信能力，还在多模态任务处理、个性化服务等领域展现出巨大潜力。本文将深入探讨这一技术组合的实际应用场景，揭示其在改善客户体验和提升运营效率
开源模型应用落地-OpenAI Agents SDK-集成Qwen3-8B-探索output_guardrail的创意应用（六）开源技术探险家开源模型-实际应用落地开源 python ai 人工智能
一、前言随着人工智能技术的迅猛发展，大语言模型（LLM）在各行各业的应用日益广泛。然而，模型生成的内容是否安全、合规、符合用户预期，成为开发者和企业不可忽视的问题。为此，OutputGuardrail应运而生，作为一种关键的安全机制，它在模型生成结果之后进行内容审核与过滤，确保输出不偏离道德、法律和业务规范。通过检测不当的内容，不仅提升了AI系统的可信度，也为构建更加稳健和负责任的人工智能应用提供
什么是深度学习框架中的计算图？杰瑞学AI Computer knowledge NLP/LLMs AI/AGI 深度学习人工智能 pytorch
在深度学习框架中，计算图是核心的数据结构和抽象概念，它用来表示和定义深度学习模型的计算过程。我们可以把它想象成一个描述数学运算如何组合和执行的有向图。以下是计算图的关键要素和作用：节点：代表操作或变量。操作：数学运算，如加法(+)、乘法(*)、矩阵乘法(matmul)、激活函数(ReLU,sigmoid)、卷积(conv2d)、损失函数(cross_entropy)等。变量：通常是张量，即存储数据
开源模型应用落地-让AI更懂你的每一次交互-Mem0集成Qdrant、Neo4j与Streamlit的创新实践（四）开源技术探险家开源模型-实际应用落地 neo4j 开源人工智能语言模型
一、前言在人工智能迅速发展的今天，如何让AI系统更懂“你”？答案或许藏在个性化的记忆管理之中。Mem0作为一个开源的记忆管理系统，正致力于为AI赋予长期记忆与个性化服务能力。通过结合高性能向量数据库Qdrant、图数据库Neo4j的强大关系分析能力以及Streamlit的高效可视化交互，我们可以打造出一个既能存储用户历史行为、又能实时推理并展示结果的智能记忆助手。本文将带您一步步探索这一技术组合的
【优秀文章】7月优秀文章推荐
优秀文章智能自主运动体与人工智能技术——环境感知、SLAM定位、路径规划、运动控制、多智能体协同作者：fpga和matlabC++之红黑树认识与实现作者：zzh_zao【手把手带你刷好题】–C语言基础编程题(十)作者：草莓熊Lotso飞算JavaAI：从“码农”到“代码指挥官”的终极进化论作者：可涵不会debug前端网页开发学习（HTML+CSS+JS）有这一篇就够！作者：一颗小谷粒
蛋白质结构预测/功能注释/交互识别/按需设计，中国海洋大学张树刚团队直击蛋白质智能计算核心任务 hyperai
蛋白质作为生命活动的主要承担者，在人体生理功能中扮演关键角色。然而传统研究面临结构解析成本高昂、功能注释严重滞后、新型蛋白质设计效率低下等挑战。近年来，生命科学对蛋白质复杂特性解析的需求日益迫切，大数据、深度学习、多模态计算等技术的突破性发展，为构建蛋白质智能计算体系提供了全新的发展契机。蛋白质智能计算体系的构建，使得蛋白质在大规模功能注释、交互预测及三维结构建模等领域取得显著成果，为药物发现与生
【心灵鸡汤】深度学习技能形成树：从零基础到AI专家的成长路径全解析智算菩萨人工智能深度学习
引言：技能树的生长哲学在这个人工智能浪潮汹涌的时代，深度学习犹如一棵参天大树，其根系深深扎入数学与计算科学的沃土，主干挺拔地承载着机器学习的核心理念，而枝叶则繁茂地延伸至计算机视觉、自然语言处理、强化学习等各个应用领域。对于初入此领域的新手而言，理解这棵技能树的生长规律，掌握其形成过程中的关键节点和发展阶段，将直接决定其在人工智能道路上能够走多远、攀多高。技能树的概念源于游戏设计，但在学习深度学习
推荐算法（推广搜）——广告和推荐有什么不同？
导语近几年新兴起一个行业：推广搜。即推荐、广告、搜索算法的简称。各大厂都隐隐将其作为公司核心技术来发展。此文将带领大家探秘广告和推荐有什么区别以及其相似处。再此强调一下，广告算法里面的推荐广告和自然推荐结果里的推荐系统进行对比，但因为广告算法里面还有“搜索广告”，搜索广告和推荐系统差异性就太大了，这里不做讨论。一、不同点1.1本质不同推荐广告和自然推荐本质中要处理的群体和衡量的利益完全不一样。（图
推荐与广告区别 ActionReaction
TheDifferencebetweenaRecommendationandanAdAquickthoughtregardingFacebook’snewSocialAdsplatform.Arecommendationissomethingyougetfromsomeonewhoknowssomethingaboutyou.Theyhaveseenanitemofinterestandthoug
MongoDB + Voyage AI 详解：重塑数据库与AI的协同范式 csdn_tom_168 NoSQL 数据库 mongodb 人工智能 AI
MongoDB+VoyageAI详解：重塑数据库与AI的协同范式2025年2月，MongoDB官方宣布收购VoyageAI，这一举措标志着数据库与人工智能技术的深度融合迈入新阶段。通过整合VoyageAI的先进AI检索与嵌入模型能力，MongoDB旨在重新定义AI时代的数据库架构，为企业构建智能应用提供端到端的数据基础设施。一、收购背景与技术战略1.行业趋势驱动AI数据挑战：随着生成式AI与大语言
HarmonyOS5.0仓颉引擎与盘古大模型：个性化作业批改系统架构设计与实现 H老师带你学鸿蒙系统架构 HarmonyOS5.0 鸿蒙华为仓颉教育
人工智能与边缘计算的融合正在重塑教育评价体系。本文将展示如何基于HarmonyOS5.0仓颉并发引擎和盘古大模型，构建新一代智能作业批改系统。系统架构全景graphTDA[学生端设备]-->|提交作业|B[仓颉边缘处理]B-->C[盘古大模型分析]C-->D[个性化反馈生成]D-->E[学生终端]D-->F[教师仪表盘]subgraphHarmonyOS分布式系统B-->|设备协同|G[教室平板集
知识图谱的个性化智能教学推荐系统(论文+源码) 毕设工作室_wlzytw python论文项目知识图谱人工智能
目录摘要Abstract目录第1章绪论1.1研究背景及意义1.2国内外研究现状1.2.1知识图谱1.2.2个性化推荐系统1.3本文研究内容及创新点1.4全文组织结构第2章相关理论与技术概述2.1知识图谱2.1.1知识图谱的介绍与发展2.1.2知识图谱的构建2.3协同过滤推荐算法2.2.1推荐算法概述2.2.2Pearson相关系数2.2.3Spearman相关系数2.4Bert模型和Albert模
阿里云瑶池数据库 Data Agent for Meta 正式发布，让 AI 更懂你的业务！数据库观点资讯人工智能
背景随着生成式人工智能（GenerativeAI）从概念验证迈向规模化商业落地，AIAgent已成为企业核心业务流程的重要组成部分。然而，当模型调用日益便捷时，核心痛点已不再是模型本身，而是集中在一个关键要素上：数据。AIAgent的落地瓶颈已从技术能力转向高质量、高相关性、安全合规的数据供给。企业面临的核心挑战在于：数据孤岛导致知识库分散，通用大模型难以理解专业业务传统数据管理依赖人工开发维护，
【TVM 教程】如何处理 TVM 报错
ApacheTVM是一个深度的深度学习编译框架，适用于CPU、GPU和各种机器学习加速芯片。更多TVM中文文档可访问→https://tvm.hyper.ai/运行TVM时，可能会遇到如下报错：---------------------------------------------------------------AnerroroccurredduringtheexecutionofTVM.F
【PaddleOCR】OCR文本检测与文本识别数据集整理，持续更新......
博主简介：曾任某智慧城市类企业算法总监，目前在美国市场的物流公司从事高级算法工程师一职，深耕人工智能领域，精通python数据挖掘、可视化、机器学习等，发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人工智能领域的优质创作者，提供AI相关的技术咨询、项目开发和个性化解决方案等服务，如有需要请站内私信或者联系任意文章底部的的VX名片（ID：xf982831907）博主粉丝群介绍：①群内初中生、
多模态大模型的技术应用与未来展望：重构AI交互范式的新引擎 zhaoyi_he 重构人工智能
一、引言：为什么多模态是AI发展的下一场革命？过去十年，深度学习推动了计算机视觉和自然语言处理的飞跃，但两者的发展路径长期割裂。随着生成式AI和大模型时代的到来，**多模态大模型（MultimodalFoundationModels）**以统一的建模方式处理图像、文本、音频、视频等多源数据，重塑了“感知-认知-决策”链条，为AGI迈出关键一步。OpenAI的GPT-4o、Google的Gemini
使用 C++ 实现 MFCC 特征提取与说话人识别系统 whoarethenext c++开发语言 mfcc 语音识别
使用C++实现MFCC特征提取与说话人识别系统在音频处理和人工智能领域，C++凭借其卓越的性能和对硬件的底层控制能力，在实时音频分析、嵌入式设备和高性能计算场景中占据着不可或缺的地位。本文将引导你了解如何使用C++库计算核心的音频特征——梅尔频率倒谱系数(MFCCs)，并进一步利用这些特征构建一个说话人识别（声纹识别）系统。Part1:在C/C++中计算MFCCs直接从零开始实现MFCC的所有计算
ImportError: /nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4 爱编程的喵喵 Python基础课程 python ImportError torch nvJitLink 解决方案
大家好，我是爱编程的喵喵。双985硕士毕业，现担任全栈工程师一职，热衷于将数据思维应用到工作与生活中。从事机器学习以及相关的前后端开发工作。曾在阿里云、科大讯飞、CCF等比赛获得多次Top名次。现为CSDN博客专家、人工智能领域优质创作者。喜欢通过博客创作的方式对所学的知识进行总结与归纳，不仅形成深入且独到的理解，而且能够帮助新手快速入门。本文主要介绍了ImportError:/home/
【机器学习&深度学习】多分类评估策略一叶千舟深度学习【理论】深度学习【应用必备常识】大数据人工智能
目录前言一、多分类3大策略✅宏平均（MacroAverage）✅加权平均（WeightedAverage）✅微平均（MicroAverage）二、类比理解2.1宏平均（MacroAverage）2.1.1计算方式2.1.2适合场景2.1.3宏平均不适用的场景2.1.4宏平均一般用在哪些指标上？2.1.5怎么看macroavg指标？2.1.6宏平均值低说明了什么？2.1.7从宏平均指标中定位模型短板
网络安全相关专业总结（非常详细）零基础入门到精通，收藏这一篇就够了网络安全工程师教学兼职副业黑客技术网络安全 web安全安全人工智能网络运维
一、网络工程专业专业内涵网络工程是指按计划进行的以工程化的思想、方式、方法，设计、研发和解决网络系统问题的工程，一般指计算机网络系统的开发与构建。该专业培养具备计算机科学与技术学科理论基础，掌握网络技术领域专业知识和基本技能，在计算机、网络及人工智能领域的工程实践和应用方面受到良好训练，具有深厚通信背景、可持续发展、能力较强的高水平工程技术人才。学生可在计算机软硬件系统、互联网、移动互联网及新一代
Linux下Redis安装配置全攻略（2024最新版）「已注销」 linux redis 运维
手残党也能搞定的Redis安装指南还在为Linux安装Redis发愁？（别问我怎么知道的）今天这个保姆级教程绝对能让你爽到飞起！从零开始到完全可用只要10分钟，连小白都能轻松上手！（信我，真的）环境准备（超级重要）先确认你的Linux发行版（敲黑板！）：#查看系统信息cat/etc/os-release推荐系统：Ubuntu20.04/22.04LTSCentOS7/8RockyLinux8/9安
大语言模型应用指南：ReAct 框架 AI大模型应用实战 java python javascript kotlin golang 架构人工智能
大语言模型应用指南：ReAct框架关键词：大语言模型,ReAct框架,自然语言处理(NLP),模型融合,多模态学习,深度学习,深度学习框架1.背景介绍1.1问题由来近年来，深度学习技术在自然语言处理(NLP)领域取得了显著进展。尤其是大语言模型(LargeLanguageModels,LLMs)，如BERT、GPT系列等，通过在大规模无标签数据上进行预训练，获得了强大的语言理解和生成能力。然而，预
大语言模型原理基础与前沿基于语言反馈进行微调 AI天才研究院计算 AI大模型企业级应用开发实战 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
大语言模型原理基础与前沿基于语言反馈进行微调作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着深度学习技术的飞速发展，自然语言处理（NLP）领域取得了显著的进展。大语言模型（LargeLanguageModels，LLMs）如GPT-3、BERT等在各项NLP任务上取得了令人瞩目的成绩。然而，如何进一步提高大语言模型的理
《北京市加快推动“人工智能+医药健康“创新发展行动计划（2025-2027年）》深度解读
引言随着新一轮科技革命和产业变革的深入推进，人工智能技术与医药健康的深度融合已成为全球科技创新的重要方向。北京市于2025年7月正式发布《北京市加快推动"人工智能+医药健康"创新发展行动计划（2025-2027年）》，旨在充分发挥北京在人工智能技术策源、头部医疗资源汇聚、健康数据高度富集等方面的突出优势，构建形成"人工智能+医药健康"创新和应用并举的产业生态体系，打造具有国际影响力的创新策源地、应
「源力觉醒创作者计划」_文心大模型开源：开启 AI 新时代的大门小黄编程快乐屋人工智能
在人工智能的浩瀚星空中，大模型技术宛如一颗璀璨的巨星，照亮了无数行业前行的道路。自诞生以来，大模型凭借其强大的语言理解与生成能力，引发了全球范围内的技术变革与创新浪潮。百度宣布于6月30日开源文心大模型4.5系列，这一消息如同一颗重磅炸弹，在AI领域掀起了惊涛骇浪，其影响之深远，意义之重大，足以改写行业的发展轨迹。百度这次放大招，直接把文心大模型4.5开源了，这操作就像往国内AI圈子里空投了一个超
四种微调技术详解：SFT 监督微调、LoRA 微调、P-tuning v2、Freeze 监督微调方法
当谈到人工智能大语言模型的微调技术时，我们进入了一个令人兴奋的领域。这些大型预训练模型，如GPT-3、BERT和T5，拥有卓越的自然语言处理能力，但要使它们在特定任务上表现出色，就需要进行微调，以使其适应特定的数据和任务需求。在这篇文章中，我们将深入探讨四种不同的人工智能大语言模型微调技术：SFT监督微调、LoRA微调方法、P-tuningv2微调方法和Freeze监督微调方法。第一部分：SFT监
2023年搜索领域的技术认证与职业发展指南搜索引擎技术搜索引擎 ai
2023年搜索领域的技术认证与职业发展指南关键词搜索领域、技术认证、职业发展、搜索引擎技术、人工智能搜索摘要本指南旨在为搜索领域的从业者和有志于进入该领域的人士提供全面的技术认证与职业发展参考。首先介绍搜索领域的概念基础，包括其历史发展和关键问题。接着阐述相关理论框架，分析不同认证背后的原理。架构设计部分展示搜索系统的组成与交互。实现机制探讨算法复杂度和代码优化。实际应用部分给出实施和部署策略。高
探索AI人工智能医疗NLP实体识别系统的架构设计 AI学长带你学AI 人工智能自然语言处理 easyui ai
探索AI人工智能医疗NLP实体识别系统的架构设计关键词：人工智能、医疗NLP、实体识别、系统架构、深度学习、自然语言处理、医疗信息化摘要：本文将深入探讨医疗领域NLP实体识别系统的架构设计。我们将从基础概念出发，逐步解析医疗文本处理的特殊性，详细介绍实体识别技术的核心原理，并通过实际案例展示如何构建一个高效可靠的医疗实体识别系统。文章还将探讨当前技术面临的挑战和未来发展方向，为医疗AI领域的从业者
AI智能体原理及实践：从概念到落地的全链路解析 you的日常人工智能大语言模型人工智能机器学习深度学习神经网络自然语言处理
AI智能体正从实验室走向现实世界，成为连接人类与数字世界的桥梁。它代表了人工智能技术从"知"到"行"的质变，是能自主感知环境、制定决策、执行任务并持续学习的软件系统。在2025年，AI智能体已渗透到智能家居、企业服务、医疗健康、教育和内容创作等领域，展现出强大的生产力与创造力。然而，其发展也伴随着技术挑战、伦理困境和安全风险，需要从架构设计到落地应用的全链条思考与平衡。一、AI智能体的核心定义与技
[星球大战]阿纳金的背叛 comsci
本来杰迪圣殿的长老是不同意让阿纳金接受训练的......... 但是由于政治原因,长老会妥协了...这给邪恶的力量带来了机会所以......现代的地球联邦接受了这个教训...绝对不让某些年轻人进入学院
看懂它，你就可以任性的玩耍了！ aijuans JavaScript
javascript作为前端开发的标配技能，如果不掌握好它的三大特点：1.原型 2.作用域 3. 闭包 ,又怎么可以说你学好了这门语言呢？如果标配的技能都没有撑握好，怎么可以任性的玩耍呢？怎么验证自己学好了以上三个基本点呢，我找到一段不错的代码，稍加改动，如果能够读懂它，那么你就可以任性了。 function jClass(b
Java常用工具包 Jodd Kai_Ge java jodd
Jodd 是一个开源的 Java 工具集，包含一些实用的工具类和小型框架。简单，却很强大！写道 Jodd = Tools + IoC + MVC + DB + AOP + TX + JSON + HTML < 1.5 Mb Jodd 被分成众多模块，按需选择，其中工具类模块有： jodd-core &nb
SpringMvc下载 120153216 springMVC
@RequestMapping(value = WebUrlConstant.DOWNLOAD) public void download(HttpServletRequest request,HttpServletResponse response,String fileName) { OutputStream os = null; InputStream is = null;
Python 标准异常总结 2002wmj python
Python标准异常总结 AssertionError 断言语句（assert）失败 AttributeError 尝试访问未知的对象属性 EOFError 用户输入文件末尾标志EOF（Ctrl+d） FloatingPointError 浮点计算错误 GeneratorExit generator.close()方法被调用的时候 ImportError 导入模块失
SQL函数返回临时表结构的数据用于查询 357029540 SQL Server
这两天在做一个查询的SQL，这个SQL的一个条件是通过游标实现另外两张表查询出一个多条数据，这些数据都是INT类型，然后用IN条件进行查询，并且查询这两张表需要通过外部传入参数才能查询出所需数据，于是想到了用SQL函数返回值，并且也这样做了，由于是返回多条数据，所以把查询出来的INT类型值都拼接为了字符串，这时就遇到问题了，在查询SQL中因为条件是INT值，SQL函数的CAST和CONVERST都
java 时间格式化 | 比较大小| 时区个人笔记 7454103 java eclipse tomcat c MyEclipse
个人总结！不当之处多多包含！引用 1.0 如何设置 tomcat 的时区：位置：(catalina.bat---JAVA_OPTS 下面加上) set JAVA_OPT
时间获取Clander的用法 adminjun Clander 时间
/** * 得到几天前的时间 * @param d * @param day * @return */ public static Date getDateBefore(Date d,int day){ Calend
JVM初探与设置 aijuans java
JVM是Java Virtual Machine（Java虚拟机）的缩写，JVM是一种用于计算设备的规范，它是一个虚构出来的计算机，是通过在实际的计算机上仿真模拟各种计算机功能来实现的。Java虚拟机包括一套字节码指令集、一组寄存器、一个栈、一个垃圾回收堆和一个存储方法域。 JVM屏蔽了与具体操作系统平台相关的信息，使Java程序只需生成在Java虚拟机上运行的目标代码（字节码）,就可以在多种平台
SQL中ON和WHERE的区别 avords
SQL中ON和WHERE的区别数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。 www.2cto.com 在使用left jion时，on和where条件的区别如下： 1、 on条件是在生成临时表时使用的条件，它不管on中的条件是否为真，都会返回左边表中的记录。
说说自信 houxinyou 工作生活
自信的来源分为两种,一种是源于实力,一种源于头脑.实力是一个综合的评定,有自身的能力,能利用的资源等.比如我想去月亮上,要身体素质过硬,还要有飞船等等一系列的东西.这些都属于实力的一部分.而头脑不同,只要你头脑够简单就可以了!同样要上月亮上,你想,我一跳,1米,我多跳几下,跳个几年,应该就到了!什么?你说我会往下掉?你笨呀你!找个东西踩一下不就行了吗? 无论工作还
WEBLOGIC事务超时设置 bijian1013 weblogic jta 事务超时
系统中统计数据，由于调用统计过程，执行时间超过了weblogic设置的时间，提示如下错误：统计数据出错! 原因：The transaction is no longer active - status: 'Rolling Back. [Reason=weblogic.transaction.internal
两年已过去，再看该如何快速融入新团队 bingyingao java 互联网融入架构新团队
偶得的空闲，翻到了两年前的帖子该如何快速融入一个新团队，有所感触，就记下来，为下一个两年后的今天做参考。时隔两年半之后的今天，再来看当初的这个博客，别有一番滋味。而我已经于今年三月份离开了当初所在的团队，加入另外的一个项目组，2011年的这篇博客之后的时光，我很好的融入了那个团队，而直到现在和同事们关系都特别好。大家在短短一年半的时间离一起经历了一
【Spark七十七】Spark分析Nginx和Apache的access.log bit1129 apache
Spark分析Nginx和Apache的access.log，第一个问题是要对Nginx和Apache的access.log文件进行按行解析，按行解析就的方法是正则表达式： Nginx的access.log解析正则表达式 val PATTERN = """([^ ]*) ([^ ]*) ([^ ]*) (\\[.*\\]) (\&q
Erlang patch bookjovi erlang
Totally five patchs committed to erlang otp, just small patchs. IMO, erlang really is a interesting programming language, I really like its concurrency feature. but the functional programming style
log4j日志路径中加入日期 bro_feng java log4j
要用log4j使用记录日志，日志路径有每日的日期，文件大小5M新增文件。实现方式 log4j: <appender name="serviceLog" class="org.apache.log4j.RollingFileAppender"> <param name="Encoding" v
读《研磨设计模式》-代码笔记-桥接模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 个人觉得关于桥接模式的例子，蜡笔和毛笔这个例子是最贴切的：http://www.cnblogs.com/zhenyulu/articles/67016.html * 笔和颜色是可分离的，蜡笔把两者耦合在一起了：一支蜡笔只有一种
windows7下SVN和Eclipse插件安装 chenyu19891124 eclipse插件
今天花了一天时间弄SVN和Eclipse插件的安装，今天弄好了。svn插件和Eclipse整合有两种方式，一种是直接下载插件包，二种是通过Eclipse在线更新。由于之前Eclipse版本和svn插件版本有差别，始终是没装上。最后在网上找到了适合的版本。所用的环境系统：windows7JDK：1.7svn插件包版本：1.8.16Eclipse：3.7.2工具下载地址：Eclipse下在地址：htt
[转帖]工作流引擎设计思路 comsci 设计模式工作应用服务器 workflow 企业应用
作为国内的同行，我非常希望在流程设计方面和大家交流，刚发现篇好文(那么好的文章，现在才发现，可惜)，关于流程设计的一些原理，个人觉得本文站得高，看得远，比俺的文章有深度，转载如下 ================================================================================= 自开博以来不断有朋友来探讨工作流引擎该如何
Linux 查看内存，CPU及硬盘大小的方法 daizj linux cpu 内存硬盘大小
一、查看CPU信息的命令 [root@R4 ~]# cat /proc/cpuinfo |grep "model name" && cat /proc/cpuinfo |grep "physical id" model name : Intel(R) Xeon(R) CPU X5450 @ 3.00GHz model name :
linux 踢出在线用户 dongwei_6688 linux
两个步骤： 1.用w命令找到要踢出的用户，比如下面： [root@localhost ~]# w 18:16:55 up 39 days, 8:27, 3 users, load average: 0.03, 0.03, 0.00 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
放手吧,就像不曾拥有过一样 dcj3sjt126com
内容提要：静悠悠编著的《放手吧就像不曾拥有过一样》集结“全球华语世界最舒缓心灵”的精华故事，触碰生命最深层次的感动，献给全世界亿万读者。《放手吧就像不曾拥有过一样》的作者衷心地祝愿每一位读者都给自己一个重新出发的理由，将那些令你痛苦的、扛起的、背负的，一并都放下吧！把憔悴的面容换做一种清淡的微笑，把沉重的步伐调节成春天五线谱上的音符，让自己踏着轻快的节奏，在人生的海面上悠然漂荡，享受宁静与
php二进制安全的含义 dcj3sjt126com PHP
PHP里，有string的概念。 string里，每个字符的大小为byte（与PHP相比，Java的每个字符为Character，是UTF8字符，C语言的每个字符可以在编译时选择）。 byte里，有ASCII代码的字符，例如ABC，123，abc，也有一些特殊字符，例如回车，退格之类的。特殊字符很多是不能显示的。或者说，他们的显示方式没有标准，例如编码65到哪儿都是字母A，编码97到哪儿都是字符
Linux下禁用T440s，X240的一体化触摸板(touchpad) gashero linux ThinkPad 触摸板
自打1月买了Thinkpad T440s就一直很火大，其中最让人恼火的莫过于触摸板。 Thinkpad的经典就包括用了小红点(TrackPoint)。但是小红点只能定位，还是需要鼠标的左右键的。但是自打T440s等开始启用了一体化触摸板，不再有实体的按键了。问题是要是好用也行。实际使用中，触摸板一堆问题，比如定位有抖动，以及按键时会有飘逸。这就导致了单击经常就
graph_dfs hcx2013 Graph
package edu.xidian.graph; class MyStack { private final int SIZE = 20; private int[] st; private int top; public MyStack() { st = new int[SIZE]; top = -1; } public void push(i
Spring4.1新特性——Spring核心部分及其他 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
配置HiveServer2的安全策略之自定义用户名密码验证 liyonghui160com
具体从网上看 http://doc.mapr.com/display/MapR/Using+HiveServer2#UsingHiveServer2-ConfiguringCustomAuthentication LDAP Authentication using OpenLDAP Setting
一位30多的程序员生涯经验总结 pda158 编程工作生活咨询
1.客户在接触到产品之后，才会真正明白自己的需求。　　这是我在我的第一份工作上面学来的。只有当我们给客户展示产品的时候，他们才会意识到哪些是必须的。给出一个功能性原型设计远远比一张长长的文字表格要好。 2.只要有充足的时间，所有安全防御系统都将失败。　　安全防御现如今是全世界都在关注的大课题、大挑战。我们必须时时刻刻积极完善它，因为黑客只要有一次成功，就可以彻底打败你。 3.
分布式web服务架构的演变自由的奴隶 linux Web 应用服务器互联网
最开始，由于某些想法，于是在互联网上搭建了一个网站，这个时候甚至有可能主机都是租借的，但由于这篇文章我们只关注架构的演变历程，因此就假设这个时候已经是托管了一台主机，并且有一定的带宽了，这个时候由于网站具备了一定的特色，吸引了部分人访问，逐渐你发现系统的压力越来越高，响应速度越来越慢，而这个时候比较明显的是数据库和应用互相影响，应用出问题了，数据库也很容易出现问题，而数据库出问题的时候，应用也容易
初探Druid连接池之二——慢SQL日志记录 xingsan_zhang 日志连接池 druid 慢SQL
由于工作原因，这里先不说连接数据库部分的配置，后面会补上，直接进入慢SQL日志记录。 1.applicationContext.xml中增加如下配置： <bean abstract="true" id="mysql_database" class="com.alibaba.druid.pool.DruidDataSourc