是Yu欸

20源代码模型的数据增强方法：克隆检测、缺陷检测和修复、代码摘要、代码搜索、代码补全、代码翻译、代码问答、问题分类、方法名称预测和类型预测对论文进行分组【网安AIGC专题11.15】

Data Augmentation Approaches for Source Code Models: A Survey

写在最前面
- 对nlp领域其他方向的启发
- - 英文版：
- 论文名片
- 论文总结
- 一个有意思的表
1.背景Background
- 1.1什么是源代码模型?What are source code models？
- 1.2什么是数据增强?What is data augmentation?
- 1.3数据增强如何在源代码中工作?How does data augmentation work in source code?
- - 挑战Challenges
  - 常见的方法Common approach
2.源代码模型的数据增强方法Data Augmentation Methods for Source Code Models
- 2.1基于规则的技术Rule-based Techniques
- - (1)程序的基本语法Basic program syntax
  - (2)更深层次的结构信息Deeper structure information
  - (3)增强自然语言语境Augmenting the natural language context
- 2.2基于模型的技术Model-based Techniques
- - (1)利用现有模型Utilization of existing models
  - (2)针对源代码模型专门设计的模型Specifically designed models for source code models
- 2.3插值技术示例Example Interpolation Techniques
- 2.3插值技术示例Example Interpolation Techniques
3.优化数据分析质量的策略与技术Strategies and Techniques to Optimize the DA Quality
- 3.1方法堆叠Method Stacking
- 3.2优化Optimization
- - (1)概率选择Probabilistic Selection
  - (2)基于模型的选择Model-based Selection
  - (3)基于规则的选择Rule-based Selection
4.将数据处理应用于源代码的场景Scenarios for Applying DA to Source Code
- 4.1鲁棒性的对抗性示例Adversarial Examples for Robustness
- 4.2低资源域Low-Resource Domains
- 4.3检索增强Retrieval Augmentation
- 4.4对比学习Contrastive Learning
5.将数据处理应用于源代码的下游任务Downstream Tasks for Applying DA to Source Code
- 5.1代码作者归属Code Authorship Attribution
- 5.2克隆检测Clone Detection
- 5.3缺陷检测Defect Detection
- 5.4代码汇总Code Summarization
- 5.5查找代码Code Search
- 5.6代码完成Code Completion
- 5.7代码翻译Code Translation
- 5.8代码问答Code Question Answering (CQA)
- 5.9代码分类Code Classification
- 5.10方法名称预测Method Name Prediction
- 5.11类型预测Type Prediction
6.挑战与机遇Challenges and Opportunities
- 6.1理论讨论 Discussion on theory
- 6.2对预训练模型的进一步研究 More study on pre-trained models
- 6.3处理特定于领域的数据 Working with domain-specific data
- 6.4对项目级源代码和低资源编程语言的更多探索 More exploration on project-level source code and low-resource programming languages
- 6.5缓解社会偏见 Mitigating social bias
- 6.6少射学习 Few-shot learning
- 6.7缺乏统一性Lack of unification

写在最前面

本文为邹德清教授的《网络安全专题》课堂笔记系列的文章，本次专题主题为大模型。

一位同学分享了Data Augmentation Approaches for Source Code Models: A Survey
《源代码模型的数据增强方法：综述》

全英文PPT，又学了很多专业术语
英文排版好好看，感觉这位同学是直接阅读的英文文献，然后根据论文做的PPT
~~希望三年~~ 争取一年内，我也能直接阅读（所在领域的）英文文献，然后吸收转化

论文：https://arxiv.org/pdf/2305.19915.pdf
代码：https://github.com/terryyz/DataAug4Code
感觉很有意思，或许可以直接去读一下

“据我们所知，我们的论文构成了第一个全面的调查，提供了对源代码模型的数据处理技术的深入研究。”
“To the best of our awareness, our paper constitutes the first comprehensive survey offering an in-depth examination of DA techniques for source code models.”

对nlp领域其他方向的启发

这篇论文的调查，通过展示源代码模型中的先进技术和新应用，强调了语言模型中上下文、鲁棒性和适应性的重要性。以下是一些要点：

理解复杂结构:正如调查在源代码中探索更深层次的结构信息一样，可以从复杂语言结构的增强分析中受益。用于分析复杂源代码结构的技术，可以用于理解自然语言中细微的语法和句法元素。
增强上下文分析:该调查的重点是增强源代码中的自然语言上下文，这可以提高对对话、文学文本或细微语言使用中的语境的理解。
利用预训练模型:探索利用源代码的现有模型可以反映在NLP中，可以集中在如何对预训练的语言模型(如GPT或BERT)进行微调或调整以适应特定的语言任务或语言。
方法叠加和优化:调查中关于方法叠加和优化的讨论，以创新的方式结合多种方法(如标记化、语义分析等)来提高模型性能。
对抗性示例和鲁棒性:对抗性示例在源代码模型中的鲁棒性应用可以启发NLP中的类似方法，以创建更健壮和有弹性的语言模型，特别是针对不断发展的语言模式和对抗性攻击。
低资源语言关注:该调查对源代码中低资源领域的关注，可以更多地关注低资源语言，开发技术来增强数据稀疏的语言模型。
对比学习:将调查中的对比学习应用于NLP，可以更好地消除歧义并理解语言使用中的细微差异和相似性，从而增强情感分析或文本分类等任务。
减轻社会偏见:源代码模型中减轻社会偏见的重点直接适用于NLP，探索如何使语言模型更加公平，减少偏见，确保公平和公正的语言理解。
探索Few-Shot Learning:受该调查的启发，NLP可以深入研究Few-Shot Learning技术，以构建从有限数据中有效学习的模型，这对于稀有语言或特定语言现象至关重要。
跨学科应用:该调查在源代码中数据增强的各种应用可以激励NLP研究人员寻求跨学科应用，例如法律文本分析，文学研究或跨语言理解。

英文版：

This survey, by showcasing advanced techniques and novel applications in source code models, emphasizing the importance of context, robustness, and adaptability in language models.Here are some key takeaways:

Understanding Complex Structures: Just as the survey explores deeper structure information in source code, NLP can benefit from enhanced analysis of complex linguistic structures. Techniques used for analyzing intricate source code structures could be adapted to understand nuanced grammatical and syntactic elements in natural language.
Augmenting Contextual Analysis: The survey’s focus on augmenting the natural language context in source code can inspire NLP researchers to develop more sophisticated context-aware models. This could improve the understanding of context in conversations, literary texts, or nuanced language use.
Leveraging Pre-Trained Models: The exploration of utilizing existing models for source code can be mirrored in NLP. Research can focus on how pre-trained language models (like GPT or BERT) can be fine-tuned or adapted for specific linguistic tasks or languages.
Method Stacking and Optimization: The survey’s discussion on method stacking and optimization can encourage NLP researchers to combine multiple methodologies (like tokenization, semantic analysis, etc.) in innovative ways to enhance model performance.
Adversarial Examples and Robustness: The application of adversarial examples for robustness in source code models can inspire similar approaches in NLP to create more robust and resilient language models, especially against evolving linguistic patterns and adversarial attacks.
Low-Resource Language Focus: The survey’s attention to low-resource domains in source code could encourage NLP researchers to focus more on low-resource languages, developing techniques to enhance language models where there is sparse data.
Contrastive Learning: Applying contrastive learning from the survey to NLP can lead to better disambiguation and understanding of subtle differences and similarities in language usage, enhancing tasks like sentiment analysis or text classification.
Mitigating Social Bias: The focus on mitigating social bias in source code models is directly applicable to NLP. Research can explore how language models can be made more equitable and less biased, ensuring fair and unbiased language understanding.
Exploring Few-Shot Learning: Inspired by the survey, NLP can delve into few-shot learning techniques to build models that learn effectively from limited data, which is crucial for rare languages or specific linguistic phenomena.
Cross-Disciplinary Applications: The survey’s diverse applications of data augmentation in source code can inspire NLP researchers to seek cross-disciplinary applications, such as in legal text analysis, literary studies, or cross-lingual understanding.

论文名片

发现数据增强的力量:深入源代码模型中数据增强的变革世界。
公布的技术:探索基于规则、基于模型和创新的插值方法。
实际应用:了解这些技术如何在代码完成、缺陷检测等现实场景中应用。
未来方向:发现在推进源代码分析和增强方面的挑战和机遇。

英文版：

Discover the Power of Data Augmentation: Dive into the transformative world of data augmentation in source code models.
Techniques Unveiled: Explore rule-based, model-based, and innovative interpolation methods.
Practical Applications: Learn how these techniques are applied in real-world scenarios like code completion, defect detection, and more.
Future Directions: Uncover the challenges and opportunities in advancing source code analysis and augmentation.

论文总结

本文在源代码的背景下全面分析了数据增强技术。
The paper comprehensively analyzes data augmentation techniques in the context of source code.

本文首先阐述了数据增强的概念及其作用。
The paper first explains the concept of data augmentation and its function.

然后，本文考察了源代码研究中常用的主要数据增强方法，并探讨了典型源代码应用程序和任务的增强方法。
The paper then examines the primary data augmentation methods commonly employed in source code research and explores augmentation approaches for typical source code applications and tasks.

最后，本文总结了当前该领域面临的挑战，并提出了未来源代码研究的潜在方向。
Finally, the paper conclude by outlining the current challenges in the field and suggesting potential directions for future source code research.

一个有意思的表

Table 1: Comparing a selection of DA methods by various aspects relating to their applicability, dependencies,and requirements. PL, NL, TA, LA, EI, Prob, Tok, and KWE stand for Programming Language, Natural Language,Example Interpolation, Probability, Tokenization, Keyword Extraction, Task-Agnostic, and Language-Agnostic.

PL and NL determine if the DA method is applied to the programming language or natural language context. Preprocess denotes preprocessing required besides the program parsing.

Parsing refers to the type of feature used by the DA method during program parsing. Level denotes the depth at which data is modified by the DA.

TA and LA represent whether the DA method can be applied to different tasks or programming languages. As most papers do not clearly state if their DA methods are TA and LA, we subjectively denote the applicability.

（见下表）PL、NL、TA、LA、EI、Prob、Tok和KWE分别代表：编程语言、自然语言、样例插值、概率、标记化、关键字提取、任务不可知论和语言不可知论。
PL和NL 确定数据处理方法是应用于编程语言还是应用于自然语言上下文。
预处理 是指除程序解析外，还需要进行的预处理。
解析 是指DA方法在程序解析过程中使用的特征类型。
级别 表示数据处理对数据进行修改的深度。
TA和LA 表示数据分析方法是否可以应用于不同的任务或编程语言。
由于大多数论文没有明确说明他们的数据分析方法是TA还是LA，我们主观地表示适用性。

缩写 (Abbreviation)	英文全称 (English Full Form)	中文全称 (Chinese Full Form)
PL	Programming Language	编程语言
NL	Natural Language	自然语言
TA	Example Interpolation, Probability	样例插值、概率
LA	Tokenization, Keyword Extraction	标记化、关键字提取
EI	Task Agnostic	任务不可知论
Prob	Probabilistic	概率
Tok	Tokenization	标记化
KWE	Keyword Extraction	关键字提取

表1：通过与它们的适用性、依赖性和需求相关的各个方面来比较数据处理方法的选择。

1.背景Background

1.1什么是源代码模型?What are source code models？

源代码模型 是在大规模的源代码语料库上训练的，因此能够对给定代码片段的上下文表示进行建模。
Source code models are trained on large-scale corpora of source code and therefore able to model the contextual representations of given code snippets.

1.2什么是数据增强?What is data augmentation?

数据增强(Data augmentation, DA)技术 旨在通过数据合成来增加训练样本的多样性，从而提高模型在各个方面的性能(例如，准确性和鲁棒性)。
Data augmentation (DA) techniques aim to improve the model’s performance in terms of various aspects (e.g., accuracy and robustness) via increasing training example diversity with data synthesis.

1.3数据增强如何在源代码中工作?How does data augmentation work in source code?

挑战Challenges

与图像和纯文本相比，由于严格的编程语法规则的性质，源代码在扩展方面不太灵活。
Compared to images and plain texts, source code is less flexible to be augmented due to the nature of strict programming syntactic rules.

源代码的数据分析方法应该保留原始代码片段的功能和语法，以便能够成功编译增强的代码片段。
DA approaches for source code should preserve the functionality and syntax of the original code snippets so that the enhanced code snippets can be successfully compiled.

常见的方法Common approach

使用解析器从代码构建具体的语法树，并进一步将其转换为抽象语法树(AST)，以简化表示，但保留关键信息。
Use a parser to build a concrete syntax tree from the code, and further transform it into an abstract syntax tree (AST) to simplify the representation but maintain the key information.

2.源代码模型的数据增强方法Data Augmentation Methods for Source Code Models

2.1基于规则的技术Rule-based Techniques

大量的数据分析方法利用预先确定的规则在不破坏语法规则和语义的情况下对程序进行转换。
A large number of DA methods utilize predetermined rules to transform the programs without breaking syntax rules and semantics.

具体地说，这些规则主要隐式地利用ast来转换代码片段。
Specifically, these rules mainly implicitly leverage ASTs to transform the code snippets.

(1)程序的基本语法Basic program syntax

(2)更深层次的结构信息Deeper structural information

(3)增强自然语言语境 Augmenting the natural language context

原文就是高糊版本的

(1)程序的基本语法Basic program syntax

MHM是一种在代码段中迭代重命名标识符的方法。
MHM is a method of iteratively renaming identifiers in the code snippets.
作为生成对抗性训练示例的方法，MHM极大地提高了源代码模型的鲁棒性。
Considered as the approach to generate examples for adversarial training, MHM greatly improves the robustness of source code models.
Srikant等人认为程序混淆是一种对抗性扰动，他们重命名程序变量，试图向读者隐藏程序的意图。
Srikant et al. consider program obfuscations as adversarial perturbations, where they rename program variables in an attempt to hide the program’s intent from a reader.
通过将这些扰动样例应用到训练阶段，源代码模型对对抗性攻击变得更加健壮。
By applying these perturbed examples to the training stage, the source code models become more robust to the adversarial attack.
BUGLABAug包含了更多的规则，同时强调了编程语言和自然语言，如注释删除、比较表达式镜像和if-else分支交换。
BUGLABAug contains more rules, emphasizing both the programming language and natural language, such as comment deletion, comparison expression mirroring, and if-else branch swapping.
对BUGLABAug的评估表明，DA方法可以用于自我监督的错误检测和修复。
The evaluation on BUGLABAug demonstrates that DA methods can be exploited for self-supervised bug detection and repair.
ranspiler使用编译器转换作为数据增强，自动生成等效函数的数据集。
Transpiler uses compiler transforms as data augmentation, automatically generating a dataset of equivalent functions.
具体来说，它们通过利用程序的ast定义了11个编译器转换。
Specifically, they define 11 compiler transforms by exploiting ASTs of the programs.

(2)更深层次的结构信息Deeper structure information

通过AST和控制流图(CGF)、使用定义链(UDG)和声明引用映射(DRM)的组合实现的三种不同的增强方案
Three different augmentation schemes via the combination of AST and control-flow graph (CGF) and use-define chains (UDG) and declaration-reference mapping (DRM)

控制转换Control Transformations
控制转换重写控制流语句或修改函数之间的控制流。
Control Transformations rewrite control-flow statements or modifies the control flow between functions.
此转换包括将变量作为函数参数传递、更新其值以及更改调用方和被调用方的控制流。
This transformation involves passing variables as function arguments, updating their values, and changing the control flow of the caller and callee.
声明转换Declaration Transformations
声明转换由14个转换器组成，它们修改、添加或删除源代码中的声明。
Declaration Transformations consist of 14 transformers that modify, add or remove declarations in source code.
声明转换使得数据处理必须更新变量的所有用法，这可以使用DRM表示优雅地执行。
Declaration Transformations make DA necessary to update all usages of variables which can be elegantly carried out using the DRM representation.
API转换API Transformations
API转换利用了可以使用各种API来解决相同问题的事实。
API Transformations exploits the fact that various APIs can be used to solve the same problem.
众所周知，程序员喜欢不同的API，因此篡改API的使用是改变风格模式的有效策略。
Programmers are known to favor different APIs and thus tampering with API usage is an effective strategy for changing stylistic patterns.

(3)增强自然语言语境Augmenting the natural language context

QRA通过在执行代码搜索和代码问答时重写自然语言查询来增强示例。
QRA augments examples by rewriting natural language queries when performing code search and code question answering.
它通过基于规则的小修改重写查询，这些修改与原始查询共享相同的语义。
It rewrites queries with minor rule-based modifications that share the same semantics as the original one.
具体来说，它包括三种方式:随机删除一个单词，随机交换两个单词的位置，随机复制一个单词。
Specifically, it consists of three ways: randomly deleting a word, randomly switching the position of two words, and randomly copying a word.
KeyDAC扩展了自然语言和编程语言，重点放在查询关键字上。
KeyDAC augments on both natural language and programming language with an emphasis on the query keywords.
对于自然语言查询，它遵循QRA中的规则，但只修改非关键字。
For natural language query, it follows the rules in QRA but only modifies non-keywords.
在编程语言增强方面，KeyDAC只是使用ast来重命名程序变量。
In terms of programming language augmentation, KeyDAC simply uses ASTs to rename program variables.

2.2基于模型的技术Model-based Techniques

基于模型的技术的目标是训练各种模型来增强数据。
Model-based techniques target training various models to augment data.

现有模型的利用Utilization of existing models
专门为源代码模型设计的模型Specifically designed models for source code models

(1)利用现有模型Utilization of existing models

Mi等人利用辅助分类器生成对抗网络(ACGAN) (Odena等人)来生成增强程序。
Mi et al. utilize Auxiliary Classifier Generative Adversarial Networks (ACGAN) (Odena et al.) to generate augmented programs…
为了增加代码总结的训练数据，CDA-CS (Song等人)使用预训练的BERT模型(Devlin等人)替换代码注释中非关键字的同义词，这有利于源代码下游任务。
In order to increase the training data for code summarization, CDA-CS (Song et al.) uses the pre-trained BERT model (Devlin et al.) to replace synonyms for non-keywords in code comments, which benefits the source code downstream tasks.

(2)针对源代码模型专门设计的模型Specifically designed models for source code models

IRGen是一种基于遗传算法的模型，使用编译器中间表示(LLVM IR)来增强源代码嵌入，其中IRGen将一段源代码生成一系列语义相同但语法不同的IR代码，以提高模型的上下文理解。
IRGen is a genetic-algorithm-based model using compiler intermediate representation (LLVM IR) to augment source code embeddings, where IRGen generates a piece of source code into a range of semantically identical but syntactically distinct IR codes for improving model’s contextual understanding.
反翻译是面向无监督编程语言翻译的多语言生成源代码模型。
Back-translation is the multilingual generative source code models for unsupervised programming language translation.
与自然语言处理不同，这里的反向翻译被定义为通过自然语言作为中间语言在两种编程语言之间进行翻译。
Unlike the one in NLP, Back-translation here is defined as translating between two programming languages via the natural language as an intermediate language.
Transcoder是一种生成源代码模型，用于执行源到源的翻译，以增加跨语言源代码。
Transcoder is a generative source code model to perform source-to-source translation for augmenting cross-language source code.

2.3插值技术示例Example Interpolation Techniques

混合包括插入两个或多个实际示例的输入和标签。
Mixup involves interpolating the inputs and labels of two or more actual examples.

这样的方法很难部署在源代码领域，因为每个代码片段都受到其独特的程序语法和功能的限制。
Such methods are hard to be deployed in the realm of source code, as each code snippet is constrained by its unique program grammar and functionality.

通过模型嵌入将多个真实示例融合到单个输入中
fusing multiple real examples into a single input via model embeddings

2.3插值技术示例Example Interpolation Techniques

Dong等人将源代码模型的基于规则的技术与Mixup合并，通过线性插值混合原始代码片段的表示及其转换。
Dong et al. merge rule-based techniques for source code models with Mixup to blend the representations of the original code snippet and its transformation via linear interpolation.
二值插值作为一种数据增强策略，它可以使用从伯努利分布中获得的元素在样本之间互换特征。
Binary Interpolation serves as a data augmentation strategy, which interchangeably swaps features between samples using elements acquired from a Bernoulli distribution.
线性外推是另一种数据增强方法，它根据均匀分布扩展当前特征，从而在现有特征空间之外生成新的数据点。
Linear Extrapolation is another data augmentation approach that generates new data points beyond the existing feature space by extending current features in accordance with a uniform distribution.

3.优化数据分析质量的策略与技术Strategies and Techniques to Optimize the DA Quality

3.1方法堆叠
3.1 Method Stacking

3.2优化
3.2 Optimization

3.1方法堆叠Method Stacking

方法叠加是不同数据增强方法的组合。
Method stacking is the combination of different data augmentation methods.

通常，这种组合包含两种类型:相同类型的数据处理或不同数据处理方法的混合。
Typically, the combination entails two types: same-type DA or a mixture of different DA methods.

Mi等人使用AC-GAN将基于规则的代码转换方案与基于模型的数据处理相结合，创建了用于模型训练的增强语料库。
Mi et al. combined rule-based code transformation schemes with model-based DA using AC-GAN to create an augmented corpus for model training.
CDACS包含两种数据分析技术:基于规则的非关键字提取和基于模型的非关键字替换。
CDACS encompasses two kinds of DA techniques: rule-based non-keyword extraction and model-based non-keyword replacement.
Chen和lamouras的经验证据表明，结合反向翻译和变量重命名可以提高代码完成性能。
Empirical evidence from Chen and Lampouras shows that combining Back-translation and variable renaming can result in improved code completion performance.

3.2优化Optimization

在某些场景中，例如增强鲁棒性和最小化计算成本，最佳地选择特定的增强示例候选者是至关重要的。
In certain scenarios such as enhancing robustness and minimizing computational cost, optimally selecting specific augmented example candidates is crucial.

在数据分析中，我们把这种面向目标的候选选择称为优化。
We denote such goal-oriented candidate selections in DA as optimization.

(1)概率选择Probabilistic Selection

(2)基于模型的选择Model-based Selection

(3)基于规则的选择Rule-based Selection

(1)概率选择Probabilistic Selection

MHM采用马尔可夫链蒙特卡罗技术，通过标识符替换来选择对抗样本。
MHM adopts the Markov Chain Monte Carlo technique, to choose adversarial examples via identifier replacement.
QMDP使用Q-learning方法在源代码上策略性地选择和执行基于规则的结构转换，从而指导对抗性示例的生成。
QMDP uses a Q-learning approach to strategically select and execute rule-based structural transformations on the source code, thereby guiding the generation of adversarial examples.
在BUGLABAug中，Allamanis等人模拟了在类似于指针网络的代码片段中的某个位置应用特定重写规则的概率。
In BUGLABAug, Allamanis et al. model the probability of applying a specific rewrite rule at a location in a code snippet similar to the pointer net.

(2)基于模型的选择Model-based Selection

DAMP方法是一种象征性的方法，它基于模型损失进行优化，通过变量重命名来选择和生成对抗样本。
DAMP method is an emblematic approach, which optimizes based on the model loss to select and generate adversarial examples via variable renaming.
SPACE通过梯度上升对代码标识符的嵌入进行选择和扰动，旨在最大限度地提高模型的性能影响，同时保持编程语言的语义和语法正确性。
SPACE performs selection and perturbation of code identifiers’ embeddings via gradient ascent, targeting to maximize the model’s performance impact while upholding semantic and grammatical correctness of the programming language.
ALERT在其基于梯度的选择策略中使用遗传算法。
ALERT uses a genetic algorithm in its gradient-based selection strategy.
它在计算模型置信度差的适应度函数的指导下，迭代地进化候选解的种群，旨在识别最有效的对抗性示例。
It evolves a population of candidate solutions iteratively, guided by a fitness function that calculates the model’s confidence difference, aiming to identify the most potent adversarial examples.

(3)基于规则的选择Rule-based Selection

IRGen利用基于遗传算法的优化技术和基于IR相似度的适应度函数。
IRGen utilizes a Genetic-Algorithm based optimization technique with a fitness function based on IR similarity.
ACCENT和RADAR分别应用BLEUand CodeBLEU等评估指标来指导选择和替换过程，旨在实现最大的对抗影响。
ACCENT and RADAR apply evaluation metrics such as BLEUand CodeBLEU respectively to guide the selection and replacement process, aiming for maximum adversarial impact.
STRATA采用基于规则的技术来选择影响较大的子令牌，这些子令牌可以显著改变模型对代码的解释。
STRATA employs a rule-based technique to select high-impact subtokens that significantly alter the model’s interpretation of the code.

4.将数据处理应用于源代码的场景Scenarios for Applying DA to Source Code

4.1鲁棒性的对抗性示例Adversarial Examples for Robustness

健壮性是软件工程中一个关键而复杂的维度，需要创建语义保守的对抗性示例来识别和减轻源代码模型中的漏洞。
Robustness presents a critical and complex dimension of software engineering, necessitating the creation of semantically-conserved adversarial examples to discern and mitigate vulnerabilities within source code models.

几项研究Several studies(Yefet et al.， 2020;Li et al.， 2022d;Srikant et al.;Li et al.， 2022c;Anand et al.;Henke et al., 2022) have utilized rule-based DA methods for testing and enhancing model robustness.进一步巩固了普遍接受的代码转换规则，以建立源代码模型鲁棒性的基准。
Wang et al. (2023) have gone a step further to consolidate universally accepted code transformation rules to establish a benchmark for source code model robustness.

4.2低资源域Low-Resource Domains

在软件工程领域，编程语言资源严重失衡。
In the domain of software engineering, the resources of programming languages are severely imbalanced.

由于源代码模型是在开源存储库和论坛上训练的，编程语言资源的不平衡会对资源稀缺的编程语言的性能产生不利影响。
As source code models are trained on open-source repositories and forums, the programming language resource imbalance can adversely impact their performance on the resource-scarce programming languages.

为了在低资源领域增加用于表示学习的数据，Li等人(2022f)倾向于通过释放编译器IR的力量来增加更多的训练数据以增强源代码模型嵌入。
In order to increase data in the low-resource domain for representation learning, Li et al. (2022f) tend to add more training data to enhance source code model embeddings by unleashing the power of compiler IR.
Ahmad等人(2023)考虑到低资源编程语言的情况，提出使用源代码模型执行反翻译数据分析。
Ahmad et al. (2023) propose to use source code models to perform Backtranslation DA, taking into consideration the scenario of low-resource programming languages.
与此同时，Chen和lamouras(2023)强调了一个事实，即源代码数据集明显小于它们的NLP等效数据集，后者通常包含数百万个实例。
Meanwhile, Chen and Lampouras (2023) underscore the fact that source code datasets are markedly smaller than their NLP equivalents, which often encompass millions of instances.
因此，他们开始调查这种环境下的代码完成任务，并尝试反向翻译和变量重命名。
As a result, they commence investigations into code completion tasks under this context and experiment with Back-translation and variable renaming.
Shen等人认为Bash注释的生成受到训练数据缺乏的阻碍，因此探索了基于模型的数据处理方法来完成这项任务。
Shen et al. contend that the generation of Bash comments is hampered by a dearth of training data and thus explore model-based DA methods for this task.

4.3检索增强Retrieval Augmentation

源代码模型的检索增强框架在预训练或微调源代码模型时包含了来自训练集的检索增强示例。
The retrieval augmentation frameworks for source code models incorporate retrieval-augmented examples from the training set when pre-training or fine-tuning source code models.

在各种源代码下游任务中，如代码汇总It is shown as a promising application of DA in various source code downstream tasks, such as code summarization (Zhang et al., 2020b;Liu et al.;Yu et al., 2022a), 代码完成code completion (Parvez et al., 2021) and 程序修复program repair (Nashid et al., 2023).

4.4对比学习Contrastive Learning

对比学习使模型能够学习到一个相似样本距离较近而不同样本距离较远的嵌入空间。
Contrastive learning enables models to learn an embedding space in which similar samples are close to each other while dissimilar ones are far apart.

由于训练数据集通常包含有限的正样本集，因此DA方法更倾向于构建与正样本相似的样本。
As the training datasets commonly contain limited sets of positive samples, DA methods are preferred to construct similar samples as the positive ones.

Liu等人(2023b)利用数据挖掘的对比学习为源代码模型设计了优越的预训练范例。
Liu et al. (2023b) make use of contrastive learning with DA to devise superior pre-training paradigms for source code models.
一些作品研究了该应用程序在一些源代码任务中的优势Some works study the advantages of this application in some source code tasks，
如缺陷检测defect detection(Cheng et al.， 2022)，克隆检测clone detection(Zubkov et al.， 2022;Wang et al.， 2022a)和代码检索code search(Shi et al.， 2022b, 2023;Li et al., 2022b).

5.将数据处理应用于源代码的下游任务Downstream Tasks for Applying DA to Source Code

5.1代码作者归属Code Authorship Attribution

代码作者归属是识别给定代码作者的过程，通常通过源代码模型实现。
Code authorship attribution is the process of identifying the author of a given code, usually achieved by source code models.

Yang等人(2022b)最初研究在谷歌代码阻塞(GCJ)数据集上生成对抗性示例，该数据集有效地欺骗源代码模型以识别给定代码片段的错误作者。
Yang et al. (2022b) initially investigate generating adversarial examples on the Google Code Jam (GCJ) dataset, which effectively fools source code models to identify the wrong author of a given code snippet.
通过这些增广样例的训练，可以进一步提高模型的鲁棒性。
By training with these augmented examples, the model’s robustness can be further improved.

5.2克隆检测Clone Detection

代码克隆检测是指识别给定的代码片段是否从原始样本中克隆和修改的任务，在某些情况下可称为剽窃检测。
Code clone detection refers to the task of identifying if the given code snippet is cloned and modified from the original sample, and can be called plagiarism detection in some cases.

Jain等人(2021)提出了通过编译器信息生成具有训练样本同等功能的许多变体的构造正确数据挖掘，并展示了其在BigCloneBench和自收集的JavaScript数据集上提高模型鲁棒性的有效性。
Jain et al. (2021) propose correct-by-construction DA via compiler information to generate many variants with equivalent functionality of the training sample and show its effectiveness of improving the model robustness on BigCloneBench and a self-collected JavaScript dataset.
Jia等人(2023)表明，通过混淆变换对对抗样本进行训练时，可以显著提高源代码模型的鲁棒性。
Jia et al. (2023) show that when training with adversarial examples via obfuscation transformation, the robustness of source code models can be significantly improved.
Zubkov等人(2022)提供了多个对比学习的比较，并结合了克隆检测任务的基于规则的转换。
Zubkov et al. (2022) provide the comparison of multiple contrastive learning, combined with rule-based transformations for the clone detection task.
Pinku等人(2023)后来使用Transcompiler在有限的Python和Java源代码之间进行翻译，从而增加了跨语言代码克隆检测的训练数据。
Pinku et al. (2023) later use Transcompiler to translate between limited source code in Python and Java and therefore increase the training data for cross-language code clone detection.

5.3缺陷检测Defect Detection

缺陷检测，换句话说，缺陷或漏洞检测，是捕获给定代码片段中的缺陷。
Defect Detection, in other words, bug or vulnerability detection, is to capture the bugs in given code snippets.
该任务可以被视为二元分类任务，其中标签为真或假。
The task can be considered as the binary classification task, where the labels are either true or false.

Allamanis等人(2021)实现了BUGLAB-Aug，这是一个自我监督的错误检测和修复的数据数据框架。
Allamanis et al. (2021) implement BUGLAB-Aug, a DA framework of self-supervised bug detection and repair.
BUGLAB-Aug有两套代码转换规则，一套是bug诱导重写，另一套是DA重写。
BUGLAB-Aug has two sets of code transformation rules, one is a bug-inducing rewrite and the other one is rewriting as DA.
他们的方法同时提高了源代码模型的性能和健壮性。
Their approach boosts the performance and robustness of source code models simultaneously.
Cheng等人(2022)提出了一种名为ContraFlow的路径敏感代码嵌入技术，该技术使用自监督对比学习来检测基于价值流路径的缺陷。
Cheng et al. (2022) present a path-sensitive code embedding technique called ContraFlow, which uses self-supervised contrastive learning to detect defects based on value-flow paths.
ContraFlow利用数据分析生成三个数据集(即D2A、Fan和FFMPeg+Qemu)的对比价值流表示，以了解程序之间的(非)相似性。
ContraFlow utilizes DA to generate contrastive value-flow representations of three datasets (namely D2A , Fan and FFMPeg+Qemu) to learn the (dis)similarity among programs.
Ding等人(2021)提出了一种新的自监督模型，专注于识别(不)相似的源代码功能。
Ding et al. (2021) present a novel self-supervised model focusing on identifying (dis)similar functionalities of source code.
具体来说，他们设计了代码转换启发式来自动创建有缺陷的程序和类似的代码来增加预训练数据。
Specifically, they design code transformation heuristics to automatically create bugged programs and similar code for augmenting pre-training data.

5.4代码汇总Code Summarization

代码摘要被认为是为一段源代码生成注释的任务，因此也被称为代码注释生成。
Code summarization is considered as a task that generates a comment for a piece of the source code, and is thus also named code comment generation.

(Zhang et al.， 2020c)将MHM应用于扰动训练样例，并与原始训练样例混合进行对抗性训练，有效提高了源代码模型在总结对抗性代码片段方面的鲁棒性。
(Zhang et al., 2020c) apply MHM to perturb training examples and mix them with the original ones for adversarial training, which effectively improves the robustness of source code models in summarizing the adversarial code snippets.
(Zhang et al.， 2020b)开发了一种用于代码摘要的检索增强框架，依靠相似的代码摘要对在PCSD和JCSD数据集上生成新的摘要。
(Zhang et al., 2020b) develop a retrieval-augmentation framework for code summarization, relying on similar code-summary pairs to generate the new summary on PCSD and JCSD datasets.
基于该框架，(Liu et al.)利用Hybrid GNN提出了一种新的检索增强代码摘要方法，并将其用于自收集的CCSD数据集的模型训练。
Based on this framework, (Liu et al.) leverage Hybrid GNN to propose a novel retrieval-augmented code summarization method and use it during model training on the self-collected CCSD dataset.

5.5查找代码Code Search

代码搜索或代码检索是基于给定的自然语言查询搜索代码片段的文本代码任务。
Code search, or code retrieval, is a text-code task that searches code snippets based on the given natural language queries.

此任务上的源代码模型需要将文本的语义映射到源代码。
The source code models on this task need to map the semantics of the text to the source code.

Bahrami等人(2021)通过增加自然语言上下文(如文档字符串、代码注释和提交消息)来增加代码搜索查询。
Bahrami et al. (2021) increase the code search queries by augmenting the natural language context such as doc-string, code comments and commit messages.
Shi等人(2023)引入了软数据增强(SoDa)，不需要对代码和文本进行外部转换规则。
Shi et al. (2023) introduce soft data augmentation (SoDa), without external transformation rules on code and text.
使用SoDa，该模型在处理CodeSearchNet时基于动态屏蔽或替换来预测令牌。
With SoDa, the model predicts tokens based on dynamic masking or replacement when processing CodeSearchNet.

5.6代码完成Code Completion

代码完成需要源代码模型生成代码行来完成给定的编程挑战。
Code completion requires source code models to generate lines of code to complete given programming challenges.

(Lu et al.， 2022)提出了一个由基于规则的DA模块组成的检索增强代码补全框架，用于在PY150和GitHub Java语料库数据集上生成。
(Lu et al., 2022) propose a retrieval-augmented code completion framework composed of the rulebased DA module to generate on PY150 and GitHub Java Corpus datasets.
Wang等人(2023)专门为文档字符串、函数和变量名、代码语法和代码格式上的代码定制了30多种转换，并在HumanEval和MBPP上对生成源代码模型进行了基准测试。
Wang et al. (2023) customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format and benchmark generative source code models on HumanEval and MBPP.

5.7代码翻译Code Translation

代码翻译任务是将用特定编程语言编写的源代码翻译成另一种编程语言。
The code translation task is to translate source code written in a specific programming language to another one.

Ahmad等人(2023)通过反向翻译应用数据增强来增强无监督代码翻译。
Ahmad et al. (2023) apply data augmentation through Back-Translation to enhance unsupervised code translation.
他们使用预先训练的序列到序列模型将代码翻译成自然语言摘要，然后再转换成不同编程语言的代码。
They use pre-trained sequence-to-sequence models to translate code into natural language summaries and then back into code in a different programming language.
Chen和lamouras(2023)利用反向翻译和变量增强技术来提高CodeTrans上的代码翻译。
Chen and Lampouras (2023) utilize Back-Translation and variable augmentation techniques to yield the improvement in code translation on CodeTrans .

5.8代码问答Code Question Answering (CQA)

CQA可以表述为一个任务，其中需要源代码模型根据给定的代码片段和问题生成文本答案。
CQA can be formulated as a task where the source code models are required to generate a textual answer based on given a code snippet and a question.

Li等人(2022c)在CodeQA(一个自由格式的CQA数据集)上探索了基于规则的DA在连续嵌入空间上对抗性训练的效果。
Li et al. (2022c) explore the efficacy of adversarial training on the continuous embedding space with rule-based DA on CodeQA , a free-form CQA dataset.

5.9代码分类Code Classification

代码分类任务根据程序的功能对程序进行分类。
Code classification task performs the categorization of programs regarding their functionality.

Zhang等人(2022)结合了单纯形插值，一种基于IR的实例插值数据分析方法，在CodeXGLUE的POJ-104上创建中间嵌入。
Zhang et al. (2022) incorporate simplex interpolation, an example-interpolation DA approach on IR, to create intermediate embeddings on POJ-104 from CodeXGLUE.
Dong等人(2023a)也探索了示例插值数据分析来融合代码片段的嵌入。
Dong et al. (2023a) also explore the example-interpolation DA to fuse the embeddings of code snippets.
他们在两个数据集JAVA250和Python800上评估该方法。
They evaluate the method on two datasets, JAVA250 and Python800.

5.10方法名称预测Method Name Prediction

方法名称预测的目标是预测给定程序的方法名称。
The goal of method name prediction is to predict the name of a method given the program.

Yefet et al.(2020)通过在Code2Seq数据集上使用变量名称替换的对抗程序来攻击和防御源代码模型。
Yefet et al. (2020) attack and defense source code models by using variable-name-replaced adversarial programs on the Code2Seq dataset.
Pour等人(2021)提出了一个专门针对对抗鲁棒性的基于搜索的测试框架。
Pour et al. (2021) propose a search-based testing framework specifically for adversarial robustness.
他们用Java中广泛使用的10个流行重构操作符生成了Java的对抗性示例。
They generate adversarial examples of Java with ten popular refactoring operators widely used in Java.
Rabin等人(2021)和Yu等人(2022b)都实现了数据增强框架和各种转换规则，用于在Code2Seq数据集上处理Java源代码。
Rabin et al. (2021) and Yu et al. (2022b) both implement data augmentation frameworks and various transformation rules for processing Java source code on the Code2Seq dataset.

5.11类型预测Type Prediction

类型预测或类型干扰的目的是预测程序中的参数和函数类型。
Type prediction, or type interference, aims to predict parameter and function types in programs.

Bielik和Vechev(2020)使用转换ast的示例对源代码模型进行对抗性攻击。
Bielik and Vechev (2020) conduct adversarial attacks on source code models with examples of transformed ASTs.
他们将攻击实例化为JavaScript和TypeScript上的类型预测。
They instantiate the attack to type prediction on JavaScript and TypeScript.
Jain等人(2021)应用编译器变换在DeepTyper中生成许多程序变体，具有11条规则的等效功能。
Jain et al. (2021) apply compiler transforms to generates many variants of programs in DeepTyper, with equivalent functionality with 11 rules.
Li等人(2022e)采用srcML元语法嵌入来增强三个数据集(DeepTyper、Typilus Data和CodeSearchNet)中示例的语法特征。
Li et al. (2022e) incorporate srcML meta-grammar embeddings to augment the syntactic features of examples in three datasets, DeepTyper, Typilus Data and CodeSearchNet.

6.挑战与机遇Challenges and Opportunities

6.1理论讨论 Discussion on theory

目前，对源代码中数据处理方法的深入探索和理论认识还存在明显的差距。
Currently, there’s a noticeable gap in the in-depth exploration and theoretical understanding of DA methods in source code.

以前的许多工作都介绍了新的方法或演示了数据处理技术如何有效地用于后续任务。
Much of the previous work introduces new methods or demonstrates how DA techniques can be effective for subsequent tasks.

然而，这些研究往往忽略了为什么和如何特别是从数学的角度来看。
However, these studies often overlook why and how particularly from a mathematical perspective.

我们应该从更广阔的角度来理解数据分析，而不仅仅是看实验结果。
We should understand DA from a broader perspective, not just by looking at experimental results.

6.2对预训练模型的进一步研究 More study on pre-trained models

一个新兴的研究机会在于利用在大量文本和源代码上训练的大型语言模型(llm)来探索数据处理在源代码领域的潜力。
An emergent research opportunity lies in exploring the potential of DA in the source code domain with the help of large language models (LLMs) trained on a large amount of text and source code.

llm具有基于提示指令和提供示例的上下文生成能力，使其成为NLP中自动化数据处理过程的选择。
LLMs have the capability of context generation based on prompted instructions and provided examples, making them a choice to automate the DA process in NLP.

相比之下，对源代码领域中基于提示的数据分析的探索仍然是一个相对未触及的研究领域。
In contrast, the exploration of prompt-based DA in source code domains remains a relatively untouched research area.

6.3处理特定于领域的数据 Working with domain-specific data

我们的调查所涵盖的数据分析方法不能直接推广到API推荐和API序列生成等任务，因为它们中的大多数只针对程序级的增强，而不是API级。
DA methods covered by our survey can not be directly generalized to tasks such as API recommendation and API sequence generation, as most of them only target program-level augmentation but not API-level.

我们观察到这两个不同层之间的数据处理技术的差距，这为未来的工作提供了探索的机会。
We observe a gap of DA techniques between these two different layers, which provides opportunities for future works to explore.

此外，源代码建模并没有完全证明数据分析用于分布外泛化。
Additionally, the source code modeling has not fully justified DA for out-of-distribution generalization.

6.4对项目级源代码和低资源编程语言的更多探索 More exploration on project-level source code and low-resource programming languages

现有的方法已经在函数级代码片段和通用编程语言方面取得了足够的进展。
The existing methods have made sufficient progress in function-level code snippets and common programming languages.

强调功能级别的代码片段无法捕捉到现实场景中编程的复杂性，在现实场景中，开发人员经常同时处理多个文件和文件夹。
The emphasis on code snippets at the function level fails to capture the intricacies and complexities of programming in realworld scenarios, where developers often work with multiple files and folders simultaneously.

因此，我们强调在项目层面探索数据分析方法的重要性。
Therefore, we highlight the importance of exploring DA approaches on the project level.

同时，由于数据资源的限制，低资源语言的扩充方法比较少，但对数据处理的需求比较大。
At the same time, limited by data resources, augmentation methods of low-resource languages are scarce, although they have more demand for DA.

6.5缓解社会偏见 Mitigating social bias

由于源代码模型促进了软件开发，它们可以用于开发以人为中心的应用程序，例如人力资源和教育，在这些应用程序中，有偏见的程序可能会导致对代表性不足的人做出不公正和不道德的决定。
As source code models have advanced software development, they may be used to develop human-centric applications such as human resources and education, where biased programs may result in unjustified and unethical decisions for underrepresented people.

虽然NLP中的社会偏见已经得到了很好的研究，并且可以通过数据处理来减轻，但源代码中的社会偏见尚未引起人们的注意。
While social bias in NLP has been well studied and can be mitigated with DA, the social bias in source code has not been brought to attention.

为了使这些模型在源代码中更负责任，我们敦促对减轻偏见进行更多的研究。
To make these models more responsible in source code, we urge more research on mitigating bias.

6.6少射学习 Few-shot learning

在少数场景下，需要模型达到与传统机器学习模型相媲美的性能，但训练数据的数量非常有限。
In few-shot scenarios, models are required to achieve performance that rivals that of traditional machine learning models, yet the amount of training data is extremely limited.

主流的预训练源代码模型通过语言建模获得丰富的语义知识。
Mainstream pre-trained source code models obtain rich semantic knowledge through language modeling.

传统数据分析方法对预训练源代码模型的改进空间被大大压缩。
The improvement space that traditional DA methods bring to pre-trained source code models has been greatly compressed.

因此，如何在少镜头场景中生成高质量的增强数据，从而为模型提供快速泛化和问题解决能力是一个有趣的问题。
Therefore, it is an interesting question how to provide models with fast generalization and problem-solving capability by generating high-quality augmented data in few-shot scenarios.

6.7缺乏统一性Lack of unification

虽然有广泛接受的用于CV的数据处理框架(例如PyTorch中的默认增强库，RandAugment)和用于NLP的数据处理框架(例如NL-Augmenter)，但明显缺乏用于源代码模型的通用数据处理技术的相应库。
Whereas there are well-accepted frameworks for DA for CV (e.g. default augmentation libraries in PyTorch, RandAugment) and DA for NLP (e.g. NL-Augmenter), a corresponding library of generalized DA techniques for source code models is conspicuously absent.

此外，由于现有的数据分析方法通常使用不同的数据集进行评估，因此很难真正确定其疗效。
Furthermore, as existent DA methods are usually evaluated with various datasets, it is hard to truly determine the efficacy.

因此，我们认为，建立标准化和统一的基准任务，以及用于对比和评估不同增强方法有效性的数据集，将极大地促进数据挖掘研究的进展。
Therefore, we posit that the progression of DA research would be greatly facilitated by the establishment of standardized and unified benchmark tasks, along with datasets for the purpose of contrasting and evaluating the effectiveness of different augmentation methods.

你可能感兴趣的:(科研笔记与实践,#,大模型,分类,AIGC,数据挖掘,人工智能,AI编程,copilot,论文阅读)

斤斤计较的婚姻到底有多难？白心之岂必有为
很多人私聊我会问到在哪个人群当中斤斤计较的人最多？我都会回答他，一般婚姻出现问题的斤斤计较的人士会非常多，以我多年经验，在婚姻落的一塌糊涂的人当中，斤斤计较的人数占比在20～30%以上，也就是说10个婚姻出现问题的斤斤计较的人有2-3个有多不减。在婚姻出问题当中，有大量的心理不平衡的、尖酸刻薄的怨妇。在婚姻中仅斤斤计较有两种类型：第一种是物质上的，另一种是精神上的。在物质与精神上抠门已经严重的影响
QQ群采集助手，精准引流必备神器 2401_87347160 其他经验分享
功能概述微信群查找与筛选工具是一款专为微信用户设计的辅助工具，它通过关键词搜索功能，帮助用户快速找到相关的微信群，并提供筛选是否需要验证的群组的功能。主要功能关键词搜索：用户可以输入关键词，工具将自动查找包含该关键词的微信群。筛选功能：工具提供筛选机制，用户可以选择是否只显示需要验证或不需要验证的群组。精准引流：通过上述功能，用户可以更精准地找到目标群组，进行有效的引流操作。3.设备需求该工具可以
机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
随笔 | 仙一般的灵气海思沧海
仙岛今天，我看了你全部，似乎已经进入你的世界我不知道，这是否是梦幻，还是你仙一般的灵气吸引了我也许每一个人都要有一份属于自己的追求，这样才能够符合人生的梦想，生活才能够充满着阳光与快乐我不知道，我为什么会这样的感叹，是在感叹自己的人生，还是感叹自己一直没有孜孜不倦的追求只感觉虚度了光阴，每天活在自己的梦中，活在一个不真实的世界是在逃避自己，还是在逃避周围的一切有时候我嘲笑自己，嘲笑自己如此的虚无，
【iOS】MVC设计模式 Magnetic_h ios mvc 设计模式 objective-c 学习 ui
MVC前言如何设计一个程序的结构，这是一门专门的学问，叫做"架构模式"（architecturalpattern），属于编程的方法论。MVC模式就是架构模式的一种。它是Apple官方推荐的App开发架构，也是一般开发者最先遇到、最经典的架构。MVC各层controller层Controller/ViewController/VC（控制器）负责协调Model和View，处理大部分逻辑它将数据从Mod
OC语言多界面传值五大方式 Magnetic_h ios ui 学习 objective-c 开发语言
前言在完成暑假仿写项目时，遇到了许多需要用到多界面传值的地方，这篇博客来总结一下比较常用的五种多界面传值的方式。属性传值属性传值一般用前一个界面向后一个界面传值，简单地说就是通过访问后一个视图控制器的属性来为它赋值，通过这个属性来做到从前一个界面向后一个界面传值。首先在后一个界面中定义属性@interfaceBViewController:UIViewController@propertyNSSt
一百九十四章. 自相矛盾巨木擎天
唉！就这么一夜，林子感觉就像过了很多天似的，先是回了阳间家里，遇到了那么多不可思议的事情儿。特别是小伙伴们，第二次与自己见面时，僵硬的表情和恐怖的气氛，让自己如坐针毡，打从心眼里难受！还有东子，他现在还好吗？有没有被人欺负？护城河里的小鱼小虾们，还都在吗？水不会真的干枯了吧？那对相亲相爱漂亮的太平鸟儿，还好吧！春天了，到了做窝、下蛋、喂养小鸟宝宝的时候了，希望它们都能够平安啊！虽然没有看见家人，也
C语言宏函数南林yan C语言 c语言
一、什么是宏函数？通过宏定义的函数是宏函数。如下，编译器在预处理阶段会将Add(x,y)替换为((x)*(y))#defineAdd(x,y)((x)*(y))#defineAdd(x,y)((x)*(y))intmain(){inta=10;intb=20;intd=10;intc=Add(a+d,b)*2;cout<
微服务下功能权限与数据权限的设计与实现 nbsaas-boot 微服务 java 架构
在微服务架构下，系统的功能权限和数据权限控制显得尤为重要。随着系统规模的扩大和微服务数量的增加，如何保证不同用户和服务之间的访问权限准确、细粒度地控制，成为设计安全策略的关键。本文将讨论如何在微服务体系中设计和实现功能权限与数据权限控制。1.功能权限与数据权限的定义功能权限：指用户或系统角色对特定功能的访问权限。通常是某个用户角色能否执行某个操作，比如查看订单、创建订单、修改用户资料等。数据权限：
学点心理知识，呵护孩子健康静候花开_7090
昨天听了华中师范大学教育管理学系副教授张玲老师的《哪里才是学生心理健康的最后庇护所，超越教育与技术的思考》的讲座。今天又重新学习了一遍，收获匪浅。张玲博士也注意到了当今社会上的孩子由于心理问题导致的自残、自杀及伤害他人等恶性事件。她向我们普及了一个重要的命题，她说心理健康的一些基本命题，我们与我们通常的一些教育命题是不同的，她还举了几个例子，让我们明白我们原来以为的健康并非心理学上的健康。比如如果
Cell Insight | 单细胞测序技术又一新发现，可用于HIV-1和Mtb共感染个体诊断尐尐呅
结核病是艾滋病合并其他疾病中导致患者死亡的主要原因。其中结核病由结核分枝杆菌（Mycobacteriumtuberculosis,Mtb）感染引起，获得性免疫缺陷综合症（艾滋病）由人免疫缺陷病毒（Humanimmunodeficiencyvirustype1,HIV-1）感染引起。国家感染性疾病临床医学研究中心/深圳市第三人民医院张国良团队携手深圳华大生命科学研究院吴靓团队，共同研究得出单细胞测序
c++ 的iostream 和 c++的stdio的区别和联系黄卷青灯77 c++算法开发语言 iostream stdio
在C++中，iostream和C语言的stdio.h都是用于处理输入输出的库，但它们在设计、用法和功能上有许多不同。以下是两者的区别和联系：区别1.编程风格iostream（C++风格）：C++标准库中的输入输出流类库，支持面向对象的输入输出操作。典型用法是cin（输入）和cout（输出），使用>操作符来处理数据。更加类型安全，支持用户自定义类型的输入输出。#includeintmain(){in
《投行人生》读书笔记小蘑菇的树洞
《投行人生》----作者詹姆斯-A-朗德摩根斯坦利副主席40年的职业洞见-很短小精悍的篇幅，比较适合初入职场的新人。第一部分成功的职业生涯需要规划1.情商归为适应能力分享与协作同理心适应能力，更多的是自我意识，你有能力识别自己的情并分辨这些情绪如何影响你的思想和行为。2.对于初入职场的人的建议，细节，截止日期和数据很重要截止日期，一种有效的方法是请老板为你所有的任务进行优先级排序。和老板喝咖啡的好
《策划经理回忆录之二》路基雅虎
话说三年变六年，飘了，飘了……眨眼，2013年5月，老吴回到了他的家乡——油城从新开启他的工作幻想症生涯。很庆幸，这是一家很有追求，同时敢于尝试的，且实力不容低调的新星房企——金源置业(前身泰源置业)更值得庆幸的是第一个盘就是油城十路的标杆之一:金源盛世。2013年5月，到2015年11月，两年的陪伴，迎来了一场大爆发。2000个筹，5万/筹，直接回笼1个亿！！！这……让我开始认真审视这座看似五线
Long类型前后端数据不一致 igotyback 前端
响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题，尤其是当后端使用Java的Long类型（64位）与前端JavaScript的Number类型（最大安全整数为2^53-1，即16位）进行数据交互时，很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问
LocalDateTime 转 String igotyback java 开发语言
importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat
店群合一模式下的社区团购新发展——结合链动 2+1 模式、AI 智能名片与 S2B2C 商城小程序源码说私域人工智能小程序
摘要：本文探讨了店群合一的社区团购平台在当今商业环境中的重要性和优势。通过分析店群合一模式如何将互联网社群与线下终端紧密结合，阐述了链动2+1模式、AI智能名片和S2B2C商城小程序源码在这一模式中的应用价值。这些创新元素的结合为社区团购带来了新的机遇，提升了用户信任感、拓展了营销渠道，并实现了线上线下的完美融合。一、引言随着互联网技术的不断发展，社区团购作为一种新兴的商业模式，在满足消费者日常需
2021-08-26 影幽
在生活中，女人与男人的感悟往往有所不同。人生最大的舞台就是生活，大幕随时都可能拉开，关键是你愿不愿意表演都无法躲避。在生活中，遇事不要急躁，不要急于下结论，尤其生气时不要做决断，要学会换位思考，大事化小小事化了，把复杂的事情尽量简单处理，千万不要把简单的事情复杂化。永远不要扭曲，别人善意，无药可救。昨天是张过期的支票，明天是张信用卡，只有今天才是现金，要善加利用！执着的攀登者不必去与别人比较自己的
消息中间件有哪些常见类型 xmh-sxh-1314 java
消息中间件根据其设计理念和用途，可以大致分为以下几种常见类型：点对点消息队列（Point-to-PointMessagingQueues）：在这种模型中，消息被发送到特定的队列中，消费者从队列中取出并处理消息。队列中的消息只能被一个消费者消费，消费后即被删除。常见的实现包括IBM的MQSeries、RabbitMQ的部分使用场景等。适用于任务分发、负载均衡等场景。发布/订阅消息模型（Pub/Sub
高级编程--XML+socket练习题 masa010 java 开发语言
1.北京华北2114.8万人上海华东2,500万人广州华南1292.68万人成都华西1417万人（1）使用dom4j将信息存入xml中（2）读取信息，并打印控制台（3）添加一个city节点与子节点（4）使用socketTCP协议编写服务端与客户端，客户端输入城市ID，服务器响应相应城市信息（5）使用socketTCP协议编写服务端与客户端，客户端要求用户输入city对象，服务端接收并使用dom4j
三大师传 beca酱
巴尔扎克的作品被誉为“法国社会的一面镜子”。文学大师维克多·雨果对巴尔扎克的评价是：“在最伟大的人物中间，巴尔扎克是名列前茅者；在最优秀的人物中间，巴尔扎克是佼佼者之一。”一个原本寂寂无名的小人物，从地中海的某个海岛上，只身一人来到巴黎，没有朋友，也没有名望。作为一个一文不名的外乡人，凭着赤手空拳赢得了巴黎，征服了整个法兰西，并且赢得了世界。这个人就是十九世纪法国伟大的军事家、政治家，法兰西第一帝
2018-07-23-催眠日作业-#不一样的31天#-66小鹿小鹿_33
预言日：人总是在逃避命运的路上，与之不期而遇。心理学上有个著名的名词，叫做自证预言；经济学上也有一个很著名的定律叫做，墨菲定律；在灵修派上，还有一个很著名的法则，叫做吸引力法则。这3个领域的词，虽然看起来不太一样，但是他们都在告诉人们一个现象：你越担心什么，就越有可能会发生什么。同样的道理，你越想得到什么，就应该要积极地去创造什么。无论是自证预言，墨菲定律还是吸引力法则，对人都有正反2个维度的影响
我的烦恼余建梅
我的烦恼。女儿问我：“你给学生布置什么作文题目？”“《我的烦恼》。”“他们都这么大了，你觉得他们还有烦恼吗？”“有啊！每个人都会有自己烦恼。”“我不相信，大人是没有烦恼的，如果说一定有的话，你的烦恼和我写作业有关，而且是小烦恼。不像我，天天被你说，有这样的妈妈，烦恼是没完没了。”女儿愤愤不平。每个人都会有自己的烦恼，处在上有老下有小的年纪，烦恼多的数不完。想干好工作带好孩子，想孝顺父母又想经营好自
《大清方方案》| 第二话谁佐清欢
和珅究竟说了些什么？竟能令堂堂九五之尊龙颜失色！此处暂且按下不表；单说这位乾隆皇帝，果真不愧是康熙从小带过的，一旦决定了要做的事，便杀伐决断毫不含糊。他当即亲自拟旨，着令和珅为钦差大臣，全权负责处理方方事件，并钦赐尚方宝剑，遇急则三品以下官员可先斩后奏。和珅身负皇上重托，岂敢有半点怠慢，当夜即率领相关人等，马不停蹄杀奔江汉。这一路上，和珅的几位幕僚一直在商讨方方事件的处置方案。有位年轻幕僚建议快刀
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
回溯 Leetcode 332 重新安排行程 mmaerd Leetcode刷题学习记录 leetcode 算法职场和发展
重新安排行程Leetcode332学习记录自代码随想录给你一份航线列表tickets，其中tickets[i]=[fromi,toi]表示飞机出发和降落的机场地点。请你对该行程进行重新规划排序。所有这些机票都属于一个从JFK（肯尼迪国际机场）出发的先生，所以该行程必须从JFK开始。如果存在多种有效的行程，请你按字典排序返回最小的行程组合。例如，行程[“JFK”,“LGA”]与[“JFK”,“LGB
每日一题——第八十九题互联网打工人no1 C语言程序设计每日一练 c语言
题目：在字符串中找到提取数字，并统计一共找到多少整数，a123xxyu23&8889，那么找到的整数为123，23，8889//思想：#include#include#includeintmain(){charstr[]="a123xxyu23&8889";intcount=0;intnum=0;//用于临时存放当前正在构建的整数。boolinNum=false;//用于标记当前是否正在读取一个整
每日一题——第九十题互联网打工人no1 C语言程序设计每日一练 c语言
题目：判断子串是否与主串匹配#include#include#include//////判断子串是否在主串中匹配//////主串///子串///boolisSubstring(constchar*str,constchar*substr){intlenstr=strlen(str);//计算主串的长度intlenSub=strlen(substr);//计算子串的长度//遍历主字符串，对每个可能得
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
log4j对象改变日志级别 3213213333332132 java log4j level log4j对象名称日志级别
log4j对象改变日志级别可批量的改变所有级别，或是根据条件改变日志级别。 log4j配置文件： log4j.rootLogger=ERROR,FILE,CONSOLE,EXECPTION #log4j.appender.FILE=org.apache.log4j.RollingFileAppender log4j.appender.FILE=org.apache.l
elk+redis 搭建nginx日志分析平台 ronin47 elasticsearch kibana logstash
elk+redis 搭建nginx日志分析平台 logstash,elasticsearch,kibana 怎么进行nginx的日志分析呢？首先，架构方面，nginx是有日志文件的，它的每个请求的状态等都有日志文件进行记录。其次，需要有个队列，redis的l
Yii2设置时区 dcj3sjt126com PHP timezone yii2
时区这东西，在开发的时候，你说重要吧，也还好，毕竟没它也能正常运行，你说不重要吧，那就纠结了。特别是linux系统，都TMD差上几小时，你能不痛苦吗？win还好一点。有一些常规方法，是大家目前都在采用的1、php.ini中的设置，这个就不谈了，2、程序中公用文件里设置，date_default_timezone_set一下时区3、或者。。。自己写时间处理函数，在遇到时间的时候，用这个函数处理（比较
js实现前台动态添加文本框，后台获取文本框内容 171815164 文本框
<%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://w
持续集成工具 g21121 持续集成
持续集成是什么？我们为什么需要持续集成？持续集成带来的好处是什么？什么样的项目需要持续集成？... 持续集成(Continuous integration ,简称CI)，所谓集成可以理解为将互相依赖的工程或模块合并成一个能单独运行
数据结构哈希表(hash)总结永夜-极光数据结构
1.什么是hash 来源于百度百科: Hash，一般翻译做“散列”，也有直接音译为“哈希”的，就是把任意长度的输入，通过散列算法，变换成固定长度的输出，该输出就是散列值。这种转换是一种压缩映射，也就是，散列值的空间通常远小于输入的空间，不同的输入可能会散列成相同的输出，所以不可能从散列值来唯一的确定输入值。简单的说就是一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。
乱七八糟程序员是怎么炼成的
eclipse中的jvm字节码查看插件地址： http://andrei.gmxhome.de/eclipse/ 安装该地址的outline 插件后重启，打开window下的view下的bytecode视图 http://andrei.gmxhome.de/eclipse/ jvm博客： http://yunshen0909.iteye.com/blog/2
职场人伤害了“上司” 怎样弥补 aijuans 职场
由于工作中的失误，或者平时不注意自己的言行“伤害”、“得罪”了自己的上司，怎么办呢？　　在职业生涯中这种问题尽量不要发生。下面提供了一些解决问题的建议：　　一、利用一些轻松的场合表示对他的尊重　　即使是开明的上司也很注重自己的权威，都希望得到下属的尊重，所以当你与上司冲突后，最好让不愉快成为过去，你不妨在一些轻松的场合，比如会餐、联谊活动等，向上司问个好，敬下酒，表示你对对方的尊重，
深入浅出url编码 antonyup_2006 应用服务器浏览器 servlet weblogic IE
出处：http://blog.csdn.net/yzhz 杨争 http://blog.csdn.net/yzhz/archive/2007/07/03/1676796.aspx 一、问题：编码问题是JAVA初学者在web开发过程中经常会遇到问题，网上也有大量相关的
建表后创建表的约束关系和增加表的字段百合不是茶标的约束关系增加表的字段
下面所有的操作都是在表建立后操作的,主要目的就是熟悉sql的约束,约束语句的万能公式 1,增加字段(student表中增加姓名字段) alter table 增加字段的表名 add 增加的字段名增加字段的数据类型 alter table student add name varchar2(10); &nb
Uploadify 3.2 参数属性、事件、方法函数详解 bijian1013 JavaScript uploadify
一.属性属性名称默认值说明 auto true 设置为true当选择文件后就直接上传了，为false需要点击上传按钮才上传。 buttonClass ” 按钮样式 buttonCursor ‘hand’ 鼠标指针悬停在按钮上的样子 buttonImage null 浏览按钮的图片的路
精通Oracle10编程SQL(16)使用LOB对象 bijian1013 oracle 数据库 plsql
/* *使用LOB对象 */ --LOB(Large Object)是专门用于处理大对象的一种数据类型，其所存放的数据长度可以达到4G字节 --CLOB/NCLOB用于存储大批量字符数据，BLOB用于存储大批量二进制数据，而BFILE则存储着指向OS文件的指针 /* *综合实例 */ --建立表空间 --#指定区尺寸为128k,如不指定，区尺寸默认为64k CR
【Resin一】Resin服务器部署web应用 bit1129 resin
工作中，在Resin服务器上部署web应用，通常有如下三种方式：配置多个web-app 配置多个http id 为每个应用配置一个propeties、xml以及sh脚本文件配置多个web-app 在resin.xml中,可以为一个host配置多个web-app <cluster id="app&q
red5简介及基础知识白糖_ 基础
简介 Red5的主要功能和Macromedia公司的FMS类似，提供基于Flash的流媒体服务的一款基于Java的开源流媒体服务器。它由Java语言编写，使用RTMP作为流媒体传输协议，这与FMS完全兼容。它具有流化FLV、MP3文件，实时录制客户端流为FLV文件，共享对象，实时视频播放、Remoting等功能。用Red5替换FMS后,客户端不用更改可正
angular.fromJson boyitech AngularJS AngularJS 官方API AngularJS API
angular.fromJson 描述: 把Json字符串转为对象使用方法: angular.fromJson(json); 参数详解: Param Type Details json string JSON 字符串返回值: 对象, 数组, 字符串或者是一个数字示例: <!DOCTYPE HTML> <h
java-颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I bylijinnan java
public class ReverseWords { /** * 题目：颠倒一个句子中的词的顺序。比如： I am a student颠倒后变成：student a am I.词以空格分隔。 * 要求： * 1.实现速度最快,移动最少 * 2.不能使用String的方法如split,indexOf等等。 * 解答：两次翻转。 */ publ
web实时通讯 Chen.H Web 浏览器 socket 脚本
关于web实时通讯，做一些监控软件。由web服务器组件从消息服务器订阅实时数据，并建立消息服务器到所述web服务器之间的连接，web浏览器利用从所述web服务器下载到web页面的客户端代理与web服务器组件之间的socket连接，建立web浏览器与web服务器之间的持久连接；利用所述客户端代理与web浏览器页面之间的信息交互实现页面本地更新，建立一条从消息服务器到web浏览器页面之间的消息通路
[基因与生物]远古生物的基因可以嫁接到现代生物基因组中吗? comsci 生物
大家仅仅把我说的事情当作一个IT行业的笑话来听吧..没有其它更多的意思如果我们把大自然看成是一位伟大的程序员,专门为地球上的生态系统编制基因代码,并创造出各种不同的生物来,那么6500万年前的程序员开发的代码,是否兼容现代派的程序员的代码和架构呢?
oracle 外部表 daizj oracle 外部表 external tables
oracle外部表是只允许只读访问，不能进行DML操作，不能创建索引，可以对外部表进行的查询，连接，排序，创建视图和创建同义词操作。 you can select, join, or sort external table data. You can also create views and synonyms for external tables. Ho
aop相关的概念及配置 daysinsun AOP
切面(Aspect): 通常在目标方法执行前后需要执行的方法（如事务、日志、权限），这些方法我们封装到一个类里面，这个类就叫切面。连接点（joinpoint） spring里面的连接点指需要切入的方法，通常这个joinpoint可以作为一个参数传入到切面的方法里面（非常有用的一个东西）。通知（Advice）通知就是切面里面方法的具体实现，分为前置、后置、最终、异常环
初一上学期难记忆单词背诵第二课 dcj3sjt126com english word
middle 中间的，中级的 well 喔，那么；好吧 phone 电话，电话机 policeman 警察 ask 问 take 拿到；带到 address 地址 glad 高兴的，乐意的 why 为什么 China 中国 family 家庭 grandmother (外)祖母 grandfather (外)祖父 wife 妻子 husband 丈夫 da
Linux日志分析常用命令 dcj3sjt126com linux log
1.查看文件内容 cat -n 显示行号 2.分页显示 more Enter 显示下一行空格显示下一页 F 显示下一屏 B 显示上一屏 less /get 查询"get"字符串并高亮显示 3.显示文件尾 tail -f 不退出持续显示 -n 显示文件最后n行 4.显示头文件 head -n 显示文件开始n行 5.内容排序 sort -n 按照
JSONP 原理分析 fantasy2005 JavaScript jsonp jsonp 跨域
转自 http://www.nowamagic.net/librarys/veda/detail/224 JavaScript是一种在Web开发中经常使用的前端动态脚本技术。在JavaScript中，有一个很重要的安全性限制，被称为“Same-Origin Policy”（同源策略）。这一策略对于JavaScript代码能够访问的页面内容做了很重要的限制，即JavaScript只能访问与包含它的
使用connect by进行级联查询 234390216 oracle 查询父子 Connect by 级联
使用connect by进行级联查询 connect by可以用于级联查询，常用于对具有树状结构的记录查询某一节点的所有子孙节点或所有祖辈节点。来看一个示例，现假设我们拥有一个菜单表t_menu，其中只有三个字段：
一个不错的能将HTML表格导出为excel,pdf等的jquery插件 jackyrong jquery插件
发现一个老外写的不错的jquery插件，可以实现将HTML 表格导出为excel,pdf等格式，地址在： https://github.com/kayalshri/ 下面看个例子，实现导出表格到excel,pdf <html> <head> <title>Export html table to excel an
UI设计中我们为什么需要设计动效 lampcy UI UI设计
关于Unity3D中的Shader的知识首先先解释下Unity3D的Shader，Unity里面的Shaders是使用一种叫ShaderLab的语言编写的，它同微软的FX文件或者NVIDIA的CgFX有些类似。传统意义上的vertex shader和pixel shader还是使用标准的Cg/HLSL 编程语言编写的。因此Unity文档里面的Shader，都是指用ShaderLab编写的代码，
如何禁止页面缓存 nannan408 html jsp cache
禁止页面使用缓存~ ------------------------------------------------ jsp:页面no cache： response.setHeader("Pragma","No-cache"); response.setHeader("Cache-Control","no-cach
以代码的方式管理quartz定时任务的暂停、重启、删除、添加等 Everyday都不同定时任务管理 spring-quartz
【前言】在项目的管理功能中，对定时任务的管理有时会很常见。因为我们不能指望只在配置文件中配置好定时任务就行了，因为如果要控制定时任务的 “暂停” 呢？暂停之后又要在某个时间点 “重启” 该定时任务呢？或者说直接 “删除” 该定时任务呢？要改变某定时任务的触发时间呢？ “添加” 一个定时任务对于系统的使用者而言，是不太现实的，因为一个定时任务的处理逻辑他是不
EXT实例 tntxia ext
（1）增加一个按钮 JSP: <%@ page language="java" import="java.util.*" pageEncoding="UTF-8"%> <% String path = request.getContextPath(); Stri
数学学习在计算机研究领域的作用和重要性 xjnine Math
最近一直有师弟师妹和朋友问我数学和研究的关系，研一要去学什么数学课。毕竟在清华，衡量一个研究生最重要的指标之一就是paper,而没有数学，是肯定上不了世界顶级的期刊和会议的，这在计算机学界尤其重要！你会发现，不论哪个领域有价值的东西，都一定离不开数学！在这样一个信息时代，当google已经让世界没有秘密的时候，一种卓越的数学思维，绝对可以成为你的核心竞争力. 无奈本人实在见地