晓理紫

[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--大模型、扩散模型、视觉导航

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持
VX关注，并留下邮箱可获得每日定时推送

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉导航

具身智能，机器人

强化学习

开放词汇，检测分割

[晓理紫]每日论文分享(有中文摘要，源码或项目地址)

== LLM ==

标题: Towards Language-Driven Video Inpainting via Multimodal Large Language Models

作者: Jianzong Wu, Xiangtai Li, Chenyang Si

中文摘要: 我们引入了一个新的任务——语言驱动的视频修复，它使用自然语言指令来指导修复过程。这种方法克服了传统视频修复方法的局限性，传统视频修复方法依赖于手动标记的二进制掩模，这是一个通常繁琐且劳动密集型的过程。我们提出了通过指令从视频中移除对象（ROVI）数据集，包含5,650个视频和9,091个修复结果，以支持这项任务的训练和评估。我们还提出了一种新的基于扩散的语言驱动的视频修复框架，这是该任务的第一个端到端基线，集成了多模态大型语言模型，以有效地理解和执行复杂的基于语言的修复请求。我们的综合结果展示了数据集的多功能性和模型在各种语言指导的修复场景中的有效性。我们将公开数据集、代码和模型。

摘要: We introduce a new task – language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset’s versatility and the model’s effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available.

[Downlink:]http://arxiv.org/abs/2401.10226v1

[Project:]https://jianzongwu.github.io/projects/rovi|

标题: Supervised Fine-tuning in turn Improves Visual Foundation Models

作者: Xiaohu Jiang, Yixiao Ge, Yuying Ge

中文摘要: 像CLIP这样的图像——文本训练近年来主导了视觉基础模型的预训练。随后的努力是将区域级视觉学习引入CLIP的预训练，但由于缺乏大规模区域级数据集，面临可扩展性挑战。从自然语言处理（如指令调整）中的监督微调（SFT）中获得灵感，我们探索了细粒度SFT在预训练后增强视觉基础模型生成的潜力。因此，提出了一种两阶段方法ViSFT（Vision SFT）来释放视觉基础模型的细粒度知识。在ViSFT中，视觉基础模型通过在一些域内任务上执行视觉联合学习来增强，然后在域外基准上进行测试。通过在不到2天的时间内在8个V100 GPUs上使用ViSFT进行更新，具有超过4.4 B参数的视觉Transformer model显示了各种域外基准测试的改进，包括视觉和视觉语言场景。

摘要: Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP’s pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuning, we explore the potential of fine-grained SFT in enhancing the generation of vision foundation models after their pretraining. Thus a two-stage method ViSFT (Vision SFT) is proposed to unleash the fine-grained knowledge of vision foundation models. In ViSFT, the vision foundation model is enhanced by performing visual joint learning on some in-domain tasks and then tested on out-of-domain benchmarks. With updating using ViSFT on 8 V100 GPUs in less than 2 days, a vision transformer with over 4.4B parameters shows improvements across various out-of-domain benchmarks including vision and vision-linguistic scenarios.

[Downlink:]http://arxiv.org/abs/2401.10222v1

[GitHub:]https://github.com/TencentARC/ViSFT/tree/main|

标题: MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

作者: Yue Huang, Jiawen Shi, Yuan Li

中文摘要: 大型语言模型（LLMs）因其令人印象深刻的自然语言处理（NLP）能力而获得了极大的关注。最近，许多研究集中在LLMs的工具利用能力上。他们主要调查了LLMs如何有效地与给定的特定工具合作。然而，在LLM充当智能代理的场景中，如在AutoGPT和MetaGPT等应用程序中所见，LLM需要参与复杂的决策过程，包括决定是否使用工具以及从可用工具集合中选择最合适的工具来满足用户请求。因此，在本文中，我们介绍了MetaTool，这是一个旨在评估LLMs是否具有工具使用意识并能够正确选择工具的基准。具体来说，我们在基准测试中创建了一个名为ToolE的数据集。该数据集包含各种类型的用户查询，这些查询以提示的形式触发LLMs使用工具，包括单工具和多工具场景。随后，我们为工具使用意识和工具选择设置任务。在刀具选择中，我们从不同的角度定义了四个子任务，包括具有相似选择的刀具选择、特定场景下的刀具选择、可能存在可靠性问题的刀具选择和多刀具选择。我们进行了涉及八个流行的LLM的实验，发现它们中的大多数仍然难以有效地选择工具，突出了LLM和真正的智能代理之间存在的差距。然而，通过误差分析，我们发现仍有很大的改进空间。最后，我们总结了对工具开发人员的见解——我们强烈建议工具开发人员选择一个合适的重写模型，根据工具将应用的下游LLM生成新的描述。我们的代码在\href{https://github.com/HowieHwong/MetaTool}{Github}中。

摘要: Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. Recently, many studies have focused on the tool utilization ability of LLMs. They primarily investigated how LLMs effectively collaborate with given specific tools. However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests. Therefore, in this paper, we introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools. Specifically, we create a dataset called ToolE within the benchmark. This dataset contains various types of user queries in the form of prompts that trigger LLMs to use tools, including both single-tool and multi-tool scenarios. Subsequently, we set the tasks for both tool usage awareness and tool selection. We define four subtasks from different perspectives in tool selection, including tool selection with similar choices, tool selection in specific scenarios, tool selection with possible reliability issues, and multi-tool selection. We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools, highlighting the existing gaps between LLMs and genuine intelligent agents. However, through the error analysis, we found there is still significant room for improvement. Finally, we conclude with insights for tool developers – we strongly recommend that tool developers choose an appropriate rewrite model for generating new descriptions based on the downstream LLM the tool will apply to. Our code is in \href{https://github.com/HowieHwong/MetaTool}{Github}.

[Downlink:]http://arxiv.org/abs/2310.03128v4

[GitHub:]https://github.com/HowieHwong/MetaTool|

标题: Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

作者: Zdeněk Kasner, Ondřej Dušek

中文摘要: 我们研究开放大型语言模型（LLMs）在多大程度上可以从结构化数据中生成连贯和相关的文本。为了防止基准泄露到LLM训练数据中的偏差，我们收集了Quintd-1：一个用于五个数据到文本（D2T）生成任务的特别基准，由从公共API收集的标准格式的结构化数据记录组成。我们利用无参考评估指标和LLMs的上下文学习能力，允许我们在没有人工编写参考的情况下测试模型。我们的评估集中在标记级别上注释语义准确性错误，结合人类注释器和基于GPT-4的度量。我们对跨领域和任务的模型行为的系统检查表明，具有7B参数的最先进的开放LLMs可以在零镜头设置中从各种标准数据格式生成流畅和连贯的文本。然而，我们也表明，输出的语义准确性仍然是一个主要问题：在我们的基准测试中，根据人类注释者，80%的开放LLMs输出包含语义错误（根据GPT-4，91%）。我们的代码、数据和模型输出可从https：//d2t-llm.github.io获得。

摘要: We investigate to which extent open large language models (LLMs) can generate coherent and relevant text from structured data. To prevent bias from benchmarks leaked into LLM training data, we collect Quintd-1: an ad-hoc benchmark for five data-to-text (D2T) generation tasks, consisting of structured data records in standard formats gathered from public APIs. We leverage reference-free evaluation metrics and LLMs’ in-context learning capabilities, allowing us to test the models with no human-written references. Our evaluation focuses on annotating semantic accuracy errors on token-level, combining human annotators and a metric based on GPT-4. Our systematic examination of the models’ behavior across domains and tasks suggests that state-of-the-art open LLMs with 7B parameters can generate fluent and coherent text from various standard data formats in zero-shot settings. However, we also show that semantic accuracy of the outputs remains a major issue: on our benchmark, 80% of outputs of open LLMs contain a semantic error according to human annotators (91% according to GPT-4). Our code, data, and model outputs are available at https://d2t-llm.github.io.

[Downlink:]http://arxiv.org/abs/2401.10186v1

[Project:]https://d2t-llm.github.io.|

标题: Hyperbolic Image-Text Representations

作者: Karan Desai, Maximilian Nickel, Tanmay Rajpurohit

中文摘要: 视觉和语言概念自然地组织在一个层次结构中，其中文本概念“狗”包含所有包含狗的图像。尽管很直观，但当前的大规模视觉和语言模型（如CLIP）并没有明确捕捉到这种层次结构。我们提出了MERU，一个产生图像和文本双曲线表示的对比模型。双曲空间具有适合嵌入树状数据的几何属性，因此MERU可以更好地捕捉图像——文本数据集中的底层层次结构。我们的结果表明，MERU学习了高度可解释和结构化的表示空间，同时在标准多模态任务（如图像分类和图像文本检索）上与CLIP的性能具有竞争力。我们的代码和模型可从https://www.github.com/facebookresearch/meru

摘要: Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept “dog” entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP’s performance on standard multi-modal tasks like image classification and image-text retrieval. Our code and models are available at https://www.github.com/facebookresearch/meru

[Downlink:]http://arxiv.org/abs/2304.09172v3

[GitHub:]https://www.github.com/facebookresearch/meru|

标题: FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

作者: Xiang Chen, Duanzheng Song, Honghao Gui

中文摘要: 尽管LLM具有令人印象深刻的生成能力，但在现实世界的应用中，它们会受到与事实冲突的幻觉的阻碍。LLMs产生的文本中幻觉的准确识别，尤其是在复杂的推理场景中，是一个相对未探索的领域。为了解决这一差距，我们提出了FactCHD，这是一个专门为检测LLMs中与事实冲突的幻觉而设计的基准。FactCHD具有跨越各种事实模式的多样化数据集，包括普通、多跳、比较和集合操作。FactCHD的一个独特元素是它整合了基于事实的证据链，大大增强了评估检测器解释的深度。在不同LLMs上的实验暴露了当前方法在准确检测事实错误方面的缺点。此外，我们引入了真值三角测量器，它通过基于Llama2的工具增强的ChatGPT和LoRA调整来综合反射性考虑，旨在通过预测结果和证据的融合来产生更可信的检测。基准数据集可在https://github.com/zjunlp/FactCHD。

摘要: Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors’ explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence. The benchmark dataset is available at https://github.com/zjunlp/FactCHD.

[Downlink:]http://arxiv.org/abs/2310.12086v2

[GitHub:]https://github.com/zjunlp/FactCHD.|

== VLM ==

标题: The Manga Whisperer: Automatically Generating Transcriptions for Comics

作者: Ragav Sachdeva, Andrew Zisserman

中文摘要: 在过去的几十年里，日本漫画，通常被称为漫画，已经超越了文化和语言的界限，成为真正的全球轰动。然而，漫画中固有的对视觉线索和插图的依赖使得有视觉障碍的人很难接触到它。在这项工作中，我们试图解决这一实质性的障碍，目的是确保漫画能够被每个人欣赏和积极参与。具体来说，我们解决了日记化的问题，即以全自动的方式生成谁在何时说了什么的转录。为此，我们做出了以下贡献：（1）我们提出了一个统一的模型Magi，它能够（a）检测面板、文本框和字符框，（b）通过身份对字符进行聚类（不预先知道聚类的数量），以及（c）将对话与其说话者相关联；（2）我们提出了一种新的方法，能够按照阅读顺序对检测到的文本框进行排序，并生成对话抄本；（3）我们使用公开可用的[英文]漫画页面来注释这项任务的评估基准。代码、评估数据集和预训练模型可在以下网址找到：https：//github.com/ragavsachdeva/magi。

摘要: In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way. To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and © associate dialogues to their speakers; (2) we propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) we annotate an evaluation benchmark for this task using publicly available [English] manga pages. The code, evaluation datasets and the pre-trained model can be found at: https://github.com/ragavsachdeva/magi.

[Downlink:]http://arxiv.org/abs/2401.10224v1

[GitHub:]https://github.com/ragavsachdeva/magi.|

标题: Supervised Fine-tuning in turn Improves Visual Foundation Models

作者: Xiaohu Jiang, Yixiao Ge, Yuying Ge

[Downlink:]http://arxiv.org/abs/2401.10222v1

[GitHub:]https://github.com/TencentARC/ViSFT/tree/main|

标题: MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

作者: Changyao Tian, Xizhou Zhu, Yuwen Xiong

中文摘要: 开发交错图像——文本数据的生成模型具有研究和实用价值。它要求模型理解交错序列，并随后生成图像和文本。然而，现有的尝试受到固定数量的视觉标记不能有效地捕捉图像细节的问题的限制，这在多图像场景中尤其成问题。为了解决这个问题，本文提出了MM-Interleaved，一个用于交错图像——文本数据的端到端生成模型。它引入了多尺度和多图像特征同步器模块，允许在生成过程中直接访问先前上下文中的细粒度图像特征。MM-Interleaved是对成对和交错的图像——文本语料库进行端到端预训练的。它通过监督微调阶段得到进一步增强，其中模型提高了其遵循复杂多模态指令的能力。实验证明了MM-Interleaved在按照多模态指令识别视觉细节和按照文本和视觉条件生成一致图像方面的多功能性。代码和模型可在\url{https：//github.com/OpenGVLab/MM-Interleaved}获得。

摘要: Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the issue that the fixed number of visual tokens cannot efficiently capture image details, which is particularly problematic in the multi-image scenarios. To address this, this paper presents MM-Interleaved, an end-to-end generative model for interleaved image-text data. It introduces a multi-scale and multi-image feature synchronizer module, allowing direct access to fine-grained image features in the previous context during the generation process. MM-Interleaved is end-to-end pre-trained on both paired and interleaved image-text corpora. It is further enhanced through a supervised fine-tuning phase, wherein the model improves its ability to follow complex multi-modal instructions. Experiments demonstrate the versatility of MM-Interleaved in recognizing visual details following multi-modal instructions and generating consistent images following both textual and visual conditions. Code and models are available at \url{https://github.com/OpenGVLab/MM-Interleaved}.

[Downlink:]http://arxiv.org/abs/2401.10208v1

[GitHub:]https://github.com/OpenGVLab/MM-Interleaved|

标题: VMamba: Visual State Space Model

作者: Yue Liu, Yunjie Tian, Yuzhong Zhao

摘要: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning. While CNNs exhibit remarkable scalability with linear complexity w.r.t. image resolution, ViTs surpass them in fitting capabilities despite contending with quadratic complexity. A closer inspection reveals that ViTs achieve superior visual modeling performance through the incorporation of global receptive fields and dynamic weights. This observation motivates us to propose a novel architecture that inherits these components while enhancing computational efficiency. To this end, we draw inspiration from the recently introduced state space model and propose the Visual State Space Model (VMamba), which achieves linear complexity without sacrificing global receptive fields. To address the encountered direction-sensitive issue, we introduce the Cross-Scan Module (CSM) to traverse the spatial domain and convert any non-causal visual image into order patch sequences. Extensive experimental results substantiate that VMamba not only demonstrates promising capabilities across various visual perception tasks, but also exhibits more pronounced advantages over established benchmarks as the image resolution increases. Source code has been available at https://github.com/MzeroMiko/VMamba.

[Downlink:]http://arxiv.org/abs/2401.10166v1

[GitHub:]https://github.com/MzeroMiko/VMamba.|

标题: Hyperbolic Image-Text Representations

作者: Karan Desai, Maximilian Nickel, Tanmay Rajpurohit

[Downlink:]http://arxiv.org/abs/2304.09172v3

[GitHub:]https://www.github.com/facebookresearch/meru|

标题: VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

作者: Xianfu Cheng, Weixiao Zhou, Xiang Li

中文摘要: 场景文本识别（STR）是一项具有挑战性的任务，涉及识别自然场景图像中的文本。尽管当前最先进的STR模型表现出高性能，但由于它们依赖于由视觉编码器和序列解码器组成的混合架构，它们通常具有较低的推理效率。在这项工作中，我们提出了用于快速有效的场景文本识别（VIPTR）的视觉可排列提取器，它在STR领域的高性能和快速推理速度之间实现了令人印象深刻的平衡。具体来说，VIPTR利用了一个具有金字塔结构的视觉语义提取器，其特征是多个自我注意层，同时避开了传统的序列解码器。这种设计选择产生了一种轻量级和高效的模型，能够处理不同大小的输入。在各种标准数据集上对中英文场景文本识别的大量实验结果验证了VIPTR的优越性。值得注意的是，VIPTR-T（微型）变体提供了与其他轻量级模型相当的极具竞争力的准确性，并实现了SOTA推理速度。同时，VIPTR-L（大）变体获得了更高的识别精度，同时保持了较低的参数计数和有利的推理速度。我们提出的方法为STR挑战提供了一个令人信服的解决方案，它融合了高准确性和效率，极大地有利于需要快速可靠文本识别的现实世界应用。代码可在https://github.com/cxfyxl/VIPTR。

摘要: Scene Text Recognition (STR) is a challenging task that involves recognizing text within images of natural scenes. Although current state-of-the-art models for STR exhibit high performance, they typically suffer from low inference efficiency due to their reliance on hybrid architectures comprised of visual encoders and sequence decoders. In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR. Specifically, VIPTR leverages a visual-semantic extractor with a pyramid structure, characterized by multiple self-attention layers, while eschewing the traditional sequence decoder. This design choice results in a lightweight and efficient model capable of handling inputs of varying sizes. Extensive experimental results on various standard datasets for both Chinese and English scene text recognition validate the superiority of VIPTR. Notably, the VIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the VIPTR-L (Large) variant attains greater recognition accuracy, while maintaining a low parameter count and favorable inference speed. Our proposed method provides a compelling solution for the STR challenge, which blends high accuracy with efficiency and greatly benefits real-world applications requiring fast and reliable text recognition. The code is publicly available at https://github.com/cxfyxl/VIPTR.

[Downlink:]http://arxiv.org/abs/2401.10110v1

[GitHub:]https://github.com/cxfyxl/VIPTR.|

== diffusion policy@diffusion formulation@diffusion model ==

标题: A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

作者: Wouter Van Gansbeke, Bert De Brabandere

中文摘要: 全景和实例分割网络通常用专门的对象检测模块、复杂的损失函数和特设的后处理步骤来训练，以处理实例掩码的排列不变性。这项工作建立在稳定扩散的基础上，并提出了一种用于全景分割的潜在扩散方法，从而产生了一种简单的架构，省略了这些复杂性。我们的训练过程包括两个步骤：（1）训练一个浅层自动编码器将分割掩模投影到潜在空间；（2）训练扩散模型以允许潜在空间中的图像条件采样。生成模型的使用开启了掩模完成或修复的探索，这在交互式分割中具有应用。实验验证产生了全景分割和掩模修复的有希望的结果。虽然没有设置一个新的最先进的状态，我们的模型的简单性，通用性和掩模完成能力是可取的属性。

摘要: Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to handle the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture which omits these complexities. Our training process consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. The use of a generative model unlocks the exploration of mask completion or inpainting, which has applications in interactive segmentation. The experimental validation yields promising results for both panoptic segmentation and mask inpainting. While not setting a new state-of-the-art, our model’s simplicity, generality, and mask completion capability are desirable properties.

[Downlink:]http://arxiv.org/abs/2401.10227v1

[GitHub:]https://github.com/segments-ai/latent-diffusion-segmentation|

标题: Towards Language-Driven Video Inpainting via Multimodal Large Language Models

作者: Jianzong Wu, Xiangtai Li, Chenyang Si

[Downlink:]http://arxiv.org/abs/2401.10226v1

[Project:]https://jianzongwu.github.io/projects/rovi|

标题: Hierarchical Masked 3D Diffusion Model for Video Outpainting

作者: Fanda Fan, Chaoxu Guo, Litong Gong

中文摘要: 视频输出旨在充分完成视频帧边缘的缺失区域。与图像绘制相比，它提出了一个额外的挑战，因为模型应该保持填充区域的时间一致性。本文介绍了一种用于视频输出的屏蔽三维扩散模型。我们使用掩模建模技术来训练三维扩散模型。这允许我们使用多个引导帧来连接多个视频剪辑推理的结果，从而确保时间一致性并减少相邻帧之间的抖动。同时，我们提取视频的全局帧作为提示，并使用交叉注意力引导模型获取当前视频剪辑以外的信息。我们还引入了一种混合的从粗到细的推理流水线来缓解伪影积累问题。现有的粗细流水线仅采用填充策略，由于稀疏帧的时间间隔过大，导致性能下降。我们的流水线受益于掩模建模的双向学习，因此在生成稀疏帧时可以采用填充和插值的混合策略。实验表明，我们的方法在视频输出任务中取得了最先进的结果。我们的https：//fanfanda.github.io/M3DDM/。

摘要: Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results and codes are provided at our https://fanfanda.github.io/M3DDM/.

[Downlink:]http://arxiv.org/abs/2309.02119v2

[Project:]https://fanfanda.github.io/M3DDM/.|

标题: FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

作者: Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang

摘要: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated convolution that elevates frequency awareness, resulting in generating speech with accurate frequency information, and (3) We introduce a bag of tricks that boosts the generation quality of the proposed model. In our experiments, FreGrad achieves 3.7 times faster training time and 2.2 times faster inference speed compared to our baseline while reducing the model size by 0.6 times (only 1.78M parameters) without sacrificing the output quality. Audio samples are available at: https://mm.kaist.ac.kr/projects/FreGrad.

[Downlink:]http://arxiv.org/abs/2401.10032v1

[Project:]https://mm.kaist.ac.kr/projects/FreGrad.|

标题: BlenDA: Domain Adaptive Object Detection through diffusion-based blending

作者: Tzuhsuan Huang, Chen-Che Huang, Chung-Hao Ku

中文摘要: 无监督域自适应（UDA）旨在将使用标记数据学习的模型从源域转移到目标域中的未标记数据。为了解决源域和目标域之间的大域差距问题，我们提出了一种新的域自适应目标检测正则化方法BlenDA，通过生成中间域的伪样本及其相应的软域标签进行自适应训练。中间样本是通过使用现成的预训练文本到图像扩散模型将源图像与其相应的翻译图像动态混合来生成的，该模型将目标域的文本标签作为输入，并表现出优异的图像到图像翻译质量。基于两个自适应基准的实验结果，我们提出的方法可以显著增强最先进的领域自适应对象检测器——对抗性查询Transformer model（AQT）的性能。特别是，在城市景观到雾城市景观的适应中，我们在雾城市景观数据集上实现了令人印象深刻的53.4%地图，比以前的最先进水平高出1.5%。值得注意的是，我们提出的方法也适用于各种领域自适应目标检测范式。代码可从以下网址获得：https://github.com/aiiu-lab/BlenDA

摘要: Unsupervised domain adaptation (UDA) aims to transfer a model learned using labeled data from the source domain to unlabeled data in the target domain. To address the large domain gap issue between the source and target domains, we propose a novel regularization method for domain adaptive object detection, BlenDA, by generating the pseudo samples of the intermediate domains and their corresponding soft domain labels for adaptation training. The intermediate samples are generated by dynamically blending the source images with their corresponding translated images using an off-the-shelf pre-trained text-to-image diffusion model which takes the text label of the target domain as input and has demonstrated superior image-to-image translation quality. Based on experimental results from two adaptation benchmarks, our proposed approach can significantly enhance the performance of the state-of-the-art domain adaptive object detector, Adversarial Query Transformer (AQT). Particularly, in the Cityscapes to Foggy Cityscapes adaptation, we achieve an impressive 53.4% mAP on the Foggy Cityscapes dataset, surpassing the previous state-of-the-art by 1.5%. It is worth noting that our proposed method is also applicable to various paradigms of domain adaptive object detection. The code is available at:https://github.com/aiiu-lab/BlenDA

[Downlink:]http://arxiv.org/abs/2401.09921v1

[GitHub:]https://github.com/aiiu-lab/BlenDA|

标题: Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

作者: Zijie Pan, Jiachen Lu, Xiatian Zhu

中文摘要: 高分辨率3D对象生成仍然是一项具有挑战性的任务，这主要是由于综合注释训练数据的可用性有限。最近的进展旨在通过利用图像生成模型来克服这一限制，图像生成模型在广泛的精选网络数据集上进行预训练，使用知识转移技术，如分数蒸馏采样（SDS）。有效地解决高分辨率渲染的要求通常需要采用基于潜在表示的模型，例如潜在扩散模型（LDM）。在这个框架中，出现了一个重大的挑战：为了计算单个图像像素的梯度，有必要通过图像模型的冻结组件，例如LDM中使用的VAE编码器，从指定的潜在空间反向传播梯度。然而，这种梯度传播途径从未被优化，在训练期间保持不受控制。我们发现，不规范的梯度会不利地影响3D模型从图像生成模型获取纹理相关信息的能力，导致外观合成质量差。为了解决这一首要挑战，我们提出了一种称为像素梯度裁剪（PGC）的创新操作，旨在无缝集成到现有的3D生成模型中，从而提高它们的合成质量。具体来说，我们通过有效地裁剪像素级梯度来控制随机梯度的大小，同时保留关键的纹理相关梯度方向。尽管这种简单性和最小的额外成本，大量的实验证明了我们的PGC在增强现有3D生成模型的性能以实现高分辨率对象渲染方面的有效性。

摘要: High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model’s capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.

[Downlink:]http://arxiv.org/abs/2310.12474v4

[Project:]https://fudan-zvg.github.io/PGC-3D|

== Visual Navigation@Visual Exploration @ VSLAM ==

标题: SEINE: Structure Encoding and Interaction Network for Nuclei Instance Segmentation

作者: Ye Zhang, Linghan Cai, Ziyue Wang

中文摘要: 组织病理学图像中的细胞核实例分割对于生物分析和癌症诊断非常重要，但由于两个原因仍然具有挑战性。（1）嫌色细胞核的核内和核外区域的相似视觉呈现经常导致分割不足，以及（2）当前的方法缺乏对核结构的探索，导致碎片化的实例预测。为了解决这些问题，本文提出了一种结构编码和交互网络，称为SEINE，它开发了核的结构建模方案，并利用核之间的结构相似性来提高每个分段实例的完整性。具体来说，SEINE引入了一种基于轮廓的结构编码（SE），该编码考虑了原子核结构和语义之间的相关性，实现了原子核结构的合理表示。基于编码，我们提出了一种以清晰核为原型的结构引导注意（SGA）来增强模糊核的结构学习。为了增强结构学习能力，提出了语义特征融合（SFF）来提高语义分支和结构分支的语义一致性。此外，位置增强（PE）方法被应用于抑制不正确的核边界预测。大量的实验证明了我们的方法的优越性，SEINE在四个数据集上实现了最先进的（SOTA）性能。该代码可从\href{https：//github.com/zhangye-zoe/SEINE}{https：//github.com/zhangye-zoe/SEINE}获得。

摘要: Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions. To address these problems, this paper proposes a structure encoding and interaction network, termed SEINE, which develops the structure modeling scheme of nuclei and exploits the structure similarity between nuclei to improve the integrality of each segmented instance. Concretely, SEINE introduces a contour-based structure encoding (SE) that considers the correlation between nuclei structure and semantics, realizing a reasonable representation of the nuclei structure. Based on the encoding, we propose a structure-guided attention (SGA) that takes the clear nuclei as prototypes to enhance the structure learning for the fuzzy nuclei. To strengthen the structural learning ability, a semantic feature fusion (SFF) is presented to boost the semantic consistency of semantic and structure branches. Furthermore, a position enhancement (PE) method is applied to suppress incorrect nuclei boundary predictions. Extensive experiments demonstrate the superiority of our approaches, and SEINE achieves state-of-the-art (SOTA) performance on four datasets. The code is available at \href{https://github.com/zhangye-zoe/SEINE}{https://github.com/zhangye-zoe/SEINE}.

[Downlink:]http://arxiv.org/abs/2401.09773v1

[GitHub:]https://github.com/zhangye-zoe/SEINE|https://github.com/zhangye-zoe/SEINE|

标题: SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

作者: Yang Zhan, Zhitong Xiong, Yuan Yuan

中文摘要: 大型语言模型（LLMs）最近已经扩展到视觉语言领域，获得了令人印象深刻的通用多模态能力。然而，针对遥感数据的多模态大语言模型（MLLMs）的探索仍处于起步阶段，性能并不令人满意。在这项工作中，我们介绍了SkyEyeGPT，一个统一的多模态大型语言模型，专门为RS视觉语言理解而设计。为此，我们精心策划了一个RS多模态指令调优数据集，包括单任务和多任务对话指令。经过人工验证，我们获得了968k样本的高质量RS指令跟随数据集。我们的研究表明，通过简单而有效的设计，SkyEyeGPT在相当不同的任务上工作得非常好，而不需要额外的编码模块。具体来说，在通过对齐层将RS视觉特征投影到语言域之后，它们与特定于任务的指令一起被馈送到基于LLM的RS解码器中，以预测RS开放式任务的答案。此外，我们设计了一种两阶段调整方法，以增强不同粒度下的指令跟随和多回合对话能力。在8个RS视觉语言任务数据集上的实验证明了SkyEyeGPT在图像级和区域级任务中的优势，如字幕和视觉接地。特别是，与GPT-4V相比，SkyEyeGPT在一些定性测试中表现出令人鼓舞的结果。在线演示、代码和数据集将在https：//github.com/ZhanYang-nwpu/SkyEyeGPT中发布。

摘要: Large language models (LLMs) have recently been extended to the vision-language realm, obtaining impressive general multi-modal capabilities. However, the exploration of multi-modal large language models (MLLMs) for remote sensing (RS) data is still in its infancy, and the performance is not satisfactory. In this work, we introduce SkyEyeGPT, a unified multi-modal large language model specifically designed for RS vision-language understanding. To this end, we meticulously curate an RS multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions. After manual verification, we obtain a high-quality RS instruction-following dataset with 968k samples. Our research demonstrates that with a simple yet effective design, SkyEyeGPT works surprisingly well on considerably different tasks without the need for extra encoding modules. Specifically, after projecting RS visual features to the language domain via an alignment layer, they are fed jointly with task-specific instructions into an LLM-based RS decoder to predict answers for RS open-ended tasks. In addition, we design a two-stage tuning method to enhance instruction-following and multi-turn dialogue ability at different granularities. Experiments on 8 datasets for RS vision-language tasks demonstrate SkyEyeGPT’s superiority in image-level and region-level tasks, such as captioning and visual grounding. In particular, SkyEyeGPT exhibits encouraging results compared to GPT-4V in some qualitative tests. The online demo, code, and dataset will be released in https://github.com/ZhanYang-nwpu/SkyEyeGPT.

[Downlink:]http://arxiv.org/abs/2401.09712v1

[GitHub:]https://github.com/ZhanYang-nwpu/SkyEyeGPT.|

标题: Cross-Modality Perturbation Synergy Attack for Person Re-identification

作者: Yunpeng Gong, others

中文摘要: 近年来，有大量研究专注于解决基于RGB图像的单模态人员再识别（ReID）系统中的安全问题。然而，在涉及红外摄像机捕获的图像的实际应用中更常见的跨模态场景的安全性没有得到足够的重视。跨模态ReID的主要挑战在于有效地处理不同模态之间的视觉差异。例如，红外图像通常是灰度的，不像可见光图像包含颜色信息。现有的攻击方法主要集中在可见图像模态的特征上，而忽略了其他模态的特征以及不同模态之间数据分布的差异。这种疏忽可能会潜在地破坏这些方法在跨不同模态的图像检索中的有效性。这项研究代表了对跨模态ReID模型安全性的首次探索，并提出了一种专门为跨模态ReID设计的通用扰动攻击。这种攻击通过利用来自不同模态数据的梯度来优化扰动，从而破坏鉴别器并加强模态之间的差异。我们在两个广泛使用的跨模态数据集上进行了实验，即RegDB和SYSU，这不仅证明了我们方法的有效性，而且为未来跨模态ReID系统的鲁棒性增强提供了见解。

摘要: In recent years, there has been significant research focusing on addressing security concerns in single-modal person re-identification (ReID) systems that are based on RGB images. However, the safety of cross-modality scenarios, which are more commonly encountered in practical applications involving images captured by infrared cameras, has not received adequate attention. The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities. For instance, infrared images are typically grayscale, unlike visible images that contain color information. Existing attack methods have primarily focused on the characteristics of the visible image modality, overlooking the features of other modalities and the variations in data distribution among different modalities. This oversight can potentially undermine the effectiveness of these methods in image retrieval across diverse modalities. This study represents the first exploration into the security of cross-modality ReID models and proposes a universal perturbation attack specifically designed for cross-modality ReID. This attack optimizes perturbations by leveraging gradients from diverse modality data, thereby disrupting the discriminator and reinforcing the differences between modalities. We conducted experiments on two widely used cross-modality datasets, namely RegDB and SYSU, which not only demonstrated the effectiveness of our method but also provided insights for future enhancements in the robustness of cross-modality ReID systems.

[Downlink:]http://arxiv.org/abs/2401.10090v1

标题: A Semantic Approach for Big Data Exploration in Industry 4.0

作者: Idoia Berges, Víctor Julio Ramírez-Durán, Arantza Illarramendi

中文摘要: 自动化、物联网、大数据和云计算技术的增长趋势导致了第四次工业革命（工业4.0），在这场革命中，可以可视化和识别模式和见解，从而更好地理解数据，并可以改善制造过程。然而，很多时候，数据探索的任务对制造专家来说很困难，因为他们可能对分析预先设计的可视化中没有出现的数据感兴趣，因此他们必须得到信息技术专家的帮助。在本文中，我们提出了一个基于语义的可视化查询系统，该系统是为真实的工业4.0场景开发的，允许领域专家以友好的方式探索和可视化数据。该系统的主要新颖之处在于它结合使用了首先进行语义注释的捕获数据，以及还与语义描述相关联的机器的2D定制数字表示。这些描述使用本体的术语来表达，其中，除其他外，用于捕获属于工业4.0场景的机器的性能指标的传感器已经被建模。此外，这种语义描述允许：在更高的抽象层次上制定查询，基于数据的格式和性质提供结果的定制图形可视化，以及下载丰富的数据以支持进一步类型的分析。

摘要: The growing trends in automation, Internet of Things, big data and cloud computing technologies have led to the fourth industrial revolution (Industry 4.0), where it is possible to visualize and identify patterns and insights, which results in a better understanding of the data and can improve the manufacturing process. However, many times, the task of data exploration results difficult for manufacturing experts because they might be interested in analyzing also data that does not appear in pre-designed visualizations and therefore they must be assisted by Information Technology experts. In this paper, we present a proposal materialized in a semantic-based visual query system developed for a real Industry 4.0 scenario that allows domain experts to explore and visualize data in a friendly way. The main novelty of the system is the combined use that it makes of captured data that are semantically annotated first, and a 2D customized digital representation of a machine that is also linked with semantic descriptions. Those descriptions are expressed using terms of an ontology, where, among others, the sensors that are used to capture indicators about the performance of a machine that belongs to a Industry 4.0 scenario have been modeled. Moreover, this semantic description allows to: formulate queries at a higher level of abstraction, provide customized graphical visualizations of the results based on the format and nature of the data, and download enriched data enabling further types of analysis.

[Downlink:]http://arxiv.org/abs/2401.09789v1

标题: Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box Method

作者: Chenxi Yang, Yujia Liu, Dingquan Li

中文摘要: 无参考图像质量评估（NR-IQA）旨在预测与人类感知一致的图像质量分数，而不依赖于原始参考图像，作为各种视觉任务的关键组成部分。确保NR-IQA方法的鲁棒性对于不同图像处理技术的可靠比较和推荐中一致的用户体验至关重要。NR-IQA的攻击方法为测试NR-IQA的鲁棒性提供了有力的工具。然而，目前的NR-IQA攻击方法严重依赖于NR-IQA模型的梯度，导致梯度信息不可用时的局限性。在本文中，我们提出了一个开创性的基于查询的黑盒攻击NR-IQA方法。我们提出了分数边界的概念，并利用具有多个分数边界的自适应迭代方法。同时，初始攻击方向也被设计成利用人类视觉系统（HVS）的特征。实验表明，我们的方法优于所有比较的最先进的攻击方法，并远远领先于以前的黑盒方法。有效的NR-IQA模型DBCNN在我们的方法攻击下遭受0.6381的Spearman秩序相关系数（SROCC）下降，揭示了NR-IQA模型对黑盒攻击的脆弱性。所提出的攻击方法也为进一步探索NR-IQA鲁棒性提供了有力的工具。

摘要: No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman’s rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness.

[Downlink:]http://arxiv.org/abs/2401.05217v2

你可能感兴趣的:(每日论文,人工智能)

Paddle进阶实战系列（三）：基于SVTR算法的手写英文单词识别 GoAI 深入浅出OCR 深入浅出AI 计算机视觉 OCR paddle 深度学习人工智能
‍作者简介：CSDN、阿里云人工智能领域博客专家，新星计划计算机视觉导师，百度飞桨PPDE，专注大数据与AI知识分享。公众号：GoAI的学习小屋，免费分享书籍、简历、导图等，更有交流群分享宝藏资料，关注公众号回复“加群”或➡️链接加群。专栏推荐：➡️
每日 Java 面试题分享【第 16 天】一只蜘猪【2025最新版】Java 基础面试题 java 开发语言面试
欢迎来到每日Java面试题分享栏目！订阅专栏，不错过每一天的练习今日分享3道面试题目！评论区复述一遍印象更深刻噢~目录问题一：Java运行时异常和编译时异常之间的区别是什么？问题二：什么是Java中的继承机制？问题三：什么是Java的封装特性？问题：Java运行时异常和编译时异常之间的区别是什么？面试官考察点异常分类理解：对Java异常体系（Throwable、Error、Exception、Ru
每日 Java 面试题分享【第 20 天】一只蜘猪【2025最新版】Java 基础面试题 java 开发语言面试 IO
欢迎来到每日Java面试题分享栏目！订阅专栏，不错过每一天的练习今日分享3道面试题目！评论区复述一遍印象更深刻噢~目录问题一：什么是BIO、NIO、AIO？问题二：什么是Channel？问题三：什么是Selector？问题一：什么是BIO、NIO、AIO？面试官视角拆解：这个问题考察对JavaI/O模型的体系化理解，以及不同场景下的技术选型能力。回答要体现三个层次：基础概念对比（核心特征+工作机制
自动驾驶系列—深度剖析自动驾驶芯片SoC架构：选型指南与应用实战学步_技术自动驾驶自动驾驶架构人工智能 SoC 芯片
欢迎来到我的技术小筑，一个专为技术探索者打造的交流空间。在这里，我们不仅分享代码的智慧，还探讨技术的深度与广度。无论您是资深开发者还是技术新手，这里都有一片属于您的天空。让我们在知识的海洋中一起航行，共同成长，探索技术的无限可能。探索专栏：学步_技术的首页——持续学习，不断进步，让学习成为我们共同的习惯，让总结成为我们前进的动力。技术导航：人工智能：深入探讨人工智能领域核心技术。自动驾驶：分享自动
DeepSeek R1 AI 论文翻译老马啸西风 java
摘要原文地址：DeepSeekR1AI论文翻译我们介绍了我们的第一代推理模型，DeepSeek-R1-Zero和DeepSeek-R1。DeepSeek-R1-Zero是一个通过大规模强化学习（RL）训练的模型，且在此过程中未使用监督微调（SFT）作为预处理步骤，展现出了显著的推理能力。通过RL，DeepSeek-R1-Zero自然而然地展现了许多强大且引人注目的推理行为。然而，它也遇到了一些挑战
一种非接触式智能垃圾桶设计（论文+源码+实物）云山工作室单片机智能家居嵌入式硬件毕业设计毕设
1系统方案设计通过对需求展开分析，本设计非接触式智能垃圾桶采用STM32F103单片机作为控制器，通过红外传感器实现垃圾桶的满溢检测，通过三个SG90舵机分别控制可回收、不可回收、其他垃圾桶盖的开关，并通过WiFi通信模块将数据信息传输到云平台，方便用户实现远程管控，在控制方式上有自动和手动两种模式，自动模式下，用户可以通过LU-ASR01语音识别模块以语音的方式对垃圾桶进行控制，手动模式下，则用
开源模型应用落地-DeepSeek-R1-Distill-Qwen-7B与vllm实现推理加速的正确姿势（一）开源技术探险家开源模型-实际应用落地 #人工智能自然语言处理语言模型深度学习
一、前言在当今人工智能技术迅猛发展的时代，各类人工智能模型如雨后春笋般不断涌现，其性能的优劣直接影响着应用的广度与深度。从自然语言处理到计算机视觉，从智能安防到医疗诊断，AI模型广泛应用于各个领域，人们对其准确性、稳定性和高效性的期望也与日俱增。在此背景下，DeepSeek模型的出现为行业带来了新的曙光。DeepSeek团队开发的DeepSeek-R1-Distill-Qwen-7B模型，利用蒸馏
Diffusion--人工智能领域的革命性技术油泼辣子多加专业名词解释人工智能
在人工智能领域，“diffusion”一词通常指的是“扩散模型”（DiffusionModels），其全称为“DenoisingDiffusionProbabilisticModels”（DDPMs）。扩散模型是一类生成式模型，它通过逐步去噪的方式，从随机噪声中生成高质量的数据，近年来在图像、音频、视频等多个领域取得了显著进展。1.发展历史扩散模型的概念源于物理学中的扩散过程，即粒子在介质中的随机
探索2025年的编程新趋势：技术、工具与未来展望桂月二二 wasm 人工智能前端
随着2025年的到来，编程技术领域依旧在高速发展。一些新兴的技术方向、工具和方法正在悄然改变开发者的日常实践。如果您是一名开发者，无论是资深还是初入门道，跟上这些趋势将让您的技能保持前沿，并为职业发展打下坚实基础。本文将从多个维度深入探讨当前最值得关注的编程技术，希望为您的技术提升带来启发。一、AI驱动的编程辅助工具人工智能已成为程序开发的重要组成部分。以下是几款2025年值得关注的AI驱动编程工
基于Hexo的主题Fluid搭建Github博客 qq742234984 计算机 github git npm node.js hexo
公众号：数学建模与人工智能基于Hexo的主题Fluid搭建Github博客一、Github配置1.安装Git2.部署本地Git与Github连接（SSH）二、node.js安装和环境配置1.安装node.js2.查看安装是否成功（版本号）3.配置环境变量三、下载Hexo并配置fluid主题1.下载Hexo2.配置fluid主题1.安装fluid2.配置fluid3.更新部署博客页面4.部署到git
电控三周速成计划参考 kyle~ 嵌入式单片机嵌入式硬件
第1周：基础搭建与GPIO控制学习目标：建立开发环境，掌握最基础的硬件控制能力每日学习（2-3小时）：环境搭建（2天）安装KeilMDK-ARM+STM32CubeMX使用CubeMX创建第一个工程（选择STM32F103C8T6）生成代码并烧录到开发板（LED点亮验证）GPIO编程（3天）推挽输出/开漏输出模式区别使用HAL_GPIO_WritePin()控制LED按键输入检测（轮询方式）时钟系
Nature Reviews Bioengineering|综述|皮肤启发的柔性生物电子材料、器件与系统（健康监测/柔性传感/电子皮肤/植入式电子/柔性电子/集成电路）感知科学前沿柔性传感柔性电子电子皮肤微信公众平台经验分享科技人机交互
斯坦福大学鲍哲南院士团队，在期刊《NatureReviewsBioengineering》上发布了一篇题为“Skin-inspiredsoftbioelectronicmaterials,devicesandsystems”的综述论文。综述内容如下：一、摘要生物电子器件和组件由软性、基于聚合物的和混合电子材料制成的设备与人体形成自然界面。可拉伸介电体、导电和半导体聚合物的分子设计的进展，以及它们与
每日一题——106. 从中序与后序遍历序列构造二叉树 m0_62867827 算法 leetcode 职场和发展
题目链接：106.从中序与后序遍历序列构造二叉树-力扣（LeetCode）代码：/***Definitionforabinarytreenode.*structTreeNode{*intval;*TreeNode*left;*TreeNode*right;*TreeNode():val(0),left(nullptr),right(nullptr){}*TreeNode(intx):val(x),
每日一题——最长有效括号 gch12138 每日一题 c++
题目链接32.最长有效括号-力扣（LeetCode）(leetcode-cn.com)题目描述给你一个只包含'('和')'的字符串，找出最长有效（格式正确且连续）括号子串的长度。解题思路确定有效括号的含义:(),()(),(()),(()()),这些都是连续的有效括号；对于有效括号，博主的第一反应是用栈进行消消乐，在消消乐过程中更新最长的长度，但在判定有效括号是否连续是遇到问题；为解决问题又采用p
turtle库绘图：毕业了，为自己放一场烟花吧 lazyn python画图 python turtle python绘图
所感是对毕业季的感想，所现是代码的实现所感六月的风悄然吹过，啊，是炽热的风，和这风一起来的还有毕业的脚步，是的，我毕业了，我去年就毕业了，但今年这场毕业盛典或许不应该仅仅属于这一届毕业生，请允许我这位学长，没穿过学士服，没有毕业照，没有毕业典礼的一位学长，也蹭一蹭今年的毕业季，假装自己刚刚毕业。我是一名大四的学生，论文已经提交了，马上就要毕业了，宿舍楼下的帐篷都支了起来，人们开始忙着邮寄自己的物品
【自我修炼】大疆技术总监对于大学生学习机器人工程师路线建议 ( 大一篇) 2401_89323952 学习机器人
很多朋友私信问我对机器人和人工智能感兴趣，该怎么展开学习。最近稍微有点空，我写写我的看法。两年前，我在知乎回答如何定义「机器人」？YY硕的回答中试图给机器人做出一个比较仔细的定义，我觉得机器人和人工智能最大的区别在于是否要和物理世界进行交互。今年初在另一篇知乎回答里对机器人或人工智能的研究会帮助我们更好的了解人类自己吗？-YY硕的回答我说到传感器是和物理世界交互的基础。后来，我又在知乎回答有哪些与
【C++】C++回调函数基本用法（详细讲解）米码收割机 C/C++c++php 开发语言
博__主：米码收割机技__能：C++/Python语言公众号：测试开发自动化【获取源码+商业合作】荣__誉：阿里云博客专家博主、51CTO技术博主专__注：专注主流机器人、人工智能等相关领域的开发、测试技术。一文详解C++回调函数目录一文详解C++回调函数1.什么是回调函数？2.为什么需要回调函数3.回调函数的应用场合4.举例说明5.高级回调方式1.什么是回调函数？回调函数可以被简单地理解为：A函
DeepSeek 详细使用教程神探阿航计算机产业科普与思考大模型人工智能
1.简介DeepSeek是一款基于人工智能技术的多功能工具，旨在帮助用户高效处理和分析数据、生成内容、解答问题、进行语言翻译等。无论是学术研究、商业分析还是日常使用，DeepSeek都能提供强大的支持。本教程将详细介绍DeepSeek的各项功能及使用方法。2.注册与登录注册：访问DeepSeek官网（https://www.deepseek.com）。点击“注册”按钮。填写邮箱地址、设置密码，并完
人工智能第2章-知识点与学习笔记想拿高薪的韭菜人工智能学习笔记
结合教材2.1节，阐述什么是知识、知识的特性,以及知识的表示。人工智能最早应用的两种逻辑是什么？阐述你对这两种逻辑表示的内涵理解。什么谓词，什么是谓词逻辑，什么是谓词公式。谈谈你对谓词逻辑中的量词的理解。阐述谓词公式的解释的含义。介绍谓词公式表示知识的一般步骤，阐述谓词逻辑表示知识的优点与局限性。什么是知识表示的产生式，请详细阐释产生式和谓词逻辑蕴涵式的差异。什么是产生式系统，请详细阐述产生式系统
微软推出GRIN-MoE：开创专家路由新范式 OpenCSG microsoft
前沿科技速递在人工智能领域，模型的性能和可扩展性一直是研究的热点。微软最近推出的GRIN-MoE（Gradient-InformedMixture-of-Experts）模型，以其独特的架构和显著的性能表现，正引领着AI技术的前沿，特别是在编码和数学任务上展现出强大的能力。GRIN-MoE的发布标志着企业级应用中AI技术的又一次飞跃，旨在提升处理复杂任务的效率和准确性。来源：传神社区01模型简介G
动手学PyTorch建模与应用：从深度学习到大模型王国平 pytorch 人工智能数据分析 python 数据挖掘
在人工智能时代，机器学习技术日新月异，深度学习是机器学习领域中一个全新的研究方向和应用热点，它是机器学习的一种，也是实现人工智能的必由之路。深度学习的出现不仅推动了机器学习的发展，而且促进了人工智能技术的革新，已经被成功应用在语音识别、图像分类识别、地球物理、大语言模型等领域，具有巨大的发展潜力和价值。本书是一本带领读者快速学习PyTorch并将其运用于深度学习建模方向的入门指南，重点介绍了基于P
AI浪潮下程序员的职业转型与技术进阶之路 nbsaas-boot 人工智能
一、引言1.1研究背景与意义在科技飞速发展的当今时代，人工智能（AI）无疑是最为耀眼的技术领域之一。从早期简单的专家系统到如今复杂的深度学习模型，AI技术经历了从理论探索到广泛应用的巨大跨越，正以前所未有的速度改变着我们的生活和工作方式。近年来，AI技术取得了一系列突破性进展。以GPT系列为代表的大型语言模型，展现出强大的自然语言处理能力，能够实现文本生成、对话交互、代码编写等多种任务。根据《20
DeepSeek的出现对全球GPT产业产生的冲击不要em0啦机器学习 gpt
引言近年来，人工智能技术的迅猛发展推动了自然语言处理（NLP）领域的革命性进步。特别是以GPT（GenerativePre-trainedTransformer）系列模型为代表的大规模预训练语言模型，已经在全球范围内引发了广泛关注和应用。然而，随着技术的不断演进，新兴的GPT模型如DeepSeek的出现，正在对全球GPT产业产生深远的影响。本文将从技术、市场、应用场景和产业生态等多个维度，深入探讨
【开源】基于SSM框架校园教务系统管理系统（计算机毕业设计）+万字毕业论文+远程部署+ppt+代码讲解 ssm086 计算机毕业设计_gzs 开源课程设计 spring 毕设 mybatis java 毕业设计
系统合集跳转源码获取链接点击主页更能获取海量源码博主联系方式拉到下方点击名片获取！！！博主联系方式拉到下方点击名片获取！！！10年计算机开发经验，主营业务：源码获取、项目二开、语音辅导、远程调试、毕业设计、课程设计、毕业论文、BUG修改一、系统环境运行环境:最好是javajdk1.8，我们在这个平台上运行的。其他版本理论上也可以。IDE环境：Eclipse,Myeclipse,IDEA或者Spri
国产AI疯卷！DeepSeek-R1成开源霸主，字节腾讯纷纷放大招？盼达思文体科创经验分享
引言家人们，最近的AI圈简直是“火药味”十足，热闹程度堪比世界杯！在科技飞速发展的当下，人工智能领域已经成为全球科技竞争的焦点，各国科技企业都在这个赛道上你追我赶，试图占据一席之地。AI技术不仅深刻改变了我们的生活方式，像智能语音助手让生活更便捷，智能推荐算法让信息获取更精准，还推动了众多行业的变革，如医疗、交通、金融等。今天咱们要聊的这几件AI大事，每一件都可能会对未来的科技走向产生深远影响。先
2025毕设springboot 老年活动中心管理系统论文+源码 zhihao508 课程设计 spring boot 后端
本系统（程序+源码）带文档lw万字以上文末可获取一份本项目的java源码和数据库参考。系统程序文件列表开题报告内容选题背景关于老年活动中心管理系统的研究，现有研究主要集中在社区养老服务、活动中心运营管理以及信息管理系统设计等领域。然而，专门针对老年活动中心管理系统的全面设计与实现的研究相对较少，尤其是在整合用户管理、活动管理、资源管理和维修管理等多个关键环节的系统化解决方案上。随着人口老龄化的加剧
计算机科学与技术论文目录结构程序猿私人订制_2573966427 毕业设计毕设毕业设计
摘要Abstract第一章绪论1.1研究背景和意义这部分大约700字1.2国内外研究现状这部分分3段，分别为：在国内，...(大约300-400字)在国外，...(大约3---400字)综上所述，...(大约200-300字)1.3系统创新点（可选）比如使用了前后端分离技术，使用了ChatGPT，使用了无session，使用了某某算法等。大概300-400字左右。1.4系统技术难点（可选）比如系统
Qwen2.5-Max 百态老人笔记大数据人工智能
Qwen2.5-Max是阿里巴巴于2024年1月29日发布的一款旗舰级人工智能模型，基于混合专家（MoE）架构开发，拥有超过20万亿tokens的超大规模预训练数据。这一模型在多项权威基准测试中展现了卓越的性能，超越了包括DeepSeekV3、GPT-4和Claude-3.5-Sonnet在内的多款国际顶尖AI模型，标志着中国AI技术在高性能、低成本路线上的重大突破。技术特点与优势超大规模预训练数
DeepSeek R1 AI 论文翻译后端java
摘要原文地址：DeepSeekR1AI论文翻译我们介绍了我们的第一代推理模型，DeepSeek-R1-Zero和DeepSeek-R1。DeepSeek-R1-Zero是一个通过大规模强化学习（RL）训练的模型，且在此过程中未使用监督微调（SFT）作为预处理步骤，展现出了显著的推理能力。通过RL，DeepSeek-R1-Zero自然而然地展现了许多强大且引人注目的推理行为。然而，它也遇到了一些挑战
ChatGPT-4o和ChatGPT-4o mini的差异点老六哥_AI助理指南人工智能 chatgpt
在人工智能领域，OpenAI再次引领创新潮流，近日正式发布了其最新模型——ChatGPT-4o及其经济实惠的小型版本ChatGPT-4oMini。这两款模型虽同属于ChatGPT系列，但在性能、应用场景及成本上展现出显著的差异。本文将通过图文并茂的方式，深入解析两者之间的不同点。一、性能差异ChatGPT-4o：全能型语言模型多模态处理能力：ChatGPT-4o不仅限于文本处理，更能够实时处理和生
多线程编程之存钱与取钱周凡杨 java thread 多线程存钱取钱
生活费问题是这样的：学生每月都需要生活费，家长一次预存一段时间的生活费，家长和学生使用统一的一个帐号，在学生每次取帐号中一部分钱，直到帐号中没钱时通知家长存钱，而家长看到帐户还有钱则不存钱，直到帐户没钱时才存钱。问题分析：首先问题中有三个实体，学生、家长、银行账户，所以设计程序时就要设计三个类。其中银行账户只有一个，学生和家长操作的是同一个银行账户，学生的行为是
java中数组与List相互转换的方法征客丶 JavaScript java jsonp
1.List转换成为数组。（这里的List是实体是ArrayList) 　　调用ArrayList的toArray方法。　　toArray 　　public T[] toArray(T[] a)返回一个按照正确的顺序包含此列表中所有元素的数组；返回数组的运行时类型就是指定数组的运行时类型。如果列表能放入指定的数组，则返回放入此列表元素的数组。否则，将根据指定数组的运行时类型和此列表的大小分
Shell 流程控制 daizj 流程控制 if else while case shell
Shell 流程控制和Java、PHP等语言不一样，sh的流程控制不可为空，如(以下为PHP流程控制写法)： <?php if(isset($_GET["q"])){ search(q);}else{// 不做任何事情} 在sh/bash里可不能这么写，如果else分支没有语句执行，就不要写这个else，就像这样 if else if if 语句语
Linux服务器新手操作之二周凡杨 Linux 简单操作
1.利用关键字搜寻Man Pages man -k keyword 其中-k 是选项，keyword是要搜寻的关键字如果现在想使用whoami命令，但是只记住了前3个字符who，就可以使用 man -k who来搜寻关键字who的man命令 [haself@HA5-DZ26 ~]$ man -k
socket聊天室之服务器搭建朱辉辉33 socket
因为我们做的是聊天室，所以会有多个客户端，每个客户端我们用一个线程去实现，通过搭建一个服务器来实现从每个客户端来读取信息和发送信息。我们先写客户端的线程。 public class ChatSocket extends Thread{ Socket socket; public ChatSocket(Socket socket){ this.sock
利用finereport建设保险公司决策分析系统的思路和方法老A不折腾 finereport 金融保险分析系统报表系统项目开发
决策分析系统呈现的是数据页面，也就是俗称的报表，报表与报表间、数据与数据间都按照一定的逻辑设定，是业务人员查看、分析数据的平台，更是辅助领导们运营决策的平台。底层数据决定上层分析，所以建设决策分析系统一般包括数据层处理（数据仓库建设）。项目背景介绍通常，保险公司信息化程度很高，基本上都有业务处理系统（像集团业务处理系统、老业务处理系统、个人代理人系统等）、数据服务系统（通过
始终要页面在ifream的最顶层林鹤霄
index.jsp中有ifream，但是session消失后要让login.jsp始终显示到ifream的最顶层。。。始终没搞定，后来反复琢磨之后，得到了解决办法，在这儿给大家分享下。。 index.jsp--->主要是加了颜色的那一句 <html> <iframe name="top" ></iframe> <ifram
MySQL binlog恢复数据 aigo mysql
1，先确保my.ini已经配置了binlog： # binlog log_bin = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.log log_bin_index = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.index log_error = D:/mysql-5.6.21-win
OCX打成CBA包并实现自动安装与自动升级 alxw4616 ocx cab
近来手上有个项目,需要使用ocx控件 (ocx是什么? http://baike.baidu.com/view/393671.htm) 在生产过程中我遇到了如下问题. 1. 如何让 ocx 自动安装? a) 如何签名? b) 如何打包? c) 如何安装到指定目录? 2.
Hashmap队列和PriorityQueue队列的应用百合不是茶 Hashmap队列 PriorityQueue队列
HashMap队列已经是学过了的,但是最近在用的时候不是很熟悉,刚刚重新看以一次, HashMap是K,v键 ,值 put()添加元素 //下面试HashMap去掉重复的 package com.hashMapandPriorityQueue; import java.util.H
JDK1.5 returnvalue实例 bijian1013 java thread java多线程 returnvalue
Callable接口：返回结果并且可能抛出异常的任务。实现者定义了一个不带任何参数的叫做 call 的方法。 Callable 接口类似于 Runnable，两者都是为那些其实例可能被另一个线程执行的类设计的。但是 Runnable 不会返回结果，并且无法抛出经过检查的异常。 ExecutorService接口方
angularjs指令中动态编译的方法(适用于有异步请求的情况) 内嵌指令无效 bijian1013 JavaScript AngularJS
在directive的link中有一个$http请求，当请求完成后根据返回的值动态做element.append('......');这个操作，能显示没问题，可问题是我动态组的HTML里面有ng-click，发现显示出来的内容根本不执行ng-click绑定的方法！
【Java范型二】Java范型详解之extend限定范型参数的类型 bit1129 extend
在第一篇中，定义范型类时，使用如下的方式： public class Generics<M, S, N> { //M,S,N是范型参数 } 这种方式定义的范型类有两个基本的问题： 1. 范型参数定义的实例字段，如private M m = null;由于M的类型在运行时才能确定，那么我们在类的方法中，无法使用m，这跟定义pri
【HBase十三】HBase知识点总结 bit1129 hbase
1. 数据从MemStore flush到磁盘的触发条件有哪些？ a.显式调用flush，比如flush 'mytable' b.MemStore中的数据容量超过flush的指定容量，hbase.hregion.memstore.flush.size,默认值是64M 2. Region的构成是怎么样？ 1个Region由若干个Store组成
服务器被DDOS攻击防御的SHELL脚本 ronin47
mkdir /root/bin vi /root/bin/dropip.sh #!/bin/bash/bin/netstat -na|grep ESTABLISHED|awk ‘{print $5}’|awk -F:‘{print $1}’|sort|uniq -c|sort -rn|head -10|grep -v -E ’192.168|127.0′|awk ‘{if($2!=null&a
java程序员生存手册-craps 游戏-一个简单的游戏 bylijinnan java
import java.util.Random; public class CrapsGame { /** * *一个简单的赌*博游戏，游戏规则如下： *玩家掷两个骰子，点数为1到6，如果第一次点数和为7或11，则玩家胜， *如果点数和为2、3或12，则玩家输， *如果和为其它点数，则记录第一次的点数和，然后继续掷骰，直至点数和等于第一次掷出的点
TOMCAT启动提示NB: JAVA_HOME should point to a JDK not a JRE解决开窍的石头 JAVA_HOME
当tomcat是解压的时候，用eclipse启动正常，点击startup.bat的时候启动报错; 报错如下： The JAVA_HOME environment variable is not defined correctly This environment variable is needed to run this program NB: JAVA_HOME shou
[操作系统内核]操作系统与互联网 comsci 操作系统
我首先申明：我这里所说的问题并不是针对哪个厂商的，仅仅是描述我对操作系统技术的一些看法操作系统是一种与硬件层关系非常密切的系统软件，按理说，这种系统软件应该是由设计CPU和硬件板卡的厂商开发的，和软件公司没有直接的关系，也就是说，操作系统应该由做硬件的厂商来设计和开发
富文本框ckeditor_4.4.7 文本框的简单使用支持IE11 cuityang 富文本框
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>知识库内容编辑</tit
Property null not found darrenzhu datagrid Flex Advanced propery null
When you got error message like "Property null not found ***", try to fix it by the following way: 1)if you are using AdvancedDatagrid, make sure you only update the data in the data prov
MySQl数据库字符串替换函数使用 dcj3sjt126com mysql 函数替换
需求：需要将数据表中一个字段的值里面的所有的 . 替换成 _ 原来的数据是 site.title site.keywords .... 替换后要为 site_title site_keywords 使用的SQL语句如下： updat
mac上终端起动MySQL的方法 dcj3sjt126com mysql mac
首先去官网下载: http://www.mysql.com/downloads/ 我下载了5.6.11的dmg然后安装,安装完成之后..如果要用终端去玩SQL.那么一开始要输入很长的:/usr/local/mysql/bin/mysql 这不方便啊,好想像windows下的cmd里面一样输入mysql -uroot -p1这样...上网查了下..可以实现滴. 打开终端,输入: 1
Gson使用一（Gson） eksliang json gson
转载请出自出处：http://eksliang.iteye.com/blog/2175401 一.概述从结构上看Json，所有的数据（data）最终都可以分解成三种类型：第一种类型是标量（scalar），也就是一个单独的字符串（string）或数字（numbers），比如"ickes"这个字符串。第二种类型是序列（sequence），又叫做数组（array）
android点滴4 gundumw100 android
Android 47个小知识 http://www.open-open.com/lib/view/open1422676091314.html Android实用代码七段（一） http://www.cnblogs.com/over140/archive/2012/09/26/2611999.html http://www.cnblogs.com/over140/arch
JavaWeb之JSP基本语法 ihuning javaweb
目录 JSP模版元素 JSP表达式 JSP脚本片断 EL表达式 JSP注释特殊字符序列的转义处理如何查找JSP页面中的错误 JSP模版元素 JSP页面中的静态HTML内容称之为JSP模版元素，在静态的HTML内容之中可以嵌套JSP
App Extension编程指南（iOS8/OS X v10.10）中文版啸笑天 ext
当iOS 8.0和OS X v10.10发布后，一个全新的概念出现在我们眼前，那就是应用扩展。顾名思义，应用扩展允许开发者扩展应用的自定义功能和内容，能够让用户在使用其他app时使用该项功能。你可以开发一个应用扩展来执行某些特定的任务，用户使用该扩展后就可以在多个上下文环境中执行该任务。比如说，你提供了一个能让用户把内容分
SQLServer实现无限级树结构 macroli oracle sql SQL Server
表结构如下：数据库id path titlesort 排序 1 0 首页 0 2 0,1 新闻 1 3 0,2 JAVA 2 4 0,3 JSP 3 5 0,2,3 业界动态 2 6 0,2,3 国内新闻 1 创建一个存储过程来实现，如果要在页面上使用可以设置一个返回变量将至传过去 create procedure test as begin decla
Css居中div，Css居中img，Css居中文本，Css垂直居中div qiaolevip 众观千象学习永无止境每天进步一点点 css
/**********Css居中Div**********/ div.center { width: 100px; margin: 0 auto; } /**********Css居中img**********/ img.center { display: block; margin-left: auto; margin-right: auto; }
Oracle 常用操作(实用) 吃猫的鱼 oracle
SQL>select text from all_source where owner=user and name=upper('&plsql_name'); SQL>select * from user_ind_columns where index_name=upper('&index_name'); 将表记录恢复到指定时间段以前
iOS中使用RSA对数据进行加密解密 witcheryne ios rsa iPhone objective c
RSA算法是一种非对称加密算法,常被用于加密数据传输.如果配合上数字摘要算法, 也可以用于文件签名. 本文将讨论如何在iOS中使用RSA传输加密数据. 本文环境 mac os openssl-1.0.1j, openssl需要使用1.x版本, 推荐使用[homebrew](http://brew.sh/)安装. Java 8 RSA基本原理 RS