晓理紫

[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--机器人、强化学习

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉导航

具身智能，机器人

强化学习

开放词汇，检测分割

== robotic agent ==

标题: Workspace Optimization Techniques to Improve Prediction of Human Motion During Human-Robot Collaboration

作者: Yi-Shiuan Tung, Matthew B. Luebbers, Alessandro Roncone

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12965v1

中文摘要: 理解人类的意图对于安全有效的人机协作至关重要。虽然人类目标预测的最新方法利用学习模型来解释人类运动数据的不确定性，但该数据本质上是随机和高方差的，阻碍了这些模型在需要协调的交互中的效用，包括安全关键或近距离任务。我们的关键见解是，机器人队友可以在交互之前故意配置共享工作空间，以减少人类运动的方差，实现目标预测中与分类器无关的改进。在这项工作中，我们提出了一种算法方法，让机器人在共享的人机工作空间中使用增强现实来安排物理对象和投影“虚拟障碍”，优化给定任务集的人类易读性。我们使用两个人类受试者研究将我们的方法与其他工作空间安排策略进行了比较，一个在虚拟2D导航领域，另一个在涉及机器人机械手的实时桌面操作领域。我们评估了从每个条件中学习的人类运动预测模型的准确性，证明了我们的虚拟障碍工作空间优化技术使用更少的训练数据导致更高的机器人预测准确性。

摘要: Understanding human intentions is critical for safe and effective human-robot collaboration. While state of the art methods for human goal prediction utilize learned models to account for the uncertainty of human motion data, that data is inherently stochastic and high variance, hindering those models’ utility for interactions requiring coordination, including safety-critical or close-proximity tasks. Our key insight is that robot teammates can deliberately configure shared workspaces prior to interaction in order to reduce the variance in human motion, realizing classifier-agnostic improvements in goal prediction. In this work, we present an algorithmic approach for a robot to arrange physical objects and project “virtual obstacles” using augmented reality in shared human-robot workspaces, optimizing for human legibility over a given set of tasks. We compare our approach against other workspace arrangement strategies using two human-subjects studies, one in a virtual 2D navigation domain and the other in a live tabletop manipulation domain involving a robotic manipulator arm. We evaluate the accuracy of human motion prediction models learned from each condition, demonstrating that our workspace optimization technique with virtual obstacles leads to higher robot prediction accuracy using less training data.

标题: Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected Improvement

作者: Nikolaus Feith, Elmar Rueckert

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12662v1

中文摘要: 交互式机器学习（IML）寻求将人类专业知识整合到机器学习过程中。然而，大多数现有的算法不能应用于现实世界的场景，因为它们的状态空间和/或动作空间被限制为离散值。此外，所有现有方法的交互仅限于在多个方案之间做出决定。因此，我们提出了一种基于贝叶斯优化（BO）的新框架。交互式贝叶斯优化（IBO）支持机器学习算法和人类之间的协作。该框架捕获用户偏好，并为用户提供一个界面来手动制定策略。此外，我们还整合了一个新的采集功能，偏好预期改善（PEI），使用用户偏好的概率模型来提高系统的效率。我们的方法旨在确保机器能够从人类的专业知识中受益，旨在实现更加一致和有效的学习过程。在这项工作的过程中，我们将我们的方法应用于模拟和真实世界的任务中，使用Franka熊猫机器人来展示人机协作。

摘要: Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. However, most existing algorithms cannot be applied to Realworld Scenarios because their state spaces and/or action spaces are limited to discrete values. Furthermore, the interaction of all existing methods is restricted to deciding between multiple proposals. We therefore propose a novel framework based on Bayesian Optimization (BO). Interactive Bayesian Optimization (IBO) enables collaboration between machine learning algorithms and humans. This framework captures user preferences and provides an interface for users to shape the strategy by hand. Additionally, we’ve incorporated a new acquisition function, Preference Expected Improvement (PEI), to refine the system’s efficiency using a probabilistic model of the user preferences. Our approach is geared towards ensuring that machines can benefit from human expertise, aiming for a more aligned and effective learning process. In the course of this work, we applied our method to simulations and in a real world task using a Franka Panda robot to show human-robot collaboration.

标题: Modeling Resilience of Collaborative AI Systems

作者: Diaeddin Rimawi, Antonio Liotta, Marco Todescato

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12632v1

中文摘要: 协作人工智能系统（CAIS）与人类协作执行动作，以实现共同的目标。CAISs可以使用经过训练的人工智能模型来控制人类与系统的交互，或者他们可以使用人类交互以在线方式动态地向人类学习。在具有人类反馈的在线学习中，AI模型在学习状态下通过系统传感器监控人类交互来进化，并在操作状态下基于学习来驱动CAIS的自主组件。因此，任何影响这些传感器的破坏性事件都可能影响人工智能模型做出准确决策的能力，并降低CAIS的性能。因此，对于CAIS管理者来说，能够自动跟踪系统性能以了解CAIS在此类破坏性事件中的恢复能力是至关重要的。在本文中，我们提供了一个新的框架来模拟系统经历破坏性事件时的CAIS性能。在此框架下，我们引入了一个CAIS性能演化模型。该模型配备了一套措施，旨在支持CAIS管理人员在决策过程中实现系统所需的弹性。我们在一个真实世界的案例研究中测试了我们的框架，当系统经历破坏性事件时，机器人与人类在线合作。案例研究表明，我们的框架可以在CAIS中采用，并集成到CAIS活动的在线执行中。

摘要: A Collaborative Artificial Intelligence System (CAIS) performs actions in collaboration with the human to achieve a common goal. CAISs can use a trained AI model to control human-system interaction, or they can use human interaction to dynamically learn from humans in an online fashion. In online learning with human feedback, the AI model evolves by monitoring human interaction through the system sensors in the learning state, and actuates the autonomous components of the CAIS based on the learning in the operational state. Therefore, any disruptive event affecting these sensors may affect the AI model’s ability to make accurate decisions and degrade the CAIS performance. Consequently, it is of paramount importance for CAIS managers to be able to automatically track the system performance to understand the resilience of the CAIS upon such disruptive events. In this paper, we provide a new framework to model CAIS performance when the system experiences a disruptive event. With our framework, we introduce a model of performance evolution of CAIS. The model is equipped with a set of measures that aim to support CAIS managers in the decision process to achieve the required resilience of the system. We tested our framework on a real-world case study of a robot collaborating online with the human, when the system is experiencing a disruptive event. The case study shows that our framework can be adopted in CAIS and integrated into the online execution of the CAIS activities.

标题: Chat Failures and Troubles: Reasons and Solutions

作者: Manal Helal, Patrick Holthaus, Gabriella Lakatos

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2309.03708v2

中文摘要: 本文研究了人机交互（HRI）中导致聊天失败和麻烦的一些常见问题。给定用例的设计决策始于合适的机器人、合适的聊天模型、识别导致故障的常见问题、识别潜在的解决方案以及规划持续改进。总之，建议使用闭环控制算法来指导训练过的人工智能（AI）预训练模型的使用，并提供词汇过滤，在新数据集上重新训练批处理模型，从数据流中在线学习，和/或使用强化学习模型来自我更新训练过的模型并减少错误。

摘要: This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case’s design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.

标题: Self context-aware emotion perception on human-robot interaction

作者: Zihan Lin, Francisco Cruz, Eduardo Benitez Sandoval

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2401.10946v1

中文摘要: 情感识别在人机交互的各个领域都起着至关重要的作用。在与人类的长期交互中，机器人需要持续准确地做出反应，然而，主流的情绪识别方法大多侧重于短期的情绪识别，而忽视了情绪被感知的背景。人类认为语境信息和不同的语境会导致完全不同的情感表达。在本文中，我们介绍了自我上下文感知模型（SCAM），该模型采用二维情绪坐标系来锚定和重新标记不同的情绪。同时，它结合了其独特的信息保留结构和上下文丢失。这种方法在音频、视频和多模态方面产生了显著的改进。在听觉模态中，准确率有显著提高，从63.10%上升到72.46%。同样，视觉模态的准确性也有所提高，从77.03%提高到80.82%。在多模态中，准确率从77.48%上升到78.93%。未来，我们将通过心理学实验验证机器人骗局的可靠性和可用性。

摘要: Emotion recognition plays a crucial role in various domains of human-robot interaction. In long-term interactions with humans, robots need to respond continuously and accurately, however, the mainstream emotion recognition methods mostly focus on short-term emotion recognition, disregarding the context in which emotions are perceived. Humans consider that contextual information and different contexts can lead to completely different emotional expressions. In this paper, we introduce self context-aware model (SCAM) that employs a two-dimensional emotion coordinate system for anchoring and re-labeling distinct emotions. Simultaneously, it incorporates its distinctive information retention structure and contextual loss. This approach has yielded significant improvements across audio, video, and multimodal. In the auditory modality, there has been a notable enhancement in accuracy, rising from 63.10% to 72.46%. Similarly, the visual modality has demonstrated improved accuracy, increasing from 77.03% to 80.82%. In the multimodal, accuracy has experienced an elevation from 77.48% to 78.93%. In the future, we will validate the reliability and usability of SCAM on robots through psychology experiments.

== Reinforcement Learning ==

标题: HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

作者: Qinhong Zhou, Sunli Chen, Yisong Wang

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12975v1

Project: https://vis-www.cs.umass.edu/hazard/.|

中文摘要: 高保真虚拟环境的最新进展是构建智能具体化代理以感知、推理和与物理世界交互的主要驱动力之一。通常，这些环境保持不变，除非代理与它们交互。然而，在现实世界的场景中，代理还可能面临以意外事件为特征的动态变化的环境，并且需要快速采取相应的行动。为了弥补这一差距，我们提出了一种新的模拟具身基准，称为HAZARD，专门用于评估具身代理在动态情况下的决策能力。HAZARD由三种意外灾难场景组成，包括火灾、洪水和风，并特别支持利用大型语言模型（LLMs）来帮助常识推理和决策。该基准使我们能够评估自主代理跨各种管道的决策能力，包括动态变化环境中的强化学习（RL）、基于规则和基于搜索的方法。作为使用大型语言模型应对这一挑战的第一步，我们进一步开发了一个基于LLM的代理，并对其解决这些挑战性任务的前景和挑战进行了深入分析。HAZARD可在https：//vis-www.cs.umass.edu/HAZARD/。获得

摘要: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of large language models (LLMs) to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents’ decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using large language models, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at https://vis-www.cs.umass.edu/hazard/.

标题: Personalized Algorithmic Recourse with Preference Elicitation

作者: Giovanni De Toni, Paolo Viappiani, Stefano Teso

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2205.13743v5

Project: https://openreview.net/forum?id=8sg2I9zXgO|

中文摘要: 算法追索（AR）是计算一系列动作的问题，一旦用户执行了这些动作，就会推翻不希望的机器决策。最重要的是，操作的顺序不需要用户太多的努力来实现。然而，大多数AR方法假设所有用户的行为成本相同，因此可能会向某些用户推荐不公平昂贵的追索计划。在这一观察的推动下，我们引入了PEAR，这是第一个能够根据任何最终用户的需求提供个性化算法资源的人在回路方法。PEAR基于贝叶斯偏好启发的洞察力，通过向目标用户询问选择集查询，迭代地细化行动成本的估计。查询本身是通过最大化选择的预期效用来计算的，选择的预期效用是一种信息增益的原则性度量，考虑了成本估计和用户响应的不确定性。PEAR将启发集成到强化学习代理中，结合蒙特卡罗树搜索，以快速识别有前途的追索计划。我们对真实世界数据集的经验评估强调了PEAR如何在少数迭代中产生高质量的个性化资源。

摘要: Algorithmic Recourse (AR) is the problem of computing a sequence of actions that – once performed by a user – overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain users. Prompted by this observation, we introduce PEAR, the first human-in-the-loop approach capable of providing personalized algorithmic recourse tailored to the needs of any end-user. PEAR builds on insights from Bayesian Preference Elicitation to iteratively refine an estimate of the costs of actions by asking choice set queries to the target user. The queries themselves are computed by maximizing the Expected Utility of Selection, a principled measure of information gain accounting for uncertainty on both the cost estimate and the user’s responses. PEAR integrates elicitation into a Reinforcement Learning agent coupled with Monte Carlo Tree Search to quickly identify promising recourse plans. Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.

标题: DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity

作者: Kang-Won Lee, Yuzhe Qin, Xiaolong Wang

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12496v1

Project: https://lee-kangwon.github.io/dextouch/|https://lee-kangwon.github.io/dextouch/|

中文摘要: 触觉是熟练执行各种任务的基本能力，提供了在不依赖视觉信息的情况下搜索和操纵对象的能力。随着时间的推移，人们进行了广泛的研究，将这些人类触觉能力应用于机器人。在本文中，我们介绍了一个多指机器人系统，该系统旨在利用触觉搜索和操纵物体，而不依赖于视觉信息。使用触觉传感器搜索随机定位的目标物体，并操纵这些物体完成模拟日常生活的任务。这项研究的目的是赋予机器人类似人类的触觉能力。为了实现这一点，在机器人手的一侧实现了二元触觉传感器，以最小化Sim2Real间隙。通过模拟中的强化学习来训练策略，并将训练好的策略转移到真实环境中，我们证明了即使在没有视觉信息的环境中，使用触觉传感器进行对象搜索和操纵也是可能的。此外，还进行了一项消融研究，以分析触觉信息对操作任务的影响。我们的项目页面可在https：//lee-kangwon.github.io/dextouch/

摘要: The sense of touch is an essential ability for skillfully performing a variety of tasks, providing the capacity to search and manipulate objects without relying on visual information. Extensive research has been conducted over time to apply these human tactile abilities to robots. In this paper, we introduce a multi-finger robot system designed to search for and manipulate objects using the sense of touch without relying on visual information. Randomly located target objects are searched using tactile sensors, and the objects are manipulated for tasks that mimic daily-life. The objective of the study is to endow robots with human-like tactile capabilities. To achieve this, binary tactile sensors are implemented on one side of the robot hand to minimize the Sim2Real gap. Training the policy through reinforcement learning in simulation and transferring the trained policy to the real environment, we demonstrate that object search and manipulation using tactile sensors is possible even in an environment without vision information. In addition, an ablation study was conducted to analyze the effect of tactile information on manipulative tasks. Our project page is available at https://lee-kangwon.github.io/dextouch/

标题: CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

作者: Siyuan Qi, Shuo Chen, Yexin Li

PubTime: 2024-01-19

Downlink: http://arxiv.org/abs/2401.10568v1

GitHub: https://github.com/bigai-ai/civrealm.|

摘要: The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization’s profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.

标题: LangProp: A code optimization framework using Language Models applied to driving

作者: Shu Ishida, Gianluca Corrado, George Fedoseev

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2401.10314v1

GitHub: https://github.com/shuishida/LangProp.|

中文摘要: LangProp是一个框架，用于在监督/强化学习环境中迭代优化大型语言模型（LLMs）生成的代码。虽然LLMs可以产生合理的解决方案，但这些解决方案往往是次优的。特别是对于代码生成任务，初始代码很可能会在某些边缘情况下失败。LangProp自动评估输入输出对数据集上的代码性能，以及捕捉任何异常，并在训练循环中将结果反馈给LLM，以便LLM可以迭代地改进它生成的代码。通过对该代码优化过程采用度量和数据驱动的训练范式，人们可以轻松地适应传统机器学习技术（如模仿学习、匕首和强化学习）的发现。我们在CARLA中展示了自动驾驶自动代码优化的第一个概念证明，表明LangProp可以生成可解释和透明的驾驶策略，这些策略可以以度量和数据驱动的方式进行验证和改进。我们的代码将是开源的，可从https://github.com/shuishida/LangProp获得。

摘要: LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting. While LLMs can generate sensible solutions zero-shot, the solutions are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, as well as catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA, showing that LangProp can generate interpretable and transparent driving policies that can be verified and improved in a metric- and data-driven way. Our code will be open-sourced and is available at https://github.com/shuishida/LangProp.

标题: CQLite: Communication-Efficient Multi-Robot Exploration Using Coverage-biased Distributed Q-Learning

作者: Ehsan Latif, Ramviyas Parasuraman

PubTime: 2024-01-18

Downlink: http://arxiv.org/abs/2307.00500v2

GitHub: https://github.com/herolab-uga/cqlite|

中文摘要: 前沿探索和强化学习在历史上被用来解决使许多移动机器人能够自主和协作地探索复杂环境的问题。这些方法需要保持内部全局地图进行导航，但它们没有考虑到机器人之间通信和信息共享的高成本。本研究提供了一种新的分布式Q学习技术CQLite，旨在最小化机器人之间的数据通信开销，同时在多机器人探索中实现快速收敛和彻底覆盖。所提出的CQLite方法使用特别映射合并，并选择性地在最近识别的前沿共享更新的Q值，以显著降低通信成本。对CQLite算法的收敛性和效率的理论分析，以及利用几个机器人在模拟室内地图上的大量数值验证，证明了该方法的新颖性。凭借超过2倍的计算和通信减少以及改进的映射性能，CQLite超越了尖端的多机器人探索技术，如快速探索随机树和深度强化学习。相关代码在\url{https://github.com/herolab-uga/cqlite}开源。

摘要: Frontier exploration and reinforcement learning have historically been used to solve the problem of enabling many mobile robots to autonomously and cooperatively explore complex surroundings. These methods need to keep an internal global map for navigation, but they do not take into consideration the high costs of communication and information sharing between robots. This study offers CQLite, a novel distributed Q-learning technique designed to minimize data communication overhead between robots while achieving rapid convergence and thorough coverage in multi-robot exploration. The proposed CQLite method uses ad hoc map merging, and selectively shares updated Q-values at recently identified frontiers to significantly reduce communication costs. The theoretical analysis of CQLite’s convergence and efficiency, together with extensive numerical verification on simulated indoor maps utilizing several robots, demonstrates the method’s novelty. With over 2x reductions in computation and communication alongside improved mapping performance, CQLite outperformed cutting-edge multi-robot exploration techniques like Rapidly Exploring Random Trees and Deep Reinforcement Learning. Related codes are open-sourced at \url{https://github.com/herolab-uga/cqlite}.

== Object Detection ==

标题: GALA: Generating Animatable Layered Assets from a Single Scan

作者: Taeksoo Kim, Byungjun Kim, Shunsuke Saito

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12979v1

Project: https://snuvclab.github.io/gala/|

中文摘要: 我们介绍了GALA，它是一个框架，将单层穿着衣服的3D人体网格作为输入，并将其分解为完整的多层3D资产。然后，输出可以与其他资产相结合，以创建具有任何姿势的新的穿着衣服的人类化身。现有的重建方法通常将穿着衣服的人视为单层几何图形，并忽略了人与发型、衣服和配饰的固有组合性，从而限制了网格在下游应用中的效用。将单层网格分解成单独的层是一项具有挑战性的任务，因为它需要为严重遮挡的区域合成似是而非的几何形状和纹理。此外，即使成功分解，网格在姿态和身体形状方面也没有标准化，无法用新的身份和姿态进行连贯的合成。为了应对这些挑战，我们建议利用预训练的2D扩散模型的一般知识作为人类和其他资产的几何和外观先验。我们首先使用从多视图2D分割中提取的3D表面分割来分离输入网格。然后，我们使用一种新的姿态引导分数蒸馏采样（SDS）损失来合成姿态空间和正则空间中不同层的缺失几何。一旦我们完成修复高保真3D几何体，我们还将相同的SDS损失应用于其纹理，以获得完整的外观，包括最初遮挡的区域。通过一系列的分解步骤，我们在一个共享的规范空间中获得了多层3D资产，该空间根据姿势和人体形状进行了标准化，因此支持毫不费力地合成新的身份和用新的姿势复活。与现有的解决方案相比，我们的实验证明了我们的方法对于分解、规范化和合成任务的有效性。

摘要: We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry and overlook the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, we propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. We first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then we synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, we obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. Our experiments demonstrate the effectiveness of our approach for decomposition, canonicalization, and composition tasks compared to existing solutions.

标题: Tracking Any Object Amodally

作者: Cheng-Yen Hsieh, Tarasha Khurana, Achal Dave

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2312.12433v2

Project: https://tao-amodal.github.io|

中文摘要: 模态感知，即从部分可见性中理解完整物体结构的能力，是一项基本技能，即使对婴儿来说也是如此。它的意义延伸到自动驾驶等应用，在这些应用中，清楚地了解严重遮挡的物体至关重要。然而，现代检测和跟踪算法经常忽略这一关键能力，这可能是由于大多数数据集中模态注释的流行。为了解决amodal数据的稀缺问题，我们引入了TAO-Amodal基准测试，在数千个视频序列中包含880个不同的类别。我们的数据集包括可见和遮挡对象的模态和模态边界框，包括部分帧外的对象。为了增强具有对象持久性的模态跟踪，我们利用一个轻量级插件模块amodal expander，通过对数百个视频序列进行数据增强的微调，将标准模态跟踪器转换为amodal跟踪器。在TAO-Amodal上对遮挡目标的检测和跟踪分别实现了3.3%和1.6%的改进。当对人进行评估时，与最先进的模态基线相比，我们的方法产生了2倍的显著改进。

摘要: Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of modal annotations in most datasets. To address the scarcity of amodal data, we introduce the TAO-Amodal benchmark, featuring 880 diverse categories in thousands of video sequences. Our dataset includes amodal and modal bounding boxes for visible and occluded objects, including objects that are partially out-of-frame. To enhance amodal tracking with object permanence, we leverage a lightweight plug-in module, the amodal expander, to transform standard, modal trackers into amodal ones through fine-tuning on a few hundred video sequences with data augmentation. We achieve a 3.3% and 1.6% improvement on the detection and tracking of occluded objects on TAO-Amodal. When evaluated on people, our method produces dramatic improvements of 2x compared to state-of-the-art modal baselines.

标题: SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

作者: Hanxue Gu, Roy Colglazier, Haoyu Dong

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12974v1

GitHub: https://github.com/mazurowski-lab/SegmentAnyBone.|

中文摘要: 磁共振成像（MRI）在放射学中至关重要，它提供了对人体的非侵入性和高质量的洞察。将MRI精确分割成不同的器官和组织将是非常有益的，因为它将允许对图像内容的更高水平的理解，并实现重要的测量，这对于准确的诊断和有效的治疗计划是必不可少的。具体来说，在MRI中分割骨骼将允许对肌肉骨骼状况进行更定量的评估，而这种评估在当前的放射学实践中基本上不存在。骨MRI分割的困难由有限的算法可公开使用的事实来说明，并且文献中包含的那些算法通常针对特定的解剖区域。在我们的研究中，我们提出了一个通用的，公开可用的深度学习模型，用于跨多个标准MRI位置的MRI中的骨分割。所提出的模型可以在两种模式下运行：全自动分割和基于提示的分割。我们的贡献包括（1）收集和注释跨各种MRI协议的新MRI数据集，包括跨不同解剖区域的300多个注释卷和8485个注释切片；（2）研究用于自动分段的几种标准网络架构和策略；（3）引入SegmentAnyBone，这是一种创新的基于基础模型的方法，扩展了SegmentAnyBone模型（SAM）；（4）算法与以往算法的比较分析；以及（5）跨不同解剖位置和MRI序列以及外部数据集对我们的算法进行概括分析。我们在https：//github.com/mazurowski-lab/SegmentAnyBone公开发布了我们的模型。

摘要: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment planning. Specifically, segmenting bones in MRI would allow for more quantitative assessments of musculoskeletal conditions, while such assessments are largely absent in current radiological practice. The difficulty of bone MRI segmentation is illustrated by the fact that limited algorithms are publicly available for use, and those contained in the literature typically address a specific anatomic area. In our study, we propose a versatile, publicly available deep-learning model for bone segmentation in MRI across multiple standard MRI locations. The proposed model can operate in two modes: fully automated segmentation and prompt-based segmentation. Our contributions include (1) collecting and annotating a new MRI dataset across various MRI protocols, encompassing over 300 annotated volumes and 8485 annotated slices across diverse anatomic regions; (2) investigating several standard network architectures and strategies for automated segmentation; (3) introducing SegmentAnyBone, an innovative foundational model-based approach that extends Segment Anything Model (SAM); (4) comparative analysis of our algorithm and previous approaches; and (5) generalization analysis of our algorithm across different anatomical locations and MRI sequences, as well as an external dataset. We publicly release our model at https://github.com/mazurowski-lab/SegmentAnyBone.

标题: Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration

作者: Yifan Zhang, Siyu Ren, Junhui Hou

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12452v1

GitHub: https://github.com/Eaphan/NCLR|

摘要: This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, named NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid transformation aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid transformation. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding their relative pose. We demonstrate NCLR’s efficacy by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network’s understanding abilities and effectiveness of learned representation. Code will be available at \url{https://github.com/Eaphan/NCLR}.

标题: NIV-SSD: Neighbor IoU-Voting Single-Stage Object Detector From Point Cloud

作者: Shuai Liu, Di Wang, Quan Wang

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12447v1

GitHub: https://github.com/Say2L/NIV-SSD.|

中文摘要: 以前的单级检测器通常遭受定位精度和分类置信度之间的不一致。为了解决错位问题，我们引入了一种新的纠正方法，称为邻居IoU投票（NIV）策略。通常，分类和回归被视为独立的分支，因此很难在它们之间建立联系。因此，分类置信度不能准确反映回归质量。NIV策略可以作为分类和回归分支之间的桥梁，通过从回归输出中计算两种类型的统计数据来校正分类置信度。此外，为了缓解点密集的完整对象（容易对象）和点稀疏的不完整对象（困难对象）检测精度的不平衡，我们提出了一种新的数据扩充方案，称为对象重采样。它通过将简单对象的一部分随机转换为困难对象来对简单对象进行欠采样和对困难对象进行过采样。最后，结合NIV策略和目标重采样增强，我们设计了一种高效的单级检测器，称为NIV-SSD。在几个数据集上的大量实验表明了NIV策略的有效性和NIV-SSD检测器的竞争性能。代码将在https://github.com/Say2L/NIV-SSD上提供。

摘要: Previous single-stage detectors typically suffer the misalignment between localization accuracy and classification confidence. To solve the misalignment problem, we introduce a novel rectification method named neighbor IoU-voting (NIV) strategy. Typically, classification and regression are treated as separate branches, making it challenging to establish a connection between them. Consequently, the classification confidence cannot accurately reflect the regression quality. NIV strategy can serve as a bridge between classification and regression branches by calculating two types of statistical data from the regression output to correct the classification confidence. Furthermore, to alleviate the imbalance of detection accuracy for complete objects with dense points (easy objects) and incomplete objects with sparse points (difficult objects), we propose a new data augmentation scheme named object resampling. It undersamples easy objects and oversamples difficult objects by randomly transforming part of easy objects into difficult objects. Finally, combining the NIV strategy and object resampling augmentation, we design an efficient single-stage detector termed NIV-SSD. Extensive experiments on several datasets indicate the effectiveness of the NIV strategy and the competitive performance of the NIV-SSD detector. The code will be available at https://github.com/Say2L/NIV-SSD.

标题: MAST: Video Polyp Segmentation with a Mixture-Attention Siamese Transformer

作者: Geng Chen, Junqing Yang, Xiaozhou Pu

PubTime: 2024-01-23

Downlink: http://arxiv.org/abs/2401.12439v1

GitHub: https://github.com/Junqing-Yang/MAST.|

中文摘要: 结肠镜检查视频中息肉的准确分割对息肉治疗和结直肠癌的早期预防具有重要意义。然而，由于与结肠镜检查视频中的长程时空关系建模相关的困难，这是具有挑战性的。在本文中，我们用一种新的混合注意力连体Transformer model（MAST）来解决这一具有挑战性的任务，该Transformer model通过混合注意力机制显式地模拟长程时空关系，以实现精确的息肉分割。具体来说，我们首先构建一个连体Transformer model架构来联合编码成对的视频帧以表示它们的特征。然后，我们设计了一个混合注意模块来利用帧内和帧间的相关性，增强具有丰富时空关系的特征。最后，增强的特征被馈送到两个并行解码器，用于预测分割图。据我们所知，我们的MAST是第一个专用于视频息肉分割的transformer模型。在大规模SUN-SEG基准测试上进行的大量实验证明，与尖端竞争对手相比，MAST具有卓越的性能。我们的代码可在https://github.com/Junqing-Yang/MAST。

摘要: Accurate segmentation of polyps from colonoscopy videos is of great significance to polyp treatment and early prevention of colorectal cancer. However, it is challenging due to the difficulties associated with modelling long-range spatio-temporal relationships within a colonoscopy video. In this paper, we address this challenging task with a novel Mixture-Attention Siamese Transformer (MAST), which explicitly models the long-range spatio-temporal relationships with a mixture-attention mechanism for accurate polyp segmentation. Specifically, we first construct a Siamese transformer architecture to jointly encode paired video frames for their feature representations. We then design a mixture-attention module to exploit the intra-frame and inter-frame correlations, enhancing the features with rich spatio-temporal relationships. Finally, the enhanced features are fed to two parallel decoders for predicting the segmentation maps. To the best of our knowledge, our MAST is the first transformer model dedicated to video polyp segmentation. Extensive experiments on the large-scale SUN-SEG benchmark demonstrate the superior performance of MAST in comparison with the cutting-edge competitors. Our code is publicly available at https://github.com/Junqing-Yang/MAST.

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持。谢谢提供建议
如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文

你可能感兴趣的:(每日论文,机器人,深度学习,人工智能,机器学习)

计算机毕业设计Python知识图谱中华古诗词可视化古诗词情感分析古诗词智能问答系统 AI大模型自动写诗大数据毕业设计(源码+LW文档+PPT+讲解) B站计算机毕业设计大学大数据毕业设计人工智能课程设计知识图谱 python 大数据深度学习爬虫
温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！信息安全/网络安全大模型、大数据、深度学习领域中科院硕士在读，所有源码均一手开发！感兴趣的可以先收藏起来，还有大家在毕设选题，项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人介绍资料《Python知识图谱中华古诗
AI编程基础：学习Python是进入AI领域的必经之路（文末含学习路线与知识推荐） Clf丶忆笙 AI 人工智能开发全栈教程学习 python 人工智能 ai
文章目录Python市场行情：AI开发的首选语言为什么学习Python对AI至关重要AI开发所需的Python知识体系Python编程基础科学计算与数据处理机器学习与深度学习性能优化与并行计算Python学习路线推荐阶段一：Python编程基础（1-2个月）阶段二：科学计算与数据处理（1-2个月）阶段三：机器学习基础（2-3个月）阶段四：深度学习与AI专项（3-6个月）阶段五：进阶与专项深化（持续
强化学习：Deep Deterministic Policy Gradient (DDPG) 学习笔记烨川南强化学习学习笔记算法人工智能机器学习
一、DDPG是什么？1.1核心概念DDPG=Deep+Deterministic+PolicyGradientDeep：使用深度神经网络和类似DQN的技术（经验回放、目标网络）Deterministic：输出确定的动作（而不是概率分布）PolicyGradient：基于策略梯度的方法，优化策略以最大化累积奖励1.2算法特点特性说明连续动作空间直接输出连续动作值（如方向盘角度、机器人关节扭矩）离线学
计算机毕业设计Python知识图谱中华古诗词可视化古诗词情感分析古诗词智能问答系统 AI大模型自动写诗大数据毕业设计(源码+LW文档+PPT+讲解)
温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！温馨提示：文末有CSDN平台官方提供的学长联系方式的名片！信息安全/网络安全大模型、大数据、深度学习领域中科院硕士在读，所有源码均一手开发！感兴趣的可以先收藏起来，还有大家在毕设选题，项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人介绍资料Python知识图谱中华古诗词
什么是端到端自动驾驶未来创世纪自动驾驶自动驾驶人工智能机器学习
一、与传统架构的核心差异工作流程传统模块化架构是分模块串联，比如感知模块先识别出前方有交通信号灯变红，然后将此信息传递给决策模块，决策模块决定要停车，接着规划模块规划出减速的路径和方式，最后控制模块执行停车操作。而端到端架构是直接将传感器的原始数据（如摄像头拍摄的视频、激光雷达的点云数据等）输入给一个单一深度学习模型，模型直接输出控制指令，如控制车辆的转向角度、油门开度或刹车力度。以自动驾驶汽车在
解锁数据的秘密：用大型语言模型编织异构数据的交响乐步子哥智能涌现 AGI通用人工智能语言模型人工智能自然语言处理
在数据的浩瀚海洋中，信息如同一座座孤岛，形态各异、语言不同。如何将这些分散的岛屿连接成一片大陆，为人工智能应用提供坚实的基础？这是数据工程师们长久以来的挑战。传统方法耗时费力，宛如手工编织一张巨大的网。而今，大型语言模型（LLMs）如同一股清风，带来了自动化整合的希望。本文将以通俗易懂的方式，深入探讨如何利用LLMs在数据工程中实现异构数据的提取与整合，聚焦于高等教育中学习障碍这一独特场景，揭示人
自动驾驶系列—加速自动驾驶系统开发：多型号SoC快速适配的最佳实践学步_技术自动驾驶自动驾驶人工智能机器学习 SoC适配
欢迎来到我的技术小筑，一个专为技术探索者打造的交流空间。在这里，我们不仅分享代码的智慧，还探讨技术的深度与广度。无论您是资深开发者还是技术新手，这里都有一片属于您的天空。让我们在知识的海洋中一起航行，共同成长，探索技术的无限可能。探索专栏：学步_技术的首页——持续学习，不断进步，让学习成为我们共同的习惯，让总结成为我们前进的动力。技术导航：人工智能：深入探讨人工智能领域核心技术。自动驾驶：分享自动
自动驾驶技术研发适用Infortrend普安存储IEC平台
Infortrend普安存储IEC私有云平台，轻松高效应用无人驾驶技术自动驾驶汽车（例如自动驾驶出租车、无人驾驶公交）和无人驾驶飞行器（UAV）依靠摄像头、物联网传感器、雷达、GPS采集的实时数据瞬间做出决策。自动驾驶系统作为核心部分，不间断分析环境条件，应对潜在风险，确保乘客和货物运输安全。Autopilot应用程序在开发和模拟中，大数据、AI（人工智能）、ML（机器学习）等技术能否高速发挥作用
提升自动驾驶导航能力：基于深度学习的场景理解技术星辰和大海都需要门票路径规划算法自动驾驶深度学习人工智能
EnhancingAutonomousVehicleNavigationUsingDeepLearning-BasedSceneUnderstanding提升自动驾驶导航能力：基于深度学习的场景理解技术摘要-为应对复杂环境下的自动驾驶导航，系统高度依赖场景理解的准确性。本研究提出一种基于深度学习的新方法，将目标识别、场景分割、运动预测与强化学习相结合以提升导航性能。该方法首先采用U-Net架构分解
通信技术以及5G和AI保障电网安全与网络安全鲸 Blue 安全 5G 人工智能
摘要：电网安全是电力的基础，随着智能电网的快速发展，越来越多的ICT信息通信技术被应用到电力网络。本文分析了历史上一些重大电网安全与网络安全事故，介绍了电网安全与网络安全、通信技术与电网安全的关系以及相应的电网安全标准，分享了中国国家电网公司保障电网安全的相关措施和成功经验，并对5G、AI等新技术在电网安全和网络安全方面的创新和应用做了分析和展望。关键词：电网安全；网络安全；5G；人工智能引言从1
阿里云瑶池数据库 Data Agent for Meta 正式发布，让 AI 更懂你的业务！数据库人工智能知识资讯
背景随着生成式人工智能（GenerativeAI）从概念验证迈向规模化商业落地，AIAgent已成为企业核心业务流程的重要组成部分。然而，当模型调用日益便捷时，核心痛点已不再是模型本身，而是集中在一个关键要素上：数据。AIAgent的落地瓶颈已从技术能力转向高质量、高相关性、安全合规的数据供给。企业面临的核心挑战在于：数据孤岛导致知识库分散，通用大模型难以理解专业业务传统数据管理依赖人工开发维护，
Python入门Day1 Zonda要好好学习 Python python 开发语言
Python介绍Python的发展历程为什么叫PythonPython本来是蟒蛇的意思，用来象征写代码的程序员。因为相对于Java、C++等程序，Python非常简单，所以写Python也的程序员也象征“玩蛇”的程序员。Python的由来Python的历史也相对比较悠久，可以追溯到1990年，有数十年的发展历程，随着今年人工智能和数据挖掘的发展，Python飞速发展。ABC语言是一种语言和编程环境
基于SpringBoot+Vue+大学校园图书管理系统设计和实现(源码+LW+部署讲解) 阿勇学长大数据项目实战案例 Java精品毕业设计实例微信小程序项目实战案例 spring boot vue.js 后端大学校园图书管理系统 Java毕业设计
博主介绍：✌全网粉丝50W+,csdn特邀作者、博客专家、CSDN新星计划导师、Java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和学生毕业项目实战,高校老师/讲师/同行前辈交流✌技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、安卓app、大数据、物联网、机器学习等
基于Google Gemini 探索大语言模型在医学领域应用评估和前景知来者逆 LLM 语言模型搜索引擎人工智能 Gemini 大语言模型医疗健康医疗
概述近年来，大规模语言模型（LLM）在理解和生成人类语言方面取得了显著的飞跃，这些进步不仅推动了语言学和计算机编程的发展，还为多个领域带来了创新的突破。特别是模型如GPT-3和PaLM，它们通过吸收海量文本数据，已经能够掌握复杂的语言模式。人工智能技术的迅猛发展不断推动着LLM的进化，并加速了这一领域的专业创新。这些进步是随着模型规模的扩大、数据量的增加以及计算能力的提升而逐步实现的，其中许多尖端
Java爬虫技术详解：原理、实现与优势 cyc&阿灿 Java 多线程 java 爬虫开发语言
一、什么是网络爬虫？网络爬虫（WebCrawler），又称网络蜘蛛或网络机器人，是一种自动化程序，能够按照一定的规则自动浏览和抓取互联网上的信息。爬虫技术是大数据时代获取网络数据的重要手段，广泛应用于搜索引擎、数据分析、价格监控等领域。Java作为一种稳定、高效的编程语言，凭借其强大的网络编程能力和丰富的生态库，成为开发网络爬虫的热门选择。二、Java爬虫核心组件一个完整的Java爬虫通常包含以下
AI人工智能与自动驾驶的协同创新模式 AI大模型应用之禅人工智能自动驾驶机器学习 ai
AI人工智能与自动驾驶的协同创新模式关键词：人工智能、自动驾驶、协同创新、深度学习、计算机视觉、传感器融合、决策系统摘要：本文深入探讨了人工智能与自动驾驶技术的协同创新模式。我们将从基础概念出发，逐步分析AI如何赋能自动驾驶系统，涵盖感知、决策和控制三大核心模块。文章将通过生动的比喻解释复杂技术原理，展示实际代码实现，并探讨未来发展趋势和挑战。通过这篇文章，读者将全面理解AI与自动驾驶如何相互促进
大语言模型应用指南：Gemini简介 AI大模型应用之禅人工智能数学基础计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍自然语言处理（NLP）一直是人工智能领域的热门话题之一。在NLP中，语言模型是一个重要的概念，它可以用来预测下一个单词或字符的概率。近年来，随着深度学习技术的发展，大型语言模型的研究和应用也越来越受到关注。其中，Gemini是一种新型的大型语言模型，它在多项任务上取得了优异的表现。本文将介绍Gemini的核心概念、算法原理、数学模型和公式、项目实践、实际应用场景、工具和资源推荐、未来发
基于Python的气象数据分析及可视化研究
气象数据作为地球系统科学的核心要素，其分析与可视化在气候研究、灾害预警、农业生产等领域具有战略性意义。本文以Python技术栈为基座，系统探讨气象数据的采集预处理、多维度分析模型及可视化表达范式，通过3000+字深度研究揭示Pandas时序处理、Xarray多维计算、Cartopy地理可视化等工具的核心方法论。内容涵盖全球再分析数据挖掘、极端天气模式识别、动态热力图构建等实战场景，并引入机器学习预
使用 Ollama 部署 Deepseek 想知道哇大语言模型人工智能语言模型
使用Ollama部署Deepseek模型Ollama与传统部署方法的主要区别特性传统部署方法（之前的文章）Ollama方法部署复杂度高（需要手动设置环境、依赖和量化）低（简化的命令行界面）技术要求需要Python和机器学习库知识基本命令行知识即可灵活性高度可定制（训练参数、模型结构等）相对较低，但足够大多数使用场景资源管理手动管理（需自行优化内存使用）自动处理（内置优化）API集成需要自行实现内置
第G1周：生成对抗网络（GAN）入门
本文为365天深度学习训练营原作者：K同学啊基础任务：1.了解什么是生成对抗网络2.生成对抗网络结构是怎么样的3.学习本文代码，并跑通代码进阶任务：调用训练好的模型生成新图像一、理论基础生成对抗网络（GenerativeAdversarialNetworks,GAN）是近年来深度学习领域的一个热点方向。GAN并不指代某一个具体的神经网络，而是指一类基于博弈思想而设计的神经网络。GAN由两个分别被称
深入了解Transformer模型及其优缺点
目录前言1Transformer结构特点1.1注意力机制（Self-Attention）1.2编码器-解码器架构1.3位置编码和基于注意力的损失函数2Transformer模型优缺点分析2.1Transformer模型的优点2.2Transformer模型的缺点3应用领域结语前言在当今人工智能领域，自然语言处理的关键问题之一是解决文本理解和生成中的挑战。传统的循环神经网络虽然在处理序列数据方面取得
Python环境搭建：从零开始配置开发环境码农垦荒笔记 Python python 开发语言经验分享
一、为什么你需要学会搭建Python环境？1.Python是什么？它能做什么？想象Python就像一把“万能工具刀”——无论是想做个网站、分析数据、写个小游戏，还是研究人工智能，它都能帮你搞定。比如：豆瓣、Instagram的后台用了Python科学家用Python分析实验数据连ChatGPT的开发者也会用到Python库2.为什么环境配置这么重要？举个生活例子就像做菜前要先准备好锅和调料，写Py
如何学习才能更好地理解人工智能工程技术专业和其他信息技术专业的关联性？人工智能教学实践 python编程实践人工智能学习人工智能
要深入理解人工智能工程技术专业与其他信息技术专业的关联性，需要跳出单一专业的学习框架，通过“理论筑基-实践串联-跨学科整合”的路径构建系统性认知。以下是分阶段、可落地的学习方法：一、建立“专业关联”的理论认知框架绘制知识关联图谱操作方法：用XMind或Notion绘制思维导图，以AI为中心，辐射关联专业的核心技术节点。例如：AI（机器学习）├─数据支撑：大数据技术（Hadoop/Spark）+数据
数据分析的智能化变革：AI人工智能 AI大模型应用工坊数据分析人工智能数据挖掘 ai
数据分析的智能化变革：AI人工智能关键词：数据分析、智能化变革、AI人工智能、机器学习、深度学习摘要：本文深入探讨了数据分析领域借助AI人工智能实现的智能化变革。详细阐述了相关核心概念、算法原理、数学模型，通过具体的项目实战展示了AI在数据分析中的应用，介绍了实际应用场景以及可利用的工具和资源。同时对数据分析智能化变革的未来发展趋势与挑战进行了总结，并解答了常见问题，为读者全面了解这一变革提供了丰
Word转LaTeX排版6大技巧加油吧zkf 目标跟踪计算机视觉目标检测机器学习人工智能 python
Word内容快速排版到TeX格式的技巧分享（含多种实用方法）在科研论文、技术报告或毕业论文写作中，很多同学喜欢先用Word写初稿，再迁移到LaTeX（.tex文件）进行排版。但迁移过程中常常遇到这些麻烦：Word中的公式复制过去乱码或无法编译排版格式对不上期刊/会议模板自己新写的tex文件总是出错，编译困难今天分享我在实际论文写作中总结出的几条小技巧，帮大家快速把Word内容排到LaTeX，而且能
Anconda环境下Vscode安装Python Java后时代程序员 python 学习面试
最后Python崛起并且风靡，因为优点多、应用领域广、被大牛们认可。学习Python门槛很低，但它的晋级路线很多，通过它你能进入机器学习、数据挖掘、大数据，CS等更加高级的领域。Python可以做网络应用，可以做科学计算，数据分析，可以做网络爬虫，可以做机器学习、自然语言处理、可以写游戏、可以做桌面应用…Python可以做的很多，你需要学好基础，再选择明确的方向。这里给大家分享一份全套的Pytho
注意力机制还有招？混合注意力好发不卷
2025深度学习发论文&模型涨点之——混合注意力混合注意力是一种融合多种不同类型注意力机制的技术，旨在提升模型对数据中关键特征的识别与处理能力。以SENet为例，它通过对特征通道进行全局池化操作，随后利用两个全连接层对通道的重要性进行建模，从而实现通道级的注意力分配。而CBAM则先应用空间注意力，通过利用特征图的通道最大值和平均值来突出重要区域，之后再进行通道注意力操作，借助全连接层来强化特定通道
AI大模型定义与应用概述水云桐程序员人工智能 ai 大模型
AI大模型，也成为基础模型或大规模预训练模型，指的是在海量数据上通过深度学习技术进行预训练的超大型人工智能模型。常见类型大型语言模型：这是目前最主流和成熟的大模型类型。擅长文本生成、文本理解、机器翻译、对话系统、代码生成与解释等。代表案例：GPT系列、通义千问、文心一言、KimiChat等。多模态大模型：擅长同时处理和生成多种模态的信息，如文生图、图生文、图文问答、视频理解、音频生成等。代表案例：
浅析基于深度学习算法的日语OCR技术原理及其应用场景 AI人工智能+ TEL18600524535 ocr 文字识别人工智能
在全球数字化进程加速的今天，日语作为世界第九大使用语言，其文字处理的自动化需求日益凸显，日语OCR技术应运而生。中科逸视日文OCR技术是一款基于先进人工智能技术的专业光学字符识别(OCR)解决方案，专门针对日语文本的数字化需求设计开发。能够将纸质文档、图片中的日文内容快速准确地转换为可编辑、可搜索的电子文本，大幅提升日文资料的处理效率，为企业国际化运营和个人日语学习提供强有力的技术支持。技术原理中
浅析通用文字识别OCR技术的工作过程及其应用场景 AI人工智能+ TEL18600524535 人工智能 ocr 计算机视觉图像处理文字识别
通用文字识别技术作为人工智能领域的重要分支，正深刻地改变着人们的生活与工作方式。通用文字识别技术基于光学字符识别（OCR）技术发展而来，其核心原理是通过对图像中文字的特征提取与分析，将其转化为计算机能够理解和处理的文本信息。这一过程涉及多个关键步骤：图像预处理：输入的图片可能存在光照不均、倾斜、模糊等问题，预处理阶段会对图像进行灰度化、降噪、二值化、倾斜校正等操作，以提高文字的清晰度和可识别性。例
二分查找排序算法周凡杨 java 二分查找排序算法折半
一：概念二分查找又称折半查找（折半搜索/ 二分搜索），优点是比较次数少，查找速度快，平均性能好；其缺点是要求待查表为有序表，且插入删除困难。因此，折半查找方法适用于不经常变动而查找频繁的有序列表。首先，假设表中元素是按升序排列，将表中间位置记录的关键字与查找关键字比较，如果两者相等，则查找成功；否则利用中间位置记录将表分成前、后两个子表，如果中间位置记录的关键字大于查找关键字，则进一步
java中的BigDecimal bijian1013 java BigDecimal
在项目开发过程中出现精度丢失问题，查资料用BigDecimal解决，并发现如下这篇BigDecimal的解决问题的思路和方法很值得学习，特转载。原文地址：http://blog.csdn.net/ugg/article/de
Shell echo命令详解 daizj echo shell
Shell echo命令 Shell 的 echo 指令与 PHP 的 echo 指令类似，都是用于字符串的输出。命令格式： echo string 您可以使用echo实现更复杂的输出格式控制。 1.显示普通字符串: echo "It is a test" 这里的双引号完全可以省略，以下命令与上面实例效果一致： echo Itis a test 2.显示转义
Oracle DBA 简单操作周凡杨 oracle dba sql
--执行次数多的SQL select sql_text,executions from ( select sql_text,executions from v$sqlarea order by executions desc ) where rownum<81; &nb
画图重绘朱辉辉33 游戏
我第一次接触重绘是编写五子棋小游戏的时候，因为游戏里的棋盘是用线绘制的，而这些东西并不在系统自带的重绘里，所以在移动窗体时，棋盘并不会重绘出来。所以我们要重写系统的重绘方法。在重写系统重绘方法时，我们要注意一定要调用父类的重绘方法，即加上super.paint(g)，因为如果不调用父类的重绘方式，重写后会把父类的重绘覆盖掉，而父类的重绘方法是绘制画布，这样就导致我们
线程之初体验西蜀石兰线程
一直觉得多线程是学Java的一个分水岭，懂多线程才算入门。之前看《编程思想》的多线程章节，看的云里雾里，知道线程类有哪几个方法，却依旧不知道线程到底是什么？书上都写线程是进程的模块，共享线程的资源，可是这跟多线程编程有毛线的关系，呜呜。。。线程其实也是用户自定义的任务，不要过多的强调线程的属性，而忽略了线程最基本的属性。你可以在线程类的run()方法中定义自己的任务，就跟正常的Ja
linux集群互相免登陆配置林鹤霄 linux
配置ssh免登陆 1、生成秘钥和公钥 ssh-keygen -t rsa 2、提示让你输入，什么都不输，三次回车之后会在~下面的.ssh文件夹中多出两个文件id_rsa 和 id_rsa.pub 其中id_rsa为秘钥，id_rsa.pub为公钥，使用公钥加密的数据只有私钥才能对这些数据解密 c
mysql : Lock wait timeout exceeded; try restarting transaction aigo mysql
原文：http://www.cnblogs.com/freeliver54/archive/2010/09/30/1839042.html 原因是你使用的InnoDB 表类型的时候, 默认参数:innodb_lock_wait_timeout设置锁等待的时间是50s, 因为有的锁等待超过了这个时间,所以抱错. 你可以把这个时间加长,或者优化存储
Socket编程基本的聊天实现。 alleni123 socket
public class Server { //用来存储所有连接上来的客户 private List<ServerThread> clients; public static void main(String[] args) { Server s = new Server(); s.startServer(9988); } publi
多线程监听器事件模式(一个简单的例子) 百合不是茶线程监听模式
多线程的事件监听器模式监听器时间模式经常与多线程使用,在多线程中如何知道我的线程正在执行那什么内容,可以通过时间监听器模式得到创建多线程的事件监听器模式思路: 1, 创建线程并启动,在创建线程的位置设置一个标记 2,创建队
spring InitializingBean接口 bijian1013 java spring
spring的事务的TransactionTemplate，其源码如下： public class TransactionTemplate extends DefaultTransactionDefinition implements TransactionOperations, InitializingBean{ ... } TransactionTemplate继承了DefaultT
Oracle中询表的权限被授予给了哪些用户 bijian1013 oracle 数据库权限
Oracle查询表将权限赋给了哪些用户的SQL，以备查用。 select t.table_name as "表名", t.grantee as "被授权的属组", t.owner as "对象所在的属组"
【Struts2五】Struts2 参数传值 bit1129 struts2
Struts2中参数传值的3种情况 1.请求参数绑定到Action的实例字段上 2.Action将值传递到转发的视图上 3.Action将值传递到重定向的视图上一、请求参数绑定到Action的实例字段上以及Action将值传递到转发的视图上 Struts可以自动将请求URL中的请求参数或者表单提交的参数绑定到Action定义的实例字段上，绑定的规则使用ognl表达式语言
【Kafka十四】关于auto.offset.reset[Q/A] bit1129 kafka
I got serveral questions about auto.offset.reset. This configuration parameter governs how consumer read the message from Kafka when there is no initial offset in ZooKeeper or
nginx gzip压缩配置 ronin47 nginx gzip 压缩范例
nginx gzip压缩配置更多 0 nginx gzip 配置随着nginx的发展，越来越多的网站使用nginx，因此nginx的优化变得越来越重要，今天我们来看看nginx的gzip压缩到底是怎么压缩的呢？ gzip(GNU-ZIP)是一种压缩技术。经过gzip压缩后页面大小可以变为原来的30%甚至更小，这样，用
java-13.输入一个单向链表，输出该链表中倒数第 k 个节点 bylijinnan java
two cursors. Make the first cursor go K steps first. /* * 第 13 题：题目：输入一个单向链表，输出该链表中倒数第 k 个节点 */ public void displayKthItemsBackWard(ListNode head,int k){ ListNode p1=head,p2=head;
Spring源码学习-JdbcTemplate queryForObject bylijinnan java spring
JdbcTemplate中有两个可能会混淆的queryForObject方法： 1. Object queryForObject(String sql, Object[] args, Class requiredType) 2. Object queryForObject(String sql, Object[] args, RowMapper rowMapper) 第1个方法是只查
[冰川时代]在冰川时代,我们需要什么样的技术? comsci 技术
看美国那边的气候情况....我有个感觉...是不是要进入小冰期了? 那么在小冰期里面...我们的户外活动肯定会出现很多问题...在室内呆着的情况会非常多...怎么在室内呆着而不发闷...怎么用最低的电力保证室内的温度.....这都需要技术手段... &nb
js 获取浏览器型号 cuityang js 浏览器
根据浏览器获取iphone和apk的下载地址 <!DOCTYPE html> <html> <head> <meta charset="utf-8" content="text/html"/> <meta name=
C# socks5详解转 dalan_123 socket C#
http://www.cnblogs.com/zhujiechang/archive/2008/10/21/1316308.html 这里主要讲的是用.NET实现基于Socket5下面的代理协议进行客户端的通讯，Socket4的实现是类似的，注意的事，这里不是讲用C#实现一个代理服务器，因为实现一个代理服务器需要实现很多协议，头大，而且现在市面上有很多现成的代理服务器用，性能又好，
运维 Centos问题汇总 dcj3sjt126com 云主机
一、sh 脚本不执行的原因 sh脚本不执行的原因只有2个 1.权限不够 2.sh脚本里路径没写完整。二、解决You have new mail in /var/spool/mail/root 修改/usr/share/logwatch/default.conf/logwatch.conf配置文件 MailTo = MailFrom 三、查询连接数
Yii防注入攻击笔记 dcj3sjt126com sql WEB安全 yii
网站表单有注入漏洞须对所有用户输入的内容进行个过滤和检查，可以使用正则表达式或者直接输入字符判断，大部分是只允许输入字母和数字的，其它字符度不允许；对于内容复杂表单的内容，应该对html和script的符号进行转义替换：尤其是<,>,',"",&这几个符号这里有个转义对照表： http://blog.csdn.net/xinzhu1990/articl
MongoDB简介[一] eksliang mongodb MongoDB简介
MongoDB简介转载请出自出处：http://eksliang.iteye.com/blog/2173288 1.1易于使用 MongoDB是一个面向文档的数据库，而不是关系型数据库。与关系型数据库相比，面向文档的数据库不再有行的概念，取而代之的是更为灵活的“文档”模型。另外，不
zookeeper windows 入门安装和测试 greemranqq zookeeper 安装分布式
一、序言以下是我对zookeeper 的一些理解： zookeeper 作为一个服务注册信息存储的管理工具，好吧，这样说得很抽象，我们举个“栗子”。栗子1号：假设我是一家KTV的老板，我同时拥有5家KTV，我肯定得时刻监视
Spring之使用事务缘由(2-注解实现) ihuning spring
Spring事务注解实现 1. 依赖包： 1.1 spring包： spring-beans-4.0.0.RELEASE.jar spring-context-4.0.0.
iOS App Launch Option 啸笑天 option
iOS 程序启动时总会调用application:didFinishLaunchingWithOptions:，其中第二个参数launchOptions为NSDictionary类型的对象，里面存储有此程序启动的原因。 launchOptions中的可能键值见UIApplication Class Reference的Launch Options Keys节。 1、若用户直接
jdk与jre的区别（_） macroli java jvm jdk
简单的说JDK是面向开发人员使用的SDK，它提供了Java的开发环境和运行环境。SDK是Software Development Kit 一般指软件开发包，可以包括函数库、编译程序等。 JDK就是Java Development Kit JRE是Java Runtime Enviroment是指Java的运行环境，是面向Java程序的使用者，而不是开发者。如果安装了JDK，会发同你
Updates were rejected because the tip of your current branch is behind qiaolevip 学习永无止境每天进步一点点众观千象 git
$ git push joe prod-2295-1 To [email protected]:joe.le/dr-frontend.git ! [rejected] prod-2295-1 -> prod-2295-1 (non-fast-forward) error: failed to push some refs to '[email protected]
[一起学Hive]之十四-Hive的元数据表结构详解 superlxw1234 hive hive元数据结构
关键字：Hive元数据、Hive元数据表结构之前在 “[一起学Hive]之一–Hive概述，Hive是什么”中介绍过，Hive自己维护了一套元数据，用户通过HQL查询时候，Hive首先需要结合元数据，将HQL翻译成MapReduce去执行。本文介绍一下Hive元数据中重要的一些表结构及用途，以Hive0.13为例。文章最后面，会以一个示例来全面了解一下，
Spring 3.2.14，4.1.7，4.2.RC2发布 wiselyman Spring 3
Spring 3.2.14、4.1.7及4.2.RC2于6月30日发布。其中Spring 3.2.1是一个维护版本(维护周期到2016-12-31截止)，后续会继续根据需求和bug发布维护版本。此时，Spring官方强烈建议升级Spring框架至4.1.7 或者将要发布的4.2 。其中Spring 4.1.7主要包含这些更新内容。