晓理紫

[晓理紫]每日论文分享(有中文摘要，源码或项目地址)--大模型、扩散模型、视觉

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文。

为了答谢各位网友的支持，从今日起免费为300名读者提供订阅主题论文服务，只需VX关注公号并回复{邮箱+论文主题}（如：[email protected] + chatgpt@large language model @LLM）,主题必须是同一个领域，最多三个关键词。解释权归博主所有

分类:

大语言模型LLM

视觉模型VLM

扩散模型

视觉语言导航VLN

强化学习 RL

模仿学习 IL

机器人

开放词汇，检测分割

== LLM ==

标题: Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models

作者: Mitodru Niyogi, Arnab Bhattacharya

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.18034v1

Project: https://www.bharatgpts.com|

中文摘要: 我们介绍了Gyan AI Paramanu（“atom”），这是一个用于印度语言的新颖语言模型家族。它是一个自回归单语、双语和多语言印度语模型的集合，在单个GPU上为10种印度语言（阿萨姆语、孟加拉语、印地语、康卡尼语、麦蒂里语、马拉地语、奥迪亚语、梵语、泰米尔语、泰卢固语）从头开始进行预训练，跨越5种脚本（孟加拉语、梵文、奥迪亚语、泰米尔语、泰卢固语），大小从13.29 M到367.5 M不等。这些模型在单个GPU上以1024的上下文大小进行预训练。这些模型非常高效、小巧、快速、强大。我们还开发了一个高效的最先进的印度语标记器，甚至可以标记看不见的语言。为了避免我们的多语言mParamanu模型中的“多语言诅咒”，我们使用相同的脚本通过类型分组对可比较的语料库进行预训练。我们对孟加拉语、印地语和梵语的语法、连贯性、创造性和真实性指标的开放式文本生成预训练模型进行了人工评估。我们的孟加拉语、印地语和梵语模型的性能远远优于GPT-3.5-Turbo（ChatGPT）、Bloom 7B、LLaMa-2 7B、OPT 6.7 B、GPT-J 6B、GPTNeo 1.3 B、GPT2-XL大型语言模型（LLMs），尽管其尺寸比标准7B LLMs小66至20倍。要在我们预训练的模型上运行推理，CPU就够了，不需要GPU。我们还根据各自语言的23k指令对预训练的孟加拉语、印地语、马拉地语、泰米尔语和泰卢固语模型进行了指令调整。我们的预训练和指令调整模型是同类模型中的第一个，是有史以来为印度语开发的最强大、最高效的小型生成语言模型，各种结果导致这样的结论：高质量的生成语言模型在没有大量计算能力和大量参数的情况下是可能的。我们计划在https：//www.bharatgpts.com发布我们的模型。

摘要: We present Gyan AI Paramanu (“atom”), a family of novel language models for Indian languages. It is a collection of auto-regressive monolingual, bilingual, and multilingual Indic language models pretrained from scratch on a single GPU for 10 Indian languages (Assamese, Bangla, Hindi, Konkani, Maithili, Marathi, Odia, Sanskrit, Tamil, Telugu) across 5 scripts (Bangla, Devanagari, Odia, Tamil, Telugu) of varying sizes ranging from 13.29M to 367.5M.The models are pretrained with a context size of 1024 on a single GPU. The models are very efficient, small, fast, and powerful. We have also developed an efficient most advanced Indic tokenizer that can even tokenize unseen languages. In order to avoid the “curse of multi-linguality” in our multilingual mParamanu model, we pretrained on comparable corpora by typological grouping using the same script. We performed human evaluation of our pretrained models for open end text generation on grammar, coherence, creativity, and factuality metrics for Bangla, Hindi, and Sanskrit. Our Bangla, Hindi, and Sanskrit models outperformed GPT-3.5-Turbo (ChatGPT), Bloom 7B, LLaMa-2 7B, OPT 6.7B, GPT-J 6B, GPTNeo 1.3B, GPT2-XL large language models (LLMs) by a large margin despite being smaller in size by 66 to 20 times compared to standard 7B LLMs. To run inference on our pretrained models, CPU is enough, and GPU is not needed. We also instruction-tuned our pretrained Bangla, Hindi, Marathi, Tamil, and Telugu models on 23k instructions in respective languages. Our pretrained and instruction-tuned models which are first of its kind, most powerful efficient small generative language models ever developed for Indic languages, and the various results lead to the conclusion that high quality generative language models are possible without high amount of compute power and humongous number of parameters. We plan to release our models at https://www.bharatgpts.com.

标题: An Empirical Study of Scaling Law for OCR

作者: Miao Rang, Zhenni Bi, Chuanjian Liu

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.00028v3

GitHub: https://github.com/large-ocr-model/large-ocr-model.github.io|

中文摘要: 模型大小、数据量、计算和模型性能的规律已经在自然语言处理（NLP）领域得到了广泛的研究。然而，光学字符识别（OCR）中的标度律还没有被研究。为了解决这个问题，我们进行了全面的研究，包括检查文本识别领域的性能与模型规模、数据量和计算之间的相关性。总之，该研究证明了当其他影响因素保持不变时，性能和模型大小以及训练数据量之间的平滑幂律。此外，我们还构建了一个名为REBU-Syn的大规模数据集，其中包括600万个真实样本和1800万个合成样本。基于我们的缩放定律和新的数据集，我们成功地训练了一个场景文本识别模型，在6个常见的测试基准上实现了新的最先进水平，平均准确率为97.42%。这些模型和数据集可在https://github.com/large-ocr-model/large-ocr-model.github.io上公开获得

摘要: The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP). However, the scaling laws in Optical Character Recognition (OCR) have not yet been investigated. To address this, we conducted comprehensive studies that involved examining the correlation between performance and the scale of models, data volume and computation in the field of text recognition.Conclusively, the study demonstrates smooth power laws between performance and model size, as well as training data volume, when other influencing factors are held constant. Additionally, we have constructed a large-scale dataset called REBU-Syn, which comprises 6 million real samples and 18 million synthetic samples. Based on our scaling law and new dataset, we have successfully trained a scene text recognition model, achieving a new state-ofthe-art on 6 common test benchmarks with a top-1 average accuracy of 97.42%. The models and dataset are publicly available at https://github.com/large-ocr-model/large-ocr-model.github.io.

标题: Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

作者: Zhen Qin, Daoyuan Chen, Bingchen Qian

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2312.06353v3

GitHub: https://github.com/alibaba/FederatedScope/tree/FedKSeed|

中文摘要: 预训练的大型语言模型（LLM）需要微调，以提高它们对自然语言指令的响应能力。联合学习提供了一种使用终端设备上的丰富数据来微调LLM而不损害数据隐私的方法。大多数现有的LLMs联合微调方法依赖于参数高效的微调技术，这可能无法达到全参数调整可能达到的性能高度。然而，由于巨大的通信成本，LLMs的联邦全参数调整是一个不小的问题。这项工作介绍了FedKSeed，它采用了有限组随机种子的零阶优化。它显著地将服务器和客户端之间的传输要求降低到仅几个随机种子和标量梯度，总计仅几千字节，使得在设备上对十亿大小的LLM进行联合全参数调优成为可能。在此基础上，我们开发了一种支持概率区分种子采样的策略，优先考虑对模型准确性有更大影响的扰动。使用各种LLM、数据集和数据分区在六个场景中进行的实验表明，我们的方法在通信效率和新任务泛化方面都优于现有的联合LLM微调方法。

摘要: Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.

标题: Efficient Large Language Models: A Survey

作者: Zhongwei Wan, Xin Wang, Che Liu

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2312.03863v3

GitHub: https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey|

中文摘要: 大型语言模型（LLMs）在自然语言理解、语言生成和复杂推理等重要任务中表现出卓越的能力，并有可能对我们的社会产生重大影响。然而，这种能力伴随着它们所需的大量资源，突出表明迫切需要开发有效的技术来应对它们的效率挑战。在这项调查中，我们提供了一个有效的LLMs研究系统和全面的审查。我们按照由三个主要类别组成的分类法组织文献，分别从以模型为中心、以数据为中心和以框架为中心的角度涵盖不同但相互关联的高效LLMs主题。我们还创建了一个GitHub存储库，在那里我们汇编了本次调查中的论文，网址为https：//github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey，并将积极维护该存储库，并在新的研究出现时纳入其中。我们希望我们的调查可以作为一个宝贵的资源，帮助研究人员和从业人员系统地了解高效有限责任管理的研究进展，并激励他们为这一重要而令人兴奋的领域做出贡献。

摘要: Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges.In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

标题: Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

作者: Jingbo Zhang, Xiaoyu Li, Ziyu Wan

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2305.11588v2

Project: https://eckertzhang.github.io/Text2NeRF.github.io/|

GitHub: https://github.com/eckertzhang/Text2NeRF|https://github.com/eckertzhang/Text2NeRF|

中文摘要: 文本驱动的3D场景生成广泛适用于对3D场景有大量需求的视频游戏、电影行业和元宇宙应用。然而，现有的文本到3D生成方法仅限于生成具有简单几何图形和缺乏真实感的梦幻风格的3D对象。在这项工作中，我们提出了Text2NeRF，它能够纯粹从文本提示中生成具有复杂几何结构和高保真纹理的各种3D场景。为此，我们采用NeRF作为3D表示，并利用预先训练的文本到图像扩散模型来约束NeRF的3D重建，以反映场景描述。具体来说，我们采用扩散模型来推断文本相关图像作为内容先验，并使用单目深度估计方法来提供几何先验。内容和几何先验都被用来更新NeRF模型。为了保证不同视图之间的纹理和几何一致性，我们引入了一种渐进的场景修复和更新策略来进行场景的新视图合成。我们的方法不需要额外的训练数据，只需要场景的自然语言描述作为输入。大量实验表明，我们的Text2NeRF在从各种自然语言提示生成照片般逼真、多视图一致和多样化的3D场景方面优于现有方法。我们的代码可在https：//github.com/eckertzhang/text 2 nerf。

摘要: Text-driven 3D scene generation is widely applicable to video gaming, film industry, and metaverse applications that have a large demand for 3D scenes. However, existing text-to-3D generation methods are limited to producing 3D objects with simple geometries and dreamlike styles that lack realism. In this work, we present Text2NeRF, which is able to generate a wide range of 3D scenes with complicated geometric structures and high-fidelity textures purely from a text prompt. To this end, we adopt NeRF as the 3D representation and leverage a pre-trained text-to-image diffusion model to constrain the 3D reconstruction of the NeRF to reflect the scene description. Specifically, we employ the diffusion model to infer the text-related image as the content prior and use a monocular depth estimation method to offer the geometric prior. Both content and geometric priors are utilized to update the NeRF model. To guarantee textured and geometric consistency between different views, we introduce a progressive scene inpainting and updating strategy for novel view synthesis of the scene. Our method requires no additional training data but only a natural language description of the scene as the input. Extensive experiments demonstrate that our Text2NeRF outperforms existing methods in producing photo-realistic, multi-view consistent, and diverse 3D scenes from a variety of natural language prompts. Our code is available at https://github.com/eckertzhang/Text2NeRF.

标题: KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

作者: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.18079v1

中文摘要: LLMs越来越多地用于需要大上下文窗口的应用程序，如文档分析和摘要，随着这些大上下文窗口的出现，KV缓存激活成为推理期间内存消耗的主要因素。量化是压缩KV高速缓存激活的一种有前途的方法；然而，现有的解决方案不能以超低精度（例如亚4位）精确地表示激活。在这项工作中，我们提出了KVQuant，它通过结合量化缓存KV激活的新方法来解决这个问题，包括：（i）每通道密钥量化，其中我们调整量化密钥激活的维度，以更好地匹配分布；（ii）预绳键量化，其中我们在旋转位置嵌入之前量化键激活，以减轻其对量化的影响；（iii）非均匀KV高速缓存量化，其中我们导出更好地表示分布的每层灵敏度加权非均匀数据类型；（iv）每个向量的密集和稀疏量化，其中我们分别隔离每个向量的离群值，以最小化量化范围中的偏斜；和（v）Q-Norm，其中我们归一化量化质心以减轻分布偏移，为2位量化提供额外的好处。通过将我们的方法应用于LLaMA、LLaMA-2和Mistral模型，我们在Wikitext-2和C4上实现了3位量化的 $< 0.1$ 困惑退化，优于现有的方法。我们的方法能够在单个A100-80GB GPU上为LLaMA-7B模型提供上下文长度高达100万的服务，在8 GPU系统上为上下文长度高达1000万的服务。

摘要: LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in ultra-low precisions, such as sub-4-bit. In this work, we present KVQuant, which addresses this problem by incorporating novel methods for quantizing cached KV activations, including: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges; and (v) Q-Norm, where we normalize quantization centroids in order to mitigate distribution shift, providing additional benefits for 2-bit quantization. By applying our method to the LLaMA, LLaMA-2, and Mistral models, we achieve $< 0.1$ perplexity degradation with 3-bit quantization on both Wikitext-2 and C4, outperforming existing approaches. Our method enables serving the LLaMA-7B model with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system.

== CLIP@ViT @ VLM @ visual model ==

标题: M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

作者: Xingning Dong, Zipeng Feng, Chunluan Zhou

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17797v1

GitHub: https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP|

中文摘要: 我们提出了一种多模态方法，用于将基于适应的预训练推进到有效和高效的零镜头视频文本检索，称为M2-RAAP。在CLIP等流行的图像——文本模型上，当前大多数基于自适应的视频——文本预训练方法面临三个主要问题，即数据语料库噪声大、预训练耗时长和性能增益有限。为此，我们进行了一项全面的研究，包括视频文本预培训的四个关键步骤。具体来说，我们研究了1）数据过滤和细化，2）视频输入类型选择，3）时间建模，以及4）视频特征增强。然后，我们将这项实证研究总结到M2-RAAP配方中，其中我们的技术贡献在于1）数据过滤和文本重写管道，产生1M高质量的双语视频——文本对，2）用关键帧替换视频输入以加速预训练，以及3）辅助字幕引导（ACG）策略以增强视频特征。我们通过在来自不同语言的两个精炼的视频——文本数据集上改编三个图像——文本基础模型进行了广泛的实验，验证了M2-RAAP对于基于适应的预训练的鲁棒性和可重复性。结果表明，M2-RAAP产生了优越的性能，显著减少了数据（-90%）和时间消耗（-95%）），在四个英文零镜头检索数据集和两个中文零镜头检索数据集上建立了一个新的SOTAd双语数据注释和代码库，可在https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP上获得。

摘要: We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP. Upon popular image-text models like CLIP, most current adaptation-based video-text pre-training methods are confronted by three major issues, i.e., noisy data corpus, time-consuming pre-training, and limited performance gain. Towards this end, we conduct a comprehensive study including four critical steps in video-text pre-training. Specifically, we investigate 1) data filtering and refinement, 2) video input type selection, 3) temporal modeling, and 4) video feature enhancement. We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features. We conduct extensive experiments by adapting three image-text foundation models on two refined video-text datasets from different languages, validating the robustness and reproducibility of M2-RAAP for adaptation-based pre-training. Results demonstrate that M2-RAAP yields superior performance with significantly reduced data (-90%) and time consumption (-95%), establishing a new SOTA on four English zero-shot retrieval datasets and two Chinese ones. We are preparing our refined bilingual data annotations and codebase, which will be available at https://github.com/alipay/Ant-Multi-Modal-Framework/tree/main/prj/M2_RAAP.

标题: Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data

作者: Chenhui Zhang, Sherrie Wang

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17600v1

Project: https://vleo.danielz.ch/|https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70|

中文摘要: 大型视觉语言模型在涉及自然语言指令视觉输入的复杂任务中表现出令人印象深刻的性能。然而，尚不清楚自然图像的能力在多大程度上转移到地球观测数据，地球观测数据主要是卫星和航空图像，在VLM训练数据中不太常见。在这项工作中，我们提出了一个全面的基准，通过评估VLM在场景理解、定位和计数以及变化检测任务方面的能力，来衡量VLM成为EO数据有用工具的进展。受现实世界应用的激励，我们的基准包括城市监控、救灾、土地使用和保护等场景。我们发现，尽管像GPT-4V这样最先进的VLM拥有广泛的世界知识，可以在位置理解和图像字幕等开放式任务中表现出色，但它们糟糕的空间推理限制了对象定位和计数任务的有用性。我们的基准测试将在https://vleo.danielz.ch/和https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70上公开，以便于模型评估。

摘要: Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions. However, it remains unclear to what extent capabilities on natural images transfer to Earth observation (EO) data, which are predominantly satellite and aerial images less common in VLM training data. In this work, we propose a comprehensive benchmark to gauge the progress of VLMs toward being useful tools for EO data by assessing their abilities on scene understanding, localization and counting, and change detection tasks. Motivated by real-world applications, our benchmark includes scenarios like urban monitoring, disaster relief, land use, and conservation. We discover that, although state-of-the-art VLMs like GPT-4V possess extensive world knowledge that leads to strong performance on open-ended tasks like location understanding and image captioning, their poor spatial reasoning limits usefulness on object localization and counting tasks. Our benchmark will be made publicly available at https://vleo.danielz.ch/ and on Hugging Face at https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70 for easy model evaluation.

标题: Calibrating Segmentation Networks with Margin-based Label Smoothing

作者: Balamurali Murugesan, Bingyuan Liu, Adrian Galdran

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2209.09641v2

GitHub: https://github.com/Bala93/MarginLoss|

中文摘要: 尽管深度神经网络推动的视觉识别任务取得了不可否认的进展，但最近有证据表明，这些模型校准不良，导致预测过于自信。训练期间最小化交叉熵损失的标准实践促进了预测的软最大概率来匹配单热标签分配。然而，这产生了正确类别的前软最大激活，其明显大于剩余的激活，这加剧了错误校准问题。来自分类文献的最新观察表明，嵌入预测熵的隐式或显式最大化的损失函数产生最先进的校准性能。尽管有这些发现，这些损失在校准医学图像分割网络的相关任务中的影响仍未被探索。在这项工作中，我们提供了当前最先进的校准损耗的统一约束优化观点。具体来说，这些损失可以被视为对logit距离施加等式约束的线性惩罚（或拉格朗日项）的近似。这指出了这种潜在等式约束的一个重要限制，其随后的梯度不断推向非信息解决方案，这可能会阻止在基于梯度的优化期间在判别性能和模型校准之间达到最佳折衷。根据我们的观察，我们提出了一个简单而灵活的基于不等式约束的推广，它对logit距离施加了一个可控的余量。在各种公共医学图像分割基准上的综合实验表明，我们的方法在网络校准方面在这些任务上设置了新的最先进的结果，同时区分性能也得到了改善。

摘要: Despite the undeniable progress in visual recognition tasks fueled by deep neural networks, there exists recent evidence showing that these models are poorly calibrated, resulting in over-confident predictions. The standard practices of minimizing the cross entropy loss during training promote the predicted softmax probabilities to match the one-hot label assignments. Nevertheless, this yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations, which exacerbates the miscalibration problem. Recent observations from the classification literature suggest that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. Despite these findings, the impact of these losses in the relevant task of calibrating medical image segmentation networks remains unexplored. In this work, we provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian term) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of public medical image segmentation benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, whereas the discriminative performance is also improved.

标题: GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

作者: Rafi Ibn Sultan, Chengyin Li, Hui Zhu

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2311.11319v2

GitHub: https://github.com/rafiibnsultan/GeoSAM/tree/main|

中文摘要: 当应用于自然图像分割时，分段任何模型（SAM）显示出令人印象深刻的性能。然而，它很难处理航空和卫星图像等地理图像，尤其是在分割道路、人行道和人行横道等移动基础设施时。这种较差的性能源于这些对象的狭窄特征，它们的纹理融入周围环境，以及来自树木、建筑物、车辆和行人等对象的干扰——所有这些都会使模型迷失方向，从而产生不准确的分割图。为了应对这些挑战，我们提出了地理SAM（GeoSAM），这是一种新的基于SAM的框架，它使用来自零镜头学习的密集视觉提示和来自预训练的CNN分割模型的稀疏视觉提示来实现微调策略。所提出的GeoSAM优于现有的地理图像分割方法，特别是道路基础设施、行人基础设施平均分别高出26%、7%和17%，代表了在利用基础模型分割移动基础设施（包括地理图像中的道路和行人基础设施）方面的重大飞跃。源代码可以在这个GitHub资源库中找到：https://github.com/rafiibnsultan/GeoSAM/tree/main。

摘要: The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images. The source code can be found on this GitHub repository: https://github.com/rafiibnsultan/GeoSAM/tree/main.

标题: Synchformer: Efficient Synchronization from Sparse Cues

作者: Vladimir Iashin, Weidi Xie, Esa Rahtu

PubTime: 2024-01-29

Downlink: http://arxiv.org/abs/2401.16423v1

Project: https://www.robots.ox.ac.uk/|

GitHub: https://github.com/v-iashin/Synchformer|

摘要: Our objective is audio-visual synchronization with a focus on ‘in-the-wild’ videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art performance in both dense and sparse settings. We also extend synchronization model training to AudioSet a million-scale ‘in-the-wild’ dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.

标题: CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting

作者: Jiezhi Yang, Khushi Desai, Charles Packer

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.18075v1

摘要: We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations, such as 2D ego-centric images. Our method maps an image to a distribution over plausible 3D latent scene configurations using a probabilistic encoder, and predicts the evolution of the hypothesized scenes through time. Our latent scene representation conditions a global Neural Radiance Field (NeRF) to represent a 3D scene model, which enables explainable predictions and straightforward downstream applications. This approach extends beyond previous neural rendering work by considering complex scenarios of uncertainty in environmental states and dynamics. We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations. Additionally, we auto-regressively predict latent scene representations as a partially observable Markov decision process, utilizing a mixture density network. We demonstrate the utility of our method in realistic scenarios using the CARLA driving simulator, where CARFF can be used to enable efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving visual occlusions.

== diffusion policy@diffusion formulation@diffusion model ==

标题: BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

作者: Zhennan Wu, Yang Li, Han Yan

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17053v2

Project: https://www.youtube.com/watch?v=PxIBtd6G0mA|

中文摘要: 我们展示了BlockFusion，这是一个基于扩散的模型，它将3D场景生成为单元块，并无缝地合并新块来扩展场景。使用从完整的3D场景网格中随机裁剪的3D块数据集来训练块融合。通过逐块拟合，所有训练块被转换成混合神经场：具有包含几何特征的三平面，随后是用于解码符号距离值的多层感知器（MLP）。采用变分自动编码器将三平面压缩到潜在的三平面空间中，并在该空间上进行去噪扩散过程。应用于潜在表示的扩散允许高质量和多样化的3D场景生成。为了在生成过程中扩展场景，只需要附加空块以与当前场景重叠，并外推现有的潜在三平面以填充新块。外推是通过在去噪迭代期间用来自重叠三平面的特征样本调节生成过程来完成的。潜在的三平面外推产生语义和几何上有意义的过渡，与现有场景和谐融合。2D布局调节机制用于控制场景元素的放置和排列。实验结果表明，BlockFusion能够在室内和室外场景中生成具有前所未有的高质量形状的多样、几何一致和无界的大型3D场景。

摘要: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.

标题: On Inference Stability for Diffusion Models

作者: Viet Nguyen, Giang Vu, Tung Nguyen Thanh

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2312.12431v2

GitHub: https://github.com/VinAIResearch/SA-DPM|

中文摘要: 去噪概率模型（DPM）代表了生成模型的新兴领域，在生成多样化和高质量的图像方面表现出色。然而，大多数当前的DPM训练方法经常忽略时间步长之间的相关性，限制了模型有效生成图像的性能。值得注意的是，我们从理论上指出，这个问题可能是由预测轨迹和实际轨迹之间的累积估计差距引起的。为了最小化这种差距，我们提出了一种新的\textit{sequency-aware}损失，旨在减少估计差距，以提高采样质量。此外，我们从理论上表明，与DPMs中的传统损耗相比，我们提出的损耗函数是估计损耗的更紧的上限。在包括CIFAR10、CelebA和CelebA-HQ在内的几个基准数据集上的实验结果一致显示，与几个DPM基线相比，我们提出的方法在通过FID和初始分数测量的图像综合质量方面有显著改进。我们的代码和预先训练过的检查点可以在\url{https：//github.com/vinaisearch/SA-DPM}上找到。

摘要: Denoising Probabilistic Models (DPMs) represent an emerging domain of generative models that excel in generating diverse and high-quality images. However, most current training methods for DPMs often neglect the correlation between timesteps, limiting the model’s performance in generating images effectively. Notably, we theoretically point out that this issue can be caused by the cumulative estimation gap between the predicted and the actual trajectory. To minimize that gap, we propose a novel \textit{sequence-aware} loss that aims to reduce the estimation gap to enhance the sampling quality. Furthermore, we theoretically show that our proposed loss function is a tighter upper bound of the estimation loss in comparison with the conventional loss in DPMs. Experimental results on several benchmark datasets including CIFAR10, CelebA, and CelebA-HQ consistently show a remarkable improvement of our proposed method regarding the image generalization quality measured by FID and Inception Score compared to several DPM baselines. Our code and pre-trained checkpoints are available at \url{https://github.com/VinAIResearch/SA-DPM}.

标题: Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

作者: Jingbo Zhang, Xiaoyu Li, Ziyu Wan

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2305.11588v2

Project: https://eckertzhang.github.io/Text2NeRF.github.io/|

GitHub: https://github.com/eckertzhang/Text2NeRF|https://github.com/eckertzhang/Text2NeRF|

标题: Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

作者: Zhongjie Duan, Chengyu Wang, Cen Chen

PubTime: 2024-01-29

Downlink: http://arxiv.org/abs/2401.16224v1

Project: https://ecnu-cilab.github.io/DiffutoonProjectPage/)|

中文摘要: 卡通着色是一种非真实感动画渲染任务。它的主要目的是渲染具有平面和风格化外观的对象。随着扩散模型上升到图像合成方法的前沿，本文深入研究了一种基于扩散模型的卡通阴影的创新形式，旨在将逼真的视频直接渲染成动漫风格。在视频风格化中，现存的方法遇到了持续的挑战，特别是在保持一致性和实现高视觉质量方面。在本文中，我们将卡通阴影问题建模为四个子问题：风格化、一致性增强、结构引导和彩色化。为了解决视频风格化的挑战，我们提出了一种有效的卡通着色方法，称为\textit{Diffutoon}。Diffutoon能够以动漫风格渲染非常详细、高分辨率和长时间的视频。它还可以通过一个附加分支根据提示编辑内容。通过定量度量和人体评估来评估Diffutoon的功效。值得注意的是，在我们的实验中，Diffutoon超越了开源和闭源基线方法。我们的工作伴随着Github上源代码和示例视频的发布（项目页面：https：//ecnu-cilab.github.io/DiffutoonProjectPage/）。

摘要: Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis methodologies, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, extant methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems: stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called \textit{Diffutoon}. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the content according to prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github (Project page: https://ecnu-cilab.github.io/DiffutoonProjectPage/).

标题: Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

作者: Daniel Geng, Andrew Owens

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.18085v1

中文摘要: 扩散模型能够根据文本描述生成令人印象深刻的图像，这些模型的扩展允许用户以相对粗略的比例编辑图像。然而，使用扩散模型精确编辑图像中对象的布局、位置、姿态和形状的能力仍然很困难。为此，我们提出了运动引导，一种零镜头技术，允许用户指定密集、复杂的运动场，指示图像中每个像素应该移动到哪里。运动制导的工作原理是通过现成的光流网络用梯度控制扩散采样过程。具体来说，我们设计了一种引导损失，它鼓励样本具有所需的运动，如流动网络所估计的，同时在视觉上也与源图像相似。通过同时从扩散模型采样并引导样本具有低引导损失，我们可以获得运动编辑图像。我们证明了我们的技术适用于复杂的运动，并对真实和生成的图像进行高质量的编辑。

摘要: Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout, position, pose, and shape of objects in images with diffusion models is still difficult. To this end, we propose motion guidance, a zero-shot technique that allows a user to specify dense, complex motion fields that indicate where each pixel in an image should move. Motion guidance works by steering the diffusion sampling process with the gradients through an off-the-shelf optical flow network. Specifically, we design a guidance loss that encourages the sample to have the desired motion, as estimated by a flow network, while also being visually similar to the source image. By simultaneously sampling from a diffusion model and guiding the sample to have low guidance loss, we can obtain a motion-edited image. We demonstrate that our technique works on complex motions and produces high quality edits of real and generated images.

标题: Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

作者: Zhipeng Bao, Yijun Li, Krishna Kumar Singh

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2312.06712v2

中文摘要: 尽管最近基于扩散的文本到图像（T2I）模型取得了重大进展，但当前系统仍然无法确保与文本提示一致的体面合成生成，特别是对于多对象生成。这项工作阐明了这种错位的根本原因，指出了与低注意力激活分数和面具重叠相关的问题。虽然以前的研究工作已经单独解决了这些问题，但我们认为整体方法是最重要的。因此，我们提出了两个新的目标，分离损失和增强损失，分别减少对象掩模重叠和最大化注意分数。我们的方法不同于传统的测试时间适应技术，专注于微调关键参数，这增强了可扩展性和普遍性。综合评估证明了我们的模型在图像真实感、文本——图像对齐和适应性方面的卓越性能，明显优于突出的基线。最终，本研究为T2I扩散模型铺平了道路，该模型具有增强的成分容量和更广泛的适用性。

摘要: Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. While previous research efforts have individually tackled these issues, we assert that a holistic approach is paramount. Thus, we propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability. Comprehensive evaluations demonstrate the superior performance of our model in terms of image realism, text-image alignment, and adaptability, notably outperforming prominent baselines. Ultimately, this research paves the way for T2I diffusion models with enhanced compositional capacities and broader applicability.

== Visual Navigation@VLN @ Visual Language Navigation ==

标题: SubPipe: A Submarine Pipeline Inspection Dataset for Segmentation and Visual-inertial Localization

作者: Olaya Álvarez-Tuñón, Luiza Ribeiro Marnet, László Antal

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2401.17907v1

GitHub: https://github.com/remaro-network/SubPipe-dataset|

中文摘要: 本文介绍了SubPipe，这是一个用于SLAM、对象检测和图像分割的水下数据集。SubPipe已经使用由OceanScan MST运营的\gls{LAUV}进行了记录，并携带了一套传感器，包括两个摄像机、一个侧扫声纳和一个惯性导航系统以及其他传感器。AUV已经部署在管道检查环境中，海底管道部分被沙子覆盖。AUV的姿态地面真实值由导航传感器估计。侧扫声纳和RGB图像分别包括目标检测和分割注释。最先进的分割、对象检测和SLAM方法在SubPipe上进行了基准测试，以展示数据集在利用计算机视觉算法方面的挑战和机遇。据作者所知，这是第一个带注释的水下数据集，提供了真实的管道检查场景。数据集和实验可在https：//github.com/remaro-network/SubPipe-dataset

摘要: This paper presents SubPipe, an underwater dataset for SLAM, object detection, and image segmentation. SubPipe has been recorded using a \gls{LAUV}, operated by OceanScan MST, and carrying a sensor suite including two cameras, a side-scan sonar, and an inertial navigation system, among other sensors. The AUV has been deployed in a pipeline inspection environment with a submarine pipe partially covered by sand. The AUV’s pose ground truth is estimated from the navigation sensors. The side-scan sonar and RGB images include object detection and segmentation annotations, respectively. State-of-the-art segmentation, object detection, and SLAM methods are benchmarked on SubPipe to demonstrate the dataset’s challenges and opportunities for leveraging computer vision algorithms. To the authors’ knowledge, this is the first annotated underwater dataset providing a real pipeline inspection scenario. The dataset and experiments are publicly available online at https://github.com/remaro-network/SubPipe-dataset

标题: Cognitive TransFuser: Semantics-guided Transformer-based Sensor Fusion for Improved Waypoint Prediction

作者: Hwan-Soo Choi, Jongoh Jeong, Young Hoo Cho

PubTime: 2024-01-31

Downlink: http://arxiv.org/abs/2308.02126v2

摘要: Sensor fusion approaches for intelligent self-driving agents remain key to driving scene understanding given visual global contexts acquired from input sensors. Specifically, for the local waypoint prediction task, single-modality networks are still limited by strong dependency on the sensitivity of the input sensor, and thus recent works therefore promote the use of multiple sensors in fusion in feature level in practice. While it is well known that multiple data modalities encourage mutual contextual exchange, it requires global 3D scene understanding in real-time with minimal computation upon deployment to practical driving scenarios, thereby placing greater significance on the training strategy given a limited number of practically usable sensors. In this light, we exploit carefully selected auxiliary tasks that are highly correlated with the target task of interest (e.g., traffic light recognition and semantic segmentation) by fusing auxiliary task features and also using auxiliary heads for waypoint prediction based on imitation learning. Our RGB-LIDAR-based multi-task feature fusion network, coined Cognitive TransFuser, augments and exceeds the baseline network by a significant margin for safer and more complete road navigation in the CARLA simulator. We validate the proposed network on the Town05 Short and Town05 Long Benchmark through extensive experiments, achieving up to 44.2 FPS real-time inference time.

标题: Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation

作者: Chanyoung Chung, Georgios Georgakis, Patrick Spieler

PubTime: 2024-01-30

Downlink: http://arxiv.org/abs/2401.17484v1

中文摘要: 了解远程地形拓扑对于越野机器人任务的成功至关重要，尤其是在高速导航时。激光雷达传感器目前严重依赖于几何测绘，当在更远的距离测绘时，提供稀疏的测量。为了应对这一挑战，我们提出了一种新的基于学习的方法，能够仅使用机载以自我为中心的图像实时预测远距离地形高程地图。我们提出的方法由三个主要元素组成。首先，引入了基于Transformer model的编码器，其学习以自我为中心的视图和先前鸟瞰高程地图预测之间的交叉视图关联。其次，提出了一种方向感知的位置编码方法，将复杂非结构化地形上的三维车辆姿态信息与多视图视觉图像特征相结合。最后，提出了一种历史增强的可学习地图嵌入，以实现高程地图预测之间更好的时间一致性，从而促进下游导航任务。我们使用真实世界的越野驾驶数据，通过实验验证了我们提出的方法在复杂和非结构化地形中自主越野机器人导航的适用性。此外，该方法与当前最先进的方法进行了定性和定量的比较。大量的现场实验表明，我们的方法在准确预测地形高程的同时有效地捕捉长期的整体地形拓扑方面优于基线模型。最后，进行消融研究，以突出和理解所提出的方法的关键组件的效果，并验证它们对提高越野机器人导航能力的适用性。

摘要: Understanding terrain topology at long-range is crucial for the success of off-road robotic missions, especially when navigating at high-speeds. LiDAR sensors, which are currently heavily relied upon for geometric mapping, provide sparse measurements when mapping at greater distances. To address this challenge, we present a novel learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time. Our proposed method is comprised of three main elements. First, a transformer-based encoder is introduced that learns cross-view associations between the egocentric views and prior bird-eye-view elevation map predictions. Second, an orientation-aware positional encoding is proposed to incorporate the 3D vehicle pose information over complex unstructured terrain with multi-view visual image features. Lastly, a history-augmented learn-able map embedding is proposed to achieve better temporal consistency between elevation map predictions to facilitate the downstream navigational tasks. We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain using real-world offroad driving data. Furthermore, the method is qualitatively and quantitatively compared against the current state-of-the-art methods. Extensive field experiments demonstrate that our method surpasses baseline models in accurately predicting terrain elevation while effectively capturing the overall terrain topology at long-ranges. Finally, ablation studies are conducted to highlight and understand the effect of key components of the proposed approach and validate their suitability to improve offroad robotic navigation capabilities.

标题: Regressing Transformers for Data-efficient Visual Place Recognition

作者: María Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov

PubTime: 2024-01-29

Downlink: http://arxiv.org/abs/2401.16304v1

中文摘要: 视觉位置识别是计算机视觉中的一项关键任务，尤其是对于定位和导航系统。现有的方法通常依赖于对比学习：图像描述符被训练成对于潜在空间中的相似图像具有较小的距离，而对于不相似的图像具有较大的距离。然而，这种方法难以确保准确的基于距离的图像相似性表示，特别是当使用二进制成对标签进行训练时，并且需要复杂的重新排序策略。这项工作引入了一个新的视角，将位置识别框定为一个回归问题，使用相机视场重叠作为学习的相似性基础事实。通过优化图像描述符以直接与分级相似性标签对齐，该方法增强了排序能力，而无需昂贵的重新排序，提供了数据高效的训练和跨多个基准数据集的强泛化。

摘要: Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.

专属领域论文订阅

关注{晓理紫|小李子}，每日更新论文，如感兴趣，请转发给有需要的同学，谢谢支持。谢谢提供建议

如果你感觉对你有所帮助，请关注我，每日准时为你推送最新论文

为了答谢各位网友的支持，从今日起免费为300名读者提供订阅主题论文服务，只需VX关注公号并回复{邮箱+论文主题}（如：[email protected] + chatgpt@large language model @LLM）,主题必须是同一个领域，最多三个关键词。解释权归博主所有

你可能感兴趣的:(每日论文,机器人,人工智能,大模型)

系统学习Python——并发模型和异步编程：进程、线程和GIL
分类目录：《系统学习Python》总目录在文章《并发模型和异步编程：基础知识》我们简单介绍了Python中的进程、线程和协程。本文就着重介绍Python中的进程、线程和GIL的关系。Python解释器的每个实例都是一个进程。使用multiprocessing或concurrent.futures库可以启动额外的Python进程。Python的subprocess库用于启动运行外部程序（不管使用何种
驱动程序为什么要做 WHQL 认证? GDCA SSL证书网络协议网络
驱动程序进行WHQL（WindowsHardwareQualityLabs）认证的核心价值在于解决兼容性、安全性和市场准入三大关键问题，具体必要性如下：️‌一、规避系统拦截，保障驱动可用性‌消除安装警告‌未认证的驱动在安装时会触发Windows的‌红色安全警告‌（如“无法验证发布者”），甚至被系统强制拦截。通过WHQL认证的驱动获得微软数字签名，用户可无阻安装‌。满足系统强制要求‌Windows1
C++ 11 Lambda表达式和min_element()与max_element()的使用_c++ lamda函数 min_element(
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以添加戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！intmain(){vectormyvec{3,
冒泡、选择、插入排序：三大基础排序算法深度解析（C语言实现） xienda 算法排序算法数据结构
在算法学习道路上，排序算法是每位程序员必须掌握的基石。本文将深入解析冒泡排序、选择排序和插入排序这三种基础排序算法，通过C语言代码实现和对比分析，帮助读者彻底理解它们的差异与应用场景。算法原理与代码实现1.冒泡排序（BubbleSort）工作原理：通过重复比较相邻元素，将较大元素逐步"冒泡"到数组末尾。voidbubbleSort(intarr[],intn){ for(inti=0;iarr[
深入剖析OpenJDK 18 GA源码：Java平台最新发展想法臃肿
本文还有配套的精品资源，点击获取简介：OpenJDK18GA作为Java开发的关键里程碑，提供了诸多新特性和改进。本文章深入探讨了OpenJDK18GA源码，揭示其内部机制，帮助开发者更好地理解和利用这个版本。文章还涵盖了PatternMatching、SealedClasses、Records、JEP395、JEP406和JEP407等特性，以及HotSpot虚拟机、编译器、垃圾收集器、内存模型
PyTorch & TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）阿牛的药铺算法移植部署 pytorch tensorflow fpga开发
PyTorch&TensorFlow速成复习：从基础语法到模型部署实战（附FPGA移植衔接）引言：为什么算法移植工程师必须掌握框架基础？针对光学类产品算法FPGA移植岗位需求（如可见光/红外图像处理），深度学习框架是算法落地的"桥梁"——既要用PyTorch/TensorFlow验证算法可行性，又要将训练好的模型（如CNN、目标检测）转换为FPGA可部署的格式（ONNX、TFLite）。本文采用"
ARM嵌入式可编程控制器技术开发拉勾科研工作室 arm开发
PLC自动化设计|毕业设计指导|工业自动化解决方案✨专业领域：PLC程序设计与调试工业自动化控制系统HMI人机界面开发工业传感器应用电气控制系统设计工业网络通信擅长工具：西门子S7系列PLC编程三菱/欧姆龙PLC应用触摸屏界面设计电气CAD制图工业现场总线技术自动化设备调试主要内容：PLC控制系统设计工业自动化方案规划电气原理图绘制控制程序编写与调试毕业论文指导毕业设计题目与程序设计✅具体问题可以
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
分布式学习笔记_04_复制模型 NzuCRAS 分布式学习笔记架构后端
常见复制模型使用复制的目的在分布式系统中，数据通常需要被分布在多台机器上，主要为了达到：拓展性：数据量因读写负载巨大，一台机器无法承载，数据分散在多台机器上仍然可以有效地进行负载均衡，达到灵活的横向拓展高容错&高可用：在分布式系统中单机故障是常态，在单机故障的情况下希望整体系统仍然能够正常工作，这时候就需要数据在多台机器上做冗余，在遇到单机故障时能够让其他机器接管统一的用户体验：如果系统客户端分布
C++ 设计模式：抽象工厂（Abstract Factory）冀晓武 C++设计模式 c++设计模式抽象工厂模式
链接：C++设计模式链接：C++设计模式-工厂方法链接：C++设计模式-原型模式链接：C++设计模式-建造者模式抽象工厂（AbstractFactory）是一种创建型设计模式，它提供一个接口，用于创建一系列相关或相互依赖的对象，而无需指定它们的具体类。抽象工厂模式通常用于创建一组相关的产品对象，例如不同类型的机器人和它们的配件。1.问题分析在某些情况下，我们需要创建一组相关或相互依赖的对象，但我们
霍夫变换（Hough Transform）算法原来详解和纯C++代码实现以及OpenCV中的使用示例点云SLAM 算法图形图像处理算法 opencv 图像处理与计算机视觉算法直线提取检测目标检测霍夫变换算法
霍夫变换（HoughTransform）是一种经典的图像处理与计算机视觉算法，广泛用于检测图像中的几何形状，例如直线、圆、椭圆等。其核心思想是将图像空间中的“点”映射到参数空间中的“曲线”，从而将形状检测问题转化为参数空间中的峰值检测问题。一、霍夫变换基本思想输入：边缘图像（如经过Canny边缘检测）输出：一组满足几何模型的形状（如直线、圆）关键思想：图像空间中的一个点→参数空间中的一个曲线参数空
npm proxy setting kjndppl [Node.js JavaScript npm https proxy password
清理npmconfigdeletehttp-proxynpmconfigdeletehttps-proxy具体设置步骤如下：1.执行npmconfig后，将看到下一行提示信息npmconfigls-ltoshowalldefaults.2.执行npmconfigls-l后，在一大长串的settign中找出userconfig项(大概位于倒数第4项)[b]userconfig[/b]="C:\\Us
计算机网络技术 CZZDg 计算机网络
目录一.网络概述1.网络的概念2.网络发展是3.网络的四要素4.网络功能5.网络类型6.网络协议与标准7.网络中常见的概念8.网络拓补结构二.网络模型1.分层思想2.OSI七层模型3.TCP/IP五层模型4.数据的封装与解封装过程三.IP地址1.进制转换2.IP地址定义3.IP地址组成成分4.IP地址分类5.地址划分6、相关概念一.网络概述1.网络的概念两个主机通过传输介质和通信协议实现通信和资源
ThinkSound V2版 - 一键给无声视频配音，为AI视频生成匹配音效支持50系显卡一键整合包下载昨日之日2006 ai语音音视频人工智能
ThinkSound是阿里通义实验室开源的首个音频生成模型，它能够让AI像专业“音效师”一样，根据视频内容生成高度逼真、与视觉内容完美契合的音频。ThinkSound可直接应用于影视后期制作，为AI生成的视频自动匹配精准的环境噪音与爆炸声效；服务于游戏开发领域，实时生成雨势变化等动态场景的自适应音效；同时可以无障碍视频生产，为视障用户同步生成画面描述与环境音效。今天分享的ThinkSoundV2版
OpenWebUI(12)源码学习-后端constants.py常量定义文件青苔猿猿 AI大模型 openwebui constants常量定义
目录文件名：`constants.py`功能概述：主要功能点详解1.**MESSAGES枚举类**2.**WEBHOOK_MESSAGES枚举类**3.**ERROR_MESSAGES枚举类**✅默认错误模板✅认证与用户相关错误✅资源冲突与重复错误✅验证失败类错误✅权限限制类错误✅文件上传与格式错误✅模型与API错误✅请求频率与安全限制✅数据库与配置错误4.**TASKS枚举类**✅总结实际应用场
RocketMQ 核心特性实战详解愤怒的代码 RocketMQ实战 rocketmq
RocketMQ核心特性实战详解本文基于RocketMQ4.x+rocketmq-spring-boot-starter2.3.1，从零搭建，逐步讲解RocketMQ11大核心特性，每一段代码都能直接跑。0.项目环境准备依赖引入在pom.xml文件添加：org.apache.rocketmqrocketmq-spring-boot-starter2.3.1配置文件application.ymlse
Kimi Chat 1.5 与 2.0 架构升级对比 charles666666 人工智能 transformer 深度学习产品经理 chatgpt
1.5版的MoE架构优化KimiChat1.5采用了优化后的MoE架构，其核心在于“专家网络动态路由”。这一机制类似于快递系统智能选择最优路径，能够根据输入数据的特性动态分配计算资源。这种优化显著提升了模型的计算效率，同时降低了硬件资源的浪费。在实际应用中，这意味着开发者可以在相同的硬件配置下处理更复杂的任务，或者在有限的资源下实现更高的性能。2.0的混合专家系统创新点与1.5版相比，KimiCh
数字孪生技术为UI前端注入新活力：实现产品设计的沉浸式体验 ui设计前端开发老司机 ui
hello宝子们...我们是艾斯视觉擅长ui设计、前端开发、数字孪生、大数据、三维建模、三维动画10年+经验!希望我的分享能帮助到您!如需帮助可以评论关注私信我们一起探讨!致敬感谢感恩!一、引言：从“平面交互”到“沉浸体验”的UI革命当用户在电商APP中翻看3D家具模型却无法感知其与自家客厅的匹配度，当设计师在2D屏幕上绘制汽车内饰却难以预判实际乘坐体验——传统UI设计的“平面化、静态化、割裂感”
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求可曾去过倒悬山 java 前端架构
Java三年经验程序员技术栈全景指南：从前端到架构，对标阿里美团全栈要求三年经验是Java程序员的分水岭，技术栈深度决定你成为“业务码农”还是“架构师候选人”。本文整合阿里、美团、滴滴等大厂招聘要求，为你绘制可落地的进阶路线。一、Java核心：从语法糖到JVM底层三年经验与初级的核心差异在于系统级理解，大厂面试常考以下能力：JVM与性能调优内存模型（堆外内存、元空间）、GC算法（G1/ZGC适用场
传统检测响应慢？陌讯多模态引擎提速90+FPS实战 2501_92473147 算法计算机视觉目标检测
开篇痛点：实时目标检测在安防监控中的核心挑战在安防监控领域，实时目标检测是保障公共安全的关键技术。然而，传统算法如YOLOv5或开源框架MMDetection常面临两大痛点：误报率高（复杂光照或遮挡场景下检测不稳定）和响应延迟（高分辨率视频流处理FPS低于30）。实测数据显示，城市交通监控系统误报率达15%，导致安保资源浪费；客户反馈表明，延迟超100ms时，目标跟踪可能失效。这些问题源于算法泛化
反光衣识别漏检率 30%？陌讯多尺度模型实测优化
在建筑工地、交通指挥等场景中，反光衣是保障作业人员安全的重要装备，对其进行精准识别是智能监控系统的核心功能之一。但传统视觉算法在实际应用中却屡屡碰壁：强光下反光衣易与背景混淆、远距离小目标漏检率高达30%、复杂场景下模型泛化能力不足[实测数据来源：某智慧工地项目2024年Q1日志]。这些问题直接导致安全监控系统预警滞后，给安全生产埋下隐患。一、技术解析：反光衣识别的核心难点与陌讯算法创新反光衣识别
AI音乐模拟器：AIGC时代的智能音乐创作革命 lauo 人工智能 AIGC 开源前端机器人
AI音乐模拟器：AIGC时代的智能音乐创作革命引言：AIGC浪潮下的音乐创作新范式在数字化转型的浪潮中，人工智能生成内容（AIGC）正在重塑各个创意领域。音乐产业作为创意经济的重要组成部分，正经历着前所未有的变革。据最新市场研究数据显示，全球AI音乐市场规模预计将从2023年的5.8亿美元增长到2030年的26.8亿美元，年复合增长率高达24.3%。这一快速增长的市场背后，是AI音乐技术正在打破传
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（Advanced RAG[1]）基于历史对话重新生成Query？ 985小水博一枚呀 AI大模型学习路线人工智能学习 langchain RAG
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Query？【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Query？文章目录【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Q
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（Advanced RAG[1]）其他Query优化相关策略？ 985小水博一枚呀 AI大模型学习路线人工智能学习 langchain
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？文章目录【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？一
盲超分的核心概念小冷爱读书数学建模盲超分超分重建
一、盲超分的本质与数学建模1.退化过程的数学表达低分辨率图像（LR）可看作高分辨率图像（HR）经过退化模型后的结果：：观测到的低分辨率图像：待恢复的高分辨率图像：模糊核（BlurKernel）⊗：卷积操作↓：下采样（步长为）：加性噪声（如高斯噪声、泊松噪声等）盲超分的核心问题：在未知、、的情况下，从估计。2.为什么传统超分方法会失效？传统方法（如SRCNN、EDSR）假设退化是固定的（如双三次下采
LangChain中的向量数据库接口－Weaviate 洪城叮当 langchain 数据库经验分享笔记交互人工智能知识图谱
文章目录前言一、原型定义二、代码解析1、add_texts方法1.1、应用样例2、from_texts方法2.1、应用样例3、similarity_search方法3.1、应用样例三、项目应用1、安装依赖2、引入依赖3、创建对象4、添加数据5、查询数据总结前言 Weaviate是一个开源的向量数据库，支持存储来自各类机器学习模型的数据对象和向量嵌入，并能无缝扩展至数十亿数据对象。它提供存储文档嵌
LLM中最后一个词语的表征（隐藏状态）通常会融合前面所有词语的信息吗？ ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 机器学习算法深度学习人工智能
LLM中最后一个词语的表征（隐藏状态）通常会融合前面所有词语的信息吗？在大语言模型（LLM）中，最后一个词语的表征（隐藏状态）通常会融合前面所有词语的信息，这是由LLM的核心架构（以Transformer为基础）决定的，具体可以从以下角度理解：1.核心机制：自注意力（Self-Attention）的作用现代LLM（如GPT系列、Qwen等）均基于Transformer架构，其核心是自注意力机制。在
深度学习模型表征提取全解析 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 深度学习人工智能 python embedding 语言模型
模型内部进行表征提取的方法在自然语言处理（NLP）中，“表征（Representation）”指将文本（词、短语、句子、文档等）转化为计算机可理解的数值形式（如向量、矩阵），核心目标是捕捉语言的语义、语法、上下文依赖等信息。自然语言表征技术可按“静态/动态”“有无上下文”“是否融入知识”等维度划分一、传统静态表征（无上下文，词级为主）这类方法为每个词分配固定向量，不考虑其在具体语境中的含义（无法解
AI Agent开发学习系列 - langchain之Chains的使用(7)：用四种处理文档的预制链轻松实现文档对话 alex100 AI Agent 学习人工智能 langchain prompt 语言模型 python
在LangChain中，四种文档处理预制链（stuff、refine、mapreduce、mapre-rank）是实现文档问答、摘要等任务的常用高阶工具。它们的核心作用是：将长文档切分为块，分步处理，再整合结果，极大提升大模型处理长文档的能力。stuff直接拼接所有文档内容到prompt，一次性交给大模型处理。适合文档较短、token不超限的场景。refine递进式摘要。先对第一块文档生成初步答案
LLM的表征做减法的是什么，自然语言是一个矩阵，怎么进行减法的 ZhangJiQun&MXP 教学 2024大模型以及算力 2021 AI python 计算机视觉人工智能机器学习算法深度学习
LLM的表征做减法的是什么，自然语言是一个矩阵，怎么进行减法的有个假设：就是最后一个词语融合了前面词语的信息减法操作主要用于提取模型内部表征中的"诚实性"概念向量。具体来说，这是通过对比诚实和不诚实场景下的模型隐藏状态实现的。importtorchfromtransformersimportAutoModelForCausalLM,AutoTokenizer,AutoConfigimportnum
辗转相处求最大公约数沐刃青蛟 C++漏洞
无言面对”江东父老“了，接触编程一年了，今天发现还不会辗转相除法求最大公约数。惭愧惭愧！为此，总结一下以方便日后忘了好查找。 1.输入要比较的两个数a,b 忽略：2.比较大小（因为后面要的是大的数对小的数做%操作） 3.辗转相除（用循环不停的取余，如a%b,直至b=0） 4.最后的a为两数的最大公约数 &
F5负载均衡会话保持技术及原理技术白皮书 bijian1013 F5 负载均衡
一.什么是会话保持？在大多数电子商务的应用系统或者需要进行用户身份认证的在线系统中，一个客户与服务器经常经过好几次的交互过程才能完成一笔交易或者是一个请求的完成。由于这几次交互过程是密切相关的，服务器在进行这些交互过程的某一个交互步骤时，往往需要了解上一次交互过程的处理结果，或者上几步的交互过程结果，服务器进行下
Object.equals方法：重载还是覆盖 Cwind java generics override overload
本文译自StackOverflow上对此问题的讨论。原问题链接在阅读Joshua Bloch的《Effective Java（第二版）》第8条“覆盖equals时请遵守通用约定”时对如下论述有疑问： “不要将equals声明中的Object对象替换为其他的类型。程序员编写出下面这样的equals方法并不鲜见，这会使程序员花上数个小时都搞不清它为什么不能正常工作：” pu
初始线程 15700786134
暑假学习的第一课是讲线程，任务是是界面上的一条线运动起来。既然是在界面上，那必定得先有一个界面，所以第一步就是，自己的类继承JAVA中的JFrame，在新建的类中写一个界面，代码如下： public class ShapeFr
Linux的tcpdump 被触发 tcpdump
用简单的话来定义tcpdump，就是：dump the traffic on a network，根据使用者的定义对网络上的数据包进行截获的包分析工具。 tcpdump可以将网络中传送的数据包的“头”完全截获下来提供分析。它支持针对网络层、协议、主机、网络或端口的过滤，并提供and、or、not等逻辑语句来帮助你去掉无用的信息。实用命令实例默认启动 tcpdump 普通情况下，直
安卓程序listview优化后还是卡顿肆无忌惮_ ListView
最近用eclipse开发一个安卓app，listview使用baseadapter，里面有一个ImageView和两个TextView。使用了Holder内部类进行优化了还是很卡顿。后来发现是图片资源的问题。把一张分辨率高的图片放在了drawable-mdpi文件夹下，当我在每个item中显示，他都要进行缩放，导致很卡顿。解决办法是把这个高分辨率图片放到drawable-xxhdpi下。 &nb
扩展easyUI tab控件，添加加载遮罩效果知了ing jquery
(function () { $.extend($.fn.tabs.methods, { //显示遮罩 loading: function (jq, msg) { return jq.each(function () { var panel = $(this).tabs(&
gradle上传jar到nexus 矮蛋蛋 gradle
原文地址： https://docs.gradle.org/current/userguide/maven_plugin.html configurations { deployerJars } dependencies { deployerJars "org.apache.maven.wagon
千万条数据外网导入数据库的解决方案。 alleni123 sql mysql
从某网上爬了数千万的数据，存在文本中。然后要导入mysql数据库。悲剧的是数据库和我存数据的服务器不在一个内网里面。。 ping了一下， 19ms的延迟。于是下面的代码是没用的。 ps = con.prepareStatement(sql); ps.setString(1, info.getYear())............; ps.exec
JAVA IO InputStreamReader和OutputStreamReader 百合不是茶 JAVA.io操作字符流
这是第三篇关于java.io的文章了，从开始对io的不了解-->熟悉--->模糊，是这几天来对文件操作中最大的感受，本来自己认为的熟悉了的，刚刚在回想起前面学的好像又不是很清晰了，模糊对我现在或许是最好的鼓励我会更加的去学加油！： JAVA的API提供了另外一种数据保存途径，使用字符流来保存的，字符流只能保存字符形式的流字节流和字符的难点：a,怎么将读到的数据
MO、MT解读 bijian1013 GSM
MO= Mobile originate，上行，即用户上发给SP的信息。MT= Mobile Terminate，下行，即SP端下发给用户的信息；上行:mo提交短信到短信中心下行:mt短信中心向特定的用户转发短信，你的短信是这样的，你所提交的短信，投递的地址是短信中心。短信中心收到你的短信后，存储转发，转发的时候就会根据你填写的接收方号码寻找路由，下发。在彩信领域是一样的道理。下行业务：由SP
五个JavaScript基础问题 bijian1013 JavaScript call apply this Hoisting
下面是五个关于前端相关的基础问题，但却很能体现JavaScript的基本功底。问题1：Scope作用范围考虑下面的代码： (function() { var a = b = 5; })(); console.log(b); 什么会被打印在控制台上？回答：上面的代码会打印 5。 &nbs
【Thrift二】Thrift Hello World bit1129 Hello world
本篇，不考虑细节问题和为什么，先照葫芦画瓢写一个Thrift版本的Hello World，了解Thrift RPC服务开发的基本流程 1. 在Intellij中创建一个Maven模块，加入对Thrift的依赖，同时还要加上slf4j依赖，如果不加slf4j依赖，在后面启动Thrift Server时会报错 <dependency>
【Avro一】Avro入门 bit1129 入门
本文的目的主要是总结下基于Avro Schema代码生成，然后进行序列化和反序列化开发的基本流程。需要指出的是，Avro并不要求一定得根据Schema文件生成代码，这对于动态类型语言很有用。 1. 添加Maven依赖 <?xml version="1.0" encoding="UTF-8"?> <proj
安装nginx+ngx_lua支持WAF防护功能 ronin47
需要的软件:LuaJIT-2.0.0.tar.gz nginx-1.4.4.tar.gz &nb
java-5.查找最小的K个元素-使用最大堆 bylijinnan java
import java.util.Arrays; import java.util.Random; public class MinKElement { /** * 5.最小的K个元素 * I would like to use MaxHeap. * using QuickSort is also OK */ public static void
TCP的TIME-WAIT bylijinnan socket
原文连接： http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html 以下为对原文的阅读笔记说明：主动关闭的一方称为local end，被动关闭的一方称为remote end 本地IP、本地端口、远端IP、远端端口这一“四元组”称为quadruplet，也称为socket 1、TIME_WA
jquery ajax 序列化表单 coder_xpf Jquery ajax 序列化
checkbox 如果不设定值，默认选中值为on；设定值之后，选中则为设定的值 <input type="checkbox" name="favor" id="favor" checked="checked"/> $("#favor&quo
Apache集群乱码和最高并发控制 cuisuqiang apache tomcat 并发集群乱码
都知道如果使用Http访问，那么在Connector中增加URIEncoding即可，其实使用AJP时也一样，增加useBodyEncodingForURI和URIEncoding即可。最大连接数也是一样的，增加maxThreads属性即可，如下，配置如下： <Connector maxThreads="300" port="8019" prot
websocket dalan_123 websocket
一、低延迟的客户端-服务器和服务器-客户端的连接很多时候所谓的http的请求、响应的模式，都是客户端加载一个网页，直到用户在进行下一次点击的时候，什么都不会发生。并且所有的http的通信都是客户端控制的，这时候就需要用户的互动或定期轮训的，以便从服务器端加载新的数据。通常采用的技术比如推送和comet（使用http长连接、无需安装浏览器安装插件的两种方式：基于ajax的长
菜鸟分析网络执法官 dcj3sjt126com 网络
最近在论坛上看到很多贴子在讨论网络执法官的问题。菜鸟我正好知道这回事情.人道"人之患好为人师" 手里忍不住,就写点东西吧. 我也很忙.又没有MM,又没有MONEY....晕倒有点跑题. OK,闲话少说,切如正题. 要了解网络执法官的原理. 就要先了解局域网的通信的原理. 前面我们看到了.在以太网上传输的都是具有以太网头的数据包.
Android相对布局属性全集 dcj3sjt126com android
RelativeLayout布局android:layout_marginTop="25dip" //顶部距离android:gravity="left" //空间布局位置android:layout_marginLeft="15dip //距离左边距 // 相对于给定ID控件android:layout_above 将该控件的底部置于给定ID的
Tomcat内存设置详解 eksliang jvm tomcat tomcat内存设置
Java内存溢出详解一、常见的Java内存溢出有以下三种： 1. java.lang.OutOfMemoryError: Java heap space ----JVM Heap（堆）溢出JVM在启动的时候会自动设置JVM Heap的值，其初始空间(即-Xms)是物理内存的1/64，最大空间(-Xmx)不可超过物理内存。可以利用JVM提
Java6 JVM参数选项 greatwqs java HotSpot jvm jvm参数 JVM Options
Java 6 JVM参数选项大全（中文版）作者：Ken Wu Email: [email protected] 转载本文档请注明原文链接 http://kenwublog.com/docs/java6-jvm-options-chinese-edition.htm！本文是基于最新的SUN官方文档Java SE 6 Hotspot VM Opt
weblogic创建JMC i5land weblogic jms
进入 weblogic控制太 1.创建持久化存储 --Services--Persistant Stores--new--Create FileStores--name随便起--target默认--Directory写入在本机建立的文件夹的路径--ok 2.创建JMS服务器 --Services--Messaging--JMS Servers--new--name随便起--Pers
基于 DHT 网络的磁力链接和BT种子的搜索引擎架构 justjavac DHT
上周开发了一个磁力链接和 BT 种子的搜索引擎 {Magnet & Torrent}，本文简单介绍一下主要的系统功能和用到的技术。系统包括几个独立的部分：使用 Python 的 Scrapy 框架开发的网络爬虫，用来爬取磁力链接和种子；使用 PHP CI 框架开发的简易网站；搜索引擎目前直接使用的 MySQL，将来可以考虑使
sql添加、删除表中的列 macroli sql
添加没有默认值：alter table Test add BazaarType char(1) 有默认值的添加列：alter table Test add BazaarType char(1) default(0) 删除没有默认值的列：alter table Test drop COLUMN BazaarType 删除有默认值的列：先删除约束（默认值）alter table Test DRO
PHP中二维数组的排序方法 abc123456789cba 排序二维数组 PHP
<?php/*** @package BugFree* @version $Id: FunctionsMain.inc.php,v 1.32 2005/09/24 11:38:37 wwccss Exp $*** Sort an two-dimension array by some level
hive优化之------控制hive任务中的map数和reduce数 superlxw1234 hive hive优化
一、控制hive任务中的map数: 1. 通常情况下，作业会通过input的目录产生一个或者多个map任务。主要的决定因素有： input的文件总个数，input的文件大小，集群设置的文件块大小(目前为128M, 可在hive中通过set dfs.block.size;命令查看到，该参数不能自定义修改)；2.
Spring Boot 1.2.4 发布 wiselyman spring boot
Spring Boot 1.2.4已于6.4日发布，repo.spring.io and Maven Central可以下载(推荐使用maven或者gradle构建下载)。这是一个维护版本，包含了一些修复small number of fixes,建议所有的用户升级。 Spring Boot 1.3的第一个里程碑版本将在几天后发布，包含许多