Yongqiang Cheng

On-Device Neural Net Inference with Mobile GPUs

Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann

1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA

Google Research tackles challenges that define the technology of today and tomorrow.
United States of America，U.S.A. or USA
United States，U.S. or US
California，CA：加利福尼亚州，加州
Mountain View is a city in Santa Clara County, California, United States.

Abstract

On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://www.tensorflow.org/lite.
最近，设备制造商正在将 neural processing units 添加到高端手机中以进行设备上推理，但这些只占手持设备的一小部分。

desirable [dɪ'zaɪrəb(ə)l]：adj. 想望的，可取的，值得拥有的，值得做的 n. 称心如意的人
ubiquitous [juːˈbɪkwɪtəs]：adj. 似乎无所不在的，十分普遍的
virtually [ˈvɜː(r)tʃʊəli]：adv. 几乎，实际上，事实上，虚拟

1 Introduction

On-device machine learning (ML) offers a variety of benefits. The most apparent is the improved inference latency: By skipping the data upload to the server and wait-time for the inference result, the app can respond more quickly to the user’s request. Removing the server dependency has additional benefits, such as:

Removing the need to maintain inference servers, (无需维护推理服务器，)
Running with limited or no connectivity, and (在连接受限或无连接的情况下运行，)
Reducing privacy concerns as the user data remains on the device. (减少隐私问题，因为用户数据保留在设备上。)

However, on-device ML is not trivial. Despite both recent advances in mobile hardware technology and efforts to efficiently run deep networks on mobile devices, mobile CPUs continue to be less powerful than those found in servers. Running deep net inference on a mobile device means adding a significant compute-intensive task to the CPU which competes with existing logic. Fully utilizing the mobile CPU comes with additional unwanted costs, e.g. increased energy consumption leads to shorter battery life and an increase in the phone’s thermal profile causes throttling resulting in slower computation.
尽管最近移动硬件技术取得了进步，并且努力在移动设备上高效运行深度网络，但移动 CPU 的能力仍然不如服务器中的 CPU。在移动设备上运行深度网络推理意味着向 CPU 添加重要的计算密集型任务，这会与 CPU 现有逻辑竞争。充分利用移动 CPU 会带来额外的不必要成本，例如增加能源消耗会导致电池寿命缩短，而手机热分布的增加会导致减速，从而导致计算速度变慢。

Dynamic frequency scaling (CPU throttling) is a power management technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted “on the fly” depending on the actual needs, to conserve power and reduce the amount of heat generated by the chip.
动态时钟频率调整 (CPU 节流) 是一个用来使微控制器的频率可以自动适应需要进行调节，从而让 CPU 降低功耗，降低发热的技术。

concern [kənˈsɜː(r)n]：n. 关心，忧虑，公司，企业 v. 涉及，影响，牵涉，与 ... 有关
trivial ['trɪviəl]：adj. 不重要的，琐碎的，微不足道的
throttle [ˈθrɒt(ə)l]：v. 抑制 (讨论，贸易等)，使窒息，(用节汽阀等) 调节，使节流 n. 风门，风门杆，气管，扼流圈
apparent [əˈpærənt]：adj. 显然，显而易见，明白易懂，貌似的
fraction ['frækʃ(ə)n]：n. 分数，小部分，小数，少量

Hardware accelerators such as the digital signal processors offer solutions to overcome these challenges. The demand for on-device ML has led to recent trends of phone manufacturers integrating dedicated neural processing units (NPUs) for high-end next-generation phones, which account for only a small fraction of the current distribution of mobile devices.
这些手机仅占当前移动设备分布的一小部分。

Our primary goal is a fast inference engine with wide coverage for TensorFlow Lite (TFLite) [8]. By leveraging the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, we can achieve real-time performance for various deep network models. Table 1 demonstrates that GPU has significantly more compute power than CPU.

Table 1: Example of available compute power on mobile in gigaflops (billion floating point instructions per second). FP16 and FP32 refer to 16- and 32-bit floating point arithmetic, respectively.

This paper presents the techniques we adopt for TFLite GPU and how we achieve an average acceleration of 2-9x for various deep networks on GPU compared to CPU inference. We first describe the general mobile GPU architecture and GPU programming, followed by how we materialize this with Compute Shaders for Android devices, with OpenGL ES 3.1+ [16] and Metal Shaders for iOS devices with iOS 9+ [1].
本文介绍了我们为 TFLite GPU 采用的技术，以及与 CPU 推理相比，我们如何在 GPU 上为各种深度网络实现 2-9 倍的平均加速。我们首先描述了一般的移动 GPU 架构和 GPU 编程，然后是我们如何使用适用于 Android 设备的 Compute Shaders、OpenGL ES 3.1+ [16] 和适用于 iOS 9+ [1] 的 iOS 设备的 Metal Shaders 来实现这一点。

materialize [məˈtɪəriəlaɪz]：v. 实现，发生，成为现实，突然显现

2 Related Work

Various research efforts from both academia and industry endeavor to bring deep neural networks inference previously limited to server, forward to mobile devices. Those efforts can be roughly categorized into three strategies:

Network architecture-driven,
Hardware-driven, and
ML framework-driven.

将以前仅限于服务器的深度神经网络推理推向移动设备。

endeavor [ɪnˈdevə(r)]：n. 努力 v. 努力

Neural network researchers have focused on optimizing their network architectures explicitly for processing on-device in various domains such as image classification [10, 21], object localization [11], and image enhancements [13, 14]. Many of these techniques involve reducing the model size by re-designing the network architecture and adding pre-/post-training quantization of weights. With these, one can achieve faster computation and smaller memory footprint, leading to reduced inference latency at the cost of slightly degraded model accuracy. MorphNet [9] takes a unique path of reducing the number of floating point operations per second which is optimized during training of the model. Our work is complementary to these efforts and instead focuses on optimizing the inference engine that runs the neural network rather than the model or training.
有了这些，人们可以实现更快的计算和更小的内存占用，从而以稍微降低模型精度为代价减少推理延迟。

Major hardware manufacturers have made architectural changes responding to demands for faster mobile inference, and are publishing software development kits (SDKs) to expose those: Arm Compute Library [4], Huawei HiAI SDK [12], MediaTek NeuroPilot SDK [17], and Qualcomm SNPE SDK [20]. These libraries are vendor-specific and either cannot be re-used on a different architecture or do not guarantee the expected performance boost on other platforms. Our work does not add new hardware or SDKs. Instead, we use well-established hardware, the mobile GPU, and well-supported graphics and compute standards as OpenGL [16] and Metal [1], to achieve high-performance neural network inference.
主要硬件制造商已针对更快的移动推理需求做出架构更改，并发布软件开发工具包 (SDK) 以公开这些：Arm Compute Library [4], Huawei HiAI SDK [12], MediaTek NeuroPilot SDK [17], and Qualcomm SNPE SDK [20]. 这些库是特定于供应商的，不能在不同的体系结构上重复使用，或者不能保证在其他平台上的预期性能提升。我们的工作不会添加新的硬件或 SDK。相反，我们使用成熟的硬件、移动 GPU 和支持良好的图形和计算标准，如 OpenGL [16] and Metal [1]，来实现高性能神经网络推理。

explicitly [ɪk'splɪsɪtli]：adv. 明确，显然，清楚地，直接地
footprint ['fʊt.prɪnt]：n. 足迹，脚印，面积，覆盖区

Apple presented the Metal Performance Shaders with support of convolutional neural networks [3] accelerated by GPU. This is a solution built on top of the Metal API and allows custom operations. Our approach is analogous to Apple’s on iOS devices. Apple also released CoreML [2], an end-to-end solution for inference on mobile devices using CPU, GPU, and NPU, if available.
这是一个构建在 Metal API 之上的解决方案，并允许自定义操作。我们的方法类似于 Apple 在 iOS 设备上的方法。Apple 还发布了 CoreML [2]，这是一种使用 CPU、GPU 和 NPU (如果可用) 在移动设备上进行推理的端到端解决方案。

Android introduced the Android Neural Networks API [7] that serves as a layer between hardware and higher-level ML frameworks that vendors must implement for Android 8.1 or later. Our work has wider coverage and does not depend on a specific Android version, or require vendors to implement individual APIs for deep network processing.

individual [.ɪndɪ'vɪdʒuəl]：n. 个人，与众不同的人，有个性的人，某种类型的人 adj. 单独的，个别的，一个人的，供一人用的

Some of the latest mobile-friendly ML frameworks are:

Caffe2 [6] which focuses on CPU inference and uses Arm Compute Library for Arm Mali GPUs.
MACE [24] which employs OpenCL which is not a part of standard Android OS.

TFLite GPU leverages the mobile GPU with OpenGL ES for Android devices and Metal for iOS devices. The specific version requirements are OpenGL ES 3.1+ and iOS 9+ which are available for more than 52% of all Android devices [23]. One of our biggest strength is that our framework employs open standards, i.e. is not limited by specific hardware vendor, and thus covers a wide range of devices.

employ [ɪm'plɔɪ]：v. 使用，雇用，运用，应用 n. 使用，雇用，服务，工作

3 General Architecture

This section explains the general architecture of TFLite GPU, consisting of an initialization phase followed by a model inference phase. The techniques in this section are independent of the architecture of the underlying GPU.
本节介绍 TFLite GPU 的一般架构，包括初始化阶段和模型推理阶段。本节中的技术独立于底层 GPU 的架构。

3.1 Initialization

TFLite provides APIs for the delegation of the execution of neural network sub-graphs to another library. We exploit this feature to integrate the GPU backend into TFLite. Given a neural net model, TFLite first checks whether it can execute all the operators in the model with our GPU delegate. Our GPU backend identifies supported operators, and TFLite then partitions the graph into several sub-graphs, substituting the sub-graphs with virtual “delegate nodes”. From that point, the GPU backend is responsible for executing this sub-graph, as depicted in Figure 1. Unsupported operators are by default computed by the CPU. Ideally, the whole graph would be compatible with our mobile GPU backend for maximum performance.
TFLite 提供 API，用于将神经网络子图的执行委托给另一个库。我们利用此功能将 GPU 后端集成到 TFLite 中。给定一个神经网络模型，TFLite 首先检查它是否可以使用我们的 GPU 代理执行模型中的所有运算符。我们的 GPU 后端识别支持的运算符，然后 TFLite 将图划分为几个子图，用虚拟“委托节点”替换子图。从那时起，GPU 后端负责执行此子图，如图 1 所示。不支持的运算符默认由 CPU 计算。理想情况下，整个图将与我们的移动 GPU 后端兼容，以实现最佳性能。

underlie [ˌʌndə(r)ˈlaɪ]：v. 构成 ... 的基础，作为 .. 的原因
explicitly [ɪk'splɪsɪtli]：adv. 明确，显然，清楚地，直接地
delegation [.delə'ɡeɪʃ(ə)n]：n. 代表团，委派，委托
substitute [ˈsʌbstɪˌtjuːt]：v. 取代，代替 n. 代用品，代替物，代替者，替补

Figure 1: TFLite’s delegate mechanism: Operations supported by the GPU delegate will run on the GPU, and the rest on the CPU.

As our mobile GPU inference engine is primarily designed for high-performance execution, we first inspect the model and resolve obvious inefficiencies. For example:

Merging pad as an option of another op where it was previously described separately.
Removing superfluous identity operations, e.g. resize with scale one or single input add/concat.

While these inefficiencies might be caught by the architect, artifacts such as these crop up inevitably, and we should still optimize these whenever possible.
虽然这些低效率可能会被架构师发现，但不可避免地会出现诸如此类的工件，我们仍然应该尽可能优化它们。

superfluous [suːˈpɜː(r)fluəs]：adj. 过剩的，过多的，多余的
architect [ˈɑː(r)kɪˌtekt]：n. 建筑师，设计师，缔造者，创造者
artifact [ˈɑː(r)tɪˌfækt]：n. 人工制品，(组织结构的) 人为现象
inevitably [ɪn'evɪtəbli]：adv. 必然，必定，免不了
crop [krɒp]：n. 庄稼，作物，产量，一批人 v. 剪短，裁切，啃吃，有收成

Note that, in contrast to CPU backends which work without initialization, GPU backends require initialization involving shader compilation and optimization by the driver before inference. The cost of this process depends on network size and may take from few milliseconds to seconds, but is incurred once and not again for subsequent runs until the cache memory is invalidated for any of reasons: application is updated or re-installed, device is rebooted, cache memory is over, or for other OS-specific reasons.

incur [ɪnˈkɜː(r)]：v. 招致，遭受，引起，引致
invalidate [ɪnˈvælɪdeɪt]：v. 证明 ... 错误，使站不住脚，使无效，使作废
subsequent ['sʌbsɪkwənt]：adj. 随后的，后来的，之后的，接后的
invalidate [ɪnˈvælɪdeɪt]：v. 证明 ... 错误，使站不住脚，使无效，使作废

3.2 Running Inference

The inference phase is fairly straightforward. The input tensors are reshaped to the PHWC4 format detailed later in Section 4, if their tensor shape has channel size not equal to 4. For each operator, shader programs are linked by binding resources such the operator’s input/output tensors, weights, etc. and dispatched, i.e. inserted into the command queue. The GPU driver then takes care of scheduling and executing all shader programs in the queue, and makes the result available to the CPU by the CPU/GPU synchronization. There might be a final conversion from PHWC4 to HWC, if the output tensor has a channel size not equal to 4.
推理阶段相当简单。如果输入张量的通道大小不等于 4，则输入张量将 reshape 为 PHWC4 格式，稍后将在第 4 节中详细介绍。对于每个运算符，着色器程序通过绑定资源 (such the operator’s input/output tensors, weights, etc.) 进行链接，并分派，即插入到命令队列中。然后 GPU 驱动程序负责调度和执行队列中的所有着色器程序，并通过 CPU/GPU 同步将结果提供给 CPU。如果输出张量的通道大小不等于 4，则可能存在从 PHWC4 到 HWC 的最终转换。

fairly [ˈfeə(r)li]：adv. 公正地，相当地，一定地，公平合理地
bind [baɪnd]：v. 约束，捆绑，系，装订 n. 窘境

Figure 2: Example of PHWC4 memory layout (best viewed in color). A tensor of shape $H{=}8, W{=}6, C{=}12)$ is split into 4-element slices of size $(H, W, 4)$ which are stored sequentially as a continuous 2D array of size $HC/4{=}24,4W{=}24)$ .

H * (C / 4) = HC/4 = 8 * (12 / 4) = 24
4 * W = 4W = 4 * 8 = 24

For maximum performance, one should avoid CPU/GPU synchronization at all cost, and preferably, never leave GPU context if real-time processing is needed. The most ideal scenario would be the following: A camera provides with RGBA texture that goes directly to TFLite GPU and the output of the network is then directly rendered to the screen.
为了获得最佳性能，应该不惜一切代价避免 CPU/GPU 同步，并且最好不要在需要实时处理时离开 GPU 上下文。最理想的场景如下：相机提供直接进入 TFLite GPU 的 RGBA texture，然后网络的输出直接渲染到屏幕。

preferably ['pref(ə)rəbli]：adv. 最好，宁可，更可取地
render [ˈrendə(r)]：v. 提供，给予，提交，翻译 n. 粉刷，初涂，抹灰，精制油

Shader Program Optimization
In the GPU inference engine, operators exist in the form of shader programs. The shader programs eventually get compiled and inserted into the command queue and the GPU executes programs from this queue without synchronization with the CPU.
在 GPU 推理引擎中，算子以着色器程序的形式存在。着色器程序最终会被编译并插入到命令队列中，GPU 会在不与 CPU 同步的情况下执行该队列中的程序。

To reduce the number of shader programs in the command queue, we consolidate them into meaningful aggregates while maximizing parallelism and well-defined data dependencies.
为了减少命令队列中着色器程序的数量，我们将它们合并为有意义的聚合，同时最大化并行性和明确定义的数据依赖性。

The following techniques are employed when generating the source code for the shader programs:

Fusing element-wise operators with computationally expensive operators, e.g. activations with convolution, to reduce the number of shader programs.
In-lining parameters and small objects directly into the shader program to reduce memory I/O overhead.
Baking uniforms into the source code, instead of passing them in the run-time, allowing drivers to produce more optimal code.
Creating specialized version of shaders, like “convolution with kernel size”, to manually optimize shaders for particular cases.
Implementing specialization of shader programs optimized for a certain architecture to improve the op’s performance on the said environment.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function.
内联展开 (内联) 是一种将函数体直接展开到调用处的一种优化技术。它可以由手工指定 (inline 关键字)，或者经由编译优化自动完成。内联展开类似于宏展开，区别在于内联展开在编译时完成，而宏展开则可能在预编译、编译时、运行时时完成。内联是一种重要的优化技术。内联的好处主要在于消除函数的调用开销 (压栈，保护/恢复现场)，但内联展开可能导致生成的代码体积膨胀，并且影响指令缓存的命中率。函数内联展开在缓存小的时候能提升性能，缓存较大的时候性能有可能下降。

eventually [ɪ'ventʃuəli]：adv. 最后，终于
consolidate [kənˈsɒlɪdeɪt]：v. 合并，使加强，使巩固
aggregate [ˈæɡrɪɡeɪt]：n. 骨料，合计，总数 v. 合计，总计 adj. 总数的，总计的
particular [pə(r)ˈtɪkjʊlə(r)]：adj. 讲究，挑剔，专指的，不寻常的 n. 细节，详细资料，详细介绍材料
said [sed]：adj. 上述的 v. say 的过去式和过去分词
bake [beɪk]：n. 烤，烘烤的成品，烧烤会餐 v. 焙，烘烤，烤硬，灼热

After the source code for each program is generated, each shader gets compiled. This compilation step can take a while, from several milliseconds to seconds. Typically, app developers can hide this latency while loading the model or starting the app for the first time. Once all shader programs are compiled, the GPU backend is ready for inference.
此编译步骤可能需要一段时间，从几毫秒到几秒不等。通常，应用程序开发人员可以在加载模型或首次启动应用程序时隐藏此延迟。编译完所有着色器程序后，GPU 后端就可以进行推理

4 Data Layout

Most modern GPUs use a homogeneous coordinate [18] system which represents points in space with coordinates $(x, y, z, w)$ . A homogeneous coordinate $(x, y, z, w)$ , where $w{\neq}0$ , represents a point $(x / w, y / w, z / w, 1)$ in a 3D space. This allows affine transformations and projective transformations to be represented in the form of 4D matrix multiplications. GPUs are essentially processors optimized for 4-element vector compute and load/store operations.

While TFLite does not restrict tensors to a certain shape, many operators assume 4D input/output tensors shaped as where , , , respectively represent batch size, height, width, and channel size. For convenience, the rest of the paper will mostly describe tensors assuming a batch size of , or for short. This simplified example can be generalized if we consider batches to be a concatenation of multiple tensors.

In TFLite GPU, a tensor is split into 4-channel slices which are stored sequentially in memory. If the number of channels is not divisible by , it is padded with zeroes. This memory layout, called PHWC4 (Figure 2), optimally reduces cache misses in the graphics architecture. This is tightly coupled with how compute threads are executed on the GPU, which defines the order of computation, and more importantly, the order of memory load instructions.

References

https://yongqiang.blog.csdn.net/
https://arxiv.org/abs/1907.01989
https://ar5iv.org/abs/1907.01989
https://en.wikipedia.org/wiki/ISO_3166-2:US

你可能感兴趣的:(TensorFlow,-,TensorFlow,Lite,On-Device,Neural,Net,Inference,Mobile,GPUs)

Nmap --- Ingreslock后门漏洞唯师默蓝
目的：利用telnet连接目标主机的1524端口，直接获取root权限；原理：Ingreslock后门程序运行在1524端口，连接到1524端口就能直接获得root权限，经常用于入侵一个暴露的服务器；步骤：nmap-sV192.168.1.3,扫描目标主机端口，发现目标主机开启了1524端口；telnet192.168.1.31524连接目标主机并直接成功，在渗透进的主机中，输入whoami,查看
10分钟搞定 MinIO 单节点多磁盘部署！打造稳定高可用对象存储【二】
MinIO是一个**高性能、开源的对象存储系统**，主要用于存储非结构化数据（如图片、视频、文档、备份等），与AmazonS3完全兼容。它被广泛用于云原生应用、大数据分析、AI模型存储、容器平台（如Kubernetes）等场景。MinIO支持多种部署模式，其中：单节点单磁盘（Single-NodeSingle-Drive）模式适用于开发测试、小规模应用或资源受限的场景。它的部署简单，不依赖集群、分
OEC 刷机Armbain 25.05后配置说明
DHCP调整为固定IPnano/etc/netplan/00-default-use-network-manager.yaml#AddedbyArmbian##Reference:https://netplan.readthedocs.io/en/stable/netplan-yaml/##LetNetworkManagermanagealldevicesonthissystem.#Anydevi
macOS 上安装 Kubernetes（k8s）老兵发新帖 macos kubernetes 容器
在macOS上安装Kubernetes（k8s）主要有三种主流方案，以下根据安装复杂度、资源占用和适用场景分类说明，并附详细步骤：⚙️一、推荐方案：Minikube（单节点本地集群）适用场景：学习、开发测试、资源有限（需2-4GB内存）。安装步骤：安装依赖工具安装DockerDesktop（推荐）或VirtualBox：brewinstall--caskdocker或brewinstallvirt
深度学习图像分类数据集—百种病虫害分类 AI街潜水的八角深度学习图像数据集深度学习分类人工智能
该数据集为图像分类数据集，适用于ResNet、VGG等卷积神经网络，SENet、CBAM等注意力机制相关算法，VisionTransformer等Transformer相关算法。数据集信息介绍：百种病虫害识别分类，训练集45095张，验证集7508张，测试集22619张具体类别为以下：insect_classes=["rice_leaf_roller","rice_leaf_caterpillar
基于小样本学习的图像分类综述 cdyyyyyyy 学习分类机器学习
目录引言基本概念小样本学习方法分类1、数据增强2、迁移学习3、元学习小样本学习主流方法1、基于度量的小样本学习2、基于Pretraining+FineTuning的方法3、基于元学习的小样本学习总结引言因为课程设计要求，所以进行了关于小样本学习的调研。目前小样本学习还是一个比较热门的研究，很多关于小样本学习的论文也陆续发表。本文只是一个概述，具体方法研究还有待深入。基本概念小样本学习（FSL：Fe
ARM汇编指令黑刀夜语言 arm 嵌入式
ARM处理器的指令集可以分为6大类跳转指令数据处理指令程序状态寄存器（PSR）处理指令加载/存储指令协处理器指令异常产生指令ARM汇编伪指令参看：https://blog.csdn.net/chengbaojin/article/details/109459693一跳转指令跳转指令用于实现程序流程的跳转，在ARM程序中有以下两种方法可以实现程序流程的跳转：使用专门的跳转指令；直接向程序计数器PC写
如何区分Bug是前端问题还是后端问题？海姐软件测试缺陷管理 bug 前端
在软件测试中，精准定位Bug的归属（前端or后端）是高效协作的关键。以下是系统化的排查方法，结合技术细节和实战技巧：1.核心判断逻辑「数据vs展示」二分法：后端问题：数据本身错误（API返回错误数据/逻辑错误/数据库问题）前端问题：数据正确但展示异常（UI渲染错误/交互逻辑问题）2.四步定位法第一步：抓包分析（必做）工具：ChromeDevTools>Network/Fiddler/Charles
前端Vue自定义顶部搜索框热门搜索历史搜索用于搜索跳转使用前端组件分享
前端Vue自定义顶部搜索框热门搜索历史搜索用于搜索跳转使用，下载完整代码请访问uni-app插件市场地址：https://ext.dcloud.net.cn/plugin?id=13128效果图如下：####自定义顶部搜索框用于搜索跳转使用方法```使用方法```####HTML代码实现部分```htmlimportCCBProjectListfrom'../../components/CCPro
IPv4 详解：从报头结构到数据传输全解析
一、引言IPv4（InternetProtocolversion4）是互联网协议族中的核心协议，也是目前全球使用最广泛的网络层协议。作为互联网的"交通规则"，IPv4定义了数据包在网络中的传输方式，负责将数据从源主机路由到目的主机，无论中间经过多少网络设备。IPv4于1981年在RFC791中正式定义，虽然已经存在超过40年，但仍然是现代互联网的基础。随着IPv6的逐渐普及，IPv4依然占据主导地
CICS Application Programming Fundamentals 第8-9章
9.VerifyingUserCredentials//JC$CRTQSJOB,'CREATEQSAM',CLASS=C,MSGCLASS=S,MSGLEVEL=(1,1),JOB00039//REGION=4096K,TIME=1440,COND=((4,LT)),NOTIFY=&SYSUID//*//*!!!PLSDOUBLECHECKANDREMARKTHISLINETOSUBMIT//*/
YAML基础使用教程（单引号和双引号的区别）奔跑吧邓邓子工具使用 YAML 单引号双引号
提示：“奔跑吧邓邓子”的高效运维专栏聚焦于各类运维场景中的实际操作与问题解决。内容涵盖服务器硬件（如IBMSystem3650M5）、云服务平台（如腾讯云、华为云）、服务器软件（如Nginx、Apache、GitLab、Redis、Elasticsearch、Kubernetes、Docker等）、开发工具（如Git、HBuilder）以及网络安全（如挖矿病毒排查、SSL证书配置）等多个方面。无论
408考研逐题详解：2010年第35题——RIP协议
2010年第35题某自治系统内采用RIP协议，若该自治系统内的路由器R1收到其邻居路由器R2的距离矢量，距离矢量中包含信息，则能得出的结论是（）A.R2可以经过R1到达net1，跳数为17B.R2可以到达net1，跳数为16C.R1可以经过R2到达net1，跳数为17D.R1不能经过R2到达net1解析本题主要考查RIP（RoutingInformationProtocol）协议，相关内容如下：R
2024年1月15日学习记录——有关resnet18的简单再实现 BARBERUM 学习深度学习人工智能
2024年1月15日学习记录1.有关resnet18重写并训练的任务resnet本意为resdualnet，就是残差神经网络，利用shortcut的连接方式，将特征层隔层连接，在保留原有特征的同时进行深层卷积。可以有效的解决因神经网络层数的叠加而导致的退化问题。根据以下的逻辑图实现:首先图片作为输入，格式为[3,32,32]经过一个7*7的卷积核和一个最大池化层后进入残差结构层第一级残差结构层为两
Secs/Gem第十二讲(基于secs4net项目的ChatGpt介绍)
好，那我们进入最关键的一讲——第十二讲：完整事件通知流程全景图——CEID触发到主机接收的全过程关键词：CEID事件上报、S6F11报文、事件触发流程、数据驱动机制、ReportDispatch、主机解析流程本讲目标你将彻底理解：设备是如何触发一个事件上报的？报文（S6F11）结构是怎么设计的？主机是怎么解析报文、提取变量、派发处理？报文中包含的信息是怎么匹配你之前定义的CEID/RPTID/VI
关于 SECS4NET 专栏的几点说明(内附资源) 好学近乎知o secs/gem secs4net
关于SECS4NET专栏的几点说明根据很多小伙伴在评论区的留言，我总结了几个反馈点：✅常见问题反馈部分章节讲解存在个别错误关于资源来源、项目版本的问题更新速度偏慢，期待能加快节奏简单说明一下：我是一个没有感情的复制粘贴机器，发布这些作品最初只是为了自己闲来学习、顺便看着玩。起初我以为这种纯纯的复制粘贴内容，甚至连开头和结尾都带着ChatGPT的沟通痕迹，肯定不会有人感兴趣。但没想到，发布之后阅读量
Secs/Gem第二讲 (基于secs4net项目的ChatGpt介绍)
好的，我们正式进入：第二讲：深入SECS4NET项目结构——主机程序是怎么搭起来的？关键词：项目结构、类图、通信类、事件处理、连接生命周期、异步机制本讲目的我们从源码入手，一步步搞懂：SECS4NET主机（Host）是如何设计通信逻辑的有哪些关键类，类之间的关系是什么通信的生命周期怎么管理怎么实现“接收消息”和“主动发送”的功能如何集成到你自己的EAP或测试程序中你将不再只是“调Sample”，而
Secs/Gem第一讲(基于secs4net项目的ChatGpt介绍) 好学近乎知o c#secs/gem
后续内容为基于github上secs4net项目源码的ChatGpt介绍以该项目为主，从零开始介绍讲解secs/gem，更多的以面试口吻讲述形式。主要为个人学习，提升使用第一讲：SECS/GEM协议是个什么东西？第1段：SECS/GEM是谁？它在哪些场合出现？️口述稿（你面试时可以这样说）：SECS/GEM协议是半导体行业的通信标准，它解决的是“设备”和“主机系统”之间如何说话、怎么互相理解命令和
对话谷歌前 CEO Eric Schmidt：数字超智能将在十年内到来，AI 将创造更多更高薪的工作 AI科技大本营人工智能
责编|王启隆出品|CSDN（ID：CSDNnews）投稿或寻求报道|[email protected]科技巨擘、谷歌前CEOEricSchmidt最近做客PeterDiamandis的Moonshots播客，与主持人PeterDiamandis及DaveLondon展开了一场关于人工智能未来的深度对话。全世界都在为AI的飞速发展感到兴奋又焦虑时，这位曾经执掌谷歌帝国长达十年、亲眼见证并推动了这场技术
FFMPEG 解码流程硬解码 ImTryCatchException ffmpeg 音视频
关键流程概述初始化阶段‌av_register_all()：注册所有编解码器新版本可以不调用了avformat_network_init():网络初始化avformat_alloc_context()：创建封装格式上下文avformat_open_input()：打开媒体文件流信息解析‌avformat_find_stream_info()：获取流信息av_find_best_stream()：查
Spring Boot 和 Netty Hao4K影音 spring boot 后端 java spring
SpringBoot和Netty是两个强大的工具，它们各自有不同的用途和优势，但可以结合使用来构建高性能的网络应用。下面将详细介绍SpringBoot和Netty，以及它们如何结合使用。SpringBoot简介SpringBoot是一个开源框架，基于Spring框架，用于简化Spring应用的开发过程。它通过以下方式简化开发：自动配置：根据类路径中的依赖自动配置Spring应用程序。开箱即用的设置
netty的编解码器，以及内置的编解码器程序员阿明 java spring boot
一、编码器和解码器1、什么是编码和解码解码常用于入站操作，将字节转换为消息。编码用于出站，将消息转换为字节流2、解码器ByteToMessageDecoder和ReplayingDecoder，ReplayingDecoder扩展了ByteToMessageDecoder类，使得我们不必使用readableBytes()方法，下面是两类测试代码publicclassToIntegerDecoder
Netty技术全解析：MessageToMessageDecoder类深度解析码到三十五 netty解析 java go 微服务
❃博主首页：「码到三十五」，同名公众号:「码到三十五」，wx号:「liwu0213」☠博主专栏：♝博主的话：搬的每块砖，皆为峰峦之基；公众号搜索「码到三十五」关注这个爱发技术干货的coder，一起筑基在Netty这个高性能的网络编程框架中，MessageToMessageDecoder类是一个关键的组件，它主要用于处理基于消息的解码。与直接处理字节流的解码器不同，MessageToMessageD
springboot集成Netty 墨_风开发笔记 spring boot java netty
一、Netty介绍Netty的系列文章，正在更新中二、Netty集成io.nettynetty-all2.1、配置文件#netty配置netty:boss:1#boss线程数量默认为cpu线程数*2worker:4#worker线程数量默认为cpu线程数*2timeout:6000#连接超时时间默认为30sport:7000#服务器主端口默认7000portSalve:7001#服务器备用端口默认
R语言使用glmnet包拟合lasso-cox回归模型（包含生存时间和结果标签）、使用lasso-cox模型进行特征筛选、使用sapply函数对特征数据进行标准化z-score statistics.insight R语言入门课机器学习人工智能 r语言数据挖掘数据分析
R语言使用glmnet包拟合lasso-cox回归模型（包含生存时间和结果标签）、使用lasso-cox模型进行特征筛选、使用sapply函数对特征数据进行标准化z-score目录R语言使用glmnet包拟合lasso-cox回归模型（包含生存时间和结果标签）、使用lasso-cox模型进行特征筛选、使用sapply函数对特征数据进行标准化z-score分类模型（classification）决策
Kubernetes存储入门付出不多 kubernetes 容器云原生
目录一，Kubernetes存储概念1，volume的概念2，volume的类型二，配置volume存储1，通过emprydir共享数据2，使用hostpath挂载宿主机文件3，使用nfs挂载至容器三，配置pv持久卷1，pv回收策略2，pv访问策略3，pv的配置方式4，PersistentVolumeclaim(Pvc，持久卷声明)5，创建基于hostpath的pv6，创建基于nfs的pv一，Ku
【前端】ikun-qrcode：极简的二维码生成组件，使用view而非canvas避免层级问题青春狗头少年不会梦到格温学姐前端
文章目录背景ikun-qrcode界面效果如何发布一款自己的插件到uniapp市场。（5分钟搞定）背景之前在uniapp上100行搞定二维码生成，现在封装为vue组件分享出来：下载地址：https://ext.dcloud.net.cn/plugin?id=19351ikun-qrcode使用基础的view渲染二维码，需要给定宽高和单位和数据：uniapp默认整体宽度是750rpx。界面效果如何发
HBase监控也想洒脱 JanusGraph hbase
Prometheus+Grafana搭建HBase监控参考https://blog.csdn.net/devcloud/article/details/115069449
MobaXterm11.1：全面的Linux/Unix远程管理工具包芥子纳须弥1116
本文还有配套的精品资源，点击获取简介：MobaXterm是为Windows用户设计的全能型Linux/Unix服务器远程管理工具，提供包括SSH客户端、多会话管理、FTP/SFTP客户端、X11转发、Telnet/Rlogin支持和端口转发在内的多种功能。它简化了远程登录和文件传输等任务，同时与SecureCRT相比，具有更直观的界面、更高的集成性和免费版本的优势。适用于系统运维、开发测试、教育科
大数据领域如何用好 Eureka 实现服务治理大数据洞察大数据 eureka 云原生 ai
大数据领域Eureka服务治理实践：架构适配与最佳实践元数据框架标题大数据领域Eureka服务治理实践：架构适配、实现机制与最佳实践关键词Eureka；服务治理；大数据分布式系统；服务发现；负载均衡；故障恢复；云原生适配摘要Eureka作为Netflix开源的AP型服务发现组件，以其高可用性、动态适配性和轻量级特性，成为微服务架构的核心工具。然而，大数据领域的超大规模分布式、高并发数据流动、动态资
jdk tomcat 环境变量配置 Array_06 java jdk tomcat
Win7 下如何配置java环境变量 1。准备jdk包，win7系统，tomcat安装包（均上网下载即可） 2。进行对jdk的安装，尽量为默认路径（但要记住啊！！以防以后配置用。。。） 3。分别配置高级环境变量。电脑-->右击属性-->高级环境变量-->环境变量。分别配置 : path &nbs
Spring调SDK包报java.lang.NoSuchFieldError错误 bijian1013 java spring
在工作中调另一个系统的SDK包，出现如下java.lang.NoSuchFieldError错误。 org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.l
LeetCode[位运算] - #136 数组中的单一数 Cwind java 题解位运算 LeetCode Algorithm
原题链接：#136 Single Number 要求：给定一个整型数组，其中除了一个元素之外，每个元素都出现两次。找出这个元素注意：算法的时间复杂度应为O(n)，最好不使用额外的内存空间难度：中等分析：题目限定了线性的时间复杂度，同时不使用额外的空间，即要求只遍历数组一遍得出结果。由于异或运算 n XOR n = 0, n XOR 0 = n，故将数组中的每个元素进
qq登陆界面开发 15700786134 qq
今天我们来开发一个qq登陆界面，首先写一个界面程序，一个界面首先是一个Frame对象，即是一个窗体。然后在这个窗体上放置其他组件。代码如下： public class First { public void initul(){ jf=ne
Linux的程序包管理器RPM 被触发 linux
在早期我们使用源代码的方式来安装软件时，都需要先把源程序代码编译成可执行的二进制安装程序，然后进行安装。这就意味着每次安装软件都需要经过预处理-->编译-->汇编-->链接-->生成安装文件--> 安装，这个复杂而艰辛的过程。为简化安装步骤，便于广大用户的安装部署程序，程序提供商就在特定的系统上面编译好相关程序的安装文件并进行打包，提供给大家下载，我们只需要根据自己的
socket通信遇到EOFException 肆无忌惮_ EOFException
java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:
基于spring的web项目定时操作知了ing java Web
废话不多说，直接上代码，很简单配置一下项目启动就行 1，web.xml <?xml version="1.0" encoding="UTF-8"?> <web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="h
树形结构的数据库表Schema设计矮蛋蛋 schema
原文地址： http://blog.csdn.net/MONKEY_D_MENG/article/details/6647488 程序设计过程中，我们常常用树形结构来表征某些数据的关联关系，如企业上下级部门、栏目结构、商品分类等等，通常而言，这些树状结构需要借助于数据库完成持久化。然而目前的各种基于关系的数据库，都是以二维表的形式记录存储数据信息，
maven将jar包和源码一起打包到本地仓库 alleni123 maven
http://stackoverflow.com/questions/4031987/how-to-upload-sources-to-local-maven-repository <project> ... <build> <plugins> <plugin> <groupI
java IO操作与 File 获取文件或文件夹的大小，可读，等属性！！！百合不是茶
类 File File是指文件和目录路径名的抽象表示形式。 1，何为文件：标准文件（txt doc mp3...）目录文件（文件夹）虚拟内存文件 2，File类中有可以创建文件的 createNewFile（）方法,在创建新文件的时候需要try{} catch(）{}因为可能会抛出异常；也有可以判断文件是否是一个标准文件的方法isFile();这些防抖都
Spring注入有继承关系的类（2） bijian1013 java spring
被注入类的父类有相应的属性，Spring可以直接注入相应的属性，如下所例：1.AClass类 package com.bijian.spring.test4; public class AClass { private String a; private String b; public String getA() { retu
30岁转型期你能否成为成功人士 bijian1013 成长励志
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
【Velocity四】Velocity与Java互操作 bit1129 velocity
Velocity出现的目的用于简化基于MVC的web应用开发，用于替代JSP标签技术，那么Velocity如何访问Java代码.本篇继续以Velocity三http://bit1129.iteye.com/blog/2106142中的例子为基础， POJO package com.tom.servlets; public
【Hive十一】Hive数据倾斜优化 bit1129 hive
什么是Hive数据倾斜问题操作：join,group by,count distinct 现象：任务进度长时间维持在99%（或100%），查看任务监控页面，发现只有少量（1个或几个）reduce子任务未完成；查看未完成的子任务，可以看到本地读写数据量积累非常大，通常超过10GB可以认定为发生数据倾斜。原因：key分布不均匀倾斜度衡量：平均记录数超过50w且
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua csrf
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-3.求子数组的最大和 bylijinnan java
package beautyOfCoding; public class MaxSubArraySum { /** * 3.求子数组的最大和题目描述：输入一个整形数组，数组里有正数也有负数。数组中连续的一个或多个整数组成一个子数组，每个子数组都有一个和。求所有子数组的和的最大值。要求时间复杂度为O(n)。例如输入的数组为1, -2, 3, 10, -4,
Netty源码学习-FileRegion bylijinnan java netty
今天看org.jboss.netty.example.http.file.HttpStaticFileServerHandler.java 可以直接往channel里面写入一个FileRegion对象，而不需要相应的encoder： //pipeline（没有诸如“FileRegionEncoder”的handler）： public ChannelPipeline ge
使用ZeroClipboard解决跨浏览器复制到剪贴板的问题 cngolon 跨浏览器复制到粘贴板 Zero Clipboard
Zero Clipboard的实现原理 Zero Clipboard 利用透明的Flash让其漂浮在复制按钮之上，这样其实点击的不是按钮而是 Flash ，这样将需要的内容传入Flash，再通过Flash的复制功能把传入的内容复制到剪贴板。 Zero Clipboard的安装方法首先需要下载 Zero Clipboard的压缩包，解压后把文件夹中两个文件：ZeroClipboard.js
单例模式 cuishikuan 单例模式
第一种（懒汉，线程不安全）： public class Singleton { 2 private static Singleton instance; 3 pri
spring+websocket的使用 dalan_123
一、spring配置文件 <?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.or
细节问题：ZEROFILL的用法范围。 dcj3sjt126com mysql
1、zerofill把月份中的一位数字比如1，2，3等加前导0 mysql> CREATE TABLE t1 (year YEAR(4), month INT(2) UNSIGNED ZEROFILL, -> day
Android开发10——Activity的跳转与传值 dcj3sjt126com Android开发
Activity跳转与传值，主要是通过Intent类，Intent的作用是激活组件和附带数据。一、Activity跳转方法一Intent intent = new Intent(A.this, B.class); startActivity(intent) 方法二Intent intent = new Intent();intent.setCla
jdbc 得到表结构、主键 eksliang jdbc 得到表结构、主键
转自博客：http://blog.csdn.net/ocean1010/article/details/7266042 假设有个con DatabaseMetaData dbmd = con.getMetaData(); rs = dbmd.getColumns(con.getCatalog(), schema, tableName, null); rs.getSt
Android 应用程序开关GPS gqdy365 android
要在应用程序中操作GPS开关需要权限： <uses-permission android:name="android.permission.WRITE_SECURE_SETTINGS" /> 但在配置文件中添加此权限之后会报错，无法再eclipse里面正常编译，怎么办？ 1、方法一：将项目放到Android源码中编译； 2、方法二：网上有人说cl
Windows上调试MapReduce zhiquanliu mapreduce
1.下载hadoop2x-eclipse-plugin https://github.com/winghc/hadoop2x-eclipse-plugin.git 把 hadoop2.6.0-eclipse-plugin.jar 放到eclipse plugin 目录中。 2.下载 hadoop2.6_x64_.zip http://dl.iteye.com/topics/download/d2b
如何看待一些知名博客推广软文的行为？ justjavac 博客
本文来自我在知乎上的一个回答：http://www.zhihu.com/question/23431810/answer/24588621 互联网上的两种典型心态：当初求种像条狗，如今撸完嫌人丑当初搜贴像条犬，如今读完嫌人软你为啥感觉不舒服呢？难道非得要作者把自己的劳动成果免费给你用，你才舒服？就如同 Google 关闭了 Gooled Reader，那是
sql优化总结 macroli sql
为了是自己对sql优化有更好的原则性，在这里做一下总结，个人原则如有不对请多多指教。谢谢！要知道一个简单的sql语句执行效率，就要有查看方式，一遍更好的进行优化。一、简单的统计语句执行时间 declare @d datetime ---定义一个datetime的变量set @d=getdate() ---获取查询语句开始前的时间select user_id
Linux Oracle中常遇到的一些问题及命令总结超声波 oracle linux
1.linux更改主机名 (1)#hostname oracledb　　　　临时修改主机名 (2) vi /etc/sysconfig/network 　　修改hostname (3) vi /etc/hosts　　　　　　　　修改IP对应的主机名 2.linux重启oracle实例及监听的各种方法（注意操作的顺序应该是先监听，后数据库实例） &nbs
hive函数大全及使用示例 superlxw1234 hadoop hive函数
具体说明及示例参见附件文档。文档目录：目录一、关系运算： 4 1. 等值比较: = 4 2. 不等值比较: <> 4 3. 小于比较: < 4 4. 小于等于比较: <= 4 5. 大于比较: > 5 6. 大于等于比较: >= 5 7. 空值判断: IS NULL 5
Spring 4.2新特性-使用@Order调整配置类加载顺序 wiselyman spring 4
4.1 @Order Spring 4.2 利用@Order控制配置类的加载顺序 4.2 演示两个演示bean package com.wisely.spring4_2.order; public class Demo1Service { } package com.wisely.spring4_2.order; public class