吃果冻不吐果冻皮

TensorRT-LLM保姆级教程（二）-离线环境搭建、模型量化及推理

随着大模型的爆火，投入到生产环境的模型参数量规模也变得越来越大（从数十亿参数到千亿参数规模），从而导致大模型的推理成本急剧增加。因此，市面上也出现了很多的推理框架，用于降低模型推理延迟以及提升模型吞吐量。

本系列将针对TensorRT-LLM推理进行讲解。本文为该系列第二篇，将基于Bloom进行模型量化及推理。

另外，我撰写的大模型相关的博客及配套代码均整理放置在Github：llm-action，有需要的朋友自取。

环境搭建

基础配置：

CUDA：12.2
镜像：nvcr.io/nvidia/pytorch:23.10-py3

由于服务器无法访问外网，只能预先准备好镜像，安装包、编译源码等，接下来准备安装 TensorRT-LLM，推荐使用 Docker 构建和运行 TensorRT-LLM，整个安装步骤参考 TensorRT-LLM 中构建 Docker 镜像的步骤。

首先，进入Docker容器。

docker run -dt --name tensorrt_llm_lgd \
--restart=always \
--gpus all \
--network=host \
--shm-size=4g \
-m 64G \
-v /home/guodong.li/workspace:/workspace \
-w /workspace \
nvcr.io/nvidia/pytorch:23.10-py3 \
/bin/bash

docker exec -it tensorrt_llm_lgd bash

安装PyTorch、TensorRT、mpi4py等：

# 卸载TensorRT
pip uninstall -y tensorrt
pip uninstall -y torch-tensorrt

pip install mpi4py -i http://nexus3.xxx.com/repository/pypi/simple --trusted-host nexus3.xxx.com
pip install polygraphy-0.48.1-py2.py3-none-any.whl -i http://nexus3.xxx.com/repository/pypi/simple --trusted-host nexus3.xxx.com

# 重新安装PyTorch
pip install torch==2.1.0 -i http://nexus3.xxx.com/repository/pypi/simple --trusted-host nexus3.xxx.com
pip uninstall transformer-engine


# 重新安装TensorRT
tar -xf /tmp/TensorRT.tar -C /usr/local/
mv /usr/local/TensorRT-9.1.0.4 /usr/local/tensorrt
pip install /usr/local/tensorrt/python/tensorrt-*-cp310-*.whl -i http://nexus3.xxx.com/repository/pypi/simple --trusted-host nexus3.xxx.com

配置环境变量/etc/profile：

ENV LD_LIBRARY_PATH=/usr/local/tensorrt/lib:${LD_LIBRARY_PATH}

构建 TensorRT-LLM：

python3 ./scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --cuda_architectures "80-real"

由于离线构建，需修改配置文件：

修改pip源：https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/scripts/build_wheel.py#L65。
修改git远程仓库地址：https://github.com/NVIDIA/TensorRT-LLM/blob/release/0.5.0/cpp/tests/CMakeLists.txt#L19

安装TensorRT-LLM：

pip install ./build/tensorrt_llm*.whl -i http://nexus3.xxx.com/repository/pypi/simple --trusted-host nexus3.xxx.com

至此，整个环境搭建就完成了。

基于 Bloom 模型开发实践简介

接下来以Bloom模型为例，进行 TensorRT-LLM 开发实践。

Bloom 示例中主要文件：

build.py：用于构建 TensorRT 引擎来运行Bloom模型。
run.py：模型推理。
summarize.py：使用模型来总结 CNN Dailymail 数据集中的文章。
hf_bloom_convert.py：将HF格式的模型进行转换。

TensorRT-LLM 中，目前针对 Bloom 模型支持的特性如下：

支持 FP16
支持 INT8 & INT4 仅权重量化
支持 INT8 KV CACHE 量化
支持SmoothQuant 量化
支持张量并行

关于大模型量化之前的文章：大模型量化概述进行过简要概述，后续有时间更详细的梳理常见的一些大模型量化技术。

数据与模型下载

下载Bloom模型，本文基于bloomz-3b进行量化和推理。

# 需先安装git-lfs，通常情况下前面已经安装过了。
# git lfs install

# 下载模型
rm -rf /workspace/model/bloomz-3b
mkdir -p /workspace/model/bloomz-3b && git clone https://huggingface.co/bigscience/bloomz-3b /workspace/model/bloomz-3b

下载数据集，本文会用到 CNN Dailymail 数据集和 LAMBADA 数据集。

https://huggingface.co/datasets/ccdv/cnn_dailymail
https://huggingface.co/datasets/lambada

构建 TensorRT 引擎

TensorRT-LLM 基于 HF 上 Bloom 的 checkpoint 构建 TensorRT 引擎。如果未指定 checkpoint 目录，TensorRT-LLM 将使用虚拟权重构建引擎。

下面使用 build.py 脚本来构建TensorRT 引擎；通常，build.py 仅需单个 GPU，但如果您有推理所需的所有 GPU，则可以通过添加 --parallel_build 参数来启用并行构建，以使引擎构建过程更快。

注意：目前parallel_build功能仅支持单节点。

hf_bloom_convert.py 脚本常用参数说明：

out_dir：模型格式转化之后的输出路径。
in_file：原始模型路径。
tensor_parallelism：模型推理时的张量并行度
calibrate_kv_cache：生成 KV 缓存的缩放因子。以 INT8 存储 KV Cache 时使用。
smoothquant：使用Smoothquant对模型进行量化时设置α参数，
，并输出int8权重。第一次尝试最好是 0.5。该参数必须在 [0, 1] 之间。
storage_type：设置模型参数存储的数据类型。

build.py 脚本常用参数说明：

model_dir：指定原始HF模型目录。
bin_model_dir：SmoothQuant 或 KV CACHE 量化时，指定模型转换后的二进制文件。
dtype：指定模型数据类型。
use_gemm_plugin：设置gemm数据类型。
use_gpt_attention_plugin：设置attention的数据类型。
output_dir：引擎输出目录。
use_layernorm_plugin：设置layernorm的数据类型。
use_weight_only：设置仅权重量化，将各种 GEMM 的权重量化为 INT4/INT8。。
weight_only_precision：设置仅权重量化时的权重精度。必须使用use_weight_only时该参数才会生效。
use_smooth_quant：使用 SmoothQuant 方法量化各种 GEMM 的激活和权重。更细粒度的量化选项，使用 --per_channel 和 --per_token 参数选型。
per_channel：默认情况下，对 GEMM 结果使用单个静态缩放因子。per_channel 相反，它为每个通道使用不同的静态缩放因子。后者通常更准确，但速度稍慢。
per_token：默认情况下，我们使用单个静态缩放因子来缩放 int8 范围内的激活。per_token 在运行时为每个token选择一个自定义缩放因子。后者通常更准确，但速度稍慢。
int8_kv_cache：默认情况下，使用 dtype 进行 KV 缓存。 int8_kv_cache为KV选择int8量化。
use_parallel_embedding：默认情况下，嵌入并行被禁用。通过设置此参数，可以启用嵌入并行。
embedding_sharding_dim：尝试通过在两层之间共享嵌入查找表来减小引擎大小。注意：当不满足条件时，该参数可能不会生效。

FP16

使用 HF 权重基于单 GPU 及 float16 精度构建引擎。使用 use_gemm_plugin 来防止准确性问题。

python build.py --model_dir /workspace/model/bloomz-3b \
                --dtype float16 \
                --use_gemm_plugin float16 \
                --use_gpt_attention_plugin float16 \
                --output_dir /workspace/model/bloomz-3b_trt_engines/fp16/1-gpu/

输出模型引擎文件：

> tree -h  /workspace/model/bloomz-3b_trt_engines/fp16/1-gpu/
├── [6.8G]  bloom_float16_tp1_rank0.engine
├── [1.2K]  config.json
└── [327K]  model.cache

仅 INT8 权重量化（W8A16）

使用单 GPU 和仅 INT8 权重量化构建引擎：

python build.py --model_dir /workspace/model/bloomz-3b \
                --dtype float16 \
                --use_gemm_plugin float16 \
                --use_gpt_attention_plugin float16 \
                --use_weight_only \
                --output_dir /workspace/model/bloomz-3b_trt_engines/int8_weight_only/1-gpu/

输出模型引擎文件：

> tree -h /workspace/model/bloomz-3b_trt_engines/int8_weight_only/1-gpu/

├── [4.6G]  bloom_float16_tp1_rank0.engine
├── [1.2K]  config.json
└── [317K]  model.cache

FP16 + 2路张量并行

使用2路张量并行构建引擎：

python build.py --model_dir /workspace/model/bloomz-3b \
                --dtype float16 \
                --use_gemm_plugin float16 \
                --use_gpt_attention_plugin float16 \
                --output_dir /workspace/model/bloomz-3b_trt_engines/fp16/2-gpu/ \
                --world_size 2

输出模型引擎文件：

> tree -h  /workspace/model/bloomz-3b_trt_engines/fp16/2-gpu/

├── [4.0G]  bloom_float16_tp2_rank0.engine
├── [4.0G]  bloom_float16_tp2_rank1.engine
├── [1.2K]  config.json
└── [327K]  model.cache

仅 INT8 权重量化 & INT8 KV CACHE 量化

下面使用仅 INT8 权重量化及 INT8 KV CACHE 量化：

对于 INT8 KV 缓存，hf_bloom_convert.py 脚本中有 --calibrate-kv-cache、-kv 选项。设置 -kv 将校准模型，然后导出 INT8 KV CACHE推理所需的缩放因子（scaling factors）。

python3 hf_bloom_convert.py \
-i /workspace/model/bloomz-3b \
-o /workspace/model/bloom-c-model/int8_kv_cache/3b \
--calibrate-kv-cache -t float16

输出结果：

> tree -h /workspace/model/bloom-c-model/int8_kv_cache/3b
/workspace/model/bloom-c-model/int8_kv_cache/3b
└── [ 28K]  1-gpu
    ├── [2.1K]  config.ini
    ├── [5.0K]  model.final_layernorm.bias.bin
    ├── [5.0K]  model.final_layernorm.weight.bin
    ├── [5.0K]  model.layers.0.attention.dense.bias.bin
    ├── [ 12M]  model.layers.0.attention.dense.weight.0.bin
    ├── [ 15K]  model.layers.0.attention.query_key_value.bias.0.bin
    ├── [   4]  model.layers.0.attention.query_key_value.scale_y_quant_orig.bin
    ├── [ 38M]  model.layers.0.attention.query_key_value.weight.0.bin
    ├── [5.0K]  model.layers.0.input_layernorm.bias.bin
    ├── [5.0K]  model.layers.0.input_layernorm.weight.bin
    ├── [5.0K]  model.layers.0.mlp.dense_4h_to_h.bias.bin
    ├── [ 50M]  model.layers.0.mlp.dense_4h_to_h.weight.0.bin
    ├── [ 20K]  model.layers.0.mlp.dense_h_to_4h.bias.0.bin
    ├── [ 50M]  model.layers.0.mlp.dense_h_to_4h.weight.0.bin
    ├── [5.0K]  model.layers.0.post_attention_layernorm.bias.bin
    ├── [5.0K]  model.layers.0.post_attention_layernorm.weight.bin
    ├── [5.0K]  model.layers.10.attention.dense.bias.bin
    ├── [ 12M]  model.layers.10.attention.dense.weight.0.bin
    ├── [ 15K]  model.layers.10.attention.query_key_value.bias.0.bin
    ├── [   4]  model.layers.10.attention.query_key_value.scale_y_quant_orig.bin
    ├── [ 38M]  model.layers.10.attention.query_key_value.weight.0.bin
    ├── [5.0K]  model.layers.10.input_layernorm.bias.bin
    ├── [5.0K]  model.layers.10.input_layernorm.weight.bin
    ├── [5.0K]  model.layers.10.mlp.dense_4h_to_h.bias.bin
    ├── [ 50M]  model.layers.10.mlp.dense_4h_to_h.weight.0.bin
    ├── [ 20K]  model.layers.10.mlp.dense_h_to_4h.bias.0.bin
    ├── [ 50M]  model.layers.10.mlp.dense_h_to_4h.weight.0.bin
    ├── [5.0K]  model.layers.10.post_attention_layernorm.bias.bin
    ├── [5.0K]  model.layers.10.post_attention_layernorm.weight.bin
	...
    ├── [5.0K]  model.word_embeddings_layernorm.bias.bin
    ├── [5.0K]  model.word_embeddings_layernorm.weight.bin
    └── [1.2G]  model.wpe.bin

组合仅 INT8 权重量化及 INT8 KV CACHE 量化构建引擎：

# Build model with both INT8 weight-only and INT8 KV cache enabled

python build.py --bin_model_dir=/workspace/model/bloom-c-model/int8_kv_cache/3b/1-gpu \
                --dtype float16 \
                --use_gpt_attention_plugin float16 \
                --use_gemm_plugin float16 \
                --use_layernorm_plugin \
                --int8_kv_cache \
                --output_dir /workspace/model/bloom-3b-c-model/int8_kv_cache/ \
                --use_weight_only

运行结果：

tree -h /workspace/model/bloom-3b-c-model/int8_kv_cache/
/workspace/model/bloom-3b-c-model/int8_kv_cache/
├── [4.6G]  bloom_float16_tp1_rank0.engine
├── [1.2K]  config.json
└── [ 78K]  model.cache

0 directories, 3 files

SmoothQuant 量化（W8A8）

与 FP16 构建引擎处理 HF 权重并直接加载到 TensorRT-LLM 不同，SmoothQuant 需要加载 INT8 权重，该权重应在构建引擎之前进行预处理。

python3 hf_bloom_convert.py \
-i /workspace/model/bloomz-3b \
-o /workspace/model/bloom-3b-c-model/smooth/ \
--smoothquant 0.5 \
--tensor-parallelism 1 \
--storage-type float16

运行结果：

> tree -h /workspace/model/bloom-3b-c-model/smooth/
/workspace/model/bloom-3b-c-model/smooth/
└── [100K]  1-gpu
    ├── [2.1K]  config.ini
    ├── [5.0K]  model.final_layernorm.bias.bin
    ├── [5.0K]  model.final_layernorm.weight.bin
    ├── [5.0K]  model.layers.0.attention.dense.bias.bin
    ├── [   4]  model.layers.0.attention.dense.scale_w_quant_orig.bin
    ├── [ 10K]  model.layers.0.attention.dense.scale_w_quant_orig.col.bin
    ├── [   4]  model.layers.0.attention.dense.scale_x_orig_quant.bin
    ├── [   4]  model.layers.0.attention.dense.scale_y_accum_quant.bin
    ├── [ 10K]  model.layers.0.attention.dense.scale_y_accum_quant.col.bin
    ├── [   4]  model.layers.0.attention.dense.scale_y_quant_orig.bin
    ├── [ 10K]  model.layers.0.attention.dense.smoother.0.bin
    ├── [ 12M]  model.layers.0.attention.dense.weight.0.bin
    ├── [6.2M]  model.layers.0.attention.dense.weight.int8.0.bin
    ├── [6.2M]  model.layers.0.attention.dense.weight.int8.col.0.bin
    ├── [ 15K]  model.layers.0.attention.query_key_value.bias.0.bin
    ├── [ 30K]  model.layers.0.attention.query_key_value.scale_w_quant_orig.bin
    ├── [ 30K]  model.layers.0.attention.query_key_value.scale_w_quant_orig.col.0.bin
    ├── [   4]  model.layers.0.attention.query_key_value.scale_x_orig_quant.bin
    ├── [ 30K]  model.layers.0.attention.query_key_value.scale_y_accum_quant.bin
    ├── [ 30K]  model.layers.0.attention.query_key_value.scale_y_accum_quant.col.0.bin
    ├── [   4]  model.layers.0.attention.query_key_value.scale_y_quant_orig.bin
    ├── [ 38M]  model.layers.0.attention.query_key_value.weight.0.bin
    ├── [ 19M]  model.layers.0.attention.query_key_value.weight.int8.0.bin
    ├── [ 19M]  model.layers.0.attention.query_key_value.weight.int8.col.0.bin
    ├── [5.0K]  model.layers.0.input_layernorm.bias.bin
    ├── [5.0K]  model.layers.0.input_layernorm.weight.bin
    ├── [5.0K]  model.layers.0.mlp.dense_4h_to_h.bias.bin
    ├── [   4]  model.layers.0.mlp.dense_4h_to_h.scale_w_quant_orig.bin
    ├── [ 10K]  model.layers.0.mlp.dense_4h_to_h.scale_w_quant_orig.col.bin
    ├── [   4]  model.layers.0.mlp.dense_4h_to_h.scale_x_orig_quant.bin
    ├── [   4]  model.layers.0.mlp.dense_4h_to_h.scale_y_accum_quant.bin
    ├── [ 10K]  model.layers.0.mlp.dense_4h_to_h.scale_y_accum_quant.col.bin
    ├── [   4]  model.layers.0.mlp.dense_4h_to_h.scale_y_quant_orig.bin
    ├── [ 40K]  model.layers.0.mlp.dense_4h_to_h.smoother.0.bin
    ├── [ 50M]  model.layers.0.mlp.dense_4h_to_h.weight.0.bin
    ├── [ 25M]  model.layers.0.mlp.dense_4h_to_h.weight.int8.0.bin
    ├── [ 25M]  model.layers.0.mlp.dense_4h_to_h.weight.int8.col.0.bin
    ├── [ 20K]  model.layers.0.mlp.dense_h_to_4h.bias.0.bin
    ├── [   4]  model.layers.0.mlp.dense_h_to_4h.scale_w_quant_orig.bin
    ├── [ 40K]  model.layers.0.mlp.dense_h_to_4h.scale_w_quant_orig.col.0.bin
    ├── [   4]  model.layers.0.mlp.dense_h_to_4h.scale_x_orig_quant.bin
    ├── [   4]  model.layers.0.mlp.dense_h_to_4h.scale_y_accum_quant.bin
    ├── [ 40K]  model.layers.0.mlp.dense_h_to_4h.scale_y_accum_quant.col.0.bin
    ├── [   4]  model.layers.0.mlp.dense_h_to_4h.scale_y_quant_orig.bin
    ├── [ 50M]  model.layers.0.mlp.dense_h_to_4h.weight.0.bin
    ├── [ 25M]  model.layers.0.mlp.dense_h_to_4h.weight.int8.0.bin
    ├── [ 25M]  model.layers.0.mlp.dense_h_to_4h.weight.int8.col.0.bin
    ├── [5.0K]  model.layers.0.post_attention_layernorm.bias.bin
    ├── [5.0K]  model.layers.0.post_attention_layernorm.weight.bin
	...
    ├── [5.0K]  model.word_embeddings_layernorm.bias.bin
    ├── [5.0K]  model.word_embeddings_layernorm.weight.bin
    └── [1.2G]  model.wpe.bin

通过 --use_smooth_quant 选型启动 INT8 量化。默认情况下，使用逐层量化（_per_tensor_）构建引擎：

# Build model for SmoothQuant in the _per_tensor_ mode.
python3 build.py --bin_model_dir=/workspace/model/bloom-3b-c-model/smooth/1-gpu \
                 --use_smooth_quant \
                 --output_dir "/workspace/model/bloom-3b-c-model/smooth-quant" \
                 --use_gpt_attention_plugin float16

运行结果：

> tree -h /workspace/model/bloom-3b-c-model/smooth-quant
/workspace/model/bloom-3b-c-model/smooth-quant
├── [3.4G]  bloom_float16_tp1_rank0.engine
├── [1.2K]  config.json
└── [516K]  model.cache

0 directories, 3 files

同时，支持使用逐通道量化（ _per_token_ + _per_channel_）构建引擎：

# Build model for SmoothQuant in the _per_token_ + _per_channel_ mode
python3 build.py --bin_model_dir=/workspace/model/bloom-3b-c-model/smooth/1-gpu \
                 --use_smooth_quant \
                 --use_gpt_attention_plugin float16 \
                 --output_dir "/workspace/model/bloom-3b-c-model/smooth-quant-channel-token" \
                 --per_token \
                 --per_channel

运行结果：

tree -h /home/guodong.li/workspace/model/bloom-3b-c-model/smooth-quant-channel-token
/home/guodong.li/workspace/model/bloom-3b-c-model/smooth-quant-channel-token
├── [4.6G]  bloom_float16_tp1_rank0.engine
├── [1.2K]  config.json
└── [516K]  model.cache

0 directories, 3 files

注意：

目前需要为 SmoothQuant 启用 GPT 注意力插件（--use_gpt_attention_plugin）。
使用 --bin_model_dir 而不是 --model_dir，是因为 SmoothQuant 量化时，模型需要二进制文件中的 INT8 权重和各种缩放（scales）。

模型推理

接下来运行模型进行推理，同时，使用rouge指标评估模型。

summarize.py 脚本常用参数说明：

hf_model_location：指定HF模型和词表地址
test_hf：测试HF
test_trt_llm：测试TensorRT-LLM
data_type：指定数据类型，该参数指定test_hf时使用，将模型参数转换成半精度
dataset_path：指定数据集缓存目录
engine_dir：指定引擎目录

FP16

python summarize.py --test_trt_llm \
                    --hf_model_location /workspace/model/bloomz-3b \
                    --data_type fp16 \
                    --engine_dir /workspace/model/bloomz-3b_trt_engines/fp16/1-gpu/

仅 INT8 权重量化

python summarize.py --test_trt_llm \
                    --hf_model_location /workspace/model/bloomz-3b \
                    --data_type fp16 \
                    --engine_dir /workspace/model/bloomz-3b_trt_engines/int8_weight_only/1-gpu/

运行过程：

[11/14/2023-09:54:48] [TRT-LLM] [I] Load tokenizer takes: 0.6626021862030029 sec
[11/14/2023-09:54:54] [TRT] [I] Loaded engine size: 4708 MiB
[11/14/2023-09:54:55] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 6142, GPU 46624 (MiB)
[11/14/2023-09:54:55] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 6144, GPU 46634 (MiB)
[11/14/2023-09:54:55] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:54:55] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +4703, now: CPU 0, GPU 4703 (MiB)
[11/14/2023-09:54:55] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 6149, GPU 48652 (MiB)
[11/14/2023-09:54:55] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 6149, GPU 48660 (MiB)
[11/14/2023-09:54:55] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:54:56] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4703 (MiB)
[11/14/2023-09:54:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 6195, GPU 48680 (MiB)
[11/14/2023-09:54:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 6196, GPU 48690 (MiB)
[11/14/2023-09:54:56] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:54:57] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4703 (MiB)
[11/14/2023-09:54:58] [TRT-LLM] [I] Load engine takes: 9.880424976348877 sec
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
[11/14/2023-09:54:59] [TRT-LLM] [I] ---------------------------------------------------------
[11/14/2023-09:54:59] [TRT-LLM] [I] TensorRT-LLM Generated :
[11/14/2023-09:54:59] [TRT-LLM] [I]  Article : ['(CNN)James Best, best known for his portrayal of bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," died Monday after a brief illness. He was 88. Best died in hospice in Hickory, North Carolina, of complications from pneumonia, said Steve Latshaw, a longtime friend and Hollywood colleague. Although he'd been a busy actor for decades in theater and in Hollywood, Best didn't become famous until 1979, when "The Dukes of Hazzard's" cornpone charms began beaming into millions of American homes almost every Friday night. For seven seasons, Best's Rosco P. Coltrane chased the moonshine-running Duke boys back and forth across the back roads of fictitious Hazzard County, Georgia, although his "hot pursuit" usually ended with him crashing his patrol car. Although Rosco was slow-witted and corrupt, Best gave him a childlike enthusiasm that got laughs and made him endearing. His character became known for his distinctive "kew-kew-kew" chuckle and for goofy catchphrases such as "cuff 'em and stuff 'em!" upon making an arrest. Among the most popular shows on TV in the early '80s, "The Dukes of Hazzard" ran until 1985 and spawned TV movies, an animated series and video games. Several of Best's "Hazzard" co-stars paid tribute to the late actor on social media. "I laughed and learned more from Jimmie in one hour than from anyone else in a whole year," co-star John Schneider, who played Bo Duke, said on Twitter. "Give Uncle Jesse my love when you see him dear friend." "Jimmy Best was the most constantly creative person I have ever known," said Ben Jones, who played mechanic Cooter on the show, in a Facebook post. "Every minute of his long life was spent acting, writing, producing, painting, teaching, fishing, or involved in another of his life's many passions." Born Jewel Guy on July 26, 1926, in Powderly, Kentucky, Best was orphaned at 3 and adopted by Armen and Essa Best, who renamed him James and raised him in rural Indiana. Best served in the Army during World War II before launching his acting career. In the 1950s and 1960s, he accumulated scores of credits, playing a range of colorful supporting characters in such TV shows as "The Twilight Zone," "Bonanza," "The Andy Griffith Show" and "Gunsmoke." He later appeared in a handful of Burt Reynolds' movies, including "Hooper" and "The End." But Best will always be best known for his "Hazzard" role, which lives on in reruns. "Jimmie was my teacher, mentor, close friend and collaborator for 26 years," Latshaw said. "I directed two of his feature films, including the recent 'Return of the Killer Shrews,' a sequel he co-wrote and was quite proud of as he had made the first one more than 50 years earlier." People we've lost in 2015 . CNN's Stella Chan contributed to this story.']
[11/14/2023-09:54:59] [TRT-LLM] [I]
 Highlights : ['James Best, who played the sheriff on "The Dukes of Hazzard," died Monday at 88 .\n"Hazzard" ran from 1979 to 1985 and was among the most popular shows on TV .']
[11/14/2023-09:54:59] [TRT-LLM] [I]
 Summary : [[' Actor James Best, best known for his role as bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," has died at age 88.']]
[11/14/2023-09:54:59] [TRT-LLM] [I] ---------------------------------------------------------
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
[11/14/2023-09:55:10] [TRT-LLM] [I] TensorRT-LLM (total latency: 10.436434745788574 sec)
[11/14/2023-09:55:10] [TRT-LLM] [I] TensorRT-LLM beam 0 result
[11/14/2023-09:55:11] [TRT-LLM] [I]   rouge1 : 30.60846842935061
[11/14/2023-09:55:11] [TRT-LLM] [I]   rouge2 : 11.315593160478784
[11/14/2023-09:55:11] [TRT-LLM] [I]   rougeL : 24.043680494718327
[11/14/2023-09:55:11] [TRT-LLM] [I]   rougeLsum : 26.250663629946125

FP16 + 2路张量并行

mpirun -n 2 --allow-run-as-root \
    python summarize.py --test_trt_llm \
                        --hf_model_location /workspace/model/bloomz-3b \
                        --data_type fp16 \
                        --engine_dir /workspace/model/bloomz-3b_trt_engines/fp16/2-gpu/

运行过程：

[11/14/2023-09:58:13] [TRT-LLM] [MPI_Rank 1] [I] Load tokenizer takes: 0.4274311065673828 sec
[11/14/2023-09:58:13] [TRT-LLM] [MPI_Rank 0] [I] Load tokenizer takes: 0.45519232749938965 sec
[11/14/2023-09:58:17] [TRT] [I] Loaded engine size: 4094 MiB
[11/14/2023-09:58:18] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5533, GPU 41994 (MiB)
[11/14/2023-09:58:18] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 5534, GPU 42004 (MiB)
[11/14/2023-09:58:18] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:19] [TRT] [I] Loaded engine size: 4094 MiB
[11/14/2023-09:58:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5529, GPU 46010 (MiB)
[11/14/2023-09:58:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 5530, GPU 46020 (MiB)
[11/14/2023-09:58:20] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +4088, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +4088, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5749, GPU 43220 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 5749, GPU 43228 (MiB)
[11/14/2023-09:58:23] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5749, GPU 47236 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 5749, GPU 47244 (MiB)
[11/14/2023-09:58:23] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 5796, GPU 47262 (MiB)
[11/14/2023-09:58:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 5796, GPU 47272 (MiB)
[11/14/2023-09:58:23] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:24] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:24] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 5796, GPU 43246 (MiB)
[11/14/2023-09:58:24] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 5796, GPU 43256 (MiB)
[11/14/2023-09:58:24] [TRT] [W] TensorRT was linked against cuDNN 8.9.4 but loaded cuDNN 8.9.2
[11/14/2023-09:58:24] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:24] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 4088 (MiB)
[11/14/2023-09:58:25] [TRT-LLM] [MPI_Rank 0] [I] Load engine takes: 11.81023645401001 sec
[11/14/2023-09:58:25] [TRT-LLM] [MPI_Rank 1] [I] Load engine takes: 11.762826204299927 sec
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I] ---------------------------------------------------------
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I] TensorRT-LLM Generated :
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I]  Article : ['(CNN)James Best, best known for his portrayal of bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," died Monday after a brief illness. He was 88. Best died in hospice in Hickory, North Carolina, of complications from pneumonia, said Steve Latshaw, a longtime friend and Hollywood colleague. Although he'd been a busy actor for decades in theater and in Hollywood, Best didn't become famous until 1979, when "The Dukes of Hazzard's" cornpone charms began beaming into millions of American homes almost every Friday night. For seven seasons, Best's Rosco P. Coltrane chased the moonshine-running Duke boys back and forth across the back roads of fictitious Hazzard County, Georgia, although his "hot pursuit" usually ended with him crashing his patrol car. Although Rosco was slow-witted and corrupt, Best gave him a childlike enthusiasm that got laughs and made him endearing. His character became known for his distinctive "kew-kew-kew" chuckle and for goofy catchphrases such as "cuff 'em and stuff 'em!" upon making an arrest. Among the most popular shows on TV in the early '80s, "The Dukes of Hazzard" ran until 1985 and spawned TV movies, an animated series and video games. Several of Best's "Hazzard" co-stars paid tribute to the late actor on social media. "I laughed and learned more from Jimmie in one hour than from anyone else in a whole year," co-star John Schneider, who played Bo Duke, said on Twitter. "Give Uncle Jesse my love when you see him dear friend." "Jimmy Best was the most constantly creative person I have ever known," said Ben Jones, who played mechanic Cooter on the show, in a Facebook post. "Every minute of his long life was spent acting, writing, producing, painting, teaching, fishing, or involved in another of his life's many passions." Born Jewel Guy on July 26, 1926, in Powderly, Kentucky, Best was orphaned at 3 and adopted by Armen and Essa Best, who renamed him James and raised him in rural Indiana. Best served in the Army during World War II before launching his acting career. In the 1950s and 1960s, he accumulated scores of credits, playing a range of colorful supporting characters in such TV shows as "The Twilight Zone," "Bonanza," "The Andy Griffith Show" and "Gunsmoke." He later appeared in a handful of Burt Reynolds' movies, including "Hooper" and "The End." But Best will always be best known for his "Hazzard" role, which lives on in reruns. "Jimmie was my teacher, mentor, close friend and collaborator for 26 years," Latshaw said. "I directed two of his feature films, including the recent 'Return of the Killer Shrews,' a sequel he co-wrote and was quite proud of as he had made the first one more than 50 years earlier." People we've lost in 2015 . CNN's Stella Chan contributed to this story.']
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I]
 Highlights : ['James Best, who played the sheriff on "The Dukes of Hazzard," died Monday at 88 .\n"Hazzard" ran from 1979 to 1985 and was among the most popular shows on TV .']
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I]
 Summary : [[' Actor James Best, best known for his role as bumbling sheriff Rosco P. Coltrane on TV's "The Dukes of Hazzard," has died at age 88.']]
[11/14/2023-09:58:27] [TRT-LLM] [MPI_Rank 0] [I] ---------------------------------------------------------
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
/workspace/TensorRT-LLM/examples/bloom/summarize.py:165: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  [torch.tensor(line_encoded[i], dtype=torch.int32), pad],
[11/14/2023-09:58:42] [TRT-LLM] [MPI_Rank 0] [I] TensorRT-LLM (total latency: 14.928563356399536 sec)
[11/14/2023-09:58:42] [TRT-LLM] [MPI_Rank 0] [I] TensorRT-LLM beam 0 result
[11/14/2023-09:58:43] [TRT-LLM] [MPI_Rank 0] [I]   rouge1 : 27.12991734291884
[11/14/2023-09:58:43] [TRT-LLM] [MPI_Rank 0] [I]   rouge2 : 8.273487794146279
[11/14/2023-09:58:43] [TRT-LLM] [MPI_Rank 0] [I]   rougeL : 21.08356714989421
[11/14/2023-09:58:43] [TRT-LLM] [MPI_Rank 0] [I]   rougeLsum : 23.51165220383353

SmoothQuant 量化

逐层量化：

python summarize.py --test_trt_llm \
                    --hf_model_location /workspace/model/bloomz-3b \
                    --data_type fp16 \
                    --engine_dir /workspace/model/bloom-3b-c-model/smooth-quant

逐通道量化：

python summarize.py --test_trt_llm \
                    --hf_model_location /workspace/model/bloomz-3b \
                    --data_type fp16 \
                    --engine_dir /workspace/model/bloom-3b-c-model/smooth-quant-channel-token

总结

本文简要介绍了TensorRT-LLM环境搭建，同时，基于Bloom进行模型量化及推理。码字不易，如果觉得有帮助，欢迎点赞收藏加关注。

参考文档

https://github.com/NVIDIA/TensorRT-LLM/tree/v0.5.0
https://github.com/NVIDIA/TensorRT-LLM/blob/v0.5.0/docker/Dockerfile.multi
https://github.com/NVIDIA/TensorRT-LLM/blob/v0.5.0/docs/source/installation.md

你可能感兴趣的:(动手学大模型,人工智能)

结构化提示词实践：提升大模型意图识别的精准度有才不一定有德 chatgpt AIGC
大家好！我是第一次写博客，可能会有一些不够专业或者细节上的错误，请大家多多指正！今天的主题是“结构化提示词”，我将结合我在学习LangGPT结构化提示词时的心得，分享一些如何编写清晰、准确的提示词的小技巧。这些技巧不仅帮助我提高了与大模型的互动效率，也使得任务的执行更加精准。希望这篇文章对大家有所帮助。为什么要使用结构化提示词？在与大模型的交互中，我们往往希望能够快速得到准确的回答。如果提示词不清
Ultralight - Digital - Human：移动端实时运行的超轻量级数字人模型花生糖@ AIGC学习资料库 AI 数字人轻量级模型
在数字人技术日益发展的今天，我们迎来了一个令人瞩目的项目——Ultralight-Digital-Human。这是一个由anliyuan等人开发的创新项目，为数字人技术在移动端的应用带来了新的可能性。一、项目概述Ultralight-Digital-Human旨在构建一个超轻量级的数字人模型，并且能够在移动端实现实时运行。这一特性使得它在众多数字人项目中脱颖而出，为移动设备上的各种应用场景提供了更
2024年7月手把手教你搭建，企业级AI大模型知识库问答系统 Peter高效办公有大招人工智能 llama
安装Docker下载Docker并安装https://www.docker.com/products/docker-desktop/安装Ollama下载Ollama并安装https://ollama.com/下载Chat模型我使用阿里的通义千问作为演示，根据自己的电脑配置情况，选择合适的模型。总体来说，模型是越大，效果越好，但是对电脑的配置要求也越高4b模型要3GB内存7b模型要8GB内存13b模
️ 在 Windows WSL 上部署 Ollama 和大语言模型的完整指南20241206 Narutolxy 技术干货分享智浪初航 windows 语言模型人工智能
️在WindowsWSL上部署Ollama和大语言模型的完整指南引言随着大语言模型（LLM）和人工智能的飞速发展，越来越多的开发者尝试在本地环境中部署大模型进行实验。然而，由于资源需求高、网络限制多以及工具复杂性，部署过程常常充满挑战。本指南基于实际经验，详细讲解如何在WindowsWSL（WindowsSubsystemforLinux）上部署Ollama和大语言模型，同时解决端口转发等常见痛点
Apache Iceberg数据湖技术在海量实时数据处理、实时特征工程和模型训练的应用技术方案和具体实施步骤及代码 weixin_30777913 音视频语言模型大数据人工智能
ApacheIceberg在处理海量实时数据、支持实时特征工程和模型训练方面的强大能力。Iceberg支持实时特征工程和模型训练，特别适用于需要处理海量实时数据的机器学习工作流。Iceberg作为数据湖，以支持其机器学习平台中的特征存储。Iceberg的分层结构、快照机制、并发读写能力以及模式演进等特性，使得它能够高效地处理海量数据，并且保证数据的一致性和可用性。特别是在特征工程和模型训练方面，I
colmap 已知pose 重建 kitti数据尝试鹿米lincent 知识深度学习自动驾驶计算机视觉
FrequentlyAskedQuestions—COLMAP3.7documentationCOLMAP已知相机内外参数重建稀疏/稠密模型-thronsbird-博客园Colmap根据相机内外参数重建稀疏模型_m0_47677188的博客-CSDN博客_colmap命令行准备images.txt现利用colmap的script提取database中的id-name对应关系colmap/scrip
基于Stackelberg博弈的光伏用户群优化定价模型(Matlab代码实现）然哥爱编程 matlab
‍个人主页欢迎来到本博客❤️❤️博主优势：博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。⛳️座右铭：行百里者，半于九十。本文目录如下：目录1概述2运行结果3文献来源4Matlab代码、数据、文章下载1概述文献来源：摘要：在由多主体组成的光伏用户群中,用户间存在光伏电量共享。然而,在现有的分布式光伏上网政策下,用户间的共享水平很低。为了提高用户间光伏电量共享水平,根据用户的用电特性,构建了光伏用
大模型低显存推理优化-Offload技术 AI大模型-大飞 java spring 前端大模型学习大模型 AI大模型大模型教程
近两年大模型火出天际；同时，也诞生了大量针对大模型的优化技术。本系列将针对一些常见大模型优化技术进行讲解。[大模型推理优化技术-KVCache][大模型推理服务调度优化技术-Continuousbatching]大模型显存优化技术-PagedAttention大模型低显存推理优化-Offload技术大模型优化技术-FlashAttention大模型解码优化-SpeculativeDecoding及
Colmap根据相机内外参数重建稀疏模型失去对象的野指针 colmap 计算机视觉
Colmap根据相机内外参数重建稀疏模型1.创建稀疏模型工作文件夹2.命令行执行稀疏重建2.1提取图像特征点2.2手动导入相机内参2.3特征匹配2.4三角测量官方文档：https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses参考博客：http://www.mamicode.com
分布式之消息中间件 CatalpaFlat 分布式消息队列
分布式之消息中间件目录*1.消息中间件概述*2.消息中间件使用场景*3.消息中间件原理*4.消息中间件传递模式 *4.1点对点模式（PTP） *4.2发布-订阅模型（Pub/Sub）1.消息中间件概述消息中间件利用高效可靠的消息传递机制进行平台无关的数据交流，并基于数据通信来进行分布式系统的集成。通过提供消息传递和消息排队模型，它可以在分布式环境下扩展进程间的通信。消息中间件就是用在消息队
DeepSeek横空出世，AI格局或将改写？倔强的石头_ 热点时事 AIGC 人工智能 AIGC
引言这几天，国产AI大模型DeepSeekR1，一飞冲天，在全球AI圈持续引爆热度，DeepSeekR1已经是世界上最先进的AI模型之一，可与OpenAI的新o1和Meta的LlamaAI模型相媲美。DeepSeek-V3模型发布后，在美国热度持续飙升。美国媒体发布紧急信息，中国的新ai技术，已威胁到美国的领先地位。目录引言DeepSeek是谁编辑发布即震撼：DeepSeekR1正式版技术实力大揭
ip地址简单求二进制转十进制，十进制转二进制 2301_79262155 Ensp 1024程序员节 ip
按照2的n次方按从大到小排列，从左到右排列8个数字2^72^62^52^42^32^22^12^01286432168421十进制转二进制例:将192.168.230.131地址转换为二进制从左到右查找相加等于192的数字，在其位置写1，其余位置写0128643216842111000000①192192=128+64，则在128和64的位置写1，其余为0输出1100000012864321684
线性回归的简单实现 SkaWxp 深度学习深度学习机器学习 mxnet gluon
本文是《动手学深度学习》的笔记文章目录线性回归的简单实现生成随机数据集读取数据初始化模型参数定义模型定义损失函数定义优化算法训练模型线性回归的简洁实现生成数据集读取数据定义模型初始化模型参数定义损失函数定义优化算法训练模型线性回归的简单实现用了mxnet中的自动求导和数组结构frommxnetimportautograd,ndimportrandom生成随机数据集只有这个是用了自己造的数据，因为线
数字孪生下的智慧城市（城市大脑）建设方案——建模层百态老人智慧城市人工智能
要构建城市信息模型（CIM）、建筑信息模型（BIM）及仿真模型，并实现L3级精度的核心区三维建模，需结合多源数据与多层级标准，具体步骤如下：1.CIM建模层构建L3级精度标准定义CIM模型分为7级（CIM1-CIM7），其中CIM3级对应标准模型，需满足以下要求：三维框架表达：包括建筑物、道路、场地、管线等实体的基本结构。内外表面建模：用倾斜摄影、BIM或CAD数据细化建筑内外表面细节。数据源：卫
hive分区和分桶详解 CodeShelby hive 大数据 hive
1、分区表分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。1）分区表基本操作（1）引入分区表（需要根据日期对日志进行管理,通过部门信息模拟）dept_20200401.logdept_2020
Task01：线性回归；Softmax与分类模型、多层感知机恰人陈 pytorch 机器学习深度学习神经网络
一、mxnet相关函数用法mxnet.nd用法对标numpy库(1)nd.concatfrommxnetimportndnd.concat(X,Y,dim=0)nd.concat(X,Y,dim=1)X,Y为两个矩阵nd.concat为连接矩阵，dim表示连接的维度，若原来两个矩阵为（4,3），dim=0就表示新生成矩阵为（8,3）dim=1表示新生成矩阵为（4,6）(2)y+=xy=y+x这样的
【单层神经网络】基于MXNet库简化实现线性回归辰尘_星启神经网络 mxnet 线性回归
写在前面同最开始的两篇文章完整程序及注释'''导入使用的库'''#基本frommxnetimportautograd,nd,gluon#模型、网络frommxnet.gluonimportnnfrommxnetimportinit#学习frommxnet.gluonimportlossasgloss#数据集frommxnet.gluonimportdataasgdata'''生成测试数据集'''#
初入机器学习辰尘_星启机器学习人工智能深度学习 python mxnet
写在前面本专栏专门撰写深度学习相关的内容，防止自己遗忘，也为大家提供一些个人的思考一切仅供参考概念辨析深度学习：本质是建模，将训练得到的模型作为系统的一部分使用侧重于发现样本集中隐含的规律难点是认识并了解模型，合理设置初始模型，要对建模对象有比较深刻的认识依赖大量的准确训练样本强化学习：本质是系统，直接将训练得到的模型视作系统本身（激进的像“端到端”）侧重于最大化当前环境下的奖励，最终目标是寻找环
SaaS架构详解 Rainbow酱架构 sass paas
SaaS架构详解架构图IaaS层定义基础设施即服务（Infrastructure-as-a-Service），指把IT基础设施作为一种服务通过网络对外提供，并根据用户对资源的实际使用量或占用量进行计费的一种服务模式。作用在这种服务模型中，普通用户不用自己构建一个数据中心等硬件设施，而是通过租用的方式，利用Internet从IaaS服务提供商获得计算机基础设施服务，包括服务器、存储和网络等服务。内容
线性回归基础学习 Remoa 人工智能线性回归优化 gluon mxnet loss
线性回归基础学习目录：理论知识样例代码测试参考文献一、理论知识线性回归思维导图NDArray：MXNet中存储和变换数据的主要工具，提供GPU计算和自动求梯度等功能线性回归可以用神经网络图表示，也可以用矢量计算表示在Gluon中，data模块提供了有关数据处理的工具，nn模块定义了大量神经网络的层，loss模块定义了各种损失函数在MXNet的init模块(initializer)提供了模型参数化的
基于Stackelberg博弈的光伏用户群优化定价模型(Matlab代码实现）砌墙_2301 matlab
‍个人主页欢迎来到本博客❤️❤️博主优势：博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。⛳️座右铭：行百里者，半于九十。本文目录如下：目录1概述2运行结果3文献来源4Matlab代码、数据、文章下载1概述文献来源：摘要：在由多主体组成的光伏用户群中,用户间存在光伏电量共享。然而,在现有的分布式光伏上网政策下,用户间的共享水平很低。为了提高用户间光伏电量共享水平,根据用户的用电特性,构建了光伏用
力扣 215. 数组中的第K个最大元素 pursuit_csdn 力扣热题 100 leetcode 算法
https://leetcode.cn/problems/kth-largest-element-in-an-array题目返回数组nums中的第k大数思路桶排，把数据都调整为正数，放置到对应的桶位置，最后遍历桶获得第K大的数代码classSolution{public:intfindKthLargest(vector&nums,intk){intmark[20010];memset(mark,0
React 和 Vue _使用区别 m0_74823490 vue.js react.js javascript
目录一、框架介绍1.Vue2.React?二、框架结构1.创建应用2.框架结构三、使用区别1.单页面组成2.样式3.显示响应式数据4.响应式html标签属性5.控制元素显隐6.条件渲染7.渲染列表react和vue是目前前端比较流行的两大框架，前端程序员应该将两种框架都掌握，本文总结一些基本知识点的使用区别。一、框架介绍1.VueVue是一个框架，也是一个生态。其功能覆盖了大部分前端开发常见的需求
蓝桥杯python基础算法（2-2）——基础算法（C）——递归 X _X Python Lanqiao 算法
四、递归递归出口：这是递归过程中的终止条件，防止函数无限制地调用自身。当前问题如何变成子问题：这是递归函数中最重要的部分，即如何将当前问题逐步简化为更小的子问题。例题-汉诺塔Hanoi塔由n个大小不同的圆盘和三根木柱a,b,c组成。开始时，这n个圆盘由大到小依次套在a柱上，如图所示。要求把a柱上n个圆盘按下述规则移到c柱上：(1)一次只能移一个圆盘；(2)圆盘只能在三个柱上存放；(3)在移动过程中
深度学习：基于MindNLP的RAG应用开发 Landy_Jay 深度学习人工智能
什么是RAG？RAG（Retrieval-AugmentedGeneration，检索增强生成）是一种结合检索（Retrieval）和生成（Generation）的技术，旨在提升大语言模型（LLM）生成内容的准确性、相关性和时效性。基本思想：通过外部知识库动态检索与用户查询相关的信息，并将检索结果作为上下文输入生成模型，辅助生成更可靠的回答。与传统LLM的区别：传统LLM仅依赖预训练参数中的静态知
跟李沐学AI：视频生成类论文精读（Movie Gen、HunyuanVideo） Landy_Jay 人工智能
MovieGen：ACastofMediaFoundationModels简介MovieGen是Meta公司提出的一系列内容生成模型，包含了3.2.1预训练数据MovieGen采用大约100M的视频-文本对和1B的图片-文本对进行预训练。图片-文本对的预训练流程与Meta提出的Emu:Enhancingimagegenerationmodelsusingphotogenicneedlesinaha
MATLAB 实现基于MPA（海洋捕食者算法）进行时间序列预测模型的项目详细实例 nantangyuxi MATLAB matlab 算法人工智能回归 cnn 支持向量机大数据
目录MTFSTLTFSB实她基她MPTFS（海洋捕食者算法）进行时间序列预测模型她项目详细实例...1项目背景介绍...1项目目标她意义...1项目挑战...2项目特点她创新...3项目应用领域...3项目效果预测图程序设计...4项目模型架构...5项目模型描述及代码示例...5项目模型算法流程图...6项目目录结构设计及各模块功能说明...7项目部署她应用...9项目扩展...11项目应该注意
OpenAI紧急加播：ChatGPT上新深度搜索，持续思考30分钟输出1万字，刷榜“人类最后的考试” 量子位
就在开源的DeepSeek-R1被整合进各路AI搜索工具之际，OpenAI临时举行小型发布会。4点27通知，8点开始直播。ChatGPT上新“DeepResearch”，把推理大模型的思考能力用于联网搜索。据介绍，DeepResearch功能可在数十分钟完成人类专家需要几个小时的复杂研究任务。在“人类最后的考试”上，DeepResearch刷新了最高分，比o3-mini高推理设置分数高出一倍。该测
PyTorch生态系统中的连续深度学习：使用Torchdyn实现连续时间神经网络
神经常微分方程（NeuralODEs）是深度学习领域的创新性模型架构，它将神经网络的离散变换扩展为连续时间动力系统。与传统神经网络将层表示为离散变换不同，NeuralODEs将变换过程视为深度（或时间）的连续函数。这种方法为机器学习开创了新的研究方向，尤其在生成模型、时间序列分析和物理信息学习等领域具有重要应用。本文将基于Torchdyn（一个专门用于连续深度学习和平衡模型的PyTorch扩展库）
Mixture of Experts（MoE）学习笔记南七小僧人工智能网站开发医疗器械研发学习笔记人工智能 MoE 大模型
1学习动机第一次了解到MoE（Mixtureofexperts），是在GPT-4模型架构泄漏事件，听说GPT-4的架构是8个GPT-3级别大小的模型以MoE架构（8*220B）组合成一个万亿参数级别的模型。不过在这之后开源社区并没有对MoE架构进行很多的探索，更多的工作还是聚焦在预训练新的大模型，在Llama2或其他模型上做Fine-tune，以及扩展大模型的ContextLength。12月8号
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring