此处以 moka-ai/m3e-base 模型下载为例。
众所周知,HuggingFace 仓库托管了诸多训练模型。DeepSeek官方也将完整满血版 DeepSeek-R1:671B 模型镜像托管在此仓库,但是目前国内无法直接从HugingFace下载。
并且,一般为了快速部署,会选择Ollama这类管模型管理工具,类似Docker引擎一样,但是Ollama支持的模型镜像格式(如GGUF)HuggingFace中不一定直接提供,DeepSeek就是如此,因此还需要进行格式转换。
Ollama官网也提供模型下载(国内可用),但是多数情况为动态量化后的模型,如果需要下载满血版DeepSeek,必须从HuggingFace下载,此外,有些特定模型也只发布在HuggingFace,Ollama官方仓库上并没有。
此处将以 moka-ai/m3e-base 模型下载为例,演示从 HuggingFace 下载模型镜像,转换为 Ollama 支持的 GGUF 格式并导入 Ollama 使用。
HugingFace 官网:https://huggingface.co (目前国内无法下载)
替代镜像站:HF-Mirror
Ollama官网:https://ollama.com
1.【模型下载】huggingface 官方提供python库huggingface_hub用以下载模型和克隆仓库,也可以使用 git 或者 wget 等方式下载。
2.【格式转换工具】(用以将HuggingFace的模型格式转换为Ollama支持的格式 - GGUF)
【推荐 llama.cpp】https://github.com/ggml-org/llama.cpp
【可选】llama-cpp-python:https://github.com/abetlen/llama-cpp-python
3.此处操作系统 Ubuntu Server 24.04 LTS。
编译环境
apt install build-essential
mkdir -p /data/llama && cd /data/llama
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
编译(如果只是为了转换模型格式,无需编译,只需要使用其中的python脚本即可)
此处编译参考官方文档,支持CUDA等,此处显卡为 V100 显卡,使用 -DCMAKE_CUDA_ARCHITECTURES 指定CUDA架构。
[root@host /data/llama/llama.cpp]# cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=70 -DCMAKE_CUDA_FLAGS="-Wno-deprecated-gpu-targets"
[root@host /data/llama/llama.cpp]# cmake --build build --config Release -j$(nproc)
[root@host /data/llama/llama.cpp]#
[root@host /data/llama/llama.cpp]# python -m venv .venv
[root@host /data/llama/llama.cpp]# ls -l *.py
-rwxr-xr-x 1 root root 235826 Mar 1 03:54 convert_hf_to_gguf.py
-rwxr-xr-x 1 root root 17613 Mar 1 03:55 convert_hf_to_gguf_update.py
-rwxr-xr-x 1 root root 19106 Mar 1 03:55 convert_llama_ggml_to_gguf.py
-rwxr-xr-x 1 root root 18612 Mar 1 03:55 convert_lora_to_gguf.py
[root@host /data/llama/llama.cpp]# source .venv/bin/activate
(venv) root@host /data/llama/llama.cpp # pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
[root@host ~]# mkdir -p /data/hfo && cd /data/hfo
[root@host /data/hfo ]# python -m venv .venv
(.venv) root@host:/data/hfo# pip install transformers huggingface_hub torch tensorflow datasets tokenizers onnx safetensors \
-i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
此处脚本仅为示例,可以根据需要修改脚本。
更改默认仓库地址 - 环境变量。
方法1:Python下载脚本中直接设置环境变量。
os.environ['HF_ENDPOINT'] = '' 必须放到 from huggingface_hub 之前,让 huggingface_hub 能加载此变量,否则依然会使用HuggingFace官网下载。
此外也可以在命令行定义环境变量 export HF_ENDPOINT="VALUE",但是此处使用的VENV,故而不太方便。
方法2:直接修改 hugingface_hub
.venv/lib/python3.12/site-packages/huggingface_hub/constants.py 文件中 _HF_DEFAULT_ENDPOINT 变量。
_HF_DEFAULT_ENDPOINT = "https://hf-mirror.com"
#_HF_DEFAULT_ENDPOINT = "https://huggingface.co"
(.venv) root@host:/data/hfo# cat download.py
import os
import argparse
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
from huggingface_hub import snapshot_download
parser = argparse.ArgumentParser(
description="Download Hugging Face Models To Local",
usage="python download.py --model-name ",
formatter_class=argparse.RawTextHelpFormatter
)
parser.add_argument(
"--model-name",
type=str,
required=True,
help="Hugging Face Repo Models Name (moka-ai/m3e-base)")
args = parser.parse_args()
model_name = args.model_name
#model_name = "moka-ai/m3e-base"
down_dir = "/data/hfo/models"
org_name = model_name.split("/")[0]
down_dir_final = os.path.join(down_dir, org_name)
os.makedirs(down_dir_final, exist_ok=True)
local_dir = snapshot_download(repo_id=model_name, local_dir=down_dir_final)
print(f"模型已下载到: {local_dir}")
模型下载
(.venv) root@host:/data/hfo# python download.py --help
usage: python download.py --model-name
Download Hugging Face Models To Local
options:
-h, --help show this help message and exit
--model-name MODEL_NAME
Hugging Face Repo Models Name (moka-ai/m3e-base)
(.venv) root@host:/data/hfo# python download.py --model-name moka-ai/m3e-base
python convert_hf_to_gguf.py --outfile /data/hfo/ollama/moka-ai/m3e-base.gguf /data/hfo/models/moka-ai
(venv) root@host/data/llama/llama.cpp# python convert_hf_to_gguf.py --outfile /data/hfo/ollama/moka-ai/m3e-base.gguf /data/hfo/models/moka-ai
INFO:hf-to-gguf:Loading model: moka-ai
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:token_embd_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:token_embd_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:position_embd.weight, torch.float32 --> F32, shape = {768, 512}
INFO:hf-to-gguf:token_types.weight, torch.float32 --> F32, shape = {768, 2}
INFO:hf-to-gguf:token_embd.weight, torch.float32 --> F16, shape = {768, 21128}
INFO:hf-to-gguf:blk.0.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.0.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.0.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.0.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.0.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.0.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.1.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.1.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.1.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.1.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.1.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.1.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.10.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.10.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.10.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.10.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.10.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.10.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.11.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.11.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.11.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.11.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.11.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.11.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.2.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.2.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.2.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.2.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.2.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.2.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.3.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.3.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.3.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.3.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.3.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.3.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.4.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.4.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.4.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.4.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.4.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.4.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.5.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.5.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.5.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.5.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.5.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.5.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.6.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.6.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.6.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.6.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.6.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.6.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.7.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.7.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.7.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.7.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.7.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.7.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.8.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.8.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.8.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.8.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.8.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.8.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:blk.9.attn_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_k.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_q.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.9.attn_v.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.float32 --> F16, shape = {768, 768}
INFO:hf-to-gguf:blk.9.ffn_up.bias, torch.float32 --> F32, shape = {3072}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float32 --> F16, shape = {768, 3072}
INFO:hf-to-gguf:blk.9.layer_output_norm.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.layer_output_norm.weight, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.bias, torch.float32 --> F32, shape = {768}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float32 --> F16, shape = {3072, 768}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 512
INFO:hf-to-gguf:gguf: embedding length = 768
INFO:hf-to-gguf:gguf: feed forward length = 3072
INFO:hf-to-gguf:gguf: head count = 12
INFO:hf-to-gguf:gguf: layer norm epsilon = 1e-12
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type unk to 100
INFO:gguf.vocab:Setting special token type sep to 102
INFO:gguf.vocab:Setting special token type pad to 0
WARNING:gguf.vocab:No handler for special token type cls with id 101 - skipping
INFO:gguf.vocab:Setting special token type mask to 103
INFO:gguf.vocab:Setting special token type bos to 0
INFO:gguf.vocab:Setting special token type eos to 2
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/data/hfo/ollama/moka-ai/m3e-base.gguf: n_tensors = 197, total_size = 204.4M
Writing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 204M/204M [00:00<00:00, 364Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /data/hfo/ollama/moka-ai/m3e-base.gguf
ollama create <自定义模型名称> -f ./Modelfile
ollama create moka-ai/m3e-base -f ./Modelfile
root@host:/data/llama/llama.cpp# cd /data/hfo/ollama/moka-ai/ # 此处为抓换后的目标目录
root@host:/data/hfo/ollama/moka-ai# cat Modelfile
FROM ./m3e-base.gguf
# 省略 可以根据需要配置
root@host:/data/hfo/ollama/moka-ai# cat create.sh
#!/bin/bash
ollama create moka-ai/m3e-base -f ./Modelfile
root@host:/data/hfo/ollama/moka-ai# bash create.sh