AI大模型(LLM)、聊天机器人整理(持续更新)by pickmind

原文:https://blog.pickmind.xyz/article/3c87123f-d283-4a05-8e43-4ee8550cf22f
目录:

文章目录

  • 国内获批大模型
  • 国内大模型深渊图
  • Open-source Large Language Models Leaderboard(国外)
  • lmsys发布的大模型排行榜(国外)
  • **Open LLM Leaderboard (国外)**
  • ****AlpacaEval Leaderboard(国外)****
  • CLUE1.1总排行榜 (国内)
  • ****CLiB中文大模型能力评测榜单 (国内)****
  • **排行榜 - C-Eval (国内)**

国内获批大模型

产品名 公司 是否开源 获批时间 链接
文心一言 百度 2023-08-31 https://wenxin.baidu.com/
豆包|云雀大模型 抖音 2023-08-31 https://www.doubao.com/login
GLM 大模型 智谱 AI 2023-08-31 https://chatglm.cn
紫东太初大模型 中科院 2023-08-31 https://xihe.mindspore.cn
百川大模型 百川智能 2023-08-31 https://baichuan-ai.com/home
日日新大模型 商汤 2023-08-31 https://sensetime.com/cn
ABAB 大模型 MiniMax 2023-08-31 https://api.minimax.chat
书生 上海人工智能实验室 2023-08-31 https://intern-ai.org.cn/
星火大模型 讯飞 2023-08-31 https://xinghuo.xfyun.cn/

国内大模型深渊图

出处:未知。

Open-source Large Language Models Leaderboard(国外)

https://accubits.com/large-language-models-leaderboard/

排行榜随时在变化,请点击链接查看最新排行榜。

Untitled Database

来源。

lmsys发布的大模型排行榜(国外)

来自于UC伯克利

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

排行榜随时在变化,请点击链接查看最新排行榜。

Model ⭐ Arena Elo rating MT-bench (score) MMLU License
https://openai.com/research/gpt-4 1193 8.99 86.4 Proprietary
https://www.anthropic.com/index/introducing-claude 1161 7.9 77 Proprietary
https://www.anthropic.com/index/claude-2 1134 8.06 78.5 Proprietary
https://www.anthropic.com/index/introducing-claude 1130 7.85 73.4 Proprietary
https://openai.com/blog/chatgpt 1118 7.94 70 Proprietary
https://huggingface.co/lmsys/vicuna-33b-v1.3 1097 7.12 59.2 Non-commercial
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf 1060 6.86 63 Llama 2 Community
https://huggingface.co/WizardLM/WizardLM-13B-V1.2 1046 7.2 52.7 Llama 2 Community
https://huggingface.co/lmsys/vicuna-13b-v1.5 1046 6.57 55.8 Llama 2 Community
https://huggingface.co/mosaicml/mpt-30b-chat 1043 6.39 50.4 CC-BY-NC-SA-4.0
https://huggingface.co/timdettmers/guanaco-33b-merged 1036 6.53 57.6 Non-commercial
https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf 1032 Llama 2 Community
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models 1008 6.4 Proprietary
https://huggingface.co/lmsys/vicuna-7b-v1.5 1003 6.17 49.8 Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf 999 6.65 53.6 Llama 2 Community
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf 979 6.27 45.8 Llama 2 Community

Open LLM Leaderboard (国外)

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

排行榜随时在变化,请点击链接查看最新排行榜。

T Model Average ⬆️ ARC HellaSwag MMLU TruthfulQA
https://huggingface.co/uni-tianyan/Uni-TianYan https://huggingface.co/datasets/open-llm-leaderboard/details_uni-tianyan__Uni-TianYan 73.81 72.1 87.4 69.91 65.81
https://huggingface.co/fangloveskari/ORCA_LLaMA_70B_QLoRA https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__ORCA_LLaMA_70B_QLoRA 73.4 72.27 87.74 70.23 63.37
https://huggingface.co/garage-bAInd/Platypus2-70B-instruct https://huggingface.co/datasets/open-llm-leaderboard/details_garage-bAInd__Platypus2-70B-instruct 73.13 71.84 87.94 70.48 62.26
https://huggingface.co/upstage/Llama-2-70b-instruct-v2 https://huggingface.co/datasets/open-llm-leaderboard/details_upstage__Llama-2-70b-instruct-v2 72.95 71.08 87.89 70.58 62.25
https://huggingface.co/fangloveskari/Platypus_QLoRA_LLaMA_70b https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__Platypus_QLoRA_LLaMA_70b 72.94 72.1 87.46 71.02 61.18

AlpacaEval Leaderboard(国外)

来自斯坦福

https://tatsu-lab.github.io/alpaca_eval/

排行榜随时在变化,请点击链接查看最新排行榜。

Model Name Win Rate Length
GPT-4https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json 95.28% 1365
https://ai.meta.com/llama/https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-70b-chat-hf/model_outputs.json 92.66% 1790
Claude 2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2/model_outputs.json 91.36% 1069
https://github.com/imoneoi/openchathttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v3.1-13b/model_outputs.json 89.49% 1484
ChatGPThttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/chatgpt/model_outputs.json 89.37% 827
https://huggingface.co/WizardLM/WizardLM-13B-V1.2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/wizardlm-13b-v1.2/model_outputs.json 89.17% 1635
https://huggingface.co/lmsys/vicuna-33b-v1.3https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-33b-v1.3/model_outputs.json 88.99% 1479
Claudehttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude/model_outputs.json 88.39% 1082
https://arxiv.org/abs/2308.06259https://github.com/tatsu-lab/alpaca_eval/blob/main/results/humpback-llama2-70b/model_outputs.json 87.94% 1822
https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-llama2-70b-v10.1/model_outputs.json 87.67% 1077

CLUE1.1总排行榜 (国内)

https://www.cluebenchmarks.com/rank.html

排行榜随时在变化,请点击链接查看最新排行榜。

排行 模型 研究机构 测评时间 Score1.1 认证 AFQMC TNEWS1.1 IFLYTEK OCNLI_50K WSC1.1 CSL CMRC2018 CHID1.1 C3 1.1
1 玉言 网易伏羲 23-07-31 87.050 待认证 86.45 74.04 67.96 86.33 95.73 97.6 84.25 95.956 95.138
2 HunYuan-NLP 1T 腾讯混元AI大模型团队 22-11-26 86.918 待认证 85.11 70.44 67.54 86.5 96 96.2 87.9 98.848 93.723
3 通义-AliceMind 达摩院NLP 22-11-22 86.685 待认证 84.07 73.47 67.42 85.87 94.33 95.03 86.8 99.208 93.969
4 HUMAN CLUE 19-12-01 86.678 已认证 81 71 80.3 90.3 98 84 92.4 87.10 96.00
5 CHAOS OPPO研究院融智团队 22-11-09 86.552 待认证 83.37 73.22 65.81 86.37 94.6 95.7 87.2 99.217 93.477
6 WenJin Meituan NLP 22-10-20 86.313 待认证 84.49 73.04 64.38 86.23 94.44 95.67 86.25 98.898 93.415
7 OBERT OPPO小布助手 22-11-07 84.783 待认证 81.02 67.75 66 84.53 91.3 99.93 84.05 97.578 90.892
8 HunYuan_nlp 腾讯TEG 22-05-11 84.730 待认证 83.37 64.01 66.58 85.23 92.27 93.87 87.9 98.512 90.831
9 ShenNonG 云小微AI 21-12-01 84.351 待认证 82.57 65.56 64.42 85.97 94.21 91.23 86.5 97.932 90.769
10 ShenZhou QQ浏览器实验室(QQ Browser Lab) 21-09-19 83.873 待认证 80.55 65.36 67.65 86.37 89.08 90.97 87.85 97.923 89.108

CLiB中文大模型能力评测榜单 (国内)

https://github.com/jeinlee1991/chinese-llm-benchmark

排行榜随时在变化,请点击链接查看最新排行榜。

类别 大模型 总分 排名
商用 gpt4 95.8 1
商用 chatgpt-3.5 93.8 2
商用 文心一言v2.2 88.3 3
商用 商汤senseChat 83.2 4
开源 BELLE-Llama2-13B-chat-0.4M 80.0 5
开源 belle-llama-13b-2m 79.2 6
商用 Baichuan-53B 79.0 7
商用 讯飞星火v1.5 77.7 8
商用 360智脑 77.0 9
商用 chatglm官方 76.9 10

排行榜 - C-Eval (国内)

https://cevalbenchmark.com/static/leaderboard_zh.html

排行榜随时在变化,请点击链接查看最新排行榜。

# 模型名称 发布机构 提交时间 平均 平均(Hard) STEM 社会科学 人文科学 其他
0 https://cevalbenchmark.com/static/model_zh.html?method=%E4%BA%91%E5%A4%A9%E4%B9%A6 深圳云天算法技术有限公司 2023/8/31 77.1 55.2 70.4 88 78.6 77.9
1 https://cevalbenchmark.com/static/model_zh.html?method=Galaxy Zuoyebang 2023/8/23 73.7 60.5 71.4 86 71.6 68.8
2 https://cevalbenchmark.com/static/model_zh.html?method=YaYi 中科闻歌 2023/9/4 71.8 60.3 70.6 81.3 71.5 65.8
3 https://cevalbenchmark.com/static/model_zh.html?method=AiLMe-100B%20v3 APUS 2023/9/4 71.6 57.9 68.5 72.3 71.2 77
4 https://cevalbenchmark.com/static/model_zh.html?method=Mengzi 澜舟科技 2023/8/25 71.5 48.8 62.3 87.2 76.8 68.6
5 https://cevalbenchmark.com/static/model_zh.html?method=DFM2.0 AISpeech & SJTU 2023/9/2 71.2 46.1 59.1 80.5 75.5 80.3
6 https://cevalbenchmark.com/static/model_zh.html?method=ChatGLM2 Tsinghua & Zhipu.AI 2023/6/25 71.1 50 64.4 81.6 73.7 71.3
7 https://cevalbenchmark.com/static/model_zh.html?method=UniGPT2.0%EF%BC%88%E5%B1%B1%E6%B5%B7%EF%BC%89 Unisound(云知声) 2023/8/28 70 52.8 65.7 78.7 67 72.9
8 https://cevalbenchmark.com/static/model_zh.html?method=360GPT-S2 360 2023/8/29 69 42 59.4 82 70.6 72.9
9 https://cevalbenchmark.com/static/model_zh.html?method=InternLM-123B Shanghai AI Lab & SenseTime 2023/8/22 68.8 50 63.5 81.4 72.7 63
10 https://cevalbenchmark.com/static/model_zh.html?method=GPT-4* OpenAI 2023/5/15 68.7 54.9 67.1 77.6 64.5 67.8

你可能感兴趣的:(人工智能,机器人,LLM,AI)