原文:https://blog.pickmind.xyz/article/3c87123f-d283-4a05-8e43-4ee8550cf22f
目录:
产品名 | 公司 | 是否开源 | 获批时间 | 链接 |
---|---|---|---|---|
文心一言 | 百度 | 否 | 2023-08-31 | https://wenxin.baidu.com/ |
豆包|云雀大模型 | 抖音 | 否 | 2023-08-31 | https://www.doubao.com/login |
GLM 大模型 | 智谱 AI | 是 | 2023-08-31 | https://chatglm.cn |
紫东太初大模型 | 中科院 | 否 | 2023-08-31 | https://xihe.mindspore.cn |
百川大模型 | 百川智能 | 是 | 2023-08-31 | https://baichuan-ai.com/home |
日日新大模型 | 商汤 | 否 | 2023-08-31 | https://sensetime.com/cn |
ABAB 大模型 | MiniMax | 否 | 2023-08-31 | https://api.minimax.chat |
书生 | 上海人工智能实验室 | 否 | 2023-08-31 | https://intern-ai.org.cn/ |
星火大模型 | 讯飞 | 否 | 2023-08-31 | https://xinghuo.xfyun.cn/ |
出处:未知。
https://accubits.com/large-language-models-leaderboard/
排行榜随时在变化,请点击链接查看最新排行榜。
Untitled Database
来源。
来自于UC伯克利
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
排行榜随时在变化,请点击链接查看最新排行榜。
Model | ⭐ Arena Elo rating | MT-bench (score) | MMLU | License |
---|---|---|---|---|
https://openai.com/research/gpt-4 | 1193 | 8.99 | 86.4 | Proprietary |
https://www.anthropic.com/index/introducing-claude | 1161 | 7.9 | 77 | Proprietary |
https://www.anthropic.com/index/claude-2 | 1134 | 8.06 | 78.5 | Proprietary |
https://www.anthropic.com/index/introducing-claude | 1130 | 7.85 | 73.4 | Proprietary |
https://openai.com/blog/chatgpt | 1118 | 7.94 | 70 | Proprietary |
https://huggingface.co/lmsys/vicuna-33b-v1.3 | 1097 | 7.12 | 59.2 | Non-commercial |
https://huggingface.co/meta-llama/Llama-2-70b-chat-hf | 1060 | 6.86 | 63 | Llama 2 Community |
https://huggingface.co/WizardLM/WizardLM-13B-V1.2 | 1046 | 7.2 | 52.7 | Llama 2 Community |
https://huggingface.co/lmsys/vicuna-13b-v1.5 | 1046 | 6.57 | 55.8 | Llama 2 Community |
https://huggingface.co/mosaicml/mpt-30b-chat | 1043 | 6.39 | 50.4 | CC-BY-NC-SA-4.0 |
https://huggingface.co/timdettmers/guanaco-33b-merged | 1036 | 6.53 | 57.6 | Non-commercial |
https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf | 1032 | Llama 2 Community | ||
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models | 1008 | 6.4 | Proprietary | |
https://huggingface.co/lmsys/vicuna-7b-v1.5 | 1003 | 6.17 | 49.8 | Llama 2 Community |
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf | 999 | 6.65 | 53.6 | Llama 2 Community |
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf | 979 | 6.27 | 45.8 | Llama 2 Community |
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
排行榜随时在变化,请点击链接查看最新排行榜。
T | Model | Average ⬆️ | ARC | HellaSwag | MMLU | TruthfulQA |
---|---|---|---|---|---|---|
https://huggingface.co/uni-tianyan/Uni-TianYan https://huggingface.co/datasets/open-llm-leaderboard/details_uni-tianyan__Uni-TianYan | 73.81 | 72.1 | 87.4 | 69.91 | 65.81 | |
https://huggingface.co/fangloveskari/ORCA_LLaMA_70B_QLoRA https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__ORCA_LLaMA_70B_QLoRA | 73.4 | 72.27 | 87.74 | 70.23 | 63.37 | |
https://huggingface.co/garage-bAInd/Platypus2-70B-instruct https://huggingface.co/datasets/open-llm-leaderboard/details_garage-bAInd__Platypus2-70B-instruct | 73.13 | 71.84 | 87.94 | 70.48 | 62.26 | |
https://huggingface.co/upstage/Llama-2-70b-instruct-v2 https://huggingface.co/datasets/open-llm-leaderboard/details_upstage__Llama-2-70b-instruct-v2 | 72.95 | 71.08 | 87.89 | 70.58 | 62.25 | |
https://huggingface.co/fangloveskari/Platypus_QLoRA_LLaMA_70b https://huggingface.co/datasets/open-llm-leaderboard/details_fangloveskari__Platypus_QLoRA_LLaMA_70b | 72.94 | 72.1 | 87.46 | 71.02 | 61.18 |
来自斯坦福
https://tatsu-lab.github.io/alpaca_eval/
排行榜随时在变化,请点击链接查看最新排行榜。
Model Name | Win Rate | Length |
---|---|---|
GPT-4https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json | 95.28% | 1365 |
https://ai.meta.com/llama/https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-70b-chat-hf/model_outputs.json | 92.66% | 1790 |
Claude 2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2/model_outputs.json | 91.36% | 1069 |
https://github.com/imoneoi/openchathttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v3.1-13b/model_outputs.json | 89.49% | 1484 |
ChatGPThttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/chatgpt/model_outputs.json | 89.37% | 827 |
https://huggingface.co/WizardLM/WizardLM-13B-V1.2https://github.com/tatsu-lab/alpaca_eval/blob/main/results/wizardlm-13b-v1.2/model_outputs.json | 89.17% | 1635 |
https://huggingface.co/lmsys/vicuna-33b-v1.3https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-33b-v1.3/model_outputs.json | 88.99% | 1479 |
Claudehttps://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude/model_outputs.json | 88.39% | 1082 |
https://arxiv.org/abs/2308.06259https://github.com/tatsu-lab/alpaca_eval/blob/main/results/humpback-llama2-70b/model_outputs.json | 87.94% | 1822 |
https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-llama2-70b-v10.1/model_outputs.json | 87.67% | 1077 |
https://www.cluebenchmarks.com/rank.html
排行榜随时在变化,请点击链接查看最新排行榜。
排行 | 模型 | 研究机构 | 测评时间 | Score1.1 | 认证 | AFQMC | TNEWS1.1 | IFLYTEK | OCNLI_50K | WSC1.1 | CSL | CMRC2018 | CHID1.1 | C3 1.1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 玉言 | 网易伏羲 | 23-07-31 | 87.050 | 待认证 | 86.45 | 74.04 | 67.96 | 86.33 | 95.73 | 97.6 | 84.25 | 95.956 | 95.138 |
2 | HunYuan-NLP 1T | 腾讯混元AI大模型团队 | 22-11-26 | 86.918 | 待认证 | 85.11 | 70.44 | 67.54 | 86.5 | 96 | 96.2 | 87.9 | 98.848 | 93.723 |
3 | 通义-AliceMind | 达摩院NLP | 22-11-22 | 86.685 | 待认证 | 84.07 | 73.47 | 67.42 | 85.87 | 94.33 | 95.03 | 86.8 | 99.208 | 93.969 |
4 | HUMAN | CLUE | 19-12-01 | 86.678 | 已认证 | 81 | 71 | 80.3 | 90.3 | 98 | 84 | 92.4 | 87.10 | 96.00 |
5 | CHAOS | OPPO研究院融智团队 | 22-11-09 | 86.552 | 待认证 | 83.37 | 73.22 | 65.81 | 86.37 | 94.6 | 95.7 | 87.2 | 99.217 | 93.477 |
6 | WenJin | Meituan NLP | 22-10-20 | 86.313 | 待认证 | 84.49 | 73.04 | 64.38 | 86.23 | 94.44 | 95.67 | 86.25 | 98.898 | 93.415 |
7 | OBERT | OPPO小布助手 | 22-11-07 | 84.783 | 待认证 | 81.02 | 67.75 | 66 | 84.53 | 91.3 | 99.93 | 84.05 | 97.578 | 90.892 |
8 | HunYuan_nlp | 腾讯TEG | 22-05-11 | 84.730 | 待认证 | 83.37 | 64.01 | 66.58 | 85.23 | 92.27 | 93.87 | 87.9 | 98.512 | 90.831 |
9 | ShenNonG | 云小微AI | 21-12-01 | 84.351 | 待认证 | 82.57 | 65.56 | 64.42 | 85.97 | 94.21 | 91.23 | 86.5 | 97.932 | 90.769 |
10 | ShenZhou | QQ浏览器实验室(QQ Browser Lab) | 21-09-19 | 83.873 | 待认证 | 80.55 | 65.36 | 67.65 | 86.37 | 89.08 | 90.97 | 87.85 | 97.923 | 89.108 |
https://github.com/jeinlee1991/chinese-llm-benchmark
排行榜随时在变化,请点击链接查看最新排行榜。
类别 | 大模型 | 总分 | 排名 |
---|---|---|---|
商用 | gpt4 | 95.8 | 1 |
商用 | chatgpt-3.5 | 93.8 | 2 |
商用 | 文心一言v2.2 | 88.3 | 3 |
商用 | 商汤senseChat | 83.2 | 4 |
开源 | BELLE-Llama2-13B-chat-0.4M | 80.0 | 5 |
开源 | belle-llama-13b-2m | 79.2 | 6 |
商用 | Baichuan-53B | 79.0 | 7 |
商用 | 讯飞星火v1.5 | 77.7 | 8 |
商用 | 360智脑 | 77.0 | 9 |
商用 | chatglm官方 | 76.9 | 10 |
https://cevalbenchmark.com/static/leaderboard_zh.html
排行榜随时在变化,请点击链接查看最新排行榜。
# | 模型名称 | 发布机构 | 提交时间 | 平均 | 平均(Hard) | STEM | 社会科学 | 人文科学 | 其他 |
---|---|---|---|---|---|---|---|---|---|
0 | https://cevalbenchmark.com/static/model_zh.html?method=%E4%BA%91%E5%A4%A9%E4%B9%A6 | 深圳云天算法技术有限公司 | 2023/8/31 | 77.1 | 55.2 | 70.4 | 88 | 78.6 | 77.9 |
1 | https://cevalbenchmark.com/static/model_zh.html?method=Galaxy | Zuoyebang | 2023/8/23 | 73.7 | 60.5 | 71.4 | 86 | 71.6 | 68.8 |
2 | https://cevalbenchmark.com/static/model_zh.html?method=YaYi | 中科闻歌 | 2023/9/4 | 71.8 | 60.3 | 70.6 | 81.3 | 71.5 | 65.8 |
3 | https://cevalbenchmark.com/static/model_zh.html?method=AiLMe-100B%20v3 | APUS | 2023/9/4 | 71.6 | 57.9 | 68.5 | 72.3 | 71.2 | 77 |
4 | https://cevalbenchmark.com/static/model_zh.html?method=Mengzi | 澜舟科技 | 2023/8/25 | 71.5 | 48.8 | 62.3 | 87.2 | 76.8 | 68.6 |
5 | https://cevalbenchmark.com/static/model_zh.html?method=DFM2.0 | AISpeech & SJTU | 2023/9/2 | 71.2 | 46.1 | 59.1 | 80.5 | 75.5 | 80.3 |
6 | https://cevalbenchmark.com/static/model_zh.html?method=ChatGLM2 | Tsinghua & Zhipu.AI | 2023/6/25 | 71.1 | 50 | 64.4 | 81.6 | 73.7 | 71.3 |
7 | https://cevalbenchmark.com/static/model_zh.html?method=UniGPT2.0%EF%BC%88%E5%B1%B1%E6%B5%B7%EF%BC%89 | Unisound(云知声) | 2023/8/28 | 70 | 52.8 | 65.7 | 78.7 | 67 | 72.9 |
8 | https://cevalbenchmark.com/static/model_zh.html?method=360GPT-S2 | 360 | 2023/8/29 | 69 | 42 | 59.4 | 82 | 70.6 | 72.9 |
9 | https://cevalbenchmark.com/static/model_zh.html?method=InternLM-123B | Shanghai AI Lab & SenseTime | 2023/8/22 | 68.8 | 50 | 63.5 | 81.4 | 72.7 | 63 |
10 | https://cevalbenchmark.com/static/model_zh.html?method=GPT-4* | OpenAI | 2023/5/15 | 68.7 | 54.9 | 67.1 | 77.6 | 64.5 | 67.8 |