ChatGLM3微调遇到的坑

问题:微调的各种问题 · THUDM/ChatGLM3 · Discussion #253 · GitHubshi

Traceback (most recent call last):
  File "/opt/projects/chatglm3-test/scripts/finetune.py", line 171, in 
    main()
  File "/opt/projects/chatglm3-test/scripts/finetune.py", line 137, in main
    print(train_dataset[0]['input_ids'])
  File "/opt/projects/chatglm3-test/scripts/preprocess_utils.py", line 127, in __getitem__
    a_ids = self.tokenizer.encode(text=data_item['prompt'], add_special_tokens=True, truncation=True,
KeyError: 'prompt'

实际上看看preprocess_utils.py对应行数的代码就知道了,单纯的对话模型,数据格式不是按照官方给定的如下格式:

```json
[
  {
    "conversations": [
      {
        "role": "system",
        "content": ""
      },
      {
        "role": "user",
        "content": ""
      },
      {
        "role": "assistant",
        "content": ""
      }, 
       // ... Muti Turn
      {
        "role": "user",
        "content": ""
      },
      {
        "role": "assistant",
        "content": ""
      }
    ]
  }
  // ...
]
``` 

 “prompt”键名并不存在,最新的官方微调脚本已于,改天尝试一下。

你可能感兴趣的:(深度学习,人工智能)