基于autodl与llama-factory微调llama3(二)

一、微调数据集构造

 基于新的python脚本:

import os
import json

# Update the folder path to the correct location
folder_path = r'pico_corpus_brat_annotated_files/pico_corpus_brat_annotated_files'

# Read file content
def read_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

# Parse annotation file
def parse_ann_file(file_path):
    annotations = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            parts = line.strip().split('\t')
            if len(parts) == 3:
                tag, annotation_info, annotated_text = parts
                annotation_info = annotation_info.split(' ')
                annotation_type = annotation_info[0]
                start_pos = int(annotation_info[1])
                end_pos = int(annotation_info[2])
           

你可能感兴趣的:(llama)