【机器翻译】【mRASP】跑通mRASP代码(二):train、finetune

源码训练时,使用inotifywait异步监控一边训练一边生成,见mRASP-master/train/misc/monitor.sh

生成的调用命令写在mRASP-master/train/scripts/common_scripts.sh

command=${gpu_cmd}"fairseq-generate ${test_path} \
    -s ${src} \
    -t ${tgt} \
    --skip-invalid-size-inputs-valid-test \
    --path ${ckpts} \
    --batch-size ${eval_batch_size} \
    --beam ${beam_size} \
    --nbest 1 \
    ${cpu_cmd} \
    --lenpen ${length_penalty} \
    --max-len-a ${max_len_a} \
    --max-len-b ${max_len_b} \
    --max-source-positions ${max_source_positions} \
    --max-target-positions ${max_target_positions} | grep -E '[S|H|P|T]-[0-9]+' > ${final_res_file}
    "

fairseq-generate没有指定task,将默认为translation,但实际上mRASP要求输入给解码器的第一个token应该是语言标签,不然就会无法控制生成什么目标语言

参考mRASP2给出的生成命令,mRASP2-master/eval.sh

command=${gpu_cmd}"fairseq-generate ${test_path} \
    --user-dir ${repo_dir}/mcolt \
    -s ${src} \
    -t ${tgt} \
    --skip-invalid-size-inputs-valid-test \
    --path ${ckpts} \
    --max-tokens 1024 \
    --task translation_w_langtok \
    ${options} \
    --lang-prefix-tok ${lang_token} \
    --max-source-positions ${max_source_positions} \
    --max-target-positions ${max_target_positions} \
    --nbest 1 | grep -E '[S|H|P|T]-[0-9]+' > ${final_res_file}
    "

需要添加几个关键参数:

--user-dir mRASP-master/user_dir # mRASP的translation_w_langtok文件放在该目录下

--task translation_w_langtok

--lang-prefix-tok ${lang_token}

你可能感兴趣的:(机器翻译,人工智能,自然语言处理)