论文翻译:NeurIPS-2024.Zhehao Zhang.DARG: Dynamic Evaluation of Large Language Models via Adaptive
DARG:DynamicEvaluationofLargeLanguageModelsviaAdaptiveReasoningGraphhttps://openreview.net/pdf?id=5IFeCNA7zR文章目录DARG:通过自适应推理图动态评估大型语言模型摘要1引言2方法:DARG2.1推理图2.2推理图构建2.3推理图扰动2.4测试用例生成3实验3.1数学推理:GSM8K3.2社会