Skip to content

Alab-NII/complex_ques_decomposition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

This is the GitHub repo for the paper: [Identifying Where Large Language Models Struggle in Answering Complex Questions]

Reproduction of Results

Table 1: Automatic and human scores (green) in the decomposition stage

python3 eval_decompose_automatic.py
python3 eval_decompose_human.py

Table 2: LLM-as-a-Judge accuracy (based on Llama 3.370B) in the sub-problem-solving stage

python3 eval_sub_problem_get_scores.py

Table 3: LLM-as-a-Judge accuracy (Llama 3.3 70B) for full-QA performance using zero-shot-CoT

python3 eval_full_get_scores.py

Running process

Stage 1: Decomposition

python3 run_s1_decompose.py

Stage 2: Subproblem Solving

python3 run_s2_ans_sub.py

Stage 2: Evaluation

python3 eval_sub_problem_llm.py

Full-QA

python3 run_full_qa.py

Full-QA: Evaluation

python3 eval_full_qa_llm.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages