Skip to content

Using the cold start dataset provided by Ares for SFT training, the MathVision evaluation score is only 29. #2

@gaozilve-max

Description

@gaozilve-max

Thank you for your contribution; this is an outstanding piece of work. I encountered a slight issue during the cold start reproduction phase and would like to seek assistance.
My training configuration is as follows:
`### model
model_name_or_path: ./Qwen/Qwen2.5-VL-7B-Instruct
image_max_pixels: 2007040
video_max_pixels: 16384
trust_remote_code: true

method

stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true
freeze_multi_modal_projector: true
freeze_language_model: false
deepspeed: ./LLaMA-Factory/examples/deepspeed/ds_z3_config.json

dataset

dataset_dir: ./huggingface.co/datasets/ares_sft
dataset: filter_data_final
template: qwen2_vl
cutoff_len: 32768
max_samples: null
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

output

output_dir: ./checkpoints/llama_factory/ares_coldstart_big_image_filtered
logging_steps: 10
save_steps: 10
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-5
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: true
gradient_checkpointing: true
ddp_timeout: 180000000
resume_from_checkpoint: null

special_tokens

add_tokens: ",,,"
skip_special_tokens: false
resize_vocab: true`

My training logs are as follows:

Image

When I evaluated on the MathVision dataset, I used GPT-4o-mini for assessment and enabled the prefetch function. However, the accuracy rate was only 0.29, which shows a significant gap compared to the results in the original paper. I would like to ask if there might be any issues in my reproduction process.

I would be very grateful if you could help me resolve this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions