I was trying to use Mistral-7B as the reward model but i kept getting error "ValueError: Found modules on cpu/disk. Using Exllama or Exllamav2 backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object".
All I did was changed the name of model and tokenizer in the "configs/ppo_flan_sentiments.yml" file. and changed the 'init' method in the class ZeroShotRewardModel according to my model.
Can someone tell what can i do to resolve the error?


I am passing the reward model and config to trlx.train() in the main method in a similar way as done in ppo_flan_sentiments.py.
I was trying to use Mistral-7B as the reward model but i kept getting error "ValueError: Found modules on cpu/disk. Using Exllama or Exllamav2 backend requires all the modules to be on GPU.You can deactivate exllama backend by setting
disable_exllama=Truein the quantization config object".All I did was changed the name of model and tokenizer in the "configs/ppo_flan_sentiments.yml" file. and changed the 'init' method in the class ZeroShotRewardModel according to my model.
Can someone tell what can i do to resolve the error?
I am passing the reward model and config to trlx.train() in the main method in a similar way as done in ppo_flan_sentiments.py.