This feature originated in GPT4all.
You may wish to consider adding a dependency on the nvidia-cuda-runtime package for the CUDA version of xllamacpp.
Once this is done, you can integrate a modified version of the code here to automatically set the LD_LIBRARY_PATH or PATH environment variable, as appropriate, so that the runtime files are available for inference. This will hopefully allow xllamacpp to be used without needing to separately configure a CUDA runtime package.
AMD is currently building a version of ROCm that can be installed on both Windows and Linux using pip. This means that an identical feature will soon be possible for ROCm users, e.g. on Strix Halo machines.
This feature originated in GPT4all.
You may wish to consider adding a dependency on the nvidia-cuda-runtime package for the CUDA version of xllamacpp.
Once this is done, you can integrate a modified version of the code here to automatically set the LD_LIBRARY_PATH or PATH environment variable, as appropriate, so that the runtime files are available for inference. This will hopefully allow xllamacpp to be used without needing to separately configure a CUDA runtime package.
AMD is currently building a version of ROCm that can be installed on both Windows and Linux using pip. This means that an identical feature will soon be possible for ROCm users, e.g. on Strix Halo machines.