Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?

Extract from the blog:
get_quantized_phi = fn ->
{:ok, %{params: model_state, model: model} = model_info} =
Bumblebee.load_model({:hf, "microsoft/Phi-3-mini-4k-instruct"})
IO.inspect(model_state, label: "Unquantized")
{quantized_model, quantized_model_state} = Axon.Quantization.quantize(model, model_state)
IO.inspect(quantized_model_state, label: "Quantized")
%{model_info | model: quantized_model, params: quantized_model_state}
end
quantized_model = get_quantized_phi.()
:ok
Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?
Extract from the blog: