Quantization?

Does someone know if this can be used for our models. It seems that the coefficients can be turned into integers. Can we do it once, save it and use this new model into the codebase, potentially significantly lowering the size, memory impact, and thus loading speed?

[<img width="723" alt="Screenshot 2024-09-06 at 11 10 05" src="https://github.com/user-attachments/assets/0a2dd937-c374-4d2f-8613-01df9c631106">](https://dockyard.com/blog/2024/08/20/where-are-nx-axon-bumblebee-headed)

Extract from the blog:

```elixir
get_quantized_phi = fn ->
  {:ok, %{params: model_state, model: model} = model_info} =
    Bumblebee.load_model({:hf, "microsoft/Phi-3-mini-4k-instruct"})

  IO.inspect(model_state, label: "Unquantized")
  {quantized_model, quantized_model_state} = Axon.Quantization.quantize(model, model_state)
  IO.inspect(quantized_model_state, label: "Quantized")
  %{model_info | model: quantized_model, params: quantized_model_state}
end

quantized_model = get_quantized_phi.()

:ok
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization? #146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization? #146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions