Hello!
How is it possible to run LLaMa models with the great FP8 inference speedup?
Would one need to train a new LLM from scratch or is it possible to convert existing models with the same accuracy?
Thank you very much and thank you for all the awesome work!
Hello!
How is it possible to run LLaMa models with the great FP8 inference speedup?
Would one need to train a new LLM from scratch or is it possible to convert existing models with the same accuracy?
Thank you very much and thank you for all the awesome work!