How to run LLaMa inference?

Hello!
How is it possible to run LLaMa models with the great FP8 inference speedup?

Would one need to train a new LLM from scratch or is it possible to convert existing models with the same accuracy? 


Thank you very much and thank you for all the awesome work!