Skip to content

Change Llama2 from the Turbine implementation to the Sharktank one#2170

Draft
gpetters-amd wants to merge 1 commit intonod-ai:mainfrom
gpetters-amd:sharktank
Draft

Change Llama2 from the Turbine implementation to the Sharktank one#2170
gpetters-amd wants to merge 1 commit intonod-ai:mainfrom
gpetters-amd:sharktank

Conversation

@gpetters-amd
Copy link
Copy Markdown
Contributor

There are still two outstanding issues I'd like some comments on, but otherwise this should be basically done.

huggingface_hub.snapshot_download(
repo_id=self.hf_model_name, cache_dir=cache_dir
)
# TODO: Convert to gguf, delete cache
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that sharktank recommends for generating the .gguf file is to use a CLI tool from llama.cpp. Is that still the best way to extract that, or do we have a way to do it using sharktank?

model = PagedLlamaModelV1(dataset.root_theta, llama_config)

fxb = FxProgramsBuilder(model)
self.torch_ir = export(fxb)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but this is producing an empty module. Any idea what I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant