fix: cache warmup RuntimeError on mps#46239
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| # Skip warmup on MPS: there is a limit of the maximum size a single buffer can have on MPS, | ||
| # which from testing seems to be about 2/3 of the total device memory (tested on apple silicon). | ||
| # This causes the warmup function to return a `RuntimeError: Invalid buffer size: XX.XX GiB`. | ||
| # NOTE: not tested on intel macs | ||
| continue |
There was a problem hiding this comment.
please provide a repro, it does not fail for me AFAIK! Loading a big mixtral to max capa!
Also no skip maybe reduce allocate + bench speed loss please
There was a problem hiding this comment.
We won't need pre-allocation with safetensors on mps once safetensors/safetensors#767 is merged. We allocate the mtlbuffers, fill them with pread and then hand them 0-copy to torch with dlpack. So as we don't go through torch's allocation stack, it's going to become unnecessary, at least for mps.
As we discussed by message, you in fact cannot allocate a buffer of size over 58gb out of your 96 available. It'd be interesting to see what total_byte_count's value is when you load your Mixtral model.
There was a problem hiding this comment.
Added more details in the PR desc
There was a problem hiding this comment.
The only reason I can see your load not crash is because you don't set device_map="mps", which skips the cache warmup function altogether.
1ae95d9 to
3d7170b
Compare
This comment was marked as low quality.
This comment was marked as low quality.
ArthurZucker
left a comment
There was a problem hiding this comment.
Ty confirmed on MPS its not slower anyways!
Skip warmup on MPS: there is a limit of the maximum size a single buffer can have on MPS, which from testing seems to be about 2/3 of the total device memory (tested on apple silicon). This causes the warmup function to return a
RuntimeError: Invalid buffer size: XX.XX GiB.NOTE: not tested on intel macs, but I assume the same issue arises since it's also an
mpsbackend.EDIT: from this old thread, it appears the mtlbuffer limit on intel macs is capped to a hardcoded value, so this PR is even more so needed on that platform.
Running:
yields:
You can run on your machine to get info of the hard limits of your system:
Mine shows:
which is coherent with the
RuntimeErrorI get in thecaching_allocator_warmupfn. Note that I have the space on my machine to load such a model, I'm at 36gb, and even if were to go a little above, I know I can fit a large amount of data in swap anyways (~36gb before my os OOMs the process).