Skip to content

Latest commit

 

History

History
131 lines (110 loc) · 7.42 KB

File metadata and controls

131 lines (110 loc) · 7.42 KB
layout distill
title Conclusions and Further Reading
description Thank you for reading! Here we'll include a few more references for further study.
date 2025-02-04
future true
htmlwidgets true
hidden false
section_number 11
previous_section_url ../jax-stuff
previous_section_name Part 10: JAX
next_section_url ../gpus
next_section_name Part 12: GPUs
giscus_comments true
authors
name url affiliations
Jacob Austin
name
Google DeepMind
name url
Sholto Douglas
name url
Roy Frostig
name url
Anselm Levskaya
name url
Charlie Chen
name url
Sharad Vikram
name url
Federico Lebron
name url
Peter Choy
name url
Vinay Ramasesh
name url
Albert Webson
name url
Reiner Pope<sup>*</sup>
toc
name
Acknowledgments
name
Further Reading
name
Feedback
_styles .fake-img { background: #bbb; border: 1px solid rgba(0, 0, 0, 0.1); box-shadow: 0 0px 4px rgba(0, 0, 0, 0.1); margin-bottom: 12px; } .fake-img p { font-family: monospace; color: white; text-align: left; margin: 12px 0; text-align: center; font-size: 16px; } .algorithm { padding: 10px; margin-top: 5px; margin-bottom: 5px; border-style: dashed; background-color: #fffaf2; } .algorithm li { margin-bottom: 0px; }

Thank you for reading the whole thing and congratulations on making it all the way to the end. Before we conclude, a few acknowledgments:

Acknowledgments

This document represents a significant collective investment from many people at Google DeepMind, who we'd like to briefly acknowledge!

  • James Bradbury, Reiner Pope, and Blake Hechtman originally derived many of the ideas in this manuscript, and were early to understanding the systems view of the Transformer.
  • Sholto Douglas wrote the first version of this doc and is responsible for kicking off the project. He is more than anyone responsible for the overall narrative of this doc.
  • Jacob Austin led the work of transforming this first version from rough notes into a more polished and comprehensive artifact. He did much of the work of editing, formatting, and releasing this document, and coordinated contributions from other authors.
  • Most of the figures and animations were made by Anselm Levskaya and Charlie Chen.
  • Charlie Chen wrote the inference section and drew many of the inference figures.
  • Roy Frostig helped with publication, editing, and many other steps of the journey.

We'd also like to thank many others who gave critical feedback throughout the process, in particular Zak Stone, Nikhil Sethi, Caitlin Stanton, Alek Dimitriev, Sridhar Lakshmanamurthy, Albert Magyar, Diwakar Gupta, Jeff Dean, Corry Wang, Matt Johnson, Peter Hawkins, and many others. Thanks to Ruiqi Gao for help with the HTML formatting.

Thank you all!

Before you go, you might also enjoy reading the new [Part 12](../gpus) on NVIDIA GPUs!

Further Reading

There is a bunch of related writing, including the following:

There remains a lot of room for comprehensive writing in this area, so we hope this manuscript encourages more of it! We also believe that this is a fruitful area to study and research. In many cases, it can be done even without having many hardware accelerators on hand.

Feedback

Please leave comments or questions so that we can improve this further. You can reach our corresponding author, Jacob Austin, at jacobaustin123 [at] gmail [dot] com, or suggest edits by posting issues, pull requests, or discussions on GitHub.