Back to archive

Thread

5 tweets

1
Yeah, matrix operations are very interesting from a computational point of view. With LLMs it‘s also only matrix-vector ops, where memory and computation are both O(n^2), but for matrix-matrix-mult, things are much more interesting… twitter.com/karpathy/statu…
2
Mat-Mat-Mult is O(n^3) if done naively on O(n^2) data that already hints there is some potential for data re-use. Now the funny thing is that even if the matrices are too big for the cache, you can still re-order the computation to make it fit into the cache.
3
That way, even if your matrices are in the GBs, you can fully saturate whatever computing pipeline you have by working on peaces that fit into the cache. Amazing! Footnotes next =>