Mikio L. Braun

1

Yeah, matrix operations are very interesting from a computational point of view. With LLMs it‘s also only matrix-vector ops, where memory and computation are both O(n^2), but for matrix-matrix-mult, things are much more interesting… twitter.com/karpathy/statu…

Aug 16, 2023 · 05:15 3

2

Mat-Mat-Mult is O(n^3) if done naively on O(n^2) data that already hints there is some potential for data re-use. Now the funny thing is that even if the matrices are too big for the cache, you can still re-order the computation to make it fit into the cache.

Aug 16, 2023 · 05:36

3

That way, even if your matrices are in the GBs, you can fully saturate whatever computing pipeline you have by working on peaces that fit into the cache. Amazing! Footnotes next =>

Aug 16, 2023 · 05:37

4

I know that asymptotically, the exponent is more something like 2.37... en.wikipedia.org/wiki/Computati… The article mentions issues with numerical stability, and the algorithm is recursive in that it reduces to smaller matrices. If the size is below a threshold, you'd use the normal alg.

Computational complexity of matrix multiplication - Wikipedia

en.wikipedia.org

Aug 16, 2023 · 05:39

5

This is a really fascinating topic. I remember that GotoBLAS, an implementation in handwritten assembly optimized for cache misses in the Translation-Lookaside-Buffer, that looks up addresses in virtual memory. There's this and much more in cs.utexas.edu/users/pingali/…

www.cs.utexas.edu · No preview available

Aug 16, 2023 · 05:44

Thread