Mikio Braun
@mikiobraun
Replying to @rasbt
No, I mean factorizing the attention matrices themselves and replacing them with low-rank approximations after training.
ML and AI expert