Back to 2023
@mikiobraun
Mikio Braun
@mikiobraun
Replying to @rasbt
No, I mean factorizing the attention matrices themselves and replacing them with low-rank approximations after training.