@mikiobraun: "No, I mean factorizing the attention matrices themselves and..."

Mikio Braun

@mikiobraun

Replying to @rasbt

No, I mean factorizing the attention matrices themselves and replacing them with low-rank approximations after training.

Jun 29, 2023 · 13:51