Back to archive

Thread

2 tweets

1
Yeah it‘s wholly impractical. I was reading papers (yeah… I know) and stumbled upon linformer… it uses low rank matrix approximation on the attention matrices… whatever happened to that?
2
@paul_rietschka I stumbled upon that because LoRA kept talking how very low rank changes can make significant improvements and so on (not surprised about that tbh) So yeah, always seeing the best in people I suspect they kept increasing model sizes for marketing purposes.