1
Yeah it‘s wholly impractical. I was reading papers (yeah… I know) and stumbled upon linformer… it uses low rank matrix approximation on the attention matrices… whatever happened to that?
ML and AI expert
2 tweets