Back to 2023
@mikiobraun
Mikio Braun
@mikiobraun
Part of a thread
According to this medium.com/codex/gpt-4-wi…, GPT-3 was trained on 1024 GPUs for 34 days, even if those were 4090s, that'd be 2.2e23 floating ops, so yeah, still off by a factor of 500, but not much!
medium.com · Access denied