@mikiobraun: "According to this https://t.co/RqlL9mWjSg, GPT-3 was trained..."

Back to 2023

Mikio Braun

@mikiobraun

Part of a thread

According to this medium.com/codex/gpt-4-wi…, GPT-3 was trained on 1024 GPUs for 34 days, even if those were 4090s, that'd be 2.2e23 floating ops, so yeah, still off by a factor of 500, but not much!

medium.com · Access denied

Oct 31, 2023 · 07:26

Full thread (4 tweets) →

Trying to figure out how long you need to train on an RTX 3090 to get 10^26 floating point ops. 🤔

Oct 31, 2023 · 06:43

I still haven't made the math but isn't any cloud data center immediately above the threshold? Does ...

Oct 31, 2023 · 07:19

OK, let's do the math. According to https://t.co/Kk5fIItqEn, the RTX 4090 has 73 TFLOPS (single prec...

Oct 31, 2023 · 07:24

According to this https://t.co/RqlL9mWjSg, GPT-3 was trained on 1024 GPUs for 34 days, even if those...

Oct 31, 2023 · 07:26