Mikio Braun
@mikiobraun
Part of a thread
Also, bare stochastic gradient descent is not trivial to get to perform well. There's a reason why we have so many variants of this.
2
Likes