Back to archive

Thread

17 tweets

2
Ok, they are just getting started, but already quite laser focussed on automation and toolchain integration. IMHO that‘s just a part of making ML work.
4
To expand a bit on this, IMHO automating model training and deploying it is something you‘ll definitely want to make easy and quick EVENTUALLY. But your first steps should focus on defining the problem, gather some data, set up evaluation and then iteratively find a candidate.
5
All of this is highly manual and if you spend too much time on automation up front, you‘re spending time on something you might not even need because the data isn‘t good enough, there are no algorithms, etc.
6
Also, the use cases where you actually want to deploy a model behind an API in a scalable fashion are quite specific. Very often, computing predictions in batch and storing them in a database is good enough. No need for k8s.
7
Good overview by @ketanumare over the ML project lifecycle to highlight that you‘ll need automation because ML projects are highly iterative so you‘re not doing these steps once but you‘ll run them over and over again.
8
In my experience, things like data availability, quality, and reproducibility are not purely technological or can be solved with the right tool, but have organizational and cultural aspects as well.
10
Alright, finally we come to notebooks vs production pipelines. To me, one of the biggest challenges yet is how to go back and forth between the two because you‘ll eventually have to get back to work on the next version of your model.
11
I had to drop out, but great Space by @kelseyhightower. My recommendation for anyone interesting in diving into this area would be to understand the ways of working first and then how tools can support them.
12
ML in production is a very young area which is also quite diverse in terms of applications, and technical requirements. Most tool have one certain application in mind. This is not like web frameworks which have had a decade or more to standardize.
13
Just because Google uses it (e.g. tensorflow, k8s) does not mean that your problems are also well suited for it. It's like with databases. SQLite might be good enough depending on your needs.
14
Also, many of the tools (esp. open source) have academic roots and aren't really geared towards production uses, or designed to be usable for people who don't really understand what they are doing. It's is super easy to shoot yourself in the foot.
15
On the other hand, a very interesting and fast moving field! One of the reasons I find it so interesting :)
16
OK, one last thought. I think people tend to focus on the ML pipeline and the tools building it (orchestration, data, training, deployment, etc), but an ML project is all about figuring out the details of this pipeline. The question is which tools/practices help you with that.
17
To take web development as analogy, parts are concerning the system in production (ORM, scalability, etc), and then there is stuff that makes the development easier (eg hot reloading, database migrations, scaffolding, etc). Same for MLOps.