Back to archive

Thread

5 tweets

1
My impression from last week's #stratadata was that there are many startups on either data cataloguing or end2end ML. Reminds me of a few years back when there were many big data startups. I think what is interesting is that end2end ML is not fully understood yet.
2
For example, one version is to deploy "pure ML" models from notebooks, but there are use cases where the model is part of a bigger system, including feature computation, post-processing, and so on.
3
Likewise, training might involve not just the model but other preprocessing/feature generation steps and pipelines, backtesting, etc. And there is the whole monitoring topic.
4
I remember with big data there were such possibilities as well, in the end we settled on Spark and friends, which is again close to SQL in what it can do. Maybe the same will happen for end2end ML, too?
5
Oh, and the whole topic of how to effectively collaborate on notebooks, or whether they are good enough for production code is also still open IMHO.