Back to archive

Thread

4 tweets

1
Has this happened to you? 🤔 - New DS Project - People are excited and re-implement a paper or article they found - A lot of discussion about what to improve - You spend months on the data and training pipeline - You launch and it looks absolutely horrible😱 What went wrong?
2
For every ML project, the data is unique. What may have worked for other projects doesn't have to work for your data. Even expert's opinions count less if they don't look at the data. They can point out errors, but not tell how to exactly do it. A better way?
3
- Rigorously evaluate whatever you do on test data. - Automate testing as much as possible. - Let testing drive your pipeline - Make writing those tests the first thing you do. It's not unlike TDD with tests being actual data.