Mikio L. Braun

1

(Hot Take🌶) The call graph for your data pipelines should have depth no more than 2.

Sep 07, 2022 · 13:07 2

2

OK, what I meant with that: I frequently see code that does any pipeline that does like: first_we_do_x then_we_do_y and_another_thing and then when you look into first_we_do_x, you see that it consists for more non-trivial substeps, and so on.

Sep 07, 2022 · 13:44

3

So it looks like you're decomposing the task, but you're actually confuscating the flow of data. I personally find it better if the data flow is always visible on the top level, you process different parts of data at the top level and compose them there.

Sep 07, 2022 · 13:50

4

Put in a different way: every level should work on some level of abstraction that makes it possible to understand what is going on without having to dive in all the subfunctions first.

Sep 07, 2022 · 13:50

5

If you realize you need to break out complex processing into subfunctions, do so in a way that is separate from the data pipeline and generalized enough that they can stand on their own.

Sep 07, 2022 · 13:50

6

Is this always possible? No idea. I also don't claim that everything I do always looks like this. Just that when I achieve a way to structure it like this, it gets much easier to understand and work with.

Sep 07, 2022 · 13:50

7

Maybe some overlap with ideas like POJOs and functional, side-effect free programming, and dependency injection. Essentially build simple things that can be composed to do what you want. (end of rant 🌶)

Sep 07, 2022 · 13:50

Thread