Thoughts from a professional machine learning engineer who has seen it all!
After working several years as a machine learning engineer, I feel like I’ve seen pretty much everything you can see when it comes to coding. Though we have standards like Python’s PEP-8, most people (including myself, if I’m being honest) don’t follow those standards. Like reading somebody else’s poetry, every coder has their own little flair to how they do things.
Like there is good poetry and bad poety, I’ve seen some really well written code out there, and… well… I’ve also seen some pretty bad code, too. Now, you might be thinking, “Shouldn’t I just follow some sort of standard like PEP-8?” And to be totally honest, that’s not a bad place to start. I personally don’t agree with everything PEP-8, for example, but generally speaking, I think PEP-8 is quite fine.
But the problem with a standard is that they are hard to remember. I myself have been trying to get better about aligning my own “coding flair” to something closer to PEP-8, but I frequently forget what the standard notes for any specific topic. That said, the considerations below are written to be a little more intuitive and easy to remember. Plus, they don’t at all conflict with standards like PEP-8, so again, I’m pro-standard… even if they’re hard to remember.
Let’s jump into the considerations!
There’s a great episode of the TV show Silicon Valley where lead character Richard Hendricks tries to sell customers on how amazing his software product is. The problem is that it is way too confusing to use, but to prove that it’s still great, he hires a focus group and spends a solid 8 hours teaching them how to properly use the product. At the end, he’s successful on convincing the focus group that the product is great, but the product fails to catch on after that because the members of that focus group are unable to explain how the product works to their friends.
Friends, the fact of the matter is that you don’t get to have an 8 hour focus group with the people who use your code! It doesn’t matter how well the code may function, people have this tendency to check out whenever they get too bamboozled by something. And unfortunately, code meant for reuse that ends up not being reused due to confusion is pretty much worthless.
If you’ve ever looked at any of my code, you’ll notice that I annotate almost every single line of code. (Example) Now, we could argue if I go too far with my annotations, but from my personal experience, people annotate their code far less often than they should. Yes, variable, class, and function names should be written in a way that should be self explanatory when another coder reads your code, but it’s not always evident to another coder what a piece of code they are reading is new to them. As a machine learning engineer myself, this happens pretty often since the data science field is constantly evolving. Annotating your code helps others to very clearly understand what your script is doing in plain language.
The general wisdom out there is that you should always write functions or classes when writing your code, and I would generally agree with that. Functions and classes are great for modularization and reusability of your code. If you have a lot of code that gets reused for objects or actions, then yes, by all means please write functions and classes.
But there is a balance to know when is the appropriate time to use functions and classes. I know some developers that will write functions and classes any chance they can get, but it can get out of hand in cases where code is not reused nor has any reason to be modularized. A script for a very narrow scenario can quickly become way too long when functions are written for something that is run only once. Remember, functions and classes can lose code readability, and as we touched on at the first point, confusion due to lack of readability can be a turn off to other developers.
We talked about a balance of using classes and functions at the last point, and what I’m getting at here is that if you have a big solution with a lot of moving parts, try to break them down into smaller chunks. There are a lot of benefits to doing this, including…
- Reusability across multiple projects
- Resiliency so that if one piece of the solution goes down, it doesn’t bring down the whole thing as would happen with a monolithic solution
- Testability to ensure that each individual piece is working as expected on its own (aka unit testing)
- Shareability in the sense that it makes it a lot more difficult for more developers to work on the solution if they are all working from something like a singular massive, monolithic script
I’m definitely a picture person, and while you of course have to get to the nitty gritty details in the code, a diagram can go a long way in helping your fellow developers understand the high level picture of what your code is doing. I work a lot on Amazon Web Services (AWS), and those of you who are familiar with AWS know that there are a LOT of difference services that can be chained together to build a full solution. Before I show my teammates the code of how all these services interact with one another, I create a high level diagram just to show an overarching picture of how all these services interact with one another. I also do the same with my work on Kubernetes.
Speaking of diagrams, I can’t recommend diagrams.net enough here. Diagrams.net is a free, web-based diagram creator very similar to Microsoft Visio, except diagrams.net also has a ton of icons for services like AWS or Kubernetes or general database icons. It’s so easy to drop a few icons down representing AWS services and then connect them together with arrows. And did I mention it’s free? (Well yes, I actually did… but it’s worth stating again.)