Tools I use daily to improve both learning and creating
It had been a while since I last developed a complete ML project: data mining, sampling, strategy, testing, and deploying seemed yet again new to me when I decided to build a new machine learning project to integrate with what I’ve been working hard on over the last six months: decentralized finance.
I’ve had my dose of machine learning, also with time-series-focused ML, and I have recently started digging deep into financial machine learning. I’ve had to redesign my toolkit since my setup consisting of Jupyter notebook wasn’t productive enough for me.
In this article, I’d like to talk about some of the tools I am using right now for both learning and creating that I have found incredibly helpful in boosting my productivity.
Without further talking, let’s start with my scientific IDE of choice.
I know this is nothing new. You have probably already heard much about this IDE. Chances are you even tried it, but it seemed that nothing compared to Jupyter Notebook. After all, nothing is more comfortable than having multi-language support and an easily shareable and highly interactive interface.
Furthermore, talking about learning, it’s awesome to be able to integrate notes and code in a single notebook, right? Sometimes it isn’t, at least personally, but I’ll speak about this in more detail in later paragraphs.
Every ML project needs a well-defined structure, and while interactive notebooks are incredibly efficient when it comes to tests, they can become a disadvantage. In my experience, you’ll lose focus in writing organized code, and find yourself running the same code cell multiple times just swapping a couple of variables and hoping to achieve better results.
While coding on a directory-like structure helps you think twice before running the whole code multiple times, it reinforces theoretical concepts and, again in my experience, lets you find issues way faster.
Besides, if you really must do tests, you have an IPython console with all variables synced to your code’s variables.
Then, why not VS Code?
Great question. I think it comes down to personal preference and what you work with besides scientific development. For example, I also primarily do frontend development, backend development, and smart contracts development. Scientific development is nothing like the above, thus I like to have separate interfaces for the two tasks. Besides, Spyder already comes with everything you need for machine learning, without having to install plugins.
Let’s trace back to learning. I like taking notes, and probably you do too. As I mentioned earlier, Jupyter lets you integrate text notes into your code. For example, if you were to learn about a specific sampling technique, you could have a notebook where you write the filter’s definition or formula, then insert a code cell that shows the filter in action.
There are many cases where this technique produces amazing results. For example, if you have worked with frontend development and had a chance to check out Mozilla’s Developer Docs, you already know those examples at the beginning help a lot.
But I have found myself better separating non-code notions from code implementations. It has helped me to focus more when learning or revising intermediate and advanced concepts.
Since most (if not all) ML concepts require a mathematical component, I’ve decided to use LaTeX. Specifically TexPad and Gummi.
I work with both a MacBook laptop and a Linux PC, and I use TexPad for my Mac and Gummi for my Linux machine. What they both have in common is that they are live LaTeX editors, and both have useful features to speed up the workflow such as shortcuts or autocompletion (only on TexPad).
LaTeX can be tricky to learn, but I have found this documentation to guide me. Hopefully, it can help you too.
Orange3 enables you to intuitively visualize and analyze your data through a simple UI. You can create workflows for your data and simultaneously visualize it from different perspectives.
The interface is easy to use and has a quite extended toolbox that can help you handle your data. As of now, I use this tool for visualization purposes since I prefer to do data preparation operations with my code.
I discovered this tool just about two weeks ago, thus my knowledge of the former is not extended. But I find myself using it frequently, especially to visualize datasets I have not worked with before firsthand.
Datasets take up a lot of space, especially when they are raw data and have not already been treated. You can store them on your hard disk, but it’s not going to be pretty if you don’t have an external HDD/SSD that’s only storing for your datasets. I prefer storing my datasets on cloud storage, or better, decentralized cloud storage (DCS).
Storj does its job just fine. Below are a couple of reasons why I have chosen this platform:
- You can easily access the object’s raw data directly.
- You get 150GB of free space.
- It’s decentralized, meaning that your data is stored on different nodes and will always be available. Furthermore, it introduces another layer of security.
- It’s focused on developers.
I organize the datasets of my projects by creating Storj buckets, which I then call whenever I need my data.
I hope you have found this article useful and have decided to try at least one of these tools. If you have found better alternatives to the proposed projects, don’t hesitate to write them down in the comments section.
- Spyder → why you should reconsider using Notebooks and give this IDE a try.
- Gummi and TexPad → the best LaTeX editors I found that are designed to take mathematical notes and improve your learning process.
- Orange3 → a data mining framework that enables you to work with your data through an easy-to-use interface.
- Storj → decentralized cloud storage primarily for developers. You can store your datasets here.
Thank you for reading!