About | Jan 30, 2022

Hello World!

A blog is born, but let's talk some meta first: what's this blog about (boring machine learning), why was it created (learning by explaining), and who's doing the writing (moi).

Amiga Kickstart V1.1 50HZ floppy — Always wait for at least V1.1.50HZ of a new OS

Boring machine learning

NaN Recipes is a collection of ideas created with NaN’s, the basic building blocks of machine learning. It is a personal blog to be able to share personal learnings, or not necessarily even learnings but just thoughts that might still be of interest. The topics are probably going to be a random mixture of machine learning and plain old computering, but no strict promises, as the first post is yet to be actually written.

Also, in the last five years, machine learning has transformed into a wave of software 2.0, that has began eating the world 2.0, so much that this introduction also needs more meaningful categories than just an all-inclusive “machine learning” or ML.The most distinct theme of the blog is to approach machine learning from a pragmatic angle, which is sometimes called applied ML to differentiate it from ML research, though I’m not sure all researchers like what that wording implies (non-applicability). But I have a more descriptive term for the pragmatic approach: boring machine learning, which is also good news for researchers, because it implies that research is the exciting branch of ML!

Why would anyone care about something that is boring though? What does it even mean to be boring?? Let’s see, maybe we can come to an agreement, if we look at what are some high level goals for our production systems:

simple,
stable,
obvious,
predictable,
uneventful,

or just plain unexciting. So all in all, we’re definitely talking about something that’s seriously boring, but in a good way, like good old logistic regression! In fact, boringness is the highest attainable quality of software systems, and especially so, when combined with the footgun characteristics of machine learning.

To summarize, the discussion will be more based on intuition than theory, and be more about systems than models. Also, since this is a boring machine learning blog, at least there won’t be any MathTeX font issues. In the end, you won’t find any exciting research discoveries here, but instead, mostly just 1001 ways to use logistic regression.

Why start a blog in 2022

In all honesty, there’s only one basic reason behind creating this blog, and that is a selfish one: I want to improve my thinking and learn more ML skills, and explaining ideas on the internet is (hopefully) a pretty good way to learn. Deep understanding of abstract concepts requires both reading and writing about them, but even beyond that, explaining to others is a powerful regularizer that forces one to fully think through all assumptions and derivations, something that self-study doesn’t enforce. This idea of learning by explaining (by learning to explain) has some pretty famous origins.

Another connected reason is that I’ve noticed that learning by doing is another very effective way to learn new skills. Anecdotally, in the past, whenever I happened to own a programming book, it was a sure sign that I would never actually learn to use that language well. Owning physical books like Programming Perl or JavaScript: The Good Parts practically guaranteed that I never really used either of those languages for anything significant, whereas learning C++ and Python on the job, by working on actual problems, supercharged those skills. Similarly, I never took any database courses, because I thought they were dinosaurs from a time before lower case letters had been invented, which again guaranteed that I’m currently spending half of my time writing SQL.

SELECT COUNT(*) AS THEM FROM TRANSACTIONS WHERE COUNTRY != 'US';

WHY DO DATABASE PEOPLE SHOUT SO MUCH?

So that’s the plan: (1) learn a little of something by doing, (2) learn more by explaining it here, and (3) profit?! Maybe also discussing niche tech problems will allow me to reach some nice niche tech people, which might further lead to level three of effective learning: learning from people who are better than you.

Causal context

Let’s model the blog author as a language model, hoping that that might be wrong but useful. That approach allows us to predict all the future words, if we know the author’s past context well enough.

Amiga 1000

The relevant context starts with an Amiga 1000 computer, which I got in the Beige Age. I future-proofed the machine on the very first day, by installing a 256KB memory expansion, and was really pleased with myself because 512KB ought to be enough for anything I would ever need. Little did I know at the time – as of today, it’s again been 0 days since last running out of memory.

The Amiga was a true full stack experience, all the way from soldering hardware expansions, stepping through OS trampolines, programming real-time 8-bit audio effects, nerding out on fractals, ray-tracing, or speech synthesis, sampling sounds, making music and art, to playing games of course. Thinking about it now, it’s surprising that I had zero problems with the machine, even as I regularly plugged in DIY electronics to the CPU socket, or lugged it to live techno gigs in warehouses like Lepakko.

After the Amiga I was drawn to more “serious” Unix and BSD computering systems like SunOS and SGI IRIX, and for a short time I even worked with legendary Xerox workstations, though my job then was to help migrate users out of Xerox and into Windows NT. With this background it didn’t take long for me to find Linux, and with it, to eventually learn to upgrade my OS only every four years. Some ten years ago, I even had my mobile phone running Linux.

Python 2001

Yeah, so my Python career turns 21 this year, and I’ve been having a blast, even if I still don’t understand basic stuff like inheritance. I originally sought out Python 1.6 and Numeric as an open source replacement for MATLAB, and boy was that a good prophecy (yes it was). Back then, because Python was slow, I used Python as a glue language to patch together C++ routines with SWIG – but now, I can use Python as a glue language to patch together C++ routines with pybind11! In between, I’ve mindlessly waded through plain C, boost::python, ctypes, Numba, and Cython, though in all honesty, TorchScript has mostly removed the need for C++ in recent times.

When I picked up Python, I was working at Nokia to develop all sorts of real time audio processing systems for mobile devices. One of the more “innovative” projects was called Mobile Dancer, which was an audio beat tracker running in real time on a 220 MHz device and controlling a dancing 3D character. Obviously, the mobile software was not running on Python but C++, but we used Python a lot for prototyping and testing.

Later on, I focused more on machine learning and less on signal processing, which thankfully also meant more Python and less C++. One of the later projects at Nokia was called URL4POI, which was about crawling the web to enrich Nokia Maps' place database, by using machine learning to cross-reference web pages with their place entities. Python was a perfect match for that project, because it has a rich ecosystem for web crawling, parsing crappy HTML, extracting features, and training and evaluating logistic regression models. In the end, we were able to add a million new web page links for places like restaurants, shops, tourist attractions, and others.

Data 5e12

So yeah, my machine learning past spans even a bit further back than Python; I stopped calling it “pattern recognition” as soon as Tom Mitchell’s book dropped. Though, looking back now, the proportion of machine learning to artisanal heuristics wasn’t very good during all those early years, and who knows, probably what we do today also looks horribly handcrafted in twenty years from now.

I ended up leaving Nokia eventually, because the future for boring machine learning started looking bleak. Nokia’s services business wasn’t doing very well, and that also meant having little data and little potential for business impact. Even Nokia’s office buildings were being taken over by new companies, and I was so diehard that I decided to join Supercell which had occupied one of Nokia’s buildings. Oh, and they also had 5TB of data flowing in daily. Also, an inverse company culture, where no-one tells you what you can (not) put live.

Although I was not hired to do machine learning but game and business analytics, I managed to sneak in enough machine learning features, so that before long, I wasn’t doing analytics but machine learning again (the boring variety). That brings us to the present time, and concludes the causal context for this blog, so that with this information, all of the future posts can be inferred, word by word.

Professional timeline

Supercell Data scientist, machine learning engineer, game analyst, machine learning lead; helping to make mobile games that are played for years (2013–)

Nokia Machine learning for Nokia Maps, Nokia Music, and Nokia Research; mobile audio software development on ARM9E (2000–2013)

MSc in Signal Processing (with distinction) from Tampere University of Technology; thesis (2001)

Audio Research Group at Tampere University of Technology: signal processing research for speech, audio, and music analysis (1997–1999)

Instrumentointi Oy Aviational test software development (1997)

TL;DR

This blog is about machine learning (ML) and computering, both approached from a pragmatic or boring angle, which means that the posts are mostly about simple stuff that works, such as logistic regression. The motivation for starting a blog is to learn myself about boring variety ML, by explaining stuff that I’ve done. I’m a ~~dinosaur~~ senior individual who just passed my first two decades of ML & Python.

ACK

NaN Recipes is built with Hugo, designed with Tailwind, and served with Netlify. Created with <3 in dark mode.