Open source operationalisation: from cooking at home to a three-star restaurant kitchen.

Reece Clifford
5 min readFeb 16, 2021

In my previous article, I suggested that operationalising an analytical model, including one built with open source, was a bit like the difference between cooking in your kitchen at home, and in a restaurant kitchen. I said that there were four categories of similarities:

1. People

2. Ingredients and Data

3. Equipment and Technology

4. Processes

People

The most obvious difference is the number of people involved. When you cook at home, it is probably just you, or perhaps you and your partner. In a restaurant kitchen, each dish may have two or three people involved. To serve an entire meal, you need many, many more. When you move a model from development to production, you need far more than just the data scientist who built the model. You also need the data engineer, the Head of Analytics, IT and the business units. More people need to know about the model, how to understand its insights, how to repeat its performance and their role in its deployment.

This leads us to the second change: the recipe. In a restaurant, how all the different dishes are cooked and served needs to be the same every time. Imagine if two people sitting on the same table were served tomato soup, one with cream and bread, and one without either! I don’t think that restaurant would be around for long. In analytics, the operationalised model needs to do the same thing each time.

Ingredients

The second similarity is in the ingredients or the data used. If you’re cooking at home and forget an ingredient, you can pop to the shop for it, or substitute something else. It’s not that simple in a restaurant. You need to have your ingredients ready, and they need to fit your menu. You therefore need to make sure your ingredients are delivered on time, in the required quantities, and that they are prepared as needed.

The same goes for data. Whether you are using monthly rolling averages of credit card payments, or removal of duplicates as part of data pre-processing, they need to be delivered on time, in the right quantity and prepared as needed. Production needs to mimic the way the data was presented for model development, because if you have different data, you will get different results.

Equipment

Third, we need to consider the equipment and technology used. You know your oven and hob, and also your knives, pans and utensils. In a restaurant, however, there are more ovens, more fridges, more pans — and also new equipment like the fryer, the griddle and the pizza oven. The same situation arises when operationalising a model. You now have to worry about working with CI/CD software, creating a model package, ensuring the model code is pushed to the correct Git repository and any unit testing. There may also be issues posed by new equipment and technology that is added to provide the highest level of business value. For example, using IoT sensors may result in your models needing to be run with PyTorch mobile, not PyTorch, providing less compute power. This, in turn, might mean that you could not facilitate as many layers in your Deep Learning model. All this affects the move from development to production environment.

Processes

Fourth, and potentially most important, are the processes. Without processes, the entire activity becomes a free for all. Processes are used to enable each person to understand their responsibilities. In a restaurant, for example, who plates the meals? Where do dirty pots and pans go to be washed? Does the sauce go on at the table, or before the chef calls for service? These questions affect how the restaurant operates: its processes.

The same is true when operationalising open source. The more people involved, the more everyone needs to understand what exactly they are responsible for, what is expected of them and where they hand over to the next person in the chain.

These processes make the difference between a cheap café and a three-star restaurant. A star is a mark of a restaurant’s quality, service and desirability. Stars are awarded based on several factors, including the ingredients, mastery of flavour, cooking techniques, value for money and consistency between visits. The big issue with processes is creating consistency.

Anyone can get lucky, and cook a fantastic dish once. To repeat it, however, and to keep creating fabulous meals, the same every time, is something else entirely. Being able to do something once may be good enough sometimes: say if you wanted your model to answer a single question at one point in time, and provide insights for a single decision. However, most of us need to operationalise open source continuously, successfully and efficiently.

There is, of course, more than one level of ‘good’. This is reflected in the number of stars available to restaurants. A one-star restaurant shows very good cooking. A two-star restaurant is worth a detour and a three-star restaurant is exceptional and worth a special journey. One size also doesn’t fit all for open source operationalisation. However, there are some elements we need to consider that are consistent.

We need to look at open source model development as part of its broader use, in the analytics lifecycle. In the lifecycle (diagram below), the Model section, where model development takes place, is at the top of the left-hand circle. It is just one aspect of a much larger process. Viewing model development like this increases collaboration and understanding for all those involved in the process.

As silly as it sounds, if the developer understands that the model they are building is going to be deployed alongside other models and business rules as part of a decision flow, they will have a better understanding of what they need to deliver. When more detail is added on how deployment will take place, that understanding becomes more refined, and the process more efficient.

In my next article, I will look at the challenges when a process like this is not understood, across the same four categories.

For those that would rather watch than read, you can view the on-demand webinar where I run through these ideas too.

--

--

Reece Clifford

Listen, Understand and Guide — Helping companies access, govern and benefit from their data and analytics.