Should we build, integrate or buy the MLOps platform? : : Syndicai

Last week I had a chance to speak at Data Summit ML Summit conference to share my thoughts related to MLOps platforms. During a 30min talk, I tried to answer the most common questions that many AI executives trying to answer today. “Should I build, integrate or buy the MLOps platform?”.

The following article contains major takeaways from the presentation. We will not build MLOps platform going through the whole process. Nevertheless, I would like to show a couple of pros and cons that might be helpful in answering the above question.

Why do we need to talk about MLOps?

So what exactly is MLOps?

The shortest answer... "it depends". Data Scientists, Software engineers, or DevOps - each of these people looks at the topic of MLOps from a different angle.

Nevertheless, all the above professions aim to achieve one goal. To ensure that the process of ML model delivery to production will be efficient, fast, and secure. MLOps is a practice of collaboration and communication between data scientists and other IT teams to manage the entire ML lifecycle.

MLOps is the best practice — MLOps definition

We can also say very broadly that MLOps covers such areas as:

data collection and cleaning,
data preparation,
training, tuning, and testing the model,
production implementation,
monitoring, and optimization.

However, after many conversations with companies creating solutions based on Machine Learning, I noticed one problem. The biggest, and also the most common challenge is to make that first step towards MLOps. That is production-ready deployment.

What is production-ready deployment?

When talking about production-ready deployment, I mean the following aspects:

Security - this is the provision of best practices to protect our model and data used in the deployment.
Scalability - that is, the possibility of expanding the system along with the increased use of the model.
High Availability - that is, ensuring long failure-free operation of the fail-safe system.
Monitoring - that is, collecting of information on how our model works.
Versioning - that is collecting all versions of the model and code allowing them to run in the production environment.

Production-ready deployment of ai models — The most common challenge in the process of MLOps adoption

A large number of features challenge us to find a solution that perfectly fits our needs. Surely most of you have heard of such tools as AWS SageMaker, MLFlow, or Kubeflow. However, companies still trying to investigate more tailored solutions. They either find one and buy it, decide to build one in-house from scratch or use open-source.

Let’s investigate the pros, and cons of all those approaches starting with the most common one: build MLOps platform.

Build MLOps platform

Build own mlops platform — Build MLOps platform

A couple of months ago, I participated in a project-based one of the AWS services. It promised many benefits that help in the AI model deployment. Unfortunately, after two months of working with this tool, it occurs that it did not cover a few things that we needed. Therefore, we decided to look for some open-source projects, however, most of them were completely unusable. Unfortunately, due to the time spent, we decided to continue using this tool.

Therefore in order to cover the missing functionalities, we decided to create our own command-line tool. The biggest advantages we have gained were a very large simplification of the environment configuration process of the deployment. The tool was tailored to our needs and was quite easily configurable. Nevertheless, to write it, I had to get to know the AWS service very well, which meant that the creation of the tool took much more time than it might seem. Also, solving the encountered problems often required team discussions to choose the most optimal solution.

...we have gained were a very large simplification of the environment configuration process.

Now, in retrospect, I can also say that often time pressure and the desire to obtain benefits from such a tool as soon as possible result in the lack of time to create full documentation describing all the possibilities of the tool. What's more, while more and more people use such a solution, start to update it, add new functionalities, fix unforeseen situations - a whole bunch of such updates is often also not documented.

The last thing I would pay attention to is the fact that cloud providers develop their services very quickly. It means that, if we use selected services, we must constantly follow their development to update our own tools. We need to adapt them to newer versions, or even fix problems with incompatibility. The painful maintenance and updates are things that we often forget about building our own solution.

The painful maintenance and updates are things that we often forget about building our own solution.

Integrate using open-source

integrate open-source solutions to create mlops platform — Integrate open-source solutions to build an in-house MLOps platform.

Let's move on to the next possibility when it comes to MLOps platform, which is integrating open-source solutions. Integration consists of the selection of currently existing tools (most often they are open source tools) and their common configuration, thanks to which we will be able to implement models on the production platform. From my own experience and observations, this is the most frequently chosen path. However, despite the very large benefits, it is worth paying attention to couple of cons of this approach.

But let's start with the benefits. The use of ready-made solutions imposes on us how we will use these tools. For example, when using the Flux tool to implement models in a k8s cluster, we need to understand what YAML. Then we need to create a configuration using this file structure to describe what we want to do. Contrary, such imposing a scheme somehow standardizes the process and provides new models. This is undoubtedly a big advantage because it allows for automation.

Another very big advantage is the ability to replace individual services if they no longer cover the functionality we expected. You may, for example, use ArgoCD instead of Flux, or use Helm instead of Customise.

Since we often use open source tools for integration, we do not incur any licensing costs, which is undoubtedly a big plus. However, again from my own observations, I know that very few people pay attention to licensing such software. Unfortunately, we have several types of open source licenses and it is worth checking them in detail.

The biggest disadvantage that we can encounter is the difficulty of updating individual tools throughout the implementation process. This often leads to a one-time configuration of the tool, and updates of individual components are performed very rarely or not at all.

Another downside is the non-working individual functionalities described in the documentation. As we know, it is impossible to write completely error-free software. However, it should be borne in mind that if we encounter such a bug, we will have to look for a solution or workaround for it.

I think the last downside worth mentioning is that we can never be sure that the chosen tool will not require us to write an integrator that will somehow connect the tools just to cover some missing functionality.

The biggest disadvantage that we can meet when integrating tools is the difficulty of updating individual tools throughout the implementation process...

Buy a ready-made solution

The last option, when it comes to MLOps platform implementation is simply to buy a ready-made solution. My observation and experience gained from several projects showed me that many open-source tools responsible for the implementation of models are mainly adapted to Software Engineers, not people specialized in ML. Software engineering has no problems looking at the infrastructure abstractly and understands how it works and what they have to do to deliver a finished solution. Therefore, it seems to me that this is the key difference between integrating with purchasing software. In both cases, we very often get similar benefits, but if you buy a dedicated tool, we can expect that it will be easy to use.

Another big advantage of the purchased software is the high-security standard. I think that of the three methods of implementing the application, it is the purchase of a ready-made solution that gives us the confidence to care for security.

...big advantage of the purchased software is the high-security standard.

The last big advantage of commercial solutions is technical support. This means that if we have a problem or if some functionality does not work properly, we can report it and expect that our problem will be solved. We do not have to look for workarounds or solutions to the problem ourselves. Very often, technical support can also help us when integrating with other solutions.

On the other hand, the first and flagship disadvantage of the purchased solution is undoubtedly the cost of the license. Many companies can’t afford to buy a solution, especially at the early stage due to the high price. That causes a lot of friction. When choosing a tool, it is also worth realizing whether its purchase will cause vendor locking, i.e. blocking the cloud service provider we will use.

The next problem is related to buying tools for our needs. This means that if we do not need a multitude of functions provided by the indicated tool, it is worth considering whether we really need them. Paying for something that we will not use because we already use a different tool in our workflow, not quite the masses make sense. Here, I can only mention that, based on these experiences, we noticed very high repeatability of the steps to be performed during the implementation, and therefore in Syndicai we decided to write our own tool that will solve mentioned above problems.

Summary

Summing up, I would like to highlight three main aspects of the presentation.

1. MLOps is not only DevOps. The topic is very young and it is much more complicated than DevOps. The problem that we most often encounter in companies is the fact of creating our own tool without being aware of what the production implementation is believed to be and what areas we need to cover.

2. Let's pay attention to popular solutions, whether their functionality meets our expectations and whether their licensing allows them to be fully used in the implementation process.

3. At present, most often we will have to integrate tools, either open-source or purchased. However, it is worth considering whether the purchase of a ready-made tool and the support received will significantly accelerate the delivery of business value to our model.

Disclaimer: The above article is not a transcript of the talk. However, please subscribe to our newsletter to learn more if you are interested in more detailed information.

Should we build, integrate or buy the MLOps platform?

Why do we need to talk about MLOps?

What is production-ready deployment?

Build MLOps platform

Integrate using open-source

Buy a ready-made solution

Summary

You might like these

How to start with MLOps?

AI Project Canvas: The best way to deploy AI in the organization