What exactly is Machine Learning Operations?⋅ Marcin Laskowski
Why we need to talk about Machine Learning Operations, and how it is different from DevOps.
When we talk about Machine Learning we always have data and the model in our mind. All the challenges correlated to the Exploratory Data Analysis workflow. However, there is a big missing puzzle in the whole game – Deployment. In order to extract real business value from an ML model, you need to somehow put it into production. And this ‘somehow’ is a methodology called Machine Learning Operations.
Why we need to talk about MLOps?
We can hear a lot of great news about the new Machine Learning models, techniques, and tools. Everybody can agree that the industry in this field is moving forward extremely quickly. However, there is a big bottleneck in AI. Taking models and moving them into production where they actually start to solve real problems is a big challenge. There are a lot of problems with AI model deployment. Just like DevOps did 15 years ago taking off the ground a piece of code with just a single code repository, the whole AI ecosystem fights to achieve a similar outcome with ML models. Execute a build pipeline that takes a model to production along the way testing and validating every single step. Sounds easy but it’s not.
What the term “Machine Learning Operations” really means?
MLOps itself is not something that is already defined. Both in terms of the definition (companies mean slightly different things) and the people who should deal with it in the company (ML Team or Backend Team). The whole topic is still fresh and the terminology has not yet been defined and established.
In syndicai when we talk about MLOps, we talk about the ability to move AI models from data scientist’s machines or laptops to remote machine clusters at scale. The main goal of that process is to automate the whole workflow taking care of resource management, orchestration, data/model versioning, and monitoring.
How MLOps is different from DevOps?
The main idea behind DevOps is that a piece of software is tested, QA-ed by the development team when stable goes into production.
In the field of AI, everything around is actually very different.
Unlike DevOps, the MLOps is much more experimental at its core. During the whole workflow, Machine Learning engineers try different features, parameters, model architectures. It is important not only to version the whole code but also being able to reproduce previously obtained results.
When we deploy our models, the constant flow of new data causes the decrease of accuracy over time, which is not the case with the regular software. That is why we need to retrain and deploy again models that theoretically work properly (with no bugs) but have a significant drop of the performance.
When it comes to computing resources, we need to be aware that Machine learning experiments can be very heavy workloads. So unlike regular software, there is a need to run stuff on large machines from day one. Very often there is a need to change workstations depending on the traffic.
In the DevOps environment mainly infrastructure is under constant supervision. In MLOps there is a need to take care of the model itself constantly checking what happens with parameters and metrics in order to retrain when needed.
In addition to the software tests such as unit testing and integration testing there is a need for model validation, model training tests.
The awareness of Machine Learning operations is constantly growing. More and more companies are providing their services on top of AI which requires them to automate processes. We can also feel that the whole movement empowers the industry bringing to life new tools. Everything in order to become much more clever and efficient as a Data Scientist.
We need to be aware that unlike ‘normal software’, Machine Learning models need to be treated with a different set of practices and tools. It is all about implementing Continuous integration flow into Data Science workflow.