MLOps: Methods and tools for machine learning

When we think of computer vision and AI applications all around us, from self-driving cars to smart home devices, the majority of us don't know the complex processes involved in pushing that product from blueprints all the way to production. That involves a hands-on approach by teams of specialists ranging from CV engineers to data scientists, ML engineers, and more. Without harmonious and efficient workflows, moving forward with the machine learning project lifecycle is slow, inefficient, and certainly not beneficial to enterprises that heavily rely on ML application production.

What is Machine Learning Operations (MLOps)?

Machine Learning Operations (MLOps) is a practice for collaboration and communication between data scientists and operations professionals to help manage the development and production machine learning (ML) lifecycle. It is derived from DevOps principles and aims to increase automation and improve the quality of ML pipelines while also focusing on business and regulatory requirements. Executing MLOps requires the collaboration of data scientists, DevOps engineers, operators, and IT.

All of the following are vital components of MLOps needed to achieve promising results:

Monitoring
Governance
Security
Training data
Infrastructure management
ML model serving
Model training
Model re-training
Model validation
Diagnostics
Model version control

Why do we need MLOps?

With the tremendous pace of ML models development, more and more companies rely on them to drive business decisions and performance. Most of tech startups use state-of-the-art ML models for the core of their products, hence the importance of MLOps is becoming more pronounced. The key factors behind this importance are:

Efficiency: MLOps can automate many of the time-consuming tasks in the machine learning lifecycle, allowing data scientists to focus on what they do best: researching and developing new ML models.
Scalability: MLOps makes it easier to manage, update, and deploy numerous ML models across different environments, contributing to better scalability and flexibility.
Reproducibility: MLOps includes versioning for data, models, and parameters, enabling model reproducibility and auditability, which is vital in regulated industries.
Monitoring: MLOps involves continuous monitoring and validation of models to ensure they perform as expected over time. This allows for quick detection and correction of issues, like model drift.

DevOps vs. MLOps

DevOps is a set of practices that combines software development (Dev) and IT operations (Ops) to shorten the systems development life cycle and provide continuous delivery with high software quality. It's mainly focused on the automation and monitoring of all steps of software construction, from integration, testing, and releasing to deployment, and infrastructure management.

On the other hand, MLOps, while borrowing principles from DevOps, extends this philosophy to machine learning model development. Machine learning has its own unique set of challenges that are not seen in traditional software development, including data versioning, model training, versioning, experimentation, and model validation and performance monitoring. MLOps aims to address these challenges and enable a smooth and efficient machine learning lifecycle.

Lifecycle overview

A proper functioning MLOps lifecycle utilized in most data science teams can be described with these steps:

Data Preparation: This involves supporting the whole data pipeline that gets raw data and returns input data for ML models. This includes raw data collection and storage followed by data processing carefully defined by the data scientists that will be used in model training.
Model Development: This is where the actual machine learning takes place. Although more and more sophisticated AutoML platforms are being released recently, this step of the MLOps lifecycle remains the least automated one, as usually complicated ML systems need continuous manual changes and tuning. Overall in this stage ML models are built, trained, and tested.
Model Validation: Here each trained model is tested on unseen data having no intersection with training data. Models are evaluated to ensure they are performing as expected. This involves assessing model performance according to predefined metrics and their generalizability.
Model Deployment: Once validated, models are deployed into a production environment where they can start providing predictions.
Model Monitoring: Once in production, model performance is continuously monitored to ensure they perform as expected on the new coming user data.
Model Updating: Practically, no deployed model remains unchanged for a long time, usually, it needs to be updated either because of model drift (the distribution of the real-time data stream changes and the deployed model's performance deteriorates) or the model performance requirements change.

MLOps involves automating as much of this manual process as possible and ensuring smooth transitions between each of these steps. The goal is to improve the efficiency and quality of machine learning in production.

CI/CD for machine learning

MLOps adopt something from DevOps called CI/CD, which stands for Continuous Integration/Continuous Deployment. CI/CD for machine learning allows rapid delivery and deployment of code to production, resulting in an automated pipeline that produces production-ready code quicker than it would be possible otherwise. This is a continuous process of updating, identifying faults in the model, returning to update the model based on the new data, and then going back again.

Through CI/CD, you can automate your machine learning pipeline and notice a great reduction in the need for intervention by data scientists in the process, allowing them to concentrate on much more valuable aspects of the lifecycle. All of this is also known as CI/CD pipeline automation.

Continuous training (CT)

This is a concept unique to MLOps that maintains the constant freshness of data to refrain from drifts or data skews. With CT, you can be certain that the algorithm will be updated via retraining upon the first signs of decay. Operators can decide the frequency of the retraining on a needs-based basis. For example, a training cycle can be created where the retraining can be initiated: daily, weekly, monthly, only once new data is available, only once model performance drops, or only when initiated manually.

Data versioning

In traditional software development, version control systems like Git are used to manage different versions of the codebase. This allows developers to track changes, collaborate efficiently, and roll back to a previous state if something goes wrong.

In machine learning projects, the data that you use to train your models is just as important as the code, if not more so. Changes in the data can have a significant impact on model performance, and it's often necessary to experiment with different versions of a dataset to optimize results. Thus, data versioning in the context of MLOps is the practice of applying these same version control principles to datasets.

Here's what data versioning typically involves in MLOps:

Tracking Changes: Just as you'd track changes to the code, data versioning involves tracking changes to datasets. This includes changes to the data itself as well as metadata, like the source of the data and how it was collected and preprocessed.
Collaboration: Data versioning makes it easier for teams to collaborate on a project. For example, two data scientists working on the same project might want to experiment with different versions of a dataset. Data versioning allows them to do this without stepping on each other's toes.
Reproducibility: With data versioning, you can always go back and see exactly what data a model was trained on. This makes experiments reproducible, which is crucial for debugging and optimization.
Rolling Back: If something goes wrong, data versioning allows you to roll back to a previous version of a dataset.

Implementing data versioning can be challenging due to the large size of datasets commonly used in machine learning. There are tools available, like DVC (Data Version Control), that are specifically designed for data versioning in machine learning projects, which can handle large amounts of data and integrate with existing version control systems like Git.

Fortunately, there are a bunch of platforms that provide a toolset to efficiently perform not only the DVC but also general exploratory data analysis, visualization, data processing, and data validation. In general terms, these data-related processes are called ML Data Ops in the community and one of the companies providing an extensive toolset through its data pipelines is SuperAnnotate.

Model versioning

Just as with data versioning, the concept of version control is borrowed from software development. In the context of machine learning, model versioning refers to the practice of keeping track of every version of an ML model, its parameters, hyperparameters, and associated training data.

Here's why model versioning is so important in MLOps:

Reproducibility: When you have a record of every model and its parameters, you can reproduce any model at any time. This is particularly valuable when you need to debug a model, understand its behavior, or validate its results.
Experimentation: Machine learning often involves experimenting with different model architectures, parameters, and types of data. Model versioning makes it possible to keep track of these experiments so you can compare results and choose the best approach.
Collaboration: In a team setting, model versioning helps ensure everyone is on the same page. Each team member can see what models have been trained, what changes have been made, and why.
Rollback and Auditing: If a new model is causing problems, model versioning allows you to roll back to a previous version. It also provides a record for auditing purposes, which can be important in industries with strict regulatory requirements.

Implementing model versioning can be more complex than traditional code versioning because a model version involves not only the model architecture and parameters but also the data, preprocessing code, training code, and sometimes the training environment (like the specific version of the ML library used). Therefore, it often involves specific tools designed for this purpose. Tools like MLflow, Neptune, and CometML are commonly used for model versioning in the MLOps context. These tools help a data science team not only to version all its models but to effectively track the experiments conducted by every single data scientist in the team.

Experiment tracking

It is important to understand that Machine learning, by its nature, is an iterative process where data scientists train multiple models with different hyperparameters, algorithms, and data sets to find the optimal model. Each of these iterations is an "experiment," and keeping track of all the experiments becomes crucial as the number of experiments increases.

Experiment tracking in MLOps typically involves tracking the following:

Model Version: Keeping track of the model architecture, hyperparameters, and any changes made to the model in each experiment.
Data: Tracking which dataset was used, including any changes or preprocessing steps applied to the data.
Performance Metrics: Tracking metrics such as accuracy, precision, recall, AUC-ROC, loss function, etc., depending on the problem at hand. This helps to compare models and understand which experiment led to the best model.
Training and Evaluation Details: This could include details about the training and evaluation process, such as the number of epochs, batch size, split ratio, etc.
Computational Resources: Recording the computational resources used during the training can be important for optimizing costs and resources.
Experiment Outcome: Finally, the outcomes of each experiment should be logged, with reasons for why certain models performed better or worse.

There are various tools available that provide experiment-tracking capabilities, such as MLflow, Neptune, and TensorBoard, among others. These tools provide a structured way to store and retrieve experiment details, allowing data scientists to efficiently manage and navigate through the multitude of experiments.

Model serving

Model serving refers to the process of deploying a trained machine learning model in a production environment so that it can be used to make predictions on new data. The key challenge here is to host the model in such a way that it can provide low-latency responses, handle large-scale traffic, and be easily updated or rolled back as necessary.

Model serving often involves the following:

Packaging the model: The trained model must be packaged in a format that the serving system can understand. This often involves saving the model weights along with the code required to run inference.
Deployment: This is where the packaged model is deployed to a serving system. This could be a server in a data center, a cloud-based machine learning platform, or even an edge device like a smartphone.
Serving the model: The deployed model needs to be set up to accept input data, run predictions, and return the results. This needs to be done in a way that's efficient and scalable.

Model monitoring

Once a model is served and is in use, it's crucial to continuously monitor its performance to ensure that it's working as expected. This process is known as model monitoring and it involves:

Performance monitoring: This involves tracking key metrics like prediction accuracy, latency, and throughput. These metrics can help identify any issues with the model or the serving infrastructure.
Data monitoring: Since the performance of a model can degrade if the input data changes over time (a problem known as concept drift), it's important to monitor the input data for any significant changes.
Alerting: If any issues are detected during monitoring, it's crucial to alert the relevant team members so that the issue can be addressed promptly.
Model updating/retraining: Based on the insights gained from monitoring, the model might need to be updated or retrained. This could involve tweaking the model architecture, retraining the model with new data, or even replacing the model entirely.

The model serving and monitoring stages are where the "Ops" in MLOps really comes into play. It involves a lot of the same principles as traditional DevOps, like CI/CD (Continuous Integration/Continuous Deployment), infrastructure as code, and monitoring and logging.

There are several tools and platforms available that can facilitate these stages, such as Kubernetes for serving models at scale, and Prometheus and Grafana for monitoring model performance.

Machine learning project lifecycle

In order to understand the MLOps lifecycle, we need to be aware of the standard lifecycle of a machine learning model from start to “finish”.

The cycle can commonly be broken down into three phases:

1) Development of the pipeline

2) Pipeline training

3) Inference

If we expand a bit further on the process, the steps are undoubtedly more extensive and follow this order: data collection, pre-processing of data, dataset construction, model training, refinement, evaluation, and finally deployment. The great part of MLOps is that it covers the entire ML pipeline lifecycle — and more. MLOps spans from the entire design and development process of a pipeline to training the model, deployment, and monitoring in post-production.

The crucial thing to keep in mind about the machine learning model development lifecycle is the fact that once begun, it becomes an open loop. Unlike code-based applications, machine learning models must be continuously monitored and maintained over time to see how they're performing and shifting with new data–ensuring that they're delivering real, ongoing business impact. When data becomes outdated, the model must be re-trained with relevant data to promote accuracy and so on. That makes the MLOps lifecycle an open-looped process as well.

Popular MLOps tools

When building the pipelines discussed so far, one needs to carefully choose and use tools available in the market based on their needs, as building everything from scratch can be very tedious and this approach is usually a go-to one for high safety and privacy-critical applications.

Major cloud providers provide various services to automate different areas of MLOps lifecycle. They provide AutoML tools to ease the model development process, tools to automatically detect any model drift that can be used in combination with the extensive CI/CD/CT automation tools available to initiate a model retraining and subsequent deployment.

On the other hand, they lack experiment tracking and ML Data Ops tools which we discussed before, hence they need to be used in conjunction with the tools successfully automating those parts of the lifecycle.

Databricks and Snowflake are major platforms specialized in MLOps that utilize the resources of the major cloud providers behind the scenes and effectively mix in experiment tracking tools like MLflow and provide extensive data manipulation, exploration and visualization tools. Seemingly they cover everything that one would need to build an end-to-end MLOps pipeline.

This would indeed be the case if you utilize only tabular data. In the case of other modalities, the provided services are either non-efficient or should be manually extended by the data scientists as they would not work out of the box.

Why do we need MLOps

Now that we've taken an in-depth look at what is MLOps, we can answer the question in the back of many people's minds, “Why do we need MLOps?” The better question would be to ask “Why not?” When MLOps best practices are integrated into the business model, a plethora of advantages become apparent and the implementation of data-centric AI becomes easier. As with anything in the business world, a cost-benefit analysis is necessary and it's up to you to decide what works best. Let's break down the pros and cons.

Benefits of MLOps

Increased scalability — A primary benefit of introducing MLOps to your business is the immense increase in scalability. Essentially, the goal of scalability is to achieve a greater magnitude of production without doing extra work that will inhibit growth. With MLOps solutions in place, the business will have the upper hand in not starting processes from scratch and replicating previous models for future implications, thus providing more avenues for the business to scale up.
Security and governance — Issues related to security and governance are by far one of the most common without the presence of MLOps. A primary purpose of MLOps is to monitor all environment changes that limit the possibilities of information loss and unidentified changes, along with ensuring the model reaches its target goals thanks to model risk management.
Maintain ML health — Much of MLOps follows through to the period after production where constant maintenance and monitoring are necessary to keep the ML model up-to-date, along with detecting any possible drifts. Enabling MLOps considerably boosts ML health thanks to consistent monitoring and adjustment.
Quicker and efficient deployment — Did you know that on average it takes a month to deploy a single ML model without MLOps? Some may not even surpass this stage altogether. Streamlining this process is at the center of MLOps and aids businesses in pushing more models to production to increase ROI.

Limitations of MLOps

The initial cost of implementation — Opting for an MLOps framework will be costly to implement which is what repels many people who first learn about what MLOps is. Over a five-year period, you can expect to spend anywhere in the ballpark of $90,000 for it. Many will affirm that it's nonetheless an investment that pays off over time, yet it is costly for companies who do not have sufficient resources for it.
Manpower limitations — Another one of the MLOps challenges businesses face is directly related to the individuals handling the operations. As mentioned above, ML operations are a collaborative effort that involves individuals from various backgrounds including DevOps specialists, data engineers, and data scientists. As production quantities expand, businesses will require more manpower to not only maintain current models but manage additional ones, increasing labor costs respectively. You will also need to oversee proper hand-offs in the case of shifts in staffing.

Does your business need MLOps?

Unsure whether or not your company needs to implement MLOps right now or even in the near future? Many sources will tell you that if you have the intent of productionizing machine learning models, then MLOps is not an option but a necessity. However, there are companies with fewer projects and resources that are not in a hurry to implement MLOps just yet. Here are a few tell-tale signs that MLOps is becoming a necessity for your business:

You work with either Edge AI or cloud computing environments.
You work with diverse languages, libraries, and tools to the point that it becomes difficult to keep track of them all.
You aim to scale your ML applications in the near future.
You have multiple models that are stuck in the deployment phase, not making it to production.

MLOps in SuperAnnotate

Ensuring a comprehensive set of MLOps tools is essential for businesses in order to manage and optimize their ML workflows from end to finish. The right annotation platform that offers easy project management, data curation, data versioning, model management, automation, and complete SDK, will allow businesses to ease their AI journey and automate complex AI pipelines. SuperAnnotate is happy to offer clients with all of what we just listed, and you're welcome to request a demo!

Key takeaways

MLOps is a relatively new approach to machine learning model management that has taken a lot of inspiration from DevOps. It enables faster intervention when ML models degrade, resulting in higher data security and accuracy, as well as allowing enterprises a faster approach to the creation and deployment of machine learning models that are difficult to achieve otherwise. With MLOps in place, data scientists will not need to be involved in architecture operations or deployment and concentrate on the phases of the lifecycle that are in their expertise, decreasing stall times to production. With this, the full process from planning to moving to production can be executed more efficiently to benefit both from a business and science standpoint. At this point in time, MLOps offers more benefits to the ML lifecycle than limitations and is considered more and more as a must instead of as an option.