

The ML operations
Perspective
In this section we focus on finding the best pre-built solution from external providers. In their book Prediction Machines (2018), Agrawal et al state that “prediction is the process of filling in missing information”. Accurate predictions are great, but external providers are often black boxes that we need to make at least partially transparent to properly compare them with each other. Let’s start with a categorical framework to better understand the landscapes of Machine Learning providers.
​
​
| The three types of
| ML providers
​
Since 2016, we have evaluated more than 300 providers globally in a broad range of ML applications for our corporate clients. These evaluations have revealed that there are three distinct clusters of providers: HyperScalers, Startups and Incumbents. No cluster has an intrinsic advantage per se, but each has characteristic benefits depending on the nature of the use case. It is very advisable to evaluate potential provider solutions coming from all three clusters.
​
​
HyperScalers
​
Examples include Microsoft Cognitive Services, IBM Watson, AWS (Amazon Web Services), GCP (Google Cloud Platform) and other dominant players that offer standardized microservices to solve concrete problems, such as document understanding, natural language understanding or speech recognition. The advantages of such providers are that their solutions are well-documented, easy to integrate (even for non-specialists), well-maintained, and have the potential to be developed further in the next few years. These providers also have potential challenges worth considering: they think in global market opportunities, depend on legacy revenues and must consider their own legal and business-driven processes to bring innovations into the market. So, it is important to take a deeper look at the solution to determine if it is state-of-the-art and how much effort it will take to adapt the standard solution to the specific needs of your use case. Furthermore, such providers have rigid pricing schemas that might be unattractive for large scale projects.
Startups
​
These are small companies that focus on applying ML to solve a specific industry problem. There are landscape reports and lists of hundreds of startups per use case or industry, so it doesn’t make sense to name only a few of them here. The advantages of startups are that they often have innovative approaches, with new core ideas to solve specific problems. They work in a very agile manner and with a strong drive to deliver something of great value to win prestigious customers. Startup founders and their investors generally aim to become a global market leader in an industry segment and best-to-go initial public offering (IPO). The reality, however, looks different: many startups are difficult to integrate and die trying, and other are acquired by bigger companies. If the scope of the transaction is to acquire intellectual property (IP) or talents, the startup might be fully integrated into the larger organization, and the startup solution may no longer be supported. Therefore, potential startup-specific challenges are the maturity of the solution and the business continuity.
​
Another consideration is that startups make lofty promises to companies, but few can actually deliver on those promises. This is especially relevant when it comes to service-level agreements; most startups prioritize scaling their product and customer base to meet the investor expectations. If 24/7 availability is key to success in your industry, you should be clear on how the startup can guarantee its availability when working in various language markets or multiple time zones.
​
​
Incumbents
​
These are companies that have already served the market for quite a while. They bring to the table good references and vast experience with relevant use cases. Nevertheless, we have seen challenges arise when such companies try to make use of certain machine learning tactics without touching the core components of their existing solution.
​
​
| Understand most relevant
| evaluation criteria
​
Different stakeholders from purchasing, IT and business teams will focus on a broad range of selection criteria. In this section we will discuss the most relevant aspects to consider for external ML solutions.
​
​
Performance
​
The first and most important selection criterion is whether the solution meets the minimum functional requirements in the production environment. As we described earlier, ML models often show a trade-off between accuracy and latency. Therefore, you should test and evaluate their performance under realistic conditions with respect to the desired production environment.
​
​
Integration effort
​
The effort and additional workloads required for the integration of some “off-the-shelf” ML systems can result in organizational friction and hidden costs for set up and maintenance. Such a burden can far exceed the desired benefits and cost savings. If the system needs to access and retrieve information from corporate databases, the data management should not involve extra work, such as duplicating databases, creating additional data validation modules or posing risks to data integrity. That being said, applying pre-built machine learning systems to a new use case will always require a certain amount of manual work for adaptation and testing. In order to conduct a proper evaluation of the implementation effort, it is critical that you involve experts who conceptually understand the solution domain and the differences between providers. Only with this understanding are you able to perform meaningful tests to determine whether the system is behaving as expected.
​
​
Total costs of ownership (TCO)
​
To avoid comparing apples with oranges, you should estimate all the costs that will be created by each of the available pre-built solutions. The costs for set-up and the solution life cycle should not only be measured by what you will pay to the provider. Depending on the use case and the pre-built solution, there might be large differences in additional costs created elsewhere in your organization in the form of auxiliary work or for accessory software licenses.
​
​
Lock-in effects
​
The supply side of ML models and solutions is very diverse, with many proprietary and open-source models already available for use. Sooner than later, there is a good chance that you’ll want to switch from your current provider to a more performant one, as technical advancements are frequently spilling over from research to the industry, resulting in large improvements in accuracy, latency and operational costs. When this happens in your use case, you’ll want to take advantage of the new technology and not be impeded by significant switching costs.
​
​
| How to find the right provider?
​
Now that we have defined the most critical selection criteria, we are ready to examine how to measure each of them.
​
​
Prepare yourself to ask the right questions
​
As Sculley et al. (2015) explain, “Developing and deploying ML systems is relatively fast and cheap but maintaining [and evolving] them over time is difficult and expensive”. Therefore, it is essential to understand where the respective provider stands in terms of architecture, data, performance, maturity, and market strategy. Not all providers and solutions will be prepared to easily adapt to future changes and to meet the expectations along your roadmap.
​
You should try to understand the available provider solutions as thoroughly as possible. There are five areas that we will take a closer look at now.
​

System and architecture. We define a system as the entire application to solve a problem or automation task. The particular assembly of the ML models into the production environment is what we call the architecture. When preparing to assess an ML system, ask the respective providers the following questions:
​
-
Which machine learning methods are used and for which purpose?
-
Why is a particular architecture chosen?
-
Which outputs are being produced by the system?
​
Often, multiple components are required to solve an automation task. For example, text-based chatbots need to perform intent classification, so the system can search for the correct topic. This is followed by named entity recognition to improve the context and semantic understanding, slot filling and gap analyzer to get a sense of the relevant information that was already provided to the system, and finally elastic search to retrieve the right information from the corporate systems and databases as an answer.
It is helpful to understand how all these methods are coupled together in the architecture, how the data is processed and where it is stored. Additionally, it is useful to understand the interaction of the ML architecture with other IT systems, e.g. if the chatbot can access and modify data from other systems like Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM) to enable self-service transactions and e-commerce.
Data for training and testing. Depending on the use case and maturity of the solution, the provider might need to train or retrain the models in the system. Ask the provider:
-
On which and with how much data was the model initially trained?
-
Which and how much additional data is necessary for adaptation to your use case?
-
Which and how much test data is necessary for performance measurement?
High-quality training data is key to achieving a high degree of accuracy. However, your use case might be slightly different to the problem a provider is currently addressing, and they might not be allowed to use other client’s data and models to adapt to your use case. In such cases, you will need to give the provider training data and reserve additional test data so you can measure the model's performance in a meaningful way. Depending on the algorithmic approaches of different providers, there can be a huge difference in the amount of training data they need and the amount of work that you need to put into data preprocessing and labeling.
Solution maturity and references. The more mature the solution and the longer reference customers use the solution in their production environment, the higher the accuracy and the more stability you can expect from a provider’s solution. Keep in mind, however, that some providers will claim that they have corporate customers, even if their solution is not being deployed in the real world. So check these out:
-
Which companies are already using the system, since when and in which environment (prototyping vs production)?
-
Is it possible to have an exchange of experiences with two or three of these customers?
-
What is the roadmap of the past twelve months and for the upcoming twelve months?
It is especially important to understand what the provider has improved in the past and what is planned in the near future, hence the third question above.
From development to production. The ultimate goal of deploying an ML system is to solve the business problem at hand in production. That is the reason why you should envision the entire roadmap already from the beginning. Ask the provider the following questions to get a better understanding:
-
How long does the implementation take?
-
Who has the ownership to develop, integrate and maintain this system in the company?
-
What is the DevOps approach, and how are the software updates deployed?
To fulfill their purpose, ML systems need to be integrated into companies' production environments. To get the approval for integration and deployment, ML systems must follow multiple guidelines and pass various tests. Additionally, there might be specific adaptations that the provider will request from you in order to connect to their API endpoints.
Some startups might lack a proper definition of their integration requirements, so it is a good practice to clearly define the respective roles and responsibilities of the provider and your company for the monitoring and maintenance of the system once it is deployed in production.
Pricing model and costs. Once the accuracy and performance of the ML system is good enough, it is the costs that will have the most bearing in your business case. Since we are very early in the process, the following questions can help you to better understand the cost implications you will get into:
-
What are the costs for setup and ongoing operations?
-
What are the general cost drivers?
-
What are the parameters underlying the cost and pricing assumptions?
It is always a good idea to distinguish the setup from the operational costs of the solution and to understand the cost drivers at each stage of the implementation. Costs can also heavily depend on underlying parameters, such as latency, service levels, security management, scalability, update frequency, and so on.
​
​
Deselect providers that do not meet your requirements
​
You should structure the selection process into five milestones, which we will describe below.
​
​

​
​
1. Start with a longlist. First, put together all providers that might be relevant to your use case. As mentioned earlier, you will most likely find very different providers from the three clusters: HyperScalers, Startups and Incumbents. Don’t deselect providers from one cluster too early, since at this stage it is not clear which approach will be appropriate to solve your problem with sufficient accuracy at reasonable costs.
Here are some best practices to create a longlist of potential providers:
-
Read market studies and research reports.
-
Search for keywords on the internet.
-
Ask colleagues and business partners.
​
2. Reduce to a shortlist. Sometimes, a longlist can contain dozens or even hundreds of providers. To make the selection process manageable, reduce the number of providers that you will be talking to later in the process. Our experience tells us that you should not put too much emphasis on reports in this stage, since they are most likely not complete or even outdated.
Put yourself in the driver’s seat and define selection criteria as early as possible. After acquiring more insights and experience during this stage, you might need to adapt or add some criteria.
Again, here is a condensed list of best practise to follow:
-
Define initial selection criteria.
-
Create a one-pager per provider.
-
Rate the provider solutions.
It is hard to provide a rule of thumb for how many providers to include on your shortlist, but we recommend narrowing it down to a number between ten and twenty.
​
3. Request proposals from the providers. Ask the remaining providers to send over further information about their company and solution based on a general description of the problem you are looking to solve. At this point, you should involve all relevant stakeholders from your company (if you haven’t already) to get a good sense of all the relevant knock out questions from IT, security, data privacy and procurement.
Best practices at this stage and milestone include:
-
Involve procurement to contact providers.
-
Develop and send a structured questionnaire.
-
Offer a conference call to clarify questions.
Some providers might hand over what we call a corporate sales playbook, in which all relevant information from their corporate customers’ perspectives is already provided and structured in a very convenient manner, such as use case and service description, competitive positioning, proof of concept and implementation roll-out plan with roles and responsibilities, pricing schema, service levels and company overview.
​
4. Conduct face-to-face interviews with providers. Based on the information potential providers have sent, you might already have a clear picture of their capabilities and solutions. If not, use face-to-face interviews to clarify your open questions, and do not waste time with demos that are not relevant to your use cases or open questions.
Best practices in this stage include:
-
Invite providers to demo their solutions.
-
Use a questionnaire as a structure for the discussion.
-
Start technical diligence and ask until you understand their solution.
5. Prove that their technical concepts are mature enough. Before deciding which provider has the most appropriate solution for your problem and business case, it is mandatory to test at least two or three different solutions.
Best practices in this stage include:
-
Prepare proof of concept and determine costs and timings.
-
Finalize the technical due diligence.
-
Complete the evaluation and prepare the selection decision.
It is only by comparing different solutions that you will get a sense for what makes one provider’s solution stand out against the others. You will also get a better understanding of how realistic your assumptions regarding workload, costs and timings are. In addition, you can get a substantial feeling about the provider’s working style.
​
​
Understand the transition from PoC to implementation
​
The final part of your selection decision should take into account how easily you can move from the PoC to the production environment. Not all the providers will show the same amount of technical expertise to integrate into your production infrastructure, nor will your IT department always possess the technical expertise to proceed smoothly with the integration of all solutions. So, depending on the use case and solution, you might need to reorganize internal processes to better cope with the data flows and the maintenance procedures of the pre-built solution.
As Wilson and Daugherty (2018) put it, in order to implement and use ML applications in production, “employees need to do different things and to do things differently”. A good starting point is to ask the respective providers for their standard roll-out plans with new customers, the roles involved on both sides and the amount of time it will take to reach relevant milestones.
​
​
​
​
| Wrap up
​
Thank you for reading this guide! We aimed to share knowledge and experiences most relevant to scoping, selecting and implementing your Machine Learning projects. We hope that you have enjoyed this journey together, and hope that this experience will inform and elevate your future work.
​
| Wrap up
​
Thank you for reading this guide! We aimed to share knowledge and experiences most relevant to scoping, selecting and implementing your Machine Learning projects. We hope that you have enjoyed this journey together, and hope that this experience will inform and elevate your future work.