Applying machine learning models to real-world business

Digital Rubicon | By James Williams

Cambridge Innovation Capital's Ian Lane discusses how global enterprises can utilise machine learning models

Good afternoon and a warm welcome to this, my first edition of Digital Rubicon, a dedicated weekly newsletter bringing insights on Ai and digitization trends in private markets.

Each week I will be bringing in-depth views from PE & VC firms, corporate entrepreneurs pushing the technology envelope, as well as academics. My aim for the newsletter is for deal teams and CTOs to keep up-to-date on what their peers are doing, and how they view the whole Ai trend, both from an external investment perspective and internal, operational perspective.

Without further ado, allow me to step aside and introduce the first article, based on an interview I conducted with Ian Lane, Partner at Cambridge Innovation Capital, in which we discuss how machine learning models can best be applied to the physical (not just digital) world. And in due course, support the digitization needs of large global enterprises:

“With respect to machine learning models it is really a tale of two cities,” says Lane, in his opening remark.

“On the one hand, you’ve got modern, digitally native companies – the Google’s and Facebook’s of the world – and on the other you’ve got traditional, non-digital companies – like Unilever, Walmart, Johnson & Johnson. The reality is these two types of companies have very different ML experiences.

“The former have not only been able to build up internal ML expertise, they’ve also had access to the right data in the right place, and the right infrastructure, to put ML models into the core of their operational product environment. The classic example everyone talks about is Netflix, where 80% of its stream time is based on its ML recommendation system.”

The talent paradox
Lane worked at Unilever Ventures for over 11 years before joining CIC in June 2021. CIC is a Cambridge-based VC manager that focuses on deep tech and life sciences and currently manages over £300 million in assets. During his time in the corporate world, Lane came to realise that there are a number of reasons for why traditional enterprises are still taking early steps on their Ai/ML journey.

First, talent is a big issue. Large, legacy businesses aren’t able to go out and hire or retain the best data science or ML expertise, because of the allure of Silicon Valley and the burgeoning tech start-up space.

One way that Unilever has sought to overcome this is by establishing centres of excellence in India and other countries.

Second, they don’t always have access to the right data in the right machine-readable format.

And third, they often use third party service vendors, where many of the IT systems are often bolted together, making it operationally difficult to uniformly apply machine learning tools.

One of Lane’s key areas of focus at CIC is to think about how a company’s algorithms that underpin machine learning models can be used in the real world. Or more specifically: how does an ML model fit in to an organisation that has multiple functions acting together, across different geographies with different data sets?

“I had an interesting discussion recently with Neil Lawrence, DeepMind Professor of Machine Learning at the University of Cambridge,” recalls Lane.

“He gives a lecture series entitled “Machine learning in the physical world”. The point he makes is if you’re building artificial systems that interact with the real world, they come with very real challenges compared to building systems that interact in the digital world. Data may be scarce, and any decisions made can have irreversible consequences.”

Most machine learning models are probabilistic. They are likely to provide the right answer, most of the time.

That might be acceptable when the end result is to make a shopping suggestion to someone on Amazon, or when Netflix throws up a random programme or film that leads to a sub-optimal viewing experience.

But it certainly is not if you’re an airline pilot, for example.

“In those real-world scenarios, you can blend a combination of uncertainty and prior knowledge to make decisions that can be interrogated by what you already know,” says Lane.

“The key point I took from my time at Unilever Ventures is: how do you manage what you already know with the value that machine learning can provide?”

MLOps coming of age
One of CIC’s portfolio companies is Seldon Technologies, founded by Alex Housley in 2014. Its central aim is to accelerate the adoption of machine learning across global business to help solve some of the world’s most pressing problems. It does this by providing an MLOps solution for those looking to test proofs of concept and scale ML models into their business operations.

MLOps has become an integral part of how enterprises build out their Ai strategy. It is, in essence, a framework of practices to enable effective collaboration between data scientists and operations professionals and as Venturebeat wrote in May 2021, citing Cognilytica, the market could grow from $350 million to $4 billion by 2025.

Whereas software production requires a lot of infrastructure, there is more complexity involved with regards to machine learning. Just training a model on a data set to get it working and into production, in a way that allows decisions to be interrogated by a management team, takes a huge amount of compute and engineering time.

“When you’re bringing an ML model into production, there might be different hardware used (e.g. Graphics Processing Units or GPUs) and a huge range of dependencies (i.e. data libraries) that need to be used.

“If you’re operating in a highly regulated industry, you’re going to want to know the audit trail at any one time; that is, to understand what data the model was trained on, so that if there were to be an issue, the company could point back to the data set involved,” remarks Lane.

To remove the complexity of testing and bringing ML models into production as efficiently as possible, Seldon uses an open source environment. Since its inception, it has deployed more than one million models, and partnered with industry leaders including Google, Red Hat, IBM, NVIDIA and AWS.
One of the perceptions of ML is that everything is done in a black box. For large organisations that have built brand reputations over many years and have loyal customer bases, understanding why an ML model is producing a certain decision or recommendation cannot be left to blind faith. You may not get a perfect explanation but certainly the ability to track and evaluate what an ML model is doing, and why, and being crystal clear in terms of having an audit trail on the data sets being used is helpful. MLOps solutions provided by companies like Seldon are important to getting ML models into large enterprises

says Lane.

Data Swamp
One of the challenges faced by global enterprises is the fragmented, heterogeneous nature of customer data; there’s a lot of it but as Lane says, what you end up with, typically, is “less of a data lake and more of a data swamp”. While companies might be able to extract some information it’s just not in a format that is useful enough to make decisions.

Some companies are all too aware of this and have sought to make their data more visual using dashboards. This can be a useful first step on the way to digitizing one’s operational environment but if there are multiple IT vendors involved, each providing their own dashboard, management teams can soon find themselves in a web of complexity, memorizing different passwords.

What they need is a way to automate the use of dashboards. And this is one area where ML can help.

“Rather than executives spending time logging in, looking at the different data, and making decisions on the back of it, ML models can make 70 to 80% of those decisions, with only the outliers that need to be flagged for human attention,” Lane says.

Synthetic data sets
Earlier, Lane referred to audit trails. It is an interesting observation, as one thinks about how global businesses might adopt ML models more readily. Ultimately, for highly regulated industries where data privacy is vital, finding ways to safely design and test machine learning tools, before they are applied to the real world, will be necessary.

One way to achieve this, in Lane’s view, will be to train ML models using synthetic data.

He explains:

“The biggest pull for the use of synthetic data sets is within regulated industries like banking and telecommunications that have very clear data governance in place, and are sensitive to data privacy. They have to adhere to high standards in terms of where data can go and what it can be used for.

“When you start thinking about data science and training ML models, pulling in data from different places and testing it…it’s a very experimental bottom-up approach. If you’ve got your data ring fenced by high data governance standards, it’s hard to overcome.

“That’s where the application of synthetic data comes in; a private, synthetic version of data that can be used in a test environment for data scientists to evaluate safely.

“It is possible we will start to see Ai-based audits.”

For VC managers like CIC, investing in MLOps companies like Seldon, the end game is to maximise return on investment. That will depend on EBITDA growth and the level of market penetration. Some in the industry believe deep learning could overtake supervised machine learning models that use gradient boosting to improve accuracy over the next decade.

Deep learning & quantum computing
Pitchbook’s November research report, “Taking Off AI’s Training Wheels”, found that AI mega exits (> $1billion) rose from 8 to 26 between Q3 2020 and Q3 2021. The majority of these deals were companies using supervised ML such as SentinelOne. But while deep learning companies only accounted for one mega exit deal in 2019 and 2020, last year saw seven such deals.

Going forward, deep learning is likely to gain momentum in sectors including autonomous vehicle production, and many other sectors where unclassified data sets can begin to be used to solve real world problems.

In conclusion, I ask Lane whether he sees a role for quantum computing in further accelerating ML innovation.

“My view is there will be some overlap between the two fields,” he says. “But it’ll require those who are experts in both fields, which is likely to be a small number. I’m not sure quantum computing can help on the deep learning and neural network development side, but it could help ML techniques like Markov Chains as part of the way these ML models are trained. It might help make that process more efficient.”

He is excited by the scale of ambition in Europe’s VC space, and by the fact that more organisations are thinking about how to apply machine learning, even in traditional industries like construction.

“Company founders in Europe are now thinking about how to build truly global businesses. The quality of founders we meet now is so much better than it was 10 years ago and the opportunity to work with them is what I enjoy most of all,” concludes Lane.

Read the full blog and learn more about Digital Rubicon