TechCrunch | Kyle Wiggers
18 Oct 2023
As the generative AI boom continues, startups building business models around the tech are beginning to delineate along two clear lines.
Some, convinced that a proprietary and closed source approach will give them an advantage over the swarms of competitors, are choosing to keep their AI models and infrastructure in-house, shielded from public view. Others are open sourcing their models, methods and datasets, embracing a more community-led path to growth.
Is there a right choice? Perhaps not. But every investor seems to have an opinion.
Dave Munichiello, a general partner at GV, an investment arm of Alphabet, makes the case that open source AI innovation can foster a sense of trust in customers through transparency. By contrast, closed source models — though potentially more performant, given the lightened documentation and publishing workload on teams — are inherently less explainable and thus a harder sell to “boards and executives,” he argues.
Ganesh Bell, the managing director at Insight Partners, generally agrees with Munichiello’s point of view. But he asserts that open source projects are often less polished than their cloud-sourced counterparts, with front ends that are “less consistent” and “harder to maintain and integrate.”
Depending on who you ask, the choice in developmental direction — closed source vs. open source — matters less for startups than the overarching go-to-market strategy, at least in the earliest stages.
Christian Noske, a partner at NGP capital, says that startups should focus more on applying the outputs of their models, open source or not, to “business logic” and ultimately proving a return on investment for their customers.
But many customers don’t care about the underlying model and whether it’s open source, Ian Lane, a partner at Cambridge Innovation Capital, points out. They’re looking for ways to solve a business problem, and startups recognizing this will have a leg up in the overcrowded field for AI.
Now, what about regulation? Could it affect how startups grow and scale their businesses and even how they publish their models and supporting tooling? Possibly.
Noske sees regulation potentially adding cost to the product development cycle, strengthening the position of Big Tech companies and incumbents at the expense of small AI vendors. But, he says that more regulation is needed — particularly policies that outline the “clear” and “responsible” use of data in AI, labor market considerations and the many ways in which AI can be weaponized.
Bell, on the other hand, sees regulation as a potentially lucrative market. Companies building tools and frameworks to help AI vendors comply with regulations could be in for a windfall — and in the process “contribute to building trust in AI technologies,” he says.
Open source versus closed source, business model and regulation are just a handful of topics covered here. The respondents also spoke to the pros and cons of transitioning from an open source to a closed source company, the possible security benefits, and dangers of open source development and the risks associated with relying on API-based AI models.
Read on to hear from:
Dave Munichiello, general partner, GV
Christian Noske, partner, NGP Capital
Ganesh Bell, managing director, Insight Partners
Ian Lane, partner, Cambridge Innovation Capital
Ting-Ting Liu, investor, Prosus Ventures
The responses have been edited for length and clarity.
Dave Munichiello, general partner, GV
What are some key advantages for open source AI models over their closed source competitors? Do the same trade-offs apply to UI elements like AI front ends?
Innovation in public (via open source) creates a dynamic where developers have a sense that the models they’re deploying have been deeply assessed by others, probed by the community, and that the organizations behind them are willing to connect their reputations to the quality of the model.
Academia and enterprise R&D were the sources of AI innovation for the past several decades. The OS community and products associated with OS make an effort to engage that critical part of the ecosystem whose incentives vary from profit-seeking businesses.
Closed source models may be more highly performant (perhaps have a technical lead by 12 to 18 months?) but will be less explainable. Further boards and executives will trust them less, unless they are strongly endorsed by a brand-name tech company willing to put its brand on the line to certify quality.
Is open sourcing potentially dangerous depending on the type of AI in question? The ways in which Stable Diffusion has been abused come to mind.
Yes, everything could be potentially dangerous if used and deployed in a dangerous way. Long-tail OS models may, in a rush to market, be less scrutinized than closed source competitors whose bar for quality and safety must be higher. As such, I would differentiate OS models with high usage and popularity from long-tail OS models.
Are all startups in the AI space that begin by embracing open source destined to eventually go closed source, once the commercial pressure’s on? Can you think of any profitable, financially stable open source AI businesses?
Lots of large AI businesses. It sounds like you’re looking for a venture-backed AI business? Most AI businesses are single-digit years old and have been encouraged to lean into this time of growth, so I’m not sure I would get excited about one focusing on profitability today. Worth a deeper discussion.
Can open source startups successfully transition to closed source without alienating their community and customers?
Nearly my entire portfolio is built on some sort of open source technology. But there are myriad business models to build on top of OS, in partnership with OS, etc.
How could regulation in the U.S. and abroad affect open source AI development? Are investors concerned?
Smart legislators should encourage innovation in AI and ML to happen out in the open, as it will accelerate U.S. capabilities and competitiveness.
Any other thoughts you’d like to add?
Happy to spend more time talking about our open source portfolio and our AI/ML portfolio. We haven’t yet invested in model-building companies, but we do have strong opinions about where the future of AI may be headed.
Christian Noske, partner, NGP Capital
What are some key advantages for open source AI models over their closed source competitors? Do the same trade-offs apply to UI elements like AI front ends?
User interfaces tend not to be central to most LLMs, as most developers utilize them via APIs. But there are several advantages to using an open source AI model instead of a closed source competitor, namely open source AI is often cheaper, more customizable and flexible.
Open source LLMs can also be deployed on-premise and even run in air-gapped environments, which can be better for compliance and information security.
Related to the previous question, can open source lead to more secure and stable products than closed source? I’m wondering specifically about identifying the weaknesses in models, like prompt injection vulnerabilities.
Open source AI can create secure, flexible and agile environments. But that’s not to say that closed source models are not secure or inflexible; it’s just that the open source community, by its nature, can place more value on ethics, combating biases and misinformation.
Open source models can be more cost-effective, too; there is no need to pay for closed source model use, which can seem cheap at first, but often scales dramatically with increased use. Typically, companies pay per API call as they use the programming interfaces.
The new smaller versions of LLaMA and Mistral are great, for example, and perform nearly as well as larger, more expensive models. Generally speaking, open source model performance still trails closed source, but it’s getting closer.
Prompt injection is a concern for any AI model, particularly LLMs, but it tends to be a vulnerability in the front end and software engineering process, rather than with the model itself. So there isn’t much difference between open and closed source in that respect.
Is open sourcing potentially dangerous depending on the type of AI in question? The ways in which Stable Diffusion has been abused come to mind.
It’s early days, but I’m not sure I would describe open source AI as being potentially dangerous, but closed source models have been created by businesses who are obliged to protect and control who is using their models. Their reputations are on the line. So if a malicious actor uses a model, most companies are going to put a stop to that pretty quickly.
Open source models, by comparison, can be deployed by anyone. An open model might inherently be able to block some malicious use cases, and often include some form of moderation in their models, but it’s almost impossible for them to block all malicious use. The same goes for any open source software, but open source LLMs create a new category of malicious use cases — phishing attacks, deep fakes, etc., and malicious actors are already using them to create havoc.
The challenge is for the ecosystem to come up with a way to detect, regulate and tackle these problems. Open source LLMs as a phenomenon can’t be reversed, nor should [they] be. Better regulation won’t cause any problems for those of us who are using open source models for good.
Lots of startups have built their businesses around both open source models and closed sourced models available through APIs. How effectively will startups that use publicly or commercially available AI models be able to differentiate themselves?
Any model, irrespective of whether it is open source or not, will have certain strengths for solving specific problems. I believe startups will benefit from using a combination of open source and closed source AI models, but they need to ensure their technology is as plug and play as possible. For example, leveraging Midjourney to ensure they get the highest quality images versus leveraging Dream Studio by Stability AI for highly customized images.
The best way for a startup to differentiate themselves in any market is their ability to apply the output of any model into business logic and ultimately prove ROI for their customers. Smart hybrid model use will also enable developers to offer the most compelling solutions.
For startups relying on commercial models accessed via an API, how much platform risk (pricing, etc.) will they have to manage?
This isn’t really an issue for early-stage startups. But, like cloud cost optimization, once a startup starts to scale, that type of risk becomes very important. First, you need to make a solution work; then you can make it cheaper/more efficient.
Once you have scale, platform risk is always important to keep in mind; the best applications can be deployed today to AWS, Azure or GCP. Similarly, and in addition to the points I mentioned in my previous answer, you should be able to work flexibly within different platforms. Your customers will also expect high levels of flexibility and control. Keeping this in mind will increase your ability to negotiate on pricing and to reduce any platform risks.
Are all startups in the AI space that begin by embracing open source destined to eventually go closed source, once the commercial pressure’s on? See Anthropic, OpenAI, etc.
That’s certainly a standard evolution of any young industry. Right now, there are a lot of benefits for an initially open source model to move to closed source once it has achieved critical mass — from basic infrastructure needs to the current, unprecedented hype around the power of generative AI models.
That said, I believe that open source models are here to stay, and they will continue to be an important part of the future of gen AI models, because of the cost benefits, diversity of features and transparency they offer.
Can you think of any profitable, financially stable open source AI businesses? Certainly, I’m aware there’s some on the infrastructure and tooling side.
Outside H2O.ai and a handful of others on the infrastructure side, a lot of funding has been raised, but I don’t see many profitable businesses yet. But that is likely to change.
Can open source startups successfully transition to closed source without alienating their community and customers?
No, I don’t think that is possible, especially if it’s a complete 180-degree change and the business in question fails to keep the core DNA of how and why the company was created. It would have to be a hybrid environment with an open core and a closed source environment around UI and business wrappers/integrations to keep everyone happy.
How could regulation in the U.S. and abroad affect open source AI development? Are investors concerned?
Regulation is always a consideration for investors, but open source AI will always exist, and increased regulation will make it more important for businesses to flexibly leverage different models, depending on the jurisdiction they are operating in.
My biggest concern is the additional cost of any new regulation. That can hamper innovation, which will strengthen the position of Big Tech and reduce innovation long-term. However, it is clear the responsible use of data, transparency, labor market considerations, deepfakes and weaponization all require some sort of government involvement and well understood (stable) rules.
Any other thoughts you’d like to add?
Open source is a fantastic and unique source for policymakers, innovators and commercial teams to learn, test and innovate. For example, it is a great platform for communities to agree on what “good” and “bad” looks like. Once that is agreed, everyone can move forward with the sustainable development of exciting new technologies like AI.
Ganesh Bell, managing director, Insight Partners
What are some key advantages for open source AI models over their closed source competitors? Do the same trade-offs apply to UI elements like AI front ends?
Open source models like LLaMA, Falcon and Mistral can be inspected and audited by anyone, which can help ensure that they are unbiased and fair. While the communities and collaboration that build around open source can drive faster innovation, the scale of reinforcement learning of closed sourced models may have an edge in general intelligence tasks.
But, customizability, cost-to-license/serve, “good-enough” performance, steerability and ability to host models closer to private data will make open source options attractive for many use cases, even though their front ends may be less polished, less consistent, harder to maintain and integrate. Open source models expand the market, have potential to democratize AI and accelerate innovation. This means startups, scale-ups and enterprises alike can reimagine and solve interesting and pressing problems.
Related to the previous question, can open source lead to more secure and stable products than closed source? I’m wondering specifically about identifying the weaknesses in models, like prompt injection vulnerabilities.
Yes, transparency, independent audits and the diversity of contributions of open source will help dramatically, but it is also largely down to how well the open source project is managed, funded, and the responsiveness of the community. Base models that have not been fine-tuned through either reinforcement learning from human feedback (RLHF) or through constitutional AI (see Anthropic) will always have biases and have no inhibitions in their responses, which could pose a risk. We are excited about the innovation we see across AI governance, model monitoring and observability.
Is open sourcing potentially dangerous depending on the type of AI in question? The ways in which Stable Diffusion has been abused come to mind.
Open sourcing does carry risks depending on the AI capabilities involved. Stable Diffusion illustrates how generative models can be misused to spread misinformation or inappropriate content if publicly released without safeguards. However, openness also enables positive advancements through collaboration. It is essential to carefully consider the ethical and security implications before open sourcing AI technologies and fostering healthy communities.
Lots of startups have built their businesses around both open source models and closed sourced models available through APIs. How effectively will startups that use publicly or commercially available AI models be able to differentiate themselves?
AI (discriminative and generative) is a bigger programming model shift than cloud or mobile, but it has also been easier to incorporate into existing apps. But moats are still possible in this new architecture: UX, richness of integration, consumability, customizability, data feedback loops, etc., still matter.
Startups versus incumbents is more than just access to technology: Transformative ideas reimagine versus sprinkle AI on top, go deeper into verticals, codify deep domain in models and code. There will be some sedimentation or commoditization of layers, but differentiation is possible in making hard things easy, easy things automated, and impossible things possible. I think this will also vary based on archetypes of applications.
For startups relying on commercial models accessed via an API, how much platform risk (pricing, etc.) will they have to manage?
There is more risk this time around than in past platform shifts, mainly because the AI models are still evolving at a rapid rate and consuming functionality around them. In general, the risk lessens in the applications layer versus AI infrastructure and frameworks. Good applications are rich in functionality and are not just thin wrappers around foundation models. They also have abstraction layers where possible and degrade functionally gracefully if any with alternate models.
Are all startups in the AI space that begin by embracing open source destined to eventually go closed source, once the commercial pressure’s on? See Anthropic, OpenAI, etc. Can you think of any profitable, financially stable open source AI businesses? Certainly, I’m aware there’s some on the infrastructure and tooling side. Can open source startups successfully transition to closed source without alienating their community and customers?
The open source AI community is still young, and there is uncertainty about how open source AI startups can generate revenue and build sustainable businesses. This is also different at different layers of the stack. There are many, big and small, committed to and building on open source AI: Meta, Databricks, Posit, Anaconda, H20.ai to name a few.
However, some startups commercialize through proprietary IP over time to allow more control. A balanced approach is to incubate in open source, benefiting from collaboration, then develop proprietary complementary assets as needed for commercial viability. But for some, openness remains integral to their mission throughout.
How could regulation in the U.S. and abroad affect open source AI development? Are investors concerned?
Regulation is essential for credibility and responsible AI development. Requirements for transparency and accountability promote greater openness. We need a framework that works for research, AGI versus narrow AI. We are encouraged by enterprises that prioritize responsible and ethical AI, moving beyond mere compliance. Additionally, we believe AI governance represents a significant opportunity for startups, enabling them to help organizations meet regulatory requirements while also contributing to building trust in AI technologies.
Ian Lane, partner, Cambridge Innovation Capital
What are some key advantages for open source AI models over their closed source competitors? Do the same trade-offs apply to UI elements like AI front ends?
Open source AI models offer similar benefits to the benefits any open source software provides in other areas: flexibility and transparency into how the models were created (e.g., data used for training).
Related to the previous question, can open source lead to more secure and stable products than closed source? I’m wondering specifically about identifying the weaknesses in models, like prompt injection vulnerabilities.
No, not unless there is a structure in place (e.g., a parallel of maintainers in Linux) and a community of engaged people, who value and want to improve the open source offering.
Is open sourcing potentially dangerous depending on the type of AI in question? The ways in which Stable Diffusion has been abused come to mind.
Any AI model can be abused whether it’s open or closed source, so I am not convinced there is any additional danger because a model happens to be open source. That’s assuming the open source structure for the AI models is well established, as mentioned above.
Lots of startups have built their businesses around both open source models and closed sourced models available through APIs. How effectively will startups that use publicly or commercially available AI models be able to differentiate themselves?
Customers don’t care about underlying foundation models (open or closed); they care about finding a way to solve their problems. Startups who are customer-focused will be able to build differentiated product offerings from freely available models.
For startups relying on commercial models accessed via an API, how much platform risk (pricing, etc.) will they have to manage?
There is always a risk that platform pricing keeps increasing, and you can’t switch, but this is no different to wanting to use AWS for your cloud environment instead of Azure and having a risk mitigation strategy in place. When you’re a startup, that particular risk probably is not a priority, because you should be focused on finding a product market fit and the right business model.
Once you have addressed these issues and are building significant revenue, then perhaps platform risk becomes important. It’s always good practice to develop your product to be as platform agnostic as possible, so that migration becomes easier in the future should you need it.
Are all startups in the AI space that begin by embracing open source destined to eventually go closed source, once the commercial pressure’s on? See Anthropic, OpenAI, etc.
No, startups that are built in a more sustainable way will find a commercial open source model that works.
Ting-Ting Liu, investor, Prosus Ventures
What are some key advantages for open source AI models over their closed source competitors? Do the same trade-offs apply to UI elements like AI front ends?
Today, cost and the ability to fine-tune are the key advantages of leveraging open source models.
For example, if you can get an open source model with, say, 7B parameters to fulfill certain tasks as well as GPT-4, switching to this smaller, cheaper and more computationally efficient model makes a lot of sense. The ability to customize open source models to your specific use case is also a key advantage, and fine-tuned open source models can now outperform GPT-4 on specific tasks, while also providing the additional cost advantage. We’re therefore starting to see more startups adopt a hybrid approach and leverage an ensemble of different open source models for simpler and/or more specific tasks, alongside proprietary models for the tasks only where required.
It’s also likely that the performance of open source models continues to evolve over time with the collective intelligence of the AI community. It’s been remarkable to see how quickly people have already innovated and pushed forward these models in the last year. In the future, open source models may increasingly be chosen for their superior performance and innovation, versus only their cost advantages.
That said, closed source proprietary models today continue to offer a lot of significant advantages (e.g., OpenAI is still the best-performing general purpose chatbot), and the owners of these models are investing heavily to stay ahead. It’s probably too early to tell what the status quo mix will be for open source vs. closed source model adoption, but there is potentially a world where both exist and have significant roles to play in the ecosystem.
Related to the previous question, can open source lead to more secure and stable products than closed source? I’m wondering specifically about identifying the weaknesses in models, like prompt injection vulnerabilities.
Potentially. The collective efforts of thousands of researchers and developers refining these models from all directions could lead to more robust and secure models in the long run. It’s essentially like having a red team of thousands of “adversaries” with different and unknown motives that can effectively poke these models in unexpected ways, and much faster than a single team.
Lots of startups have built their businesses around both open source models and closed source models available through APIs. How effectively will startups that use publicly or commercially available AI models be able to differentiate themselves?
Assuming you’re mostly referring to the application layer, differentiation is indeed a key question right now, as many of the startups developing applications today are largely building off a similar set of proprietary/closed sourced and open sourced models. To stand out, companies (especially those building verticalized applications) are differentiating by fine-tuning these models with proprietary data sets to improve performance for their specific use case. This is certainly a compelling approach, though it’s probably a bit early to tell exactly how much of a long-term moat these datasets will truly offer. The answer likely varies by use case, the nature of the data, and how difficult that data moat is to replicate.
Additionally, capturing traditional software competitive advantages such as network effects and robust integration with customer data and workflows will also be key to winning in the AI space.
We’ll likely start to see players who can execute well in the above dimensions pull ahead of the rest, even if the product itself isn’t technically the most differentiated from others.
For startups relying on commercial models accessed via an API, how much platform risk (pricing, etc.) will they have to manage?
Startups probably do face a fair amount of platform risk if they depend exclusively on closed-source models accessed by APIs. Pricing risk is a major one, but there are other risks as well, notably being at the mercy of any changes that are made to the underlying model. For example, in the case of GPT, it’s constantly being tweaked and in flux, meaning startups using OpenAI risk having inconsistent user experiences that are difficult to account for and control.
Another risk is that the owners of proprietary models could decide to remove the API access altogether, for example, if they decide to move toward becoming a full-stack product company versus being a platform. We’re therefore starting to see startups and enterprises more heavily prioritize exploring open source models in order to mitigate some of these risks.