After years of groundwork, the Open Source Initiative (OSI) has finally unveiled version 1.0 of the Open Source AI Definition (OSAID) — an “official” benchmark that marks a new era for open source in artificial intelligence. As the nonprofit that has guided the broader open source movement, OSI has now crafted a guideline to help the world determine what truly counts as open source in AI. It’s a milestone, especially as this is an era where terms like “open source” are frequently overused or misused, particularly in AI.
Stefano Maffulli, OSI’s Executive Vice President, emphasized that establishing consensus on open source AI is crucial. The AI industry is growing rapidly, and as regulatory frameworks emerge, there’s an urgent need for a shared understanding of what constitutes “open” AI. “Regulators are already watching the space,” he noted, pointing out that the OSI engaged stakeholders across academia, tech, and government to ensure that OSAID has the backing needed for real-world relevance.
What It Takes to Be “Open Source AI” Under OSAID
To qualify as open source under the OSAID, an AI model must meet several critical requirements:
- Transparency of Design: An open source AI must reveal enough of its design so that an individual can recreate it. The model should disclose how it was developed, the steps taken in data preprocessing, and the full code used in its training process.
- Clarity on Training Data: It’s essential that any AI model adhering to OSAID discloses detailed information about its training data — its sources, processing methods, and licensing status.
- Freedom to Use, Modify, and Build: True to the spirit of open source, OSAID-compliant models allow developers the freedom to use them as they wish, without restriction, to modify them, and to create derivative works.
Maffulli explained that an open source AI, under OSAID, must be a model “that allows you to fully understand how it’s been built.” This means developers should have access to all critical components — including the code used for training and data filtering — not just limited parts.
Why Does It Matter?
One might wonder, why does a unified definition matter in a space that’s already saturated with “open” or “open source” claims? One of OSI’s goals is to support lawmakers and industry developers in achieving a shared understanding, especially as regulators like the European Commission move toward acknowledging open source with unique privileges. Maffulli pointed out that reaching a consensus across industries and regions isn’t just about policing definitions but also about protecting open source as a valuable, clearly understood resource.
In the long term, OSI believes this consensus could curb misuse of the term “open source.” Although OSI doesn’t have any formal enforcement power, Maffulli is optimistic that community oversight will push companies to align with OSAID standards or risk a backlash. “Our hope is that when someone tries to abuse the term, the AI community will say, ‘We don’t recognize this as open source,’” he said. Though informal, this kind of “peer pressure” has shown mixed success in tech circles but still holds influence.
The Challenge of Truly “Open” AI: Tech Giants Face Scrutiny
Notably, big players like Meta, Google, and Stability AI have adopted the “open source” label for their AI models, but under OSI’s new guidelines, these claims may not fully align. For instance, Meta’s Llama model — although touted as open source — comes with restrictions: platforms with over 700 million monthly active users must request a special license for access. Meta has defended this licensing decision, stating that restrictions are necessary to prevent misuse, but Maffulli has openly questioned Meta’s use of the open source label under these terms.
Meta isn’t the only one to face scrutiny. Stability AI’s models, promoted as open source, require businesses with revenues exceeding $1 million to obtain an enterprise license. This, combined with similar practices from upstart AI companies like Mistral, has raised concerns among researchers. A recent study conducted by the Signal Foundation, the AI Now Institute, and Carnegie Mellon found that many so-called “open source” models are far from transparent, keeping data sources hidden and requiring intensive computational resources to operate — barriers that exclude smaller developers from participating.
This study suggested that instead of democratizing AI, such practices reinforce power structures, centralizing AI’s benefits in the hands of large corporations. Meta’s Llama models have been downloaded hundreds of millions of times, while Stability claims its models power up to 80% of all AI-generated imagery. While open source AI was originally intended to break down these barriers, the reality remains complex and somewhat contradictory.
Meta’s Stance on OSAID: A “Cautious Approach”
Meta has expressed reservations about the OSAID guidelines, despite participating in its drafting process. A spokesperson acknowledged Meta’s “cautious approach” to sharing model details, attributing this to concerns over emerging regulatory frameworks, such as California’s new transparency law for AI. “There is no single open source AI definition,” Meta’s spokesperson said, adding that the complexities of today’s AI models require a different interpretation of openness than in traditional software.
Other industry entities, like the Linux Foundation and the Free Software Foundation, have also weighed in with their own frameworks for what constitutes “open” AI, often with a more lenient approach to the training data’s openness.
Meta, which has contributed funding to the OSI’s work (alongside Amazon, Google, Microsoft, Cisco, Intel, and Salesforce), is still committed to working with OSI and others to balance accessibility with safety. As regulations around data usage and AI transparency evolve, the tension between openness and competitive advantage will likely intensify.
The Data Challenge: Balancing Transparency with Competitive Interests
At the heart of the OSAID debate is the handling of training data, which is frequently sourced from publicly available material across the internet, including images, audio, text, and videos. While OSAID champions transparency, companies face fierce competition and are often reluctant to disclose how they compile or process their data.
Beyond competitiveness, legal risks are another reason companies hesitate to disclose training data. Some authors, publishers, and artists have alleged that companies like Meta and Stability AI have used copyrighted material in training without permission, and lawsuits are mounting. For developers, providing detailed records of training data could increase liability, especially as lawsuits over copyright in AI escalate.
What Lies Ahead for OSAID and Open Source AI
Not everyone is convinced that OSAID has gone far enough. Luca Antiga, CEO of Lightning AI, raised concerns that the standard doesn’t fully address proprietary training data licenses. A model could technically meet OSAID’s standards, but if inspecting the training data requires access to privately licensed repositories, the model may still feel less than fully “open” to the community.
Moreover, the OSI hasn’t yet tackled copyright questions for AI models. Would granting copyright to a model (or parts of it) suffice to satisfy open source requirements? Maffulli admitted that OSAID will need updates as the industry evolves, and OSI has set up a committee to monitor the OSAID’s implementation, evaluating how well it aligns with industry needs and future legal landscapes.
“This isn’t the work of lone geniuses in a basement,” Maffulli said. “It’s work that’s being done in the open with wide stakeholders and different interest groups.”
In Conclusion
As AI continues to shape our digital future, having an established open source standard like OSAID brings a much-needed lens for transparency and ethics. The OSAID’s debut is an important step toward open AI, but it’s clear that the debate over what “open source” really means for AI is far from settled. For now, OSI has laid down the groundwork — and as the AI industry navigates this fast-moving space, time will tell how the OSAID impacts real-world AI development, access, and regulation.