AI weights are not open "source"

Published: Jun 27, 2023
By: Sid Sijbrandij

AI licensing is extremely complex. Unlike software licensing, AI isn’t as simple as applying current proprietary/open source software licenses. AI has multiple components—the source code, weights, data, etc.—that are licensed differently. AI also poses socio-ethical consequences that don’t exist on the same scale as computer software, necessitating more restrictions like behavioral use restrictions, in some cases, and distribution restrictions. Because of these complexities, AI licensing has many layers, including multiple components and additional licensing considerations. 

img

With Heather Meeker’s input, I’ve created a set of categories to standardize how we talk about AI licensing. The table above names different categories of license types based on the function they serve and which component they apply to. Typical software licensing falls into two categories: proprietary and open source. Some people have the perspective that if a license isn’t open source, it’s proprietary. I think it’s more nuanced than that and believe there are three more license types worth naming: non-commercial NDA, non-commercial public, and ethical. There is no standard license for any of these categories, so I suggest we categorize existing licenses as proprietary, collaborator, available, ethical, and open.

“Open Weights” is already gaining traction as the default name for “open source” AI weights. Meeker and Joseph Jacks of OSS Capital recently announced an Open Weights definition and licensing framework focused on “the original idea of openness, and preserving the original goals of Freedom Zero of free software and the non-discrimination principles of open source.” Eventually, we will need standard definitions and licensing frameworks for each category.

License categories

The proprietary license category captures fully closed licenses that can’t be used for any purposes without explicit permission from the original author.

A non-commercial, NDA license is a mostly closed license that allows non-commercial use with a non-disclosure agreement. Meta applied this license type to its AI model weights when it released LLaMA, allowing certain academic researchers, laboratories, and government agencies to download the weights. The purpose of this license, according to Meta, is to allow certain, pre-defined, and approved groups to collaborate on improving the model and support further research without compromising the integrity or opening it up to misuse. Because this license type is limited to specific collaborators, I’ve named the category “collaborator.” You could argue that this should be combined with the next license, “non-commercial, public” but for now, we think that having the weights out in public is a big enough difference.

A non-commercial, public license is similar to the Creative Commons Noncommercial license but for AI. Source code and weights are available to anyone to download and use (not just select collaborators) but can’t be used commercially. I’ve named this category “available” because it makes the component available to the public for personal use.

The ethical license category applies to licenses that allow commercial use of the component but includes field of endeavor and/or behavioral use restrictions set by the licensor. Naming these licenses as “ethical” distinguishes them from true open source licenses. The RAIL license is an example of an ethical license that includes restrictions that require the user to agree not to use the model in any way that violates the law, to exploit or harm minors in any way, or to generate verifiably false information for the purpose of harming others. While the RAIL organization suggests adding the word “Open” to RAIL licenses that include similar open-access and free-use as open source (i.e. OpenRAIL-M), this is confusing since the license is not open source so long as it includes usage restrictions. A better name would be EthicalRAIL-M. Using the term “ethical” to describe this category license clearly indicates its functional difference from open source licenses.

The open source license category is reserved for licenses that adhere to the specific open source definition set by the Open Source Initiative (OSI). Only licenses which meet the 10 criteria described should carry the “open” designation. Meeker and OSS Capital have started the effort to define Open Weights, stating, “We need a standard for Open Weights that recognizes the unique nature of NNWs and provides legal and practical guidelines for their use, distribution, and sharing.”

Separating source from weights

Source code and weights are two different things. It doesn’t make sense to call weights “open source” when it’s not source code. Code is instructions on how to do something, while weights are the output of training runs on data and are hard to decipher. Licenses designed for source code created by humans don’t directly translate to AI weights that are only readable by computers. A separate category of licenses designed specifically for weights is needed.

While the licensing principles can be similar when applied to different components, a distinction should be made when referencing licenses that apply to source code versus weights, data, etc. The Responsible AI Licenses (RAIL) initiative has adopted a similar naming convention by adding representative letters to its RAIL license (RAIL-D for data, RAIL-M for model). The same categories of licenses can be applied to each component individually, but the nomenclature should include the specific licensed component.

Better definitions to avoid “open washing”

The OSI’s definition of open source is specific, strict, and intentionally does not allow discrimination against any persons, groups, or fields of endeavor. This is intended to uphold the principles and integrity of open source. AI poses different challenges than computer software, making it necessary, to some, to restrict certain potentially harmful uses. This is understandable, but it can’t be called “open source.” Using the term ethical to describe licenses that allow free and open use, similar to open source, prevents “open washing” and provides clarity to users of the model.

It’s important to make these distinctions so that it’s extremely clear which component is being discussed and its intended use is explicit. The terms “open” and “open source” are being mistakenly applied to model weights that are not actually open source but have some level of openness. For example, it’s common for people to call AI weights “open source” when (1) they are actually referring to weights, not source code, and (2) the weights are non-commercial public, not open source. Non-commericial, public licensed weights should be called “Available weights.” It’s also common to see ethical licenses (like Rails M) called open source, but it is problematic because they include behavioral use restrictions that are not open source according to the OSI definition. I propose calling this category “Ethical Weights” to distinguish it from “Open Weights” licenses which follow the OSI open source definition as it applies to weights.

Many AI weights with the label “open” are not open source. They aren’t source code at all. Using the right nomenclature, like “Open Weights” and “Ethical Weights,” will ultimately help the industry move forward in developing standards around each of these categories.