December 19 2023

#117 Flexing Open Weights

large language models

<< Previous Edition: Navigating the Layout

In Newsletter #105," we explored the intricacies of open-source in the realm of LLMs, particularly the daunting challenge that I called the "sausage-making" dilemma. Now, let's take a moment to revisit the three tiers of open-source as we lay the foundation for our ongoing discourse.

Levels of Openness in BC (Before ChatGPT Era)

Open-API: Fostering Interoperability

At the foundational level, Open-API represented a consensus among providers to adhere to a single API standard. This fostered interoperability and empowered users to seamlessly switch between providers. A notable example was Sun Microsystems' release of the J2EE APIs. Leading J2EE server providers, such as BEA and IBM, embraced this standard, which simplified Java adoption while maintaining clarity. Sun Microsystems also provided its own implementation, although it primarily served as a reference rather than achieving the anticipated commercial success.

The Open-Core Model: Balancing Openness and Commercial Viability

The Open-Core model, a pivotal strategy in software, melded open-source principles with commercial viability. RedHat Linux and Hadoop exemplified this approach. RedHat's core Linux operating system was open-source, encouraging community-driven enhancements, while proprietary add-ons like advanced security and tailored support catered to enterprise needs.

Hadoop, a game-changer in big data, followed a similar path. Foundational components like the Hadoop Distributed File System (HDFS) and MapReduce were open-source, enticing innovation from the user community. Commercial entities like Cloudera and Hortonworks then provided proprietary services and tools, driving Hadoop's adoption in enterprises.

Open-Source: The Apex of Collaboration

At the pinnacle of transparency and collaboration lay the Open-Source model. Here, every aspect, from source code to documentation, was freely accessible, fostering a global development community. Diverse perspectives and expertise led to robust, secure, and feature-rich software. However, monetization presented challenges, often overcome through premium services, dual licensing, or crowdfunding.

Levels of Openness in CE (ChatGPT Era)

Open API

In the BC period, Open-API, being the more straightforward approach, was seen as the initial step towards standardization in the industry. However, in the CE, achieving a unified Open-API might present more complexities. There are two potential paths forward:

OpenAI as the Standard Setter: One possibility is that OpenAI, given its prominence and influence in the field, could establish its API as the de facto standard. This move would essentially invite other players in the AI landscape to align with OpenAI's API structure. By setting such a standard, OpenAI could significantly influence the direction of AI development, encouraging a more uniform approach to AI interfaces. This would streamline integration and interoperability among different AI services, but it also raises concerns about centralization and the dominance of one entity in shaping the AI ecosystem.
Formation of an Independent Standardizing Body: Alternatively, the establishment of an independent foundation dedicated to defining API standards could democratize the process. Such a body would operate as a neutral entity, working collaboratively with various stakeholders in the AI field, including OpenAI. The foundation's role would be to develop and maintain a set of standardized APIs that all entities, regardless of size or influence, would adhere to. This approach promotes a more egalitarian and inclusive framework for AI development, ensuring that standards are not dictated by a single entity but are the result of industry-wide consensus.

The Obsolescence of Open-Core in Generative AI

The generative AI era has rendered the Open-Core model obsolete. This model, is ill-suited for the nuances of generative AI, where the emphasis has shifted from software frameworks to the complexities of data, algorithms, and AI training methodologies. In this new landscape, the value lies in the holistic development of AI technologies, encompassing ethical considerations, data privacy, and democratization. The Open-Core model's focus on software alone is inadequate for addressing these broader aspects of AI technology.

As the generative AI era ushers in new methodologies and frameworks, the concept of Open Weights emerges as a successor to the now-obsolete Open-Core model. This evolution marks a significant transition in how AI development is approached, moving from the software-centric paradigm of Open-Core to a more comprehensive and holistic approach in Open Weights.

Open Weights: The New Paradigm in Generative AI

In the realm of generative AI, "Open Weights" is a novel approach. It involves making the weights and biases of a trained model publicly available. This approach balances transparency with practicality, particularly in complex models like Large Language Models (LLMs). Open weights facilitate research, fine-tuning, and adaptation to specific tasks without the resource-intensive training process.

While open-weights in simplest form mean making weights and biases public, this is not good enough to build perspective. We need to define the key steps between open-weights and true open-source. In my opinion those steps are:

Foundation of Open WeightsPublicly Available Weights and Biases: The core aspect of open weights is making the trained model's weights and biases accessible.Deployment Code: Providing the necessary code to deploy these models, including scripts and tools for utilization.Documentation and User-Guides: Comprehensive guides and documentation that detail usage, best practices, and troubleshooting.
Training TransparencyPublicly Available Training Data: Sharing the data used in the model’s training process (the sausage making).Training Methodologies and Algorithms: Detailed information on the training architecture, protocols, and optimization strategies.Model Development Environment and Tools: Insights into the software, frameworks, hardware, and resources used during development.
Community and Ethical FrameworkCommunity Contribution Mechanisms: Platforms and processes for community contributions, bug reporting, and sharing adaptations.Ethical Guidelines and Usage Policies: Clear guidelines addressing biases, privacy, and intended use to ensure responsible application.Licensing Information: Comprehensive licensing details outlining how the model and its components can be used and distributed.Security Review Processes: Analyzing potential vulnerabilities and establishing responsible disclosure practices as models are opened.

Llama 2: Meta's Step Towards Open Weights

Meta's Llama 2 is a substantial advancement in embracing the open-weights concept. As a suite of pretrained and fine-tuned Large Language Models (LLMs) with parameters ranging from 7 to 70 billion, Llama 2 showcases its prowess, particularly in dialogue-centric applications. Its performance is competitive with leading closed-source models like ChatGPT and PaLM, underscoring its sophistication and utility. The development of Llama 2 incorporated training on publicly available online data, enhanced with over a million human annotations, showcasing Meta's commitment to leveraging open data sources for model training.

Despite these advancements, Llama 2's approach to openness is nuanced. While the model weights and deployment code are accessible, thus aligning with the foundational level of open weights, there are aspects where it doesn't fully align with a pure open-source model. The training data, while extensive, isn't entirely disclosed, and the model's licensing, though open-source and free for research and commercial use, comes with a custom commercial license. This licensing reflects Meta's attempt to balance openness with commercial pragmatism, allowing broad usage while retaining some degree of control.

Mixtral 8x7B: Pushing Boundaries in Open Weights

Mistral AI's Mixtral 8x7B, on the other hand, represents a more radical approach to transparency in the open-weights model. Licensed under the Apache 2.0, a very permissive license, Mixtral 8x7B provides extensive freedom in terms of usage rights, significantly broadening the scope of applications and modifications that can be made. This degree of openness represents a significant leap towards the ideal of true open-source.

However, like Llama 2, Mixtral 8x7B doesn't entirely reach the zenith of open-source. While the model weights, deployment code, and extensive documentation are provided, aligning well with the foundational open-weights criteria, there's still a gap in terms of the complete transparency of training data and methodologies. The choice to distribute the model via torrent is a commendable step towards accessibility, yet the journey to complete open-source transparency, including the sharing of all training data and detailed training processes, is still in progress.

Charting the Future: Open Weights and Beyond in AI

In conclusion, the rise of Open Weights signifies a pivotal shift in the generative AI landscape, underscoring a new paradigm in the development and sharing of AI technologies. As we progress into the ChatGPT Era (CE), innovative models like Llama 2 and Mixtral 8x7B are at the forefront, carving paths toward a more transparent, collaborative, and ethically conscious AI environment. These models are setting the stage for what could become a new standard in AI development - a landscape where openness and accessibility are not just aspirations but practical realities. However, the journey toward a fully open-source future, one that includes not only open weights but also the entirety of training data and methodologies, is still unfolding. The advancements made by Llama 2 and Mixtral 8x7B are critical in directing future AI developments. They present a promising glimpse into a world where AI is more inclusive, diverse, and innovative, showcasing the vast potential of AI technologies developed in an open and collaborative ecosystem.

>> Next Edition: Upending Social Order