<< Previous Edition: OpenAI SORA
With the growing hype around compact models, it's easy for some to question the necessity of large language models. They might wonder if smaller models could suffice. However, this line of thinking is risky and overlooks the significant advantages and capabilities that largeness of LLMs offer.
Much like choosing a generally intelligent person for any position, it's crucial to start with a Large Language Model (LLM) that possesses a wide base of general intelligence. This foundational capability ensures that, no matter the direction of its fine-tuning, the LLM remains adaptable, capable, and ready to tackle specific challenges. Such a broad yet deep understanding forms the bedrock upon which specialized functions can be built, ensuring efficiency and effectiveness in diverse applications.
The Double-Edged Sword of Fine-Tuning
In a previous newsletter, amidst rumors that OpenAI might charge around $3M for model fine-tuning, I suggested it would be a worthwhile investment for enterprises. What I should have highlighted more is that this is a play for the big leagues. It's about major players like OpenAI partnering with Fortune 500 companies. The catch is, with every new release of a Large Language Model (LLM), the custom-tailored weights and biases from previous fine-tuning become a challenge to replace. Unless you're prepared to invest in complete re-tuning—with costs likely exceeding $3M, factoring in inflation—each time a new LLM version rolls out, you might find yourself at a significant disadvantage.
For smaller companies, the hefty price tag of fine-tuning by industry giants isn't feasible. So, what's the alternative? The go-to strategy is often Retrieval Augmented Generation (RAG). In layman's terms, this involves supplying external data through embeddings as needed. While it might not seem as efficient as embedding specific weights and biases directly into the model, the results might pleasantly surprise you.
Another viable path is exploring open-source models, such as LLaMA or the Mixtral series. These models are designed with adaptability in mind, making them more suited for fine-tuning that's easier to update. This approach provides a more accessible and flexible option for smaller entities looking to leverage advanced language models without the exorbitant costs.
Conclusion
Despite the increasing interest in compact versions of models, it's crucial to distinguish this from the idea of diminishing the scale of large language models. If history and trends are any indication, these models are set to grow even more extensive in size and complexity.