Skip to content
December 9 2023

#116 Navigating the Layout: Large Language Models in the File System

Blog Details

<< Previous Edition: Elo vs. MMLU scores

In the realm of Large Language Models (LLMs), complexity intertwines with sophistication. Andrej Karpathy, in his recent video titled "The Busy Person's Intro to LLMs," gracefully unravels this intricacy, simplifying our understanding while adding depth. This blog will delve into select aspects from his presentation, providing a profound insight into the inner workings of LLMs, with a special emphasis on the remarkable Llama 2 70b model.

The Anatomy of LLMs

Karpathy explains that at their core, LLMs can be viewed as a two-file system. One file contains the model's parameters, which are essentially the weights, and the other, a smaller file, is the code needed to run the model. This section breaks down this dual-file concept, illustrating the simplicity and elegance of the underlying structure of LLMs.

Llama 2 stands out as the only prominent model that offers openly accessible weights, making analysis a breeze. However, it's important to note that ease of analysis does not necessarily translate to superior performance. According to Karpathy, Llama 2 falls significantly behind leading language models like GPT-4 and Claude, at least by an order of magnitude. Our laboratory research at Roost.AI corroborates this evaluation.

With disclaimer out of the way, A notable feature of the Llama 2 model is its ability to function on common hardware, such as a MacBook. By compiling the necessary files, users can independently operate the model, even in air-gapped environments. Furthermore, the model's parameters, represented as 2-byte floating-point numbers, result in a reasonable data footprint. Specifically, Llama 2's 70 billion parameters lead to a disk space requirement of approximately 140GB.

The Intensive Process of Model Training

Karpathy describes the training of LLMs as akin to compressing a chunk of the internet. To train a model with 70 billion parameters, around 10 TB of text data is required. The resource requirements for training models like Llama 2 are colossal. Karpathy notes the necessity of powerful GPU clusters, comprising 60 GPUs, and a 12-day running period, amounting to a cost of about $2 million.

For many general use-cases, this amount of training may suffice, but it is essentially a high-volume, low-quality approach. Personally, I prefer the term "generalized training". As it serves as a preliminary step towards more specific training, Karpathy aptly coined it as "pre-training".

After pre-training the next step is fine-tuning. While pre-training can be considered as source of knowledge gathering, fine-tuning is more about task specific alignment. The key to fine-tune is to swap the generalized dataset with a custom dataset.

An additional, optional step in this process is the use of comparison labels, often referred to as Reinforcement Learning through Human Feedback (RLHF). This process emphasizes the ease of verifying solutions over generating them, encapsulating the P vs NP dilemma.

Enhancing Efficiency with LLMs

A recurring theme in our discussions is the elevation of human roles to higher levels of abstraction. This principle also applies in the context of LLMs, particularly in label generation and review. LLMs can draft initial labels, which humans can refine, or critique labels based on set instructions. This approach not only streamlines the process but also integrates LLMs more deeply into operational workflows, underscoring their growing importance in the AI landscape.


In conclusion, the external structure of Large Language Models (LLMs), such as Llama, showcases a blend of elegance and complexity in their file system design. While these lighter models might not currently match the performance of more advanced systems, their portability is a significant advantage. As technology progresses, these desktop-grade AI models are poised to become highly useful, bridging the gap between sophistication and accessibility in the realm of artificial intelligence.

>> Next Edition: Flexing Open Weights