Skip to content
February 15 2024

#152 The High-Bandwidth Memory of GPUs

Blog Details

<< Previous Edition: In Data We Trust

Whenever I discuss complex technology, I adhere to Richard Feynman's belief that if you cannot explain it simply, you do not understand it well enough. Although I would not claim to fully grasp topics like memory that are outside my main area of expertise, I can still strive to explain them in simple terms.

A Post Singularity Utopian Town

Imagine a town where wealth isn't measured by industrial achievements but by the quality of social interactions and the flow of information. In this community, homes are more than just spacious living areas; they're interconnected with their neighbors, making each house not just a place to live but a vital part of a lively network of exchange and collaboration. With this concept in mind, city planners have developed various generations of housing designs over the years, which we will explore today.

The DDR Series of Homes

At the foundational level of our utopian town, we have the DDR Series of Homes, reminiscent of the SDRAM technology and its iterations, adopting a townhome-style configuration. Each cluster within the town comprises units that represent the initial step towards realizing our vision of a vibrant, interconnected community. We've scaled up the capacity of these homes to up to 16GB, significantly enhancing each unit's ability to partake in the town's network of exchange and collaboration. While the bandwidth of these early homes may not match that of later advancements, it lays down the essential infrastructure for connectivity—reminiscent of the quaint walking paths that facilitate the fundamental interactions among neighbors.

The HBM: Medium-Rise

Moving up to the initial level of architectural evolution, we find the medium-rise configuration of HBM. This innovative design introduces the concept of vertically stacking homes, akin to the technological breakthrough of through-silicon vias (TSVs) and microbumps. This method significantly enhances neighborly access by vertically linking units, a stark contrast to traditional urban development which often seeks to minimize ground footprint. Our memory town, however, aspires to maximize the interactive footprint, fostering a denser, more connected community fabric.

The first iteration of HBM allowed for 2-4 layers (or floors) of homes, each initially equipped with a 1GB capacity, setting the foundation for a more densely populated and interactive environment. The bandwidth of HBM1, akin to the speed of 128GB/s, transformed community interaction, upgrading from pedestrian paths to an efficient elevator system. This leap in connectivity is like enhancing the mobility and interaction efficiency within the community, allowing for a more dynamic and fast-paced exchange among its residents.

Make/Model : AMD Radeon R9 Fury X

The HBM2: High Rise

Elevating the concept in our utopian town to the HBM2 era, we envisage high-rises that not only reach skyward with up to 8 floors (Both NVIDIA and AMD had builders's special upgrade options to 12 floors), each endowed with an 8GB capacity, but also pioneer an unprecedented level of accessibility. This design innovation brings the total capacity to a robust 64GB per stack, significantly densifying the urban fabric with vibrant, interconnected living spaces.

The advent of HBM2 transforms the communal experience, akin to outfitting every side of our buildings with high-speed elevators, and even embedding them within, ensuring that no corner is left disconnected. This architectural marvel elevates bandwidth to around 256GB/s, mirroring the introduction of an ultra-efficient transportation network within each structure. Such advancements ensure rapid, seamless access throughout the building, reflecting the swift data transfer rates of HBM2 technology.

Make/Model : NVIDIA Tesla V100, AMD Radeon RX Vega 56 and Vega 64

The HBM2E: The High-Rise Complex

The rollout of HBM2E introduced a groundbreaking era in our urban narrative, heralding the age of the High-Rise Complex. These architectural giants redefine the essence of urban expansion, not merely by their ascent towards the heavens but also through their unprecedented capacity expansion. Each floor in these colossal structures is endowed with an impressive 16GB, accumulating to a breathtaking 128GB per stack. Yet, the marvels of HBM2E extend even further, allowing for the integration of up to 8 stacks within a single package—akin, for illustrative purposes, to housing 8 distinct buildings (called active sites) on within the same complex. This innovation multiplies the community's collective capacity, fostering a denser, and profoundly interconnected living and working environment.

With bandwidth surging to a staggering 460GB/s, this generation mirrors the deployment of an expansive, high-capacity elevator network, engineered to whisk residents between levels and buildings with unparalleled speed and smoothness. This leap in bandwidth efficiency resembles the introduction of futuristic elevator systems, enhancing the ease with which people and information navigate the complex. Such advancements not only streamline day-to-day interactions but also significantly elevate the communal quality of life, paving the path toward a society marked by enhanced connectivity and a spirit of collaboration.

Make/Model : NVIDIA A100 (5 active sites), AMD Radeon Instinct MI100 (4 active sites)

HBM3: Expansive High-Rise Complex

The zenith of our utopian architecture is embodied by HBM3, introducing the era of extra wide high rises. These architectural giants extend the concept of vertical living, featuring floors with capacities surpassing the 16GB benchmark set by their predecessors, and targeting even loftier aggregate capacities per stack. HBM3 stands as a testament to the peak of memory technology, ushering in a new era of data processing and storage capabilities. It fosters a degree of communal interaction and connectivity that was once beyond our wildest dreams.

The bandwidth capabilities of HBM3 elevate to beyond 1TB/s, reminiscent of the kind of futuristic, supremely efficient elevator systems one might find in science fiction—capable of moving an entire building's populace as if by teleportation. This extraordinary leap in speed and efficiency represents the pinnacle of data transfer and accessibility. It ensures that the inhabitants of these skyscrapers enjoy a level of connectivity and collaborative potential that is unmatched, truly reflecting the ultimate ambition of our urban and technological advancements.

Make/Model : NVIDIA H100 (6 active sites)

HBM3e: The Skybridge Towers

In our evolving urban landscape, HBM3e introduces the era of "Skybridge Towers," a visionary leap beyond the extra wide high rises brought forth by HBM3. These towers not only extend the vertical and horizontal boundaries of living spaces but also introduce skybridges that link buildings, facilitating an unparalleled level of integration and community connectivity. HBM3e, with its advancements in memory technology, lays the foundation for these architectural marvels, offering capacities that boldly surpass the benchmarks of the past and bandwidths that redefine the essence of seamless communication.

Skybridge Towers, enabled by HBM3e, boast an architectural sophistication that mirrors the technological leap to bandwidths exceeding 1TB/s, allowing for data—and metaphorically, people—to flow effortlessly between points at speeds previously unimaginable. This architectural innovation symbolizes a new phase of urban development, where the focus is on creating a hyper-connected community fabric, weaving together the digital and physical realms in harmony.

Make/Model : NVIDIA H200 (upcoming)

Conclusion

Memory management in GPUs is complex, but it is front and center in driving generative AI innovation. High-Bandwidth Memory in particular is the core reason that GPUs are an order of magnitude superior to CPUs, even for inference workloads. My goal in this newsletter is to provide an illustrative overview about the power of memory in GPUs. We are still only looking under the hood at a level of knowledge expected of a driver, not an automobile engineer. I have made a good faith effort to double check the numbers, but please post in the comments if you find any inaccuracies.

>> Next Edition: Storytelling with SORA