Skip to content
August 3 2023

Will Vector Embeddings Exert a More Pervasive Influence Than Anticipated, Casting a Broader Shadow?

Blog Details

In our previous conversations, we explored the captivating realm of vector embeddings—a universal language, if you will, bridging the gap between humans and generative agents. These embeddings transform language into numerical representations, enabling the seamless operation of sophisticated AI endeavors.

Specialized databases like Pinecone have been carefully engineered to proficiently store and extract these embeddings. A clear use case of this is the transmutation of enterprise knowledge bases, which aids swift data retrieval. Today, we pivot our attention to a tantalizing supposition: Might these databases stretch their sphere of influence beyond their predetermined confines, thereby casting a wider net than initially predicted?

Unleashing the Potential of Adaptable Vector Databases

In an ideal scenario, each enterprise should operate with one foundational model, delicately fine-tuned with refreshed weights and biases to cater to the full spectrum of organizational use-cases (in contrast to Large Language Models or LLMs, which are only trained on public data). Yet, within the same enterprise, different departments often have unique requirements, necessitating separate sets of embeddings to fully unlock the prowess of generative AI tools. Stored independently for enhanced security and flexibility, these department-specific embeddings can significantly boost the performance of the central enterprise LLM.

"When the facts change, I change my mind - what do you do, sir?" - John Maynard Keynes

This is the juncture where adaptable vector databases prove beneficial. They can be refreshed as new data rolls in, preserving the relevance of each department's data without impacting the enterprise's foundational LLM. This results in a versatile system wherein generative AI tools can query these databases for the most fitting, department-specific embeddings as and when needed. Consequently, each department's knowledge base continues to evolve without necessitating significant alterations or retraining of the core LLMs.

Redefining the Enterprise Data Ecosystem

The enterprise data ecosystem predominantly consists of Online Transaction Processing (OLTP) databases, which administer systems of records, and Online Analytical Processing (OLAP) systems for data analysis - each with its distinct role (I'm indeed generalizing here). While OLTP databases enjoy an unwavering status in the enterprise data panorama, the potential for disruption lies in the OLAP systems, which are designed to provide insights and address enterprise-wide queries.

Vector embeddings and foundational models don't threaten traditional OLTP databases, which are too vital to enterprise operations to be disrupted. With their prowess in transaction management and data integrity maintenance, these systems of records will continue to hold their ground for the foreseeable future.

However, the dynamics change when it comes to OLAP systems and Business Intelligence (BI) tools. With the advent of vector embeddings and foundational models, these analytic systems are not just ripe for transformation but also on the cusp of being superseded. Of course, despite the swift advancement of generative AI, this disruption is anticipated to occur over a considerable span of time.

Final Thoughts

The arrival of generative AI and vector embeddings is dramatically reshaping the terrain of enterprise data management. While systems of records maintain their imperviousness, the realm of analytics and BI systems stands on the edge of a monumental shift. Vector databases, with their exceptional aptitude in natural language processing, are primed to supplant traditional OLAP systems, heralding an era of superior data understanding and insights.

In future newsletters, I will delve into the finer aspects of the enterprise data ecosystem, which is influenced by everything from gravity to the speed of light and every other physical law in between.