The Guide for Time Series Data Projects is out.

Download now
Skip to content

FOSDEM'24: How to Use Private Data in Generative AI

This event has passed.

This talk focuses on the synergistic combination of CrateDB and LangChain: it helps to get started with using private data as context for large language models through LangChain, incorporating the concept of Retrieval Augmented Generation (RAG). CrateDB as an open-source distributed SQL database helps to streamline data management by supporting a wide variety of data models ranging from documents over time series and full-text search towards recently added support for vectors and similarity search. Thus, it helps to avoid technology sprawl in modern data architectures that need to cope with structured, semi-structured, and unstructured data while not sacrificing the convenience of having a standard SQL interface. LangChain, an open-source framework written in Python, is specifically designed to facilitate the development of applications using large language models (LLMs). LangChain enables seamless integration of language models with various data sources and services, acting as a bridge between AI language capabilities and private data.

During the talk we will demonstrate CrateDB's proficiency in storing and handling vector data, and how this can be utilized by large language models (LLMs) through LangChain, further enhanced by RAG for richer contextual insights. Through live coding demonstration, we will illustrate firsthand the process of extracting data from CrateDB and integrating it into LLMs using LangChain, thereby enriching language models with new data. This integration finds practical applications in semantic search, content recommendation, and data analytics, just to name a few.

The talk will appeal to anyone keen on leveraging Large Language Models, enhanced by Retrieval Augmented Generation, to unlock new insights from their private data. Participants will also learn how to leverage the CrateDB database to efficiently store vector data and utilize vector search in advanced analytics applications.

Track: Python Devroom devroom
Room: UD2.218A
Day: Sunday
Start: 09:00
End: 09:25
Video only: ud2218a
Chat: Join the conversation!