Webinar

From Documents to Dialogue: Unlocking PDF Data with a Smart Chatbot 

Mountains of valuable information are locked away inside PDF files. Whether it’s business reports, regulatory documents, user manuals, or research papers, the ability to extract and utilize insights from these documents is becoming essential.  

Screenshot of the webinar From Documents to Dialogue: Unlocking PDF Data with a Smart Chatbot. The speaker, Simon Prickett, is shown on the left while he shares his screen showing some source code of the the PDF chatbot.In this recording, Simon Prickett (Developer Advocate at CrateDB), begins by showing you how to extract data from text and images in PDF files, storing it in CrateDB. From there, you’ll see how to generate embeddings using AI models and perform hybrid semantic and keyword searches with SQL queries. Finally, we’ll put it all together and demonstrate a natural language chatbot that takes questions in plain English, returning responses from a large language model.  

What you will learn

  • Data Preparation: Discover how to extract text from PDF documents, generate textual descriptions of images and store these in CrateDB for vector and full-text searching. 
  • Data Retrieval and Augmentation: Learn how natural language search queries are converted to vector embeddings and used in semantic and hybrid searches.  You’ll also see how to augment a Large Language Model (LLM) prompt with data from the database. 
  • Response Generation: In the final step of the pipeline, we’ll introduce techniques for generating coherent and fluent responses to users

Watch Now