Vector Data
Vector data querying with SQL
Hyper-fast. Queries in milliseconds.
SELECT text, _score
FROM word_embeddings
WHERE knn_match(embedding,[0.3, 0.6, 0.0, 0.9], 2)
ORDER BY _score DESC;
|------------------------|--------|
| text | _score |
|------------------------|--------|
|Discovering galaxies |0.917431|
|Discovering moon |0.909090|
|Exploring the cosmos |0.909090|
|Sending the mission |0.270270|
|------------------------|--------|
SELECT text, _score
FROM word_embeddings
WHERE knn_match(embedding, (SELECT embedding FROM word_embeddings WHERE text ='Discovering galaxies'), 2)
ORDER BY _score DESC
|------------------------|--------|
| text | _score |
|------------------------|--------|
|Discovering galaxies |1 |
|Discovering moon |0.952381|
|Exploring the cosmos |0.840336|
|Sending the mission |0.250626|
|------------------------|--------|
Streamlined data management
Eliminate the need to manage multiple systems. CrateDB seamlessly integrates your data, keeping your (meta-)data and vector representations aligned without the complexity of data synchronization processes. Not only does it offer powerful vector search capabilities, but it also seamlessly integrates with time series, geospatial, JSON, full-text search, and other data types.
Data enriched with semantics
Seamlessly add vector data types to any row in the database, providing context aligned with your (meta-)data and enhancing explainability.
Advanced search capabilities
Enhanced AI model integration
Improved scalability
Faster development & lower maintenance
Keynote - The transformative effects of real-time AI
In this keynote at the AI & Big Data Expo Europe 2023, CrateDB's VP Product shares his vision for the future with multi-model SQL databases and Large Language Models.
Dev Talk - How to use private data in generative AI
This talk at Fosdem 2024 focuses on the combination of CrateDB and LangChain: it helps get started with using private data as context for large language models through LangChain, incorporating the concept of Retrieval Augmented Generation (RAG).
5 essential things you need to know about vector databases
This infographic gives you some basic understanding of vector databases, from what you should look for when choosing one to combining vector data with other data types.
Demo – Harnessing CrateDB’s multi-model capabilities for AI-powered applications
In this video, we explore the integration of CrateDB and PyCaret to detect anomalies in machine data, crucial for identifying potential failures or inefficiencies in technological systems. CrateDB's capability for handling large-scale data with ease pairs seamlessly with PyCaret's low-code approach to machine learning, offering a streamlined path to uncovering insights within vast datasets.
Curious to learn more?
CrateDB stands as a vector store database with key features that elevate its capabilities: vector storage and similarity search.
- Vector storage empowers users to efficiently store embeddings produced by their preferred machine learning models, creating a streamlined method for managing and accessing vectorized data.
- Similarity search enables users to effortlessly discover similarities within datasets represented as vectors, fostering advanced data exploration and in-depth analysis.
By offering these vector database capabilities within a single, scalable product, CrateDB streamlines data management, cutting down both development time and total cost of ownership.
Typical use cases for vector databases
Unlock the potential of CrateDB's vector storage and similarity search across a range of industries and applications:
E-commerce recommendations
Chatbots & customer support
Enhance customer interactions by understanding questions with precision. Contextualize conversations, providing better service with improved understanding of user inquiries, regardless of the terms they use.
Anomaly & fraud detection
Multimodal search
Generative AI
Store embeddings, provide additional context in prompts and act as conversational memory for LLM-based applications. Use vector search functionality for retrieval augmented generation (RAG), which enables LLMs to understand specific data.
Additional resources on vector data
FAQ
Vector data allows users to capture the complex details of points, lines, and polygons, unveiling a new dimension in data analysis, mapping, and spatial decision-making. Vector data can be stored in different file formats: Shapefile (.shp), GeoJSON (.geojson), KML (Keyhole Markup Language), and GML (Geography Markup Language). CrateDB leverages the power of vector data with a highly scalable database that can be queried using SQL, simplifying data management and reducing development time and overall costs.
Vectors are numerical representations used to quantify and compare features or characteristics of data items, such as text, images, or sounds, in a high-dimensional space. For example, a vector can look like this: -0.32643065
, -0.12308089
, -0.2873811
, representing a point in a multi-dimensional space. In CrateDB, vectors are stored as one-dimensional arrays of float values using the float_vector data type, allowing for efficient storage and querying of dense vector data.
A vector database is designed to store and manage high-dimensional data, grouping vectors based on their similarities. These databases use advanced indexing and search algorithms to find the most similar vectors to a given query quickly. Examples of vector databases include CrateDB, Pinecone, Zilliz, and Weaviate. CrateDB excels as a vector store database with features like vector storage and similarity search. If you want to learn more, read this blog post on how to choose the best database for vector data.
Vector data is used in various applications, including e-commerce recommendations, chatbots and customer support, anomaly and fraud detection, multimodal searches, and generative AI. One of the key applications is similarity search, where algorithms like k-nearest neighbors (KNN) identify the most similar data points to a given query vector. This capability is crucial for recommendation systems, image retrieval, and anomaly detection. CrateDB enhances these applications by integrating vector storage and similarity search within a scalable database solution. Watch CrateDB’s keynote to learn more about vectors for real-time AI >
Typical distance metrics for comparing vectors include Euclidean distance, Cosine similarity, and Manhattan distance. In AI and ML, they are used for similarity search.