CrateDB can be beneficially employed in the outlined architecture for knowledge assistants below, providing a unified data platform for landing zones, chunks, embeddings, configurations, operational stores, and logging and reporting functionalities. This greatly simplifies the architecture, replacing the need for multiple different database technologies with a single solution.
The Issue of Proliferating Database Technologies
Unfortunately, the landscape of data challenges is shaped by the complexity and constant evolution of data architecture. It's common to begin with relational technology due to its familiarity. As user requirements continually evolve and expand, developers often find themselves incorporating additional capabilities into their applications, such as full-text search engines, document, and vector databases. When thinking of real-world applications, maintaining and scaling such a heterogeneous infrastructure can be time-consuming and resource intensive. Moreover, each new technology oftentimes requires learning a new language, which drastically increases the effort needed to develop new applications.
This results in big impacts in terms of people, time and money: highly skilled people need to be hired for each language and technology and the effort is very high to keep all systems in sync. Both time to market and time for changes significantly increase, resulting in a high total cost of ownership.
How CrateDB Can Help
As AI adoption continues to grow, the need for databases that can adapt to complex data landscapes becomes paramount. Leveraging a multi-model database capable of managing both structured, semi-structured, and unstructured data, is an ideal fit to serve as the foundation for data modelling and application development in AI/ML scenarios. It is an enabler of complex, contextual-rich, and real-time intelligent applications.
CrateDB combines diverse data types into single records accessible via SQL, making it easy to adopt by developers already familiar with relational databases.
Beyond native SQL, CrateDB offers dynamic schema capabilities, allowing schema changes on the fly and custom logic definition. Backed by a distributed storage and query engine, CrateDB supports high volume reads and writes, optimal for real-time scenarios and fast, complex query performance. It uses columnar storage, with all attributes indexed by default or in a custom mode, and ensures high availability and horizontal scalability by managing data distribution across added nodes. Finally, CrateDB can be deployed in various scenarios: as a fully managed cloud service (available on AWS, Azure, and GCP), or self-deployed on-premises, on private cloud, in hybrid architectures, or even on edge devices.
AI Ecosystem Integration
Regarding AI visualization, CrateDB integrates with various tools like Grafana, Tableau or Power BI, Google Looker, and Python libraries such as Matplotlib or Plotly. These tools can be used in conjunction to build custom applications on top of CrateDB.
Applications that require Machine Learning and AI capabilities, such as Natural Language Processing (NLP), chatbots, classification, anomaly detection, and predictions, can easily integrate with CrateDB. It's also compatible with a number of orchestration frameworks. Furthermore, if you need to track your model training and execution, CrateDB can be used as the backend for MLflow, providing a comprehensive solution for your AI and Machine Learning initiatives.
LangChain Integration
LangChain is a popular framework for developing applications powered by language models. It enables applications that:
- Are context-aware: connect a language model to sources of context such as prompt instructions.
- Can reason: rely on a language model to reason (defining ways to answer based on provided context, defining actions to take, etc.)
LangChain can easily integrate with CrateDB and the integration offers these capabilities:
- Vector store: store embeddings in CrateDB
- Document loader: load documents from CrateDB via SQL
- Message history: store conversations (use prompts, system prompts, AI responses). It enables the model to remember and maintain context throughout a conversation with a user.
It enables to create embeddings and chats and provides access to over 70 different LLMs.