Let's take a moment to understand how CrateDB solves the challenges associated with today's diverse data sources and the demands of modern systems or application development.
When developing a system or application that needs to store and access data in the database, you'll often start by using a relational database system. Soon you realise you also need a search solution and add a search engine to the architecture. Now you have to manage data replication processes to keep that engines data in sync with the relational database. Maybe you have a need to store complex objects or arbitrary JSON and streaming data from sensors. Adding a document store and time series database helps you to meet these needs.
As your system grows, geospatial data becomes important and you also need normalised abstractions for machine learning. To work with embeddings, a vector store is added. In the end you have a very complex architecture with a lot of data replication, different technologies and multiple different languages in use for each of these technologies. This adds complexity to deployment, monitoring and operations. It also creates additional development overhead due to the different query languages and data abstractions used by each data store in the system. There are issues and implications with adopting this sort of data architecture.
Let's begin with the issues. Having a data store per use case requires data integration and synchronisation. It's difficult to manage scaling up and out of multiple data stores, especially with time series databases. You also end up with a complex application back end because you're developing with different languages and in different data silos. This leads to a growing complexity and technical debt over time. The implications here for your people: You'll have multiple skill sets required. You'll need more people to maintain this complexity. There's an implication for time to delivery as well. It's a slow time to value because changes take a long time and there are many overhead activities associated with all of these different data stores. Overall, this leads to a high total cost of ownership and a high cost of change.
So what does a solution to these problems look like? Ideally, we want a single source of truth that keeps data in near real time. We want to support multiple different data types in a single technology and we want that to be performance and to scale.
Let's look at how CrateDB solves these challenges: CrateDB offers a multi model approach that can cover tables, time series, geospatial documents, or JSON, binary objects and vectors. All of these data types are accessible via standard SQL by data consumers. A dynamic schema makes the database more resilient and easier to change. Functions allow for more complex tasks. This is backed by a distributed query engine that allows for massive high volume concurrent reads and writes. Columnar storage and advanced indexing helps to support fast complex queries and eases development as not all indexes need to be defined manually. The distributed nature allows for highly available and horizontally scalable architectures.
CrateDB can be deployed on the edge in your data centres as well as in cloud environments. It's also available as a managed service. A synchronisation method helps to keep the edge and other clusters in sync. For example, you could store and process individual data on the edge and synchronise relevant information into the cloud to perform holistic analysis.