Skip to content
Blog

CrateDB on Acceleration for Data Lakes

Move analytics from traditional architectures to modern architectures

A couple of years ago, data lakes became an early standard for large volumes of data and running business analytics on them. Today requirements have increased and real-time access at scale is the new normal - this is where CrateDB comes in.  

In essence, cloud object store data lakes are optimized for storing large data volumes but struggle with real-time analytics at scale. CrateDB is the new modern component to enhance and accelerate analytical performance for Hadoop, Azure Data Lake, AWS S3 and more.

Traditional data lake architecture 

In a traditional data lake architecture:

  • Multiple data sources produce event data stored in various databases.  
  • Batch-oriented, ETL-style data collection processes slow down the sync and consolidation. 
  • Multiple projects and experts are needed to make the data available under high-performance SQL.
  • As a result, dashboards and analytics are slow and complicated. 

Such a data lake infrastructure can be simplified and accelerated for analytics with a database that offers scalable and fast data ingestion and sub-second, fast queries of large data sets leveraging the benefits and simplicity of Standard SQL. 

Introducing a modern architecture  

We are introducing a modern architecture where CrateDB perfectly augments existing data stores, tools and applications while simplifying the stack and greatly expanding the accessibility of data and interoperability with surrounding systems.  

  • Multiple sources can easily store data directly in CrateDB, in various data formats, and in almost any pipeline connection.
  • Real-time transformation and synchronization happen from multiple databases (think “real-time ETL”) while exposing fast, scalable SQL through a distributed database combining relational and semi-structured data with super easy analytics access. BI strategies can be implemented with any BI tools (e.g, PowerBI and others) or any data science/AI tools through open standard SQL connectors. 
datalake  (1)

In addition, CrateDB can be integrated with existing legacy architecture for archiving (e.g, Cold store) and Data Science processing (e.g, Spark) housed in legacy data lakes. 

The modern architecture and database solution of CrateDB enables its user to benefit from real-time analytic performance across several data sources with scalable SQL with a cost-effective platform integrated into Microsoft Azure or other hyper scaler cloud environments.