Live Stream: Turbocharge your aggregations, search & AI models & get real-time insights

Register now
Skip to content
Blog

CrateDB is a Perfect [Rockset] Replacement for Real-Time Analytics and Hybrid Search

First of all, congratulations to the Rockset team on their acquisition by OpenAI. It is a testament to the exceptional technology and innovative solutions Rockset has built over the years. The acquisition demonstrates the immense value and potential of Rockset's real-time analytics database and its world-class data indexing and querying capabilities.  

Unfortunately, the Rockset service will be taken out of business by 30 September 2024, which requires all customers to offboard into other database solutions. Sometimes it took multiple years to migrate into the service, a migration away in just three months requires some miracles to happen. While there are many different alternatives out in the market, CrateDB is the only solution that offers a similar approach to converged indexing, full-text search, vector search, and geospatial support in a single storage engine, accessible via native SQL and HTTP endpoints. This page outlines a detailed comparison to help you in selecting the right alternative and ensuring a timely and seamless migration. 

Here are the top five reasons for choosing CrateDB as a Rockset replacement.

1. Converged index aka automated indexing

Rockset stores all data in a Converged Index, which combines row and column storage, as well as inverted text indexes. This enables fast, compute-efficient queries, regardless of the access pattern or shape of the data.

CrateDB automatically indexes all data in real-time and uses a row- and column-based format for efficient retrieval and fast aggregation of data – no matter if a simple table structure or a deeply nested JSON is stored. In addition, full-text indexes and vector indexes can be leveraged to build hybrid search functionality. Rockset’s BM25 implementation has only been available in private beta. 

The most recent independent benchmark of CrateDB demonstrates that CrateDB cannot only index all data in real-time with a very demanding 1 billion row inject phase of the TSBS benchmark, but also does it 20x faster than MongoDB.

You can learn more about this powerful combination to build real-time monitoring, anomaly detection, prediction, and company-wide chatbots in the presentation of our customer TGW Logistics, who build fully automated warehouse solutions across the globe. This use case combines complex JSON handling, real-time indexing, real-time querying (incl. our most-recent release of a foreign data wrapper to access and join other databases in real-time), as well as vector indexing and search.

2. Fully-Featured SQL and HTTP interface

Rockset uses SQL as its native query language – for search, aggregations, and joins on semi-structured data. Query lambdas offer easy integration from services. 
 
CrateDB uses SQL, offers JOINs, and implements the PostgreSQL Wire Protocol which opens a huge set of integrations. The HTTP endpoint allows to execute and parameterize SQL queries from any kind of service.

3. Support for handling structured, semi-structured, and unstructured data

Rockset users can easily analyze free structured JSON or text documents, incl. smart schema management.

CrateDB allows to simply ingest nested JSON as-is without the need for transformation or preprocessing. The schema can be either dynamic, strict, or completely ignored. CrateDB provides easy access to nested attributes using plain SQL. The database automatically indexes all attributes, regardless of their depth, enabling fast searches and efficient updates.

4. Support for real-time streaming and updates

Rockset is a database designed to deliver millisecond query latencies over event streams from multiple systems. Every attribute is mutable, i.e. attributes can be updated individually in real-time as well. In order to achieve this performance, Rockset offers local SSD storage for the virtual instances and cloud storage to share across multiple virtual instances. CrateDB Cloud offers high performant SSD drives on all plans to guarantee unparallel performance for real-time analytics and search use cases. In-place updates of individual attributes are available in the same fashion as in Rockset, incl. consistency for by-primary-key queries. CrateDB’s lock-free ingestion and query mechanism is achieved via Optimistic Concurrency Control, and ensures millisecond query latencies over billions of rows ingested in real-time into the database, while having thousands of concurrent read users.

5. CrateDB is fully open source and deployment-agnostic

Rockset is a proprietary solution available as a managed service in AWS only. As the abrupt stop of the service (or other sudden price increases, like Microsoft Azure back in 2023) demonstrates, it is important to put a critical service like a database into an open solution, that is robust against market volatilities.

CrateDB offers not just a managed service available on AWS, Azure, and GCP, but is also fully open source with a clear commitment to remain like this. This has several advantages outlined below.

Cost Control and Full Flexibility

With CrateDB as an open-source database, customers can customize and extend the software to suit their needs without relying on the vendor. Additional deployment options arise: ranging from single-node instances on the Edge (e.g. for real-time monitoring and predictions), over on-premises installations, to private, public, and hybrid cloud scenarios. Multiple installation methods are available, either in Docker containers, via the Kubernetes operator, or via binaries for multiple operating systems – providing you a maximum of flexibility.

Transparency and Security

The disclosure of the source code allows security researchers and developers to discover and fix potential vulnerabilities before they can be exploited. Companies can ensure that there are no hidden backdoors or unwanted data accesses in the code, which is often not verifiable with proprietary solutions.

No Vendor Lock-In 

With proprietary solutions, there is a risk that the vendor or a hyperscaler might discontinue support or change terms, leading to unpredictable costs and risks. CrateDB offers a multi-cloud service: Sudden increase in cloud prices of one hyperscaler – just move to another one. You don’t like CrateDB Cloud anymore – move to self-deployed. Running CrateDB in the cloud is as easy as five clicks (literally!).

Active Community and Innovation 

The community of developers and users contributes to rapid development and enhancement of the software. Companies can benefit from the latest developments and integrate them into their own solutions without waiting for the next version of proprietary software. Visit CrateDB Community

Interoperability and Open Standards

CrateDB supports common protocols and formats, enabling seamless integration with existing systems. Companies can leverage open standards to avoid vendor lock-in and future-proof their systems. 

Migration Offering

As it is never easy to get started with new technology, we are offering free migration and consulting services to ensure a seamless transition from Rockset to CrateDB. Even if you just have a quick question about some queries as you have already tested on your own – we are here to help.

Feature Comparison

This section provides a detailed comparison of features to enable you to make an informed decision. We are not just claiming these comparisons, we also validated with the Rockset team that this is a fair comparison, to avoid any bias.

 

Rockset 

CrateDB 

Licensing 

 

 

Open Source to avoid vendor lock-in

+ Cloud Service on par with Open Source 

 

 

Deployment 

 

 

SaaS Serverless 

 

 

SaaS – Shared (for test and development workloads) 

 

 

SaaS – Dedicated 

 

 

SaaS – In your Environment 

 

 

Multi-Cloud

(AWS, Azure, GCP, incl. Marketplace Integrations)

On-premises

 

Edge 

 

 

Kubernetes Operator 

 

 

Scaling 

T-Shirt Sizes

Increase resources of individual service that needs more compute and/or storage

Separation of Concerns

(Compute/Storage, Compute/Compute)

(Share-nothing with individual scaling of compute/storage and optionally workload isolation, horizontal scaling to thousands of concurrent sessions)

Ingestion 

 

 

Real-Time Ingestion 

 

 

Bulk File Ingestion 

 

 

Upserts 

 

 

Partial Upserts 

 

 

Deduplication 

 

 

Insert-Time Aggregations 

 

 

Time Partitioning 

 

 

Column Value Partitioning 

☑️ 
(Fixed count, PK only) 

✅ 

(Configurable, any column, multiple columns) 

Write API 

 

 

Materialized Views 

☑️ 
(On-Demand, Query Lambdas) 

☑️ 
(On-Demand, Scheduler) 

Backups

 

 

(Incremental Backups,  
Restore on Table and Partition Level)
 

Query Language 

 

 

Postgres Compatible Query Language 

 

 

Wire-Compatible Driver 

 

 

Joins 

 

 

Collocated Joins 

 

 

Nested JSON Support 

 

 
(Strict, Dynamic, Ignored Schema, incl. indexing of all attributes, independent of depth) 

Pagination 

 

 

User-Defined Functions 

 

(JavaScript) 

✅ 

(JavaScript) 

Query Lambdas 

 

 
(HTTP requests with parameterized SQL queries) 

RBAC 

 

 

Table Aliases 

 

☑️ 
(Views, Temporary Query Alias) 

Query Scaling 

 
(Multiple Virtual Instances) 

 
(Horizontal Scaling, Partitioning & Sharding, Replicas) 

Indices 

 

 

Automated Indexing of Attributes 

 

 

Converged (row, columnar, inverted, search, vector) 

 

 

Real-Time Indexing 

 

 

Mutability (Real-time updates don’t require full reindexing) 

 

 

Configurable per Column 

 

 
(Disable indexing and columnar store on a per-colum level, Enable BM25 / full-text index per single or combined columns) 

Columnar 

 

 

Inverted Index for Text 

 

 

Full-Text for Fuzzy Search, etc. 

 
(BM25 ranking in private beta only) 

 

(Fuzzy Search, Phrase Search, Boosting, …) 

Time-Column Partitioned 

 

 
(Partitioning across any single or multiple attributes) 

Geospatial 

 

 

JSON, incl. automatic schema derivation 

 

 
(Strict, Dynamic, Ignored Schema) 

Vector 

 

 

Conclusion

CrateDB is a fully open source, scalable, and cost-effective real-time analytics database combining complex JSON handling, time series, geospatial data, full-text search, and vector search in one single storage engine that automatically indexes all your data to achieve millisecond response times for any kind of incoming query and aggregation.

While we respect the capabilities of other database solutions, we believe CrateDB offers a unique combination of features that make it an ideal choice as Rockset replacement. We invite you to explore CrateDB's capabilities and see how they align with your specific needs.

We strongly invite you to test the product and reach out to us with any questions – remember, we are offering a free migration service to get you started and integrate CrateDB into your data and streaming landscape. We are happy to help you to implement your requirements. Reach out to us via website or book an appointment directly with one of our specialists. We look forward to hearing from you.