First of all, congratulations to the Rockset team on their acquisition by OpenAI. It is a testament to the exceptional technology and innovative solutions Rockset has built over the years. The acquisition demonstrates the immense value and potential of Rockset's real-time analytics database and its world-class data indexing and querying capabilities.
Unfortunately, the Rockset service will be taken out of business by 30 September 2024, which requires all customers to offboard into other database solutions. Sometimes it took multiple years to migrate into the service, a migration away in just three months requires some miracles to happen. While there are many different alternatives out in the market, CrateDB is the only solution that offers a similar approach to converged indexing, full-text search, vector search, and geospatial support in a single storage engine, accessible via native SQL and HTTP endpoints. This page outlines a detailed comparison to help you in selecting the right alternative and ensuring a timely and seamless migration.
Here are the top five reasons for choosing CrateDB as a Rockset replacement.
Rockset stores all data in a Converged Index, which combines row and column storage, as well as inverted text indexes. This enables fast, compute-efficient queries, regardless of the access pattern or shape of the data.
CrateDB automatically indexes all data in real-time and uses a row- and column-based format for efficient retrieval and fast aggregation of data – no matter if a simple table structure or a deeply nested JSON is stored. In addition, full-text indexes and vector indexes can be leveraged to build hybrid search functionality. Rockset’s BM25 implementation has only been available in private beta.
The most recent independent benchmark of CrateDB demonstrates that CrateDB cannot only index all data in real-time with a very demanding 1 billion row inject phase of the TSBS benchmark, but also does it 20x faster than MongoDB.
You can learn more about this powerful combination to build real-time monitoring, anomaly detection, prediction, and company-wide chatbots in the presentation of our customer TGW Logistics, who build fully automated warehouse solutions across the globe. This use case combines complex JSON handling, real-time indexing, real-time querying (incl. our most-recent release of a foreign data wrapper to access and join other databases in real-time), as well as vector indexing and search.
Rockset uses SQL as its native query language – for search, aggregations, and joins on semi-structured data. Query lambdas offer easy integration from services.
CrateDB uses SQL, offers JOINs, and implements the PostgreSQL Wire Protocol which opens a huge set of integrations. The HTTP endpoint allows to execute and parameterize SQL queries from any kind of service.
Rockset users can easily analyze free structured JSON or text documents, incl. smart schema management.
CrateDB allows to simply ingest nested JSON as-is without the need for transformation or preprocessing. The schema can be either dynamic, strict, or completely ignored. CrateDB provides easy access to nested attributes using plain SQL. The database automatically indexes all attributes, regardless of their depth, enabling fast searches and efficient updates.
Rockset is a database designed to deliver millisecond query latencies over event streams from multiple systems. Every attribute is mutable, i.e. attributes can be updated individually in real-time as well. In order to achieve this performance, Rockset offers local SSD storage for the virtual instances and cloud storage to share across multiple virtual instances. CrateDB Cloud offers high performant SSD drives on all plans to guarantee unparallel performance for real-time analytics and search use cases. In-place updates of individual attributes are available in the same fashion as in Rockset, incl. consistency for by-primary-key queries. CrateDB’s lock-free ingestion and query mechanism is achieved via Optimistic Concurrency Control, and ensures millisecond query latencies over billions of rows ingested in real-time into the database, while having thousands of concurrent read users.
Rockset is a proprietary solution available as a managed service in AWS only. As the abrupt stop of the service (or other sudden price increases, like Microsoft Azure back in 2023) demonstrates, it is important to put a critical service like a database into an open solution, that is robust against market volatilities.
CrateDB offers not just a managed service available on AWS, Azure, and GCP, but is also fully open source with a clear commitment to remain like this. This has several advantages outlined below.
Cost Control and Full Flexibility
With CrateDB as an open-source database, customers can customize and extend the software to suit their needs without relying on the vendor. Additional deployment options arise: ranging from single-node instances on the Edge (e.g. for real-time monitoring and predictions), over on-premises installations, to private, public, and hybrid cloud scenarios. Multiple installation methods are available, either in Docker containers, via the Kubernetes operator, or via binaries for multiple operating systems – providing you a maximum of flexibility.
Transparency and Security
The disclosure of the source code allows security researchers and developers to discover and fix potential vulnerabilities before they can be exploited. Companies can ensure that there are no hidden backdoors or unwanted data accesses in the code, which is often not verifiable with proprietary solutions.
No Vendor Lock-In
With proprietary solutions, there is a risk that the vendor or a hyperscaler might discontinue support or change terms, leading to unpredictable costs and risks. CrateDB offers a multi-cloud service: Sudden increase in cloud prices of one hyperscaler – just move to another one. You don’t like CrateDB Cloud anymore – move to self-deployed. Running CrateDB in the cloud is as easy as five clicks (literally!).
Active Community and Innovation
The community of developers and users contributes to rapid development and enhancement of the software. Companies can benefit from the latest developments and integrate them into their own solutions without waiting for the next version of proprietary software. Visit CrateDB Community
Interoperability and Open Standards
CrateDB supports common protocols and formats, enabling seamless integration with existing systems. Companies can leverage open standards to avoid vendor lock-in and future-proof their systems.
As it is never easy to get started with new technology, we are offering free migration and consulting services to ensure a seamless transition from Rockset to CrateDB. Even if you just have a quick question about some queries as you have already tested on your own – we are here to help.
This section provides a detailed comparison of features to enable you to make an informed decision. We are not just claiming these comparisons, we also validated with the Rockset team that this is a fair comparison, to avoid any bias.
|
Rockset |
CrateDB |
Licensing |
|
|
Open Source to avoid vendor lock-in + Cloud Service on par with Open Source |
❌ |
✅ |
Deployment |
|
|
SaaS Serverless |
✅ |
❌ |
SaaS – Shared (for test and development workloads) |
❌ |
✅ |
SaaS – Dedicated |
❌ |
✅ |
SaaS – In your Environment |
❌ |
✅ |
Multi-Cloud |
❌ |
✅ (AWS, Azure, GCP, incl. Marketplace Integrations) |
On-premises |
❌ |
✅ |
Edge |
❌ |
✅ |
Kubernetes Operator |
❌ |
✅ |
Scaling |
T-Shirt Sizes |
Increase resources of individual service that needs more compute and/or storage |
Separation of Concerns |
✅ (Compute/Storage, Compute/Compute) |
✅ (Share-nothing with individual scaling of compute/storage and optionally workload isolation, horizontal scaling to thousands of concurrent sessions) |
Ingestion |
|
|
Real-Time Ingestion |
✅ |
✅ |
Bulk File Ingestion |
✅ |
✅ |
Upserts |
✅ |
✅ |
Partial Upserts |
✅ |
✅ |
Deduplication |
❌ |
✅ |
Insert-Time Aggregations |
✅ |
✅ |
Time Partitioning |
✅ |
✅ |
Column Value Partitioning |
☑️ |
✅ (Configurable, any column, multiple columns) |
Write API |
✅ |
✅ |
Materialized Views |
☑️ |
☑️ |
Backups |
❌ |
✅ (Incremental Backups, |
Query Language |
|
|
Postgres Compatible Query Language |
✅ |
✅ |
Wire-Compatible Driver |
❌ |
✅ |
Joins |
✅ |
✅ |
Collocated Joins |
✅ |
✅ |
Nested JSON Support |
✅ |
✅ |
Pagination |
✅ |
✅ |
User-Defined Functions |
✅ (JavaScript) |
✅ (JavaScript) |
Query Lambdas |
✅ |
✅ |
RBAC |
✅ |
✅ |
Table Aliases |
✅ |
☑️ |
Query Scaling |
✅ |
✅ |
Indices |
|
|
Automated Indexing of Attributes |
✅ |
✅ |
Converged (row, columnar, inverted, search, vector) |
✅ |
✅ |
Real-Time Indexing |
✅ |
✅ |
Mutability (Real-time updates don’t require full reindexing) |
✅ |
✅ |
Configurable per Column |
❌ |
✅ |
Columnar |
✅ |
✅ |
Inverted Index for Text |
✅ |
✅ |
Full-Text for Fuzzy Search, etc. |
❌ |
✅ (Fuzzy Search, Phrase Search, Boosting, …) |
Time-Column Partitioned |
✅ |
✅ |
Geospatial |
✅ |
✅ |
JSON, incl. automatic schema derivation |
✅ |
✅ |
Vector |
✅ |
✅ |
CrateDB is a fully open source, scalable, and cost-effective real-time analytics database combining complex JSON handling, time series, geospatial data, full-text search, and vector search in one single storage engine that automatically indexes all your data to achieve millisecond response times for any kind of incoming query and aggregation.
While we respect the capabilities of other database solutions, we believe CrateDB offers a unique combination of features that make it an ideal choice as Rockset replacement. We invite you to explore CrateDB's capabilities and see how they align with your specific needs.
We strongly invite you to test the product and reach out to us with any questions – remember, we are offering a free migration service to get you started and integrate CrateDB into your data and streaming landscape. We are happy to help you to implement your requirements. Reach out to us via website or book an appointment directly with one of our specialists. We look forward to hearing from you.