Live Stream: Turbocharge your aggregations, search & AI models & get real-time insights

Register now
Skip to content
Blog

CrateDB as a cost effective alternative to [Rockset]

During the last weeks, we had multiple conversations with Rockset users that need to offload within just three months to a different solution. In addition to the risk of feature and performance discrepancies, many of them are also in fear of massively increased costs. 

We provide a detailed feature comparison in a different article and also worked on comparing the performance for streaming ingests of Rockset and CrateDB based on Rockbench. In addition, we gathered experience in a few migration projects already. A combination of the gathered knowledge is used to compare costs between Rockset and CrateDB. 

Cost Comparison of Rockbench Execution 

During the comparison of streaming ingest performance, we have not just seen that CrateDB on average is 6-9x faster than Rockset for streaming ingests, we have also seen that it comes with lower costs. For a detailed explanation, please have a look at the benchmark blog post.

The benchmark compares Rockset based on 2XLarge and 4XLarge instances, which we translated into available CR4 instances in CrateDB Cloud. Please see the table below for a comparison.

Rockset CrateDB
2XLarge 
Allocated Compute: 64 vCPU 
Allocated RAM: 512 GB 
4 Nodes CR4 
Allocated Compute: 64 vCPU 
Allocated RAM: 220 GB 
4XLarge 
Allocated Compute: 128 vCPU 
Allocated RAM: 1,024 GB 
 8 Nodes CR4 
Allocated Compute: 128 vCPU 
Allocated RAM: 440 GB 


Rockset distinguishes between the compute and storage size, same for CrateDB. Both prices are compared on AWS, us-east-1 (N. Virginia). 

We compared costs for two scenarios each: As a lot of customers use Rockset not just as an analytical database, but build their whole business on top, we must ensure high availability. The typical Rockset setup requires to have at least two virtual instances and parallel ingest to achieve zero-downtime failover and maintenance processes. As CrateDB is already set-up as a multi-node cluster, we can increase the replication factor to one ensuring that a single node can break without any impact on the availability of the overall cluster – only the storage size needs to be increased to have sufficient disk space available. The availability can be further increased by increasing the replication factor. 

As we see in the different scenarios, CrateDB is about 25% more cost effective than Rockset in the Non-HA scenario, even about 60% more cost effective in the case of a true HA setup. This holds true for the 2XLarge and the 4XLarge virtual instance sizes. Even the HA scenario in CrateDB is more cost-effective than the single-node scenario in Rockset. 

Comparison of the 2XLarge scenario between Rockset and CrateDB, Non-HA and HA

 

Comparison of the 4XLarge scenario between Rockset and CrateDB, Non-HA and HA

 

Pricing Comparison of Example Pricing Scenarios on the Rockset Website

Rockset provides five example scenarios to calculate its pricing on their website. We combined the experience of real-world customer projects as well as the Rockbench execution to compare the prices for each of these scenarios. The scenarios show that CrateDB plays in a similar ballpark and is oftentimes a bit cheaper (exception are smaller development scenarios – which can be mitigated by either using the forever free tier in CrateDB Cloud or using a self-deployed variant on the developer workstation).

Scenario 1: Geosearch Application (Development)

You start to develop a geosearch feature for your logistics tracking application using Rockset. As you are in development mode, you select a shared virtual instance, nano, for your workload. 

You build your app on streaming data from Apache Kafka. You ingest 5 GiB of uncompressed data daily from Kafka. The total size of data on disk after compression and indexing for every day is 10GiB. You retain the data for 1 day. You use Rockset’s developer edition in AWS US-West. 

Scenario 2: Recommendation Engine

You go to production on a recommendation application using Rockset. Your recommendation application uses a small virtual instance during peak times, 30% of the month, and scales down to an XSmall virtual instance during off peak times, 70% of the month. 

Your recommendation application is built on data from MongoDB. You have 500 GiB of data already in MongoDB and ingest 5GiB of change data from MongoDB every day during the month. Your total data size on disk after indexing and compression in Rockset remains at 500 GiB for the first 20 days and then increases to 700GiB for the last 10 days. You use Rockset’s general purpose instance class in the standard edition in AWS US-West. 

Scenario 3: In-App Search and Analytics

You go to production on in-app search and analytics using Rockset. You isolate the search application from the analytics application using virtual instances. 

The virtual instance for the search application uses a medium virtual instance for peak periods, 25% of the time, and a XSmall virtual instance for off peak periods, 75% of the time. 

The virtual instance for the analytics application uses a small virtual instance for peak periods, 25% of the time, and an XSmall virtual instance for off peak periods, 75% of the time. 

Scenario 4: Real-Time Game Telemetry

You go to production on an application that collects real-time game behavior data to inform development. You isolate the compute for ingestion and indexing from analytics for predictable performance at scale using virtual instances. 

Your gaming data streams through Amazon Kinesis into Rockset for analysis. You ingest 500 telemetry events per second with an average size of 1 KiB per event. When retained for 30 days that is roughly 1,200 GiB of data. You use an XSmall virtual instance that can support streaming ingestion rates of 2 MiB/second. 

The virtual instance for game analytics uses a Medium virtual instance for peak periods, 20% of the time, and a XSmall virtual instance for off peak periods, 80% of the time. 

Scenario 5: Anomaly Detection Application

You are in development on an anomaly detection application using Rockset. Your anomaly detection application streams data from Confluent Cloud. You ingest 10 GB of uncompressed data daily from Confluent Cloud. The size of data on disk after compression and indexing for every day is 20 GB. You retain the data for 30 days. 

As your total dataset size is 600 GB, over the storage size limit of shared virtual instances, you decide to use the standard edition in AWS US-West. 

You do not require access to the freshest data to build your application and configure microbatching, setting the batching of ingestion in intervals of 2 hours. You use the Xsmall virtual instance with a peak streaming ingest rate of 2 MiB/s. Every 2 hours, 833 MiBs have accumulated which take 7 minutes to batch load into Rockset. 

You use an XSmall virtual instance to test the performance of application queries. You spin up the virtual instance when building your application and spin down when you are done for the day. On average, you spend 4 hours a workday using the query virtual instance. 

 

Conclusion

Based on the feature comparison, benchmark results, and cost comparisons in this post, CrateDB is a perfect replacement for Rockset – also from a commercial point of view. For a limited amount of time, we are offering free migration services to achieve a seamless transition from Rockset to CrateDB. Please reach out to us via the website or book a meeting with one of our solution engineers to discuss your use case in detail.