Microsoft has a great article that provides an Internet of Things (IoT) reference architecture for the Microsoft Azure Platform as a Service (PaaS), including Azure IoT Central and OSS components like as SMACK (Spark, Mesos, Akka, Cassandra, Kafka) deployed on Azure VMs.
Crate.io is partnered with Microsoft and we have a lot of experience implementing IoT solutions for major international companies such as ALPLA, Gantner Instruments, and Roomonitor using our IoT Data Platform.
In this post, I will build on the Azure IoT reference architecture by explaining how CrateDB fits into the Azure ecosystem and can be used to supercharge your Azure tech stack.
The Internet of Things has been gaining adoption momentum and studies show that is only likely to increase.
The basic IoT workflow looks like this:
Let's break that down:
When adopting IoT, there are a lot of important things that need to be considered. For instance: networking, protocol support, security, privacy, cost, high-availability, data retention policies, and so on.
Typical industrial use-cases (e.g., a factory) generates a massive amount of complex time-series data (i.e., data that tracks changes over time). And this data often requires real-time processing, querying, visualization, and analysis.
Like IoT as a whole, databases that are specialized to handle this sort of data are also experiencing adoption growth.
So, let's take a closer look at the Azure IoT reference architecture, using the diagram from the original article:
Let's break that down:
Phew! That's a lot. But hopefully, this is a useful summary. If you want more details, check out the original article.
CrateDB is a new type of SQL database that offers many of the performance and scaling benefits typically associated with NoSQL databases (including horizontal scaling and schemaless objects) without you having to ditch SQL.
Specifically, CrateDB is an eventually consistent, distributed SQL database that uses a shared-nothing architecture. CrateDB clusters are masterless, and nodes coordinate seamlessly with each other. Query execution is automatically parallelized and distributed across the nodes in the cluster.
This architecture is well-suited to containerization, meaning a cluster running on Kubernetes can be scaled up or down as easily as running a kubectl scale
command.
Scale up operations can take minutes instead of weeks.
A modest CrateDB cluster can ingest millions of records per second while also offering real-time queries (including joins and aggregations).
Some customers report that CrateDB is 20x faster than their previous database and on 75% less hardware.
All of this means CrateDB excels at handling the velocity, volume, and diversity of huge industrial time-series workloads.
For example, industrial sensors often increase the frequency of their measurements when values exceed configured thresholds. With multiple sensors, cascading failures can result in huge data ingestion spikes. CrateDB is able to handle these spikes without sacrificing query performance, meaning that your reporting and analysis tools won't stop working just when you need them the most.
Crate.io offers a hosted CrateDB product called CrateDB Cloud which runs on and is integrated with Azure.
Since you're familiar with CrateDB and the Azure IoT reference architecture, I can show you where CrateDB Cloud fits in the Azure tech stack to handle hot, warm, and cold path data.
Here's a modified version of the previous diagram:
Let's take a closer look.
Hot storage is for data that sees continuous use and needs to be accessed immediately.
For example, if you're collecting sensor data from machines on a factory floor, you want to be able to spot faults as soon as possible and act on that data right away.
That means you need to pick a time-series database that can handle the amount of data your sensors are producing, as well as offering real-time query facilities.
Hot storage is the most expensive sort of storage because it typically involves the most performant hardware, advanced networking setups, redundant copies of the data, multiple geographic availability zones, and so on.
The Azure reference architecture uses Azure Stream Analytics and Azure Functions to handle hot path data ingestion. But those don't offer querying or visualization functionality. If you want to do that, you can add Azure Time Series Insights (TSI) to your tech stack.
Here's an overview of TSI:
Azure TSI is good at what it does. But for the demanding workloads of industrial time-series data, CrateDB has a more competitive pricing structure than that of Azure TSI. (We've seen a 10x better price-performance ratio.) Contact us for more details.
Warm storage is for data that sees frequent use and when small delays in access times for that data can be tolerated. This means lower infrastructure costs.
The Azure reference architecture uses CosmosDB for warm storage.
CosmosDB is a globally distributed, multi-model database with Service Level Agreements (SLAs) and a focus on throughput, latency, availability, and consistency.
Since CrateDB is a time-series database, you have only one product to integrate with (as opposed to using both CosmosDB for warm data and Azure TSI for hot data).
Additionally, CrateDB may be a better alternative here because it is more cost efficient for extreme time-series use cases. CosmosDB can become expensive to run with large, non-transactional workloads.
Cold storage is for data that sees infrequent use and when large delays in access times for that data can be tolerated.
Typically, cold storage is used for historical data and batch processing is more common than real-time querying.
This is the least expensive sort of data storage.
Azure Blob Storage can archive data indefinitely at low cost, and, per the Azure reference architecture, this is what we recommend for cold storage.
Now you know how CrateDB fits into the Azure IoT reference architecture, let's take a look at some of the additional benefits of pairing CrateDB with Microsoft Azure:
In this post, I introduced you to CrateDB and the Microsoft Azure IoT reference architecture. I then showcased how CrateDB fits in the architecture and can help you improve performance, increase capacity, reduce complexity, add features, and reduce costs.
Here are some real-life single-database stats we've seen:
If you want to know more, please get in touch!