Independent Time Series Benchmark Confirms CrateDB’s Top-Tier Performance

Read now
Skip to content
Blog

5 Myths About Relational Databases, Time Series Data, and Operational Historians

This article is more than 4 years old

Relational databases are a go-to solution for many applications. However, traditional relational database architecture can often cause problems when you're working with the business intelligence demands of the Industrial Internet of Things (IIoT) and time series data. To address these issues, many people turn to modern time series databases or traditional operational historians.

But what if you could have your cake and eat it?

In this post, I will take a look at a few myths that contribute to the false idea that you must give up SQL and choose between a time series database and an operational historian.

Myth: Relational databases can't scale to handle industrial time series workloads

While it's true that storage and compute power is cheap, IIoT and time series data applications are capable of continuously generating previously unfathomable amounts of data that needs to be ingested and analyzed in real-time.

For these use cases, traditional Relational Database Management Systems (RDBMSes) can start to be a pain.

Many traditional RDBMSes are designed to scale vertically, which means buying new hardware and upgrading your machines every time you want to increase capacity. That process can be slow, error-prone, and expensive.

In response, many people turn to operational historians or NoSQL solutions. However, both of these options have downsides to consider.

Operational historians are expensive, difficult to operate, and typically behind the times. On the other hand, NoSQL time series databases require you to develop and maintain your own business intelligence software.

But there's no need to throw the SQL baby out with the bathwater.

With CrateDB, you get the power of an RDBMS (including ANSI SQL, replication, sharding, and snapshots) combined with all the benefits typically associated with NoSQL: distributed computing, limitless horizontal scalability, high-availability, containerization, and much more.

And, for business intelligence, you can use the Crate IoT Data Platform, a fully-managed operational historian and mission control Platform as a Service (PaaS) for discrete manufacturers.

Myth: Relational databases are too slow for industrial time series workloads

Traditional RDBMSes often struggle with query execution performance, write performance, read performance, or some combination of the three.

Why? Well, consider:

  1. Most RDBMSes provide ACID semantics: atomicity, consistency, isolation, and durability. Basically, once you've successfully written something to the database, you can be sure that any subsequent reads provide an up-to-date and consistent view of the data.ACID is a neat feature, but it is computationally expensive. Additionally, strong consistency can only be provided if a single write master handles all writes.
  2. Because strong consistency limits your RDBMS to a single write master, the capacity and the performance of your database is constrained by what you can achieve with a single machine.

One solution for this is to switch to BASE semantics: basically available, soft state, eventual consistency. These sorts of systems can be distributed and run across a cluster of any size, neatly solving most traditional performance and scaling issues.

However, in doing so, most eventually consistent databases do away with the relational model, and SQL to boot. (Hence why this new breed of database picked up the moniker NoSQL.)

Baby, meet bathwater.

Fortunately, it doesn't have to be like this. CrateDB is one of a new breed of relational databases that can be clustered to provide distributed SQL query execution with eventual consistency.

Myth: Using a relational database as a historian has been done before and it doesn't work

People have been trying to use traditional RDBMes as operational historians for decades. And yes, they run into issues, because traditional RDBMSes are challenging to scale for the needs of industrial time series workloads.

But that doesn't mean that relational databases are the problem. Single write-master restrictions and vertical scaling are the problems.

To solve those problems, it is common to switch to a distributed, horizontally-scalable system.

And sure, most databases that can do this also ditch the relational model. But it doesn't have to be that way. With a distributed SQL database like CrateDB, you can have the best of both worlds.

Myth: The only options are a modern time series database or a traditional operational historian

We debunk this myth in detail in a previous post.

In summary:

  1. Modern time series databases provide good performance and scalability, but you'll have to roll your own business intelligence features.
  2. On the other hand, traditional historians come with batteries included but are costly and challenging to work with.

With CrateDB and the Crate IoT Data Platform, you get a relational database with the best of both worlds: it's specialized for time series data, and it can do everything a traditional operational historian can do.

Myth: Working with time series data in a relational database is difficult

Indeed, working with time series data often requires a specialized approach. And that is easier with a specialized database. However, there's no particular reason why a relational database cannot be specialized for time series data.

For instance, here are some of the ways in which CrateDB is specialized for time series data:

  1. Distributed processing, data partitioning, and a multi-threaded architecture enable fast, linearly scalable time series data ingestion and limitless data storage without a performance penalty.
  2. Parallel processing, in-memory columnar indexes, and window functions enable real-time, sophisticated analytics, and machine learning on time series data.
  3. Adapt to changing data with dynamic schemas and complex objects queryable to an arbitrary depth with standard SQL.
  4. Elastic scaling, sharding, and replication for fast, always-on performance that stays consistent as data volume and concurrent clients grow.
  5. First-class interfaces to Azure IoT Hub, Azure Event Hubs, Prometheus, Telegraf, and other SQL tools (e.g., Grafana) via the Postgres wire protocol or HTTP REST API.

Wrap up

In this post, I detailed five myths about relational databases, time series data, and operational historians.

In particular, I argued against the false idea that you have to choose between using a relational database and working with series data or the business intelligence demands of discrete manufacturing.


Got questions? No problem! Get in touch, and we'll be happy to help.