IoT Analytics at Scale: Architecture Guide for Industrial Data

Every IoT analytics deployment starts the same way: sensors generate data, a collection agent picks it up, and a dashboard displays it. The architecture looks manageable on a whiteboard. The problems surface at scale.

Most IoT time series databases were designed for monitoring workloads — fixed metric sets with pre-defined aggregations and dashboards that refresh on a schedule. Industrial workloads break that model in two places. The device count grows without warning: a single distribution center can reach 900,000 sensors, a global packaging manufacturer runs 900 distinct sensor types across 181 factories. And the queries that matter most to operations teams cannot always be anticipated in advance.

This guide covers the architecture for real-time industrial IoT analytics: how data moves from sensor to SQL, why the database choice determines whether you can query that data while it is still live, and what the SQL patterns look like in practice. All three layers, with specifics.

The three-layer IoT analytics architecture

Real-time IoT analytics has three distinct layers: ingestion, query, and visualization. Each has a clear job. Getting the boundary between them right is what makes the system maintainable as data volumes grow.

Layer 1: Ingestion via Telegraf

Telegraf is an open-source collection agent that reads from OPC-UA, MQTT, Modbus, and dozens of other industrial protocols, then writes to configurable outputs. For industrial IoT analytics, the output is CrateDB.

The CrateDB output plugin writes sensor data over the PostgreSQL wire protocol. A minimal Telegraf configuration looks like this:

[[outputs.cratedb]]
  url = "postgresql://user:password@cratedb-host:5432/sensor_data"
  table = "sensor_readings"
  timestamp_column = "ts"

Telegraf handles protocol translation, batching, and retry logic. It runs at the edge or close to your OT network, collecting from PLCs, historians, and SCADA systems without requiring direct network access to the database from the factory floor.

Layer 2: Query with CrateDB

CrateDB receives the sensor data and auto-indexes every field on ingestion. Data is available for SQL queries within milliseconds of arrival. No batch loading step, no pre-aggregation required, no schema migration when a new sensor type appears.

The shared-nothing distributed architecture means query execution scales horizontally. Adding nodes handles more concurrent dashboards, higher ingestion rates, and larger datasets without downtime or schema changes.

Layer 3: Visualization with Grafana

Grafana connects to CrateDB via the PostgreSQL datasource plugin, using a standard PostgreSQL connection string. Every SQL query runs directly against live sensor data. Dashboard refresh intervals determine query frequency. The data is always current.

The full pipeline:

No export jobs. No staging tables. No scheduled aggregation tasks. The latency between a sensor event and a visible dashboard update is measured in seconds, not minutes.

Why most IoT time series databases stall at industrial scale

The IoT time series database category was built around a monitoring model: fixed sets of metrics with known cardinality, queried through pre-defined dashboards. Industrial data breaks two assumptions in that model.

Cardinality. Most time-series databases organize data by series — a unique combination of metric name and tag values. At low cardinality, this works well. At industrial scale, it degrades structurally. A factory with 900 sensor types, 50 production lines, and 10 product variants generates 450,000 unique series per facility. At 181 facilities, that is over 81 million unique series. The storage model that works for IT monitoring does not survive this arithmetic.

Query flexibility. Monitoring workloads have predictable query patterns: "show me CPU utilization for this host over the last hour." Industrial analytics does not. A maintenance engineer investigating a quality event needs to correlate temperature, vibration, and throughput on a specific asset during a specific production run, joined against a parts catalog. That query does not exist in any dashboard until the engineer writes it. An industrial IoT analytics database must support ad-hoc queries over high-cardinality data without requiring pre-aggregation.

CrateDB is architected for high-cardinality, high-dimensionality analytics. Columnar storage and automatic indexing serve ad-hoc SQL queries across hundreds of millions of sensor records without requiring practitioners to define query patterns in advance. You should not need to know what you will ask before the database will answer it. For how high cardinality intersects with JSON schema evolution, unified observability, and mixed workloads in modern industrial architectures, see Beyond Time-Series: High-Cardinality Analytics, Flexible JSON, and Unified Observability.

Real-time IoT analytics in standard SQL

SQL is the query language your data engineers already know. Using it for industrial IoT analytics means your existing drivers, BI tools, and dashboards all connect to CrateDB unchanged via the PostgreSQL wire protocol. Here are two patterns that appear frequently in industrial deployments.

Time-bucketed sensor aggregation

This query computes per-minute averages, minimums, and maximums across all active sensors over the last hour. It runs on live data — no pre-computed summary tables.

Schema (illustrative):

CREATE TABLE sensor_readings (
    ts           TIMESTAMP WITH TIME ZONE NOT NULL,
    sensor_id    TEXT,
    sensor_type  TEXT,
    asset_id     TEXT,
    plant_id     TEXT,
    value        DOUBLE,
    month_bucket TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS DATE_TRUNC('month', ts)
) PARTITIONED BY (month_bucket);

Aggregation query:

SELECT
    sensor_id,
    DATE_TRUNC('minute', ts) AS minute_bucket,
    AVG(value)               AS avg_value,
    MIN(value)               AS min_value,
    MAX(value)               AS max_value
FROM sensor_readings
WHERE ts > NOW() - INTERVAL '1 hour'
GROUP BY sensor_id, minute_bucket
ORDER BY sensor_id, minute_bucket;

This is the foundation of a live sensor dashboard. Grafana runs this on a configurable refresh interval. The WHERE clause uses NOW() so the query always covers the most recent window, regardless of when it executes.

Cross-asset threshold detection

This query identifies assets where the average value for a specific sensor type exceeded a threshold in the last 15 minutes — across all plants in the dataset simultaneously.

SELECT
    asset_id,
    plant_id,
    AVG(value)   AS avg_reading,
    COUNT(*)     AS sample_count
FROM sensor_readings
WHERE
    sensor_type = 'temperature'
    AND ts > NOW() - INTERVAL '15 minutes'
GROUP BY asset_id, plant_id
HAVING AVG(value) > 85.0
ORDER BY avg_reading DESC;

CrateDB executes this across all cluster nodes in parallel. The result covers every plant in the deployment. A maintenance engineer does not run one query per facility and manually combine the results.

For OEE-specific analytics — Availability, Performance, and Quality calculated from live production data — see OEE Analytics on Live Data: How to Move from Nightly Exports to Real-Time Dashboards, which includes the full OEE SQL schema and query, plus the ALPLA proof point in detail.

Industrial IoT analytics in production: three deployments at scale

The architecture above runs in production today. Here are three deployments at industrial scale.

ALPLA: 900 sensor types, 181 factories, 250x faster

ALPLA manufactures packaging for global consumer brands including Coca-Cola and Unilever. They operate 181 facilities in 46 countries, each running up to 900 distinct sensor types feeding into a centralized production monitoring system.

Before CrateDB, production queries ran against Microsoft SQL Server. Execution time was 3 to 5 minutes per query. Cross-facility comparisons — the queries that let central operations identify underperforming sites and direct improvement resources — were too slow to use as day-to-day operational tools.

After migrating to CrateDB, query time dropped from 3 to 5 minutes to milliseconds: a 250x improvement. ALPLA's 900 sensor types per factory land in a single table using CrateDB's dynamic schema capability. New sensor types are absorbed at ingestion without schema migrations or pipeline changes. ALPLA moved their existing SQL queries directly from SQL Server to CrateDB without rewriting them — the PostgreSQL wire protocol compatibility meant their reporting tooling connected without changes.

"By collecting and analyzing sensor data in real time, we can direct people to the 'hot spots' and improve waste rate and efficiency." — Jodok Schäffler, General Manager, ALPLA

Full story: cratedb.com/stories/alpla

ABB Ability Genix: 1 million values ingested per second

ABB's Ability Genix industrial AI platform handles predictive maintenance and operational analytics for manufacturing customers globally. CrateDB ingests 1 million values per second, with event retrieval rates of 30,000 to 120,000 events per second. The deployment supports multi-platform data tiering — hot, warm, and cold storage — across production environments at industrial scale.

"Working with CrateDB brings positive outcomes. The ingestion and throughput have very good performance, with 1 million values/sec, the horizontal scalability where we can add as many nodes as we need and the automatic query distribution across the whole cluster." — Marko Sommarberg, Lead, Digital Strategy and Business Development, ABB

Full story: cratedb.com/stories/abb

TGW Logistics: 900,000 sensors per distribution center

TGW Logistics Group designs and operates automated distribution centers for global retailers. Each center runs 900,000 sensors, with CrateDB processing over 100,000 messages every few seconds. The same CrateDB deployment feeds real-time visibility dashboards for TGW's customers, predictive modeling for the operations team, and digital twin workloads for engineering — all from one database, on both cloud and on-premises infrastructure.

"CrateDB allows us to operate on any Cloud and on-prem/Edge with simplicity and stellar performance, and significant cost advantages." — Alexander Mann, Owner Connected Warehouse Architecture, TGW Logistics Group

Full story: cratedb.com/stories/tgw-logistics

Choosing the right database for industrial IoT analytics

Industrial IoT analytics is not a monitoring problem. It is a distributed SQL problem: high-cardinality sensor data, concurrent dashboard loads, ad-hoc queries across multi-dimensional datasets, and live ingestion with no tolerance for batch latency.

The IoT database guide covers how to evaluate database choices for sensor data workloads, including ingestion protocol support, query model trade-offs, and cardinality handling. For teams evaluating whether a purpose-built time-series database or a distributed SQL database better fits their workload, that guide includes the architectural comparisons that matter.

CrateDB is not the right database for every workload. If your IoT deployment is low-cardinality metric monitoring with pre-defined dashboards and no cross-dimensional queries, a time-series monitoring tool may be sufficient. If your deployment involves high device counts, mixed sensor types, ad-hoc queries from engineers, or any workload that requires joining sensor data with operational metadata, CrateDB is built for that intersection.

For teams extending this into predictive maintenance (fault detection, anomaly
scoring, and maintenance triggers from the same sensor stream), the
Predictive Maintenance Database Architecture guide covers the SQL patterns and schema design.

Getting started

The Telegraf to CrateDB to Grafana pipeline runs locally in Docker in under 30 minutes. The exploration path at cratedb.com/explore walks through the full setup: database, ingestion configuration, and a live SQL dashboard against your own data.

For production deployments on managed infrastructure, CrateDB Cloud removes the operational overhead. For teams with on-premises or edge requirements, CrateDB Enterprise supports the same SQL interface across factory-floor, private cloud, and hybrid deployments.

Real-time IoT analytics does not require replacing your dashboard tool, rewriting your queries, or redesigning your ingestion pipeline from scratch. It requires a database that can keep up with your sensors.

Try CrateDB Live — sensor to live SQL dashboard in under 30 minutes.

IoT Analytics at Scale: Architecture Guide for Industrial Data

The three-layer IoT analytics architecture

Layer 1: Ingestion via Telegraf

Layer 2: Query with CrateDB

Layer 3: Visualization with Grafana

Why most IoT time series databases stall at industrial scale

Real-time IoT analytics in standard SQL

Time-bucketed sensor aggregation

Cross-asset threshold detection

Industrial IoT analytics in production: three deployments at scale

ALPLA: 900 sensor types, 181 factories, 250x faster

ABB Ability Genix: 1 million values ingested per second

TGW Logistics: 900,000 sensors per distribution center

Choosing the right database for industrial IoT analytics

Getting started

Product

Developers

Company

Community

IoT Analytics at Scale: Architecture Guide for Industrial Data

The three-layer IoT analytics architecture

Layer 1: Ingestion via Telegraf

Layer 2: Query with CrateDB

Layer 3: Visualization with Grafana

Why most IoT time series databases stall at industrial scale

Real-time IoT analytics in standard SQL

Time-bucketed sensor aggregation

Cross-asset threshold detection

Industrial IoT analytics in production: three deployments at scale

ALPLA: 900 sensor types, 181 factories, 250x faster

ABB Ability Genix: 1 million values ingested per second

TGW Logistics: 900,000 sensors per distribution center

Choosing the right database for industrial IoT analytics

Getting started

Related Posts

Time Series Forecasting with SQL: DATE_BIN, Window Functions, and Rolling Aggregations

Why Industrial Teams Are Moving from Flux and InfluxQL to Standard SQL

How to Add a New Sensor Type to Your Industrial Database Without Pipeline Downtime